[ 
https://issues.apache.org/jira/browse/MADLIB-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873720#comment-16873720
 ] 

Frank McQuillan commented on MADLIB-1351:
-----------------------------------------

(1)
I would suggest we use the more verbose explanation of 'evaluate_every' and 
'perplexity_tol' that we put in the JIRA description above, i.e., 

{code}
evaluate_every : int, optional (default=0)
How often to evaluate perplexity. Set it to 0 or negative number to not 
evaluate perplexity in training at all. Evaluating perplexity can help you 
check convergence in training process, but it will also increase total training 
time. Evaluating perplexity in every iteration might increase training time up 
to two-fold.

perplexity_tol : float, optional (default=1e-1)
Perplexity tolerance to stop iterating. Only used when evaluate_every is 
greater than 0.
{code}

(2)
For the model output table, perplexity will need to be an array along with a 
separate array indicating the iterations it was calculated on.  This is similar 
to what we do in 'madlib_keras_evaluate' :

{code}
perplexity  DOUBLE PRECISION[].  Array of perplexity values as per the 
'evaluate_every' parameter. For example, if 'evaluate_every=5' this would be an 
array of perplexity values for every 5th iteration, plus the last iteration.

perplexity_iters INTEGER[]
        Array indicating the iterations for which perplexity is calculated, as 
derived from the parameters 'iter_num' and 'evaluate_every'.  For example, if 
'iter_num=5' and 'evaluate_every=2', then 'perplexity_iters' value would be 
{2,4,5} indicating that perplexity is computed at iterations 2, 4 and 5 (at the 
end), unless of course it terminated earlier due to 'perplexity_tol'.  If 
'iter_num=5' and 'evaluate_every=1', then 'perplexity_iters' value would be 
{1,2,3,4,5} indicating that perplexity is computed at every iteration, again 
assuming it ran the full number of iterations. 
{code}



> Add stopping criteria on perplexity to LDA
> ------------------------------------------
>
>                 Key: MADLIB-1351
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1351
>             Project: Apache MADlib
>          Issue Type: Improvement
>          Components: Module: Parallel Latent Dirichlet Allocation
>            Reporter: Frank McQuillan
>            Assignee: Himanshu Pandey
>            Priority: Minor
>             Fix For: v1.17
>
>
> In LDA 
> http://madlib.apache.org/docs/latest/group__grp__lda.html
> make stopping criteria on perplexity rather than just number of iterations.
> Suggested approach is to do what scikit-learn does
> https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.LatentDirichletAllocation.html
> evaluate_every : int, optional (default=0)
> How often to evaluate perplexity. Set it to 0 or negative number to not 
> evaluate perplexity in training at all. Evaluating perplexity can help you 
> check convergence in training process, but it will also increase total 
> training time. Evaluating perplexity in every iteration might increase 
> training time up to two-fold.
> perplexity_tol : float, optional (default=1e-1)
> Perplexity tolerance to stop iterating. Only used when evaluate_every is 
> greater than 0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to