[
https://issues.apache.org/jira/browse/MADLIB-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873720#comment-16873720
]
Frank McQuillan commented on MADLIB-1351:
-----------------------------------------
(1)
I would suggest we use the more verbose explanation of 'evaluate_every' and
'perplexity_tol' that we put in the JIRA description above, i.e.,
{code}
evaluate_every : int, optional (default=0)
How often to evaluate perplexity. Set it to 0 or negative number to not
evaluate perplexity in training at all. Evaluating perplexity can help you
check convergence in training process, but it will also increase total training
time. Evaluating perplexity in every iteration might increase training time up
to two-fold.
perplexity_tol : float, optional (default=1e-1)
Perplexity tolerance to stop iterating. Only used when evaluate_every is
greater than 0.
{code}
(2)
For the model output table, perplexity will need to be an array along with a
separate array indicating the iterations it was calculated on. This is similar
to what we do in 'madlib_keras_evaluate' :
{code}
perplexity DOUBLE PRECISION[]. Array of perplexity values as per the
'evaluate_every' parameter. For example, if 'evaluate_every=5' this would be an
array of perplexity values for every 5th iteration, plus the last iteration.
perplexity_iters INTEGER[]
Array indicating the iterations for which perplexity is calculated, as
derived from the parameters 'iter_num' and 'evaluate_every'. For example, if
'iter_num=5' and 'evaluate_every=2', then 'perplexity_iters' value would be
{2,4,5} indicating that perplexity is computed at iterations 2, 4 and 5 (at the
end), unless of course it terminated earlier due to 'perplexity_tol'. If
'iter_num=5' and 'evaluate_every=1', then 'perplexity_iters' value would be
{1,2,3,4,5} indicating that perplexity is computed at every iteration, again
assuming it ran the full number of iterations.
{code}
> Add stopping criteria on perplexity to LDA
> ------------------------------------------
>
> Key: MADLIB-1351
> URL: https://issues.apache.org/jira/browse/MADLIB-1351
> Project: Apache MADlib
> Issue Type: Improvement
> Components: Module: Parallel Latent Dirichlet Allocation
> Reporter: Frank McQuillan
> Assignee: Himanshu Pandey
> Priority: Minor
> Fix For: v1.17
>
>
> In LDA
> http://madlib.apache.org/docs/latest/group__grp__lda.html
> make stopping criteria on perplexity rather than just number of iterations.
> Suggested approach is to do what scikit-learn does
> https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.LatentDirichletAllocation.html
> evaluate_every : int, optional (default=0)
> How often to evaluate perplexity. Set it to 0 or negative number to not
> evaluate perplexity in training at all. Evaluating perplexity can help you
> check convergence in training process, but it will also increase total
> training time. Evaluating perplexity in every iteration might increase
> training time up to two-fold.
> perplexity_tol : float, optional (default=1e-1)
> Perplexity tolerance to stop iterating. Only used when evaluate_every is
> greater than 0.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)