[ 
https://issues.apache.org/jira/browse/MADLIB-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank McQuillan updated MADLIB-1351:
------------------------------------
    Description: 
In LDA 
http://madlib.apache.org/docs/latest/group__grp__lda.html
make stopping criteria on perplexity rather than just number of iterations.

Suggested approach is to do what scikit-learn does
https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.LatentDirichletAllocation.html

evaluate_every : int, optional (default=0)
How often to evaluate perplexity. Set it to 0 or negative number to not 
evaluate perplexity in training at all. Evaluating perplexity can help you 
check convergence in training process, but it will also increase total training 
time. Evaluating perplexity in every iteration might increase training time up 
to two-fold.

perp_tol : float, optional (default=1e-1)
Perplexity tolerance in batch learning. Only used when evaluate_every is 
greater than 0.


  was:
In LDA 
http://madlib.apache.org/docs/latest/group__grp__lda.html
make stopping criteria on perplexity rather than just number of iterations.

Suggested approach is to do what scikit-learn does
https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.LatentDirichletAllocation.html

evaluate_every : int, optional (default=0)
How often to evaluate perplexity. Only used in fit method. set it to 0 or 
negative number to not evalute perplexity in training at all. Evaluating 
perplexity can help you check convergence in training process, but it will also 
increase total training time. Evaluating perplexity in every iteration might 
increase training time up to two-fold.

perp_tol : float, optional (default=1e-1)
Perplexity tolerance in batch learning. Only used when evaluate_every is 
greater than 0.



> Add stopping criteria on perplexity to LDA
> ------------------------------------------
>
>                 Key: MADLIB-1351
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1351
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: Module: Parallel Latent Dirichlet Allocation
>            Reporter: Frank McQuillan
>            Priority: Major
>             Fix For: v1.17
>
>
> In LDA 
> http://madlib.apache.org/docs/latest/group__grp__lda.html
> make stopping criteria on perplexity rather than just number of iterations.
> Suggested approach is to do what scikit-learn does
> https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.LatentDirichletAllocation.html
> evaluate_every : int, optional (default=0)
> How often to evaluate perplexity. Set it to 0 or negative number to not 
> evaluate perplexity in training at all. Evaluating perplexity can help you 
> check convergence in training process, but it will also increase total 
> training time. Evaluating perplexity in every iteration might increase 
> training time up to two-fold.
> perp_tol : float, optional (default=1e-1)
> Perplexity tolerance in batch learning. Only used when evaluate_every is 
> greater than 0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to