[ 
https://issues.apache.org/jira/browse/MADLIB-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873498#comment-16873498
 ] 

Scott Hajek commented on MADLIB-1351:
-------------------------------------

I like the proposed API for stopping criterion on LDA. I think *evaluate_every* 
and *perp_tol*  are sufficient, and the default values are reasonable. I might 
suggest calling the latter by a slightly less abbreviated name for clarity, 
like *perplex_tol* or *perplexity_tol*. The brevity isn't as necessary in 
MADlib since you don't/can't specify with keyword arguments in SQL function 
calls anyway. It's mainly for documentation.

 

Although I guess [named 
notation|[https://www.postgresql.org/docs/9.0/sql-syntax-calling-funcs.html]] 
will be (or maybe already is) available once Greenplum catches up with later 
versions of Postgres. In that case, maybe a good compromise would be 
*perplex_tol*. 

> Add stopping criteria on perplexity to LDA
> ------------------------------------------
>
>                 Key: MADLIB-1351
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1351
>             Project: Apache MADlib
>          Issue Type: Improvement
>          Components: Module: Parallel Latent Dirichlet Allocation
>            Reporter: Frank McQuillan
>            Assignee: Himanshu Pandey
>            Priority: Minor
>             Fix For: v1.17
>
>
> In LDA 
> http://madlib.apache.org/docs/latest/group__grp__lda.html
> make stopping criteria on perplexity rather than just number of iterations.
> Suggested approach is to do what scikit-learn does
> https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.LatentDirichletAllocation.html
> evaluate_every : int, optional (default=0)
> How often to evaluate perplexity. Set it to 0 or negative number to not 
> evaluate perplexity in training at all. Evaluating perplexity can help you 
> check convergence in training process, but it will also increase total 
> training time. Evaluating perplexity in every iteration might increase 
> training time up to two-fold.
> perp_tol : float, optional (default=1e-1)
> Perplexity tolerance in batch learning. Only used when evaluate_every is 
> greater than 0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to