[ 
https://issues.apache.org/jira/browse/SPARK-29496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nardi updated SPARK-29496:
--------------------------------
    Description: In gensim, [the LDA 
model][[https://radimrehurek.com/gensim/models/ldamodel.html]] has a parameter 
eval_every that allows a user to specify that the model should be evaluated 
every X iterations to determine its log perplexity. This helps to determine 
convergence of the model, and whether or not the proper number of iterations 
has been chosen. Spark has no similar functionality in its implementation of 
LDA. This should be added, as it appears the only way to achieve this 
functionality would be to train models of varying numbers of iterations and 
evaluate each's log perplexity.  (was: In gensim, [the LDA 
model|[https://radimrehurek.com/gensim/models/ldamodel.html]] has a parameter 
eval_every that allows a user to specify that the model should be evaluated 
every X iterations to determine its log perplexity. This helps to determine 
convergence of the model, and whether or not the proper number of iterations 
has been chosen. Spark has no similar functionality in its implementation of 
LDA. This should be added, as it appears the only way to achieve this 
functionality would be to train models of varying numbers of iterations and 
evaluate each's log perplexity.)

> Add ability to estimate perplexity every X iterations for LDA
> -------------------------------------------------------------
>
>                 Key: SPARK-29496
>                 URL: https://issues.apache.org/jira/browse/SPARK-29496
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 2.4.4
>            Reporter: Chris Nardi
>            Priority: Major
>
> In gensim, [the LDA 
> model][[https://radimrehurek.com/gensim/models/ldamodel.html]] has a 
> parameter eval_every that allows a user to specify that the model should be 
> evaluated every X iterations to determine its log perplexity. This helps to 
> determine convergence of the model, and whether or not the proper number of 
> iterations has been chosen. Spark has no similar functionality in its 
> implementation of LDA. This should be added, as it appears the only way to 
> achieve this functionality would be to train models of varying numbers of 
> iterations and evaluate each's log perplexity.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to