[GitHub] [madlib] kaknikhil commented on a change in pull request #432: MADLIB-1351 : Added stopping criteria on perplexity to LDA

GitBox Tue, 03 Sep 2019 14:12:08 -0700

kaknikhil commented on a change in pull request #432: MADLIB-1351 : Added 
stopping criteria on perplexity to LDA
URL: https://github.com/apache/madlib/pull/432#discussion_r320471060


 ##########
 File path: src/ports/postgres/modules/lda/lda.py_in
 ##########
 @@ -191,7 +199,32 @@ class LDATrainer:
         # etime = time.time()
         # plpy.notice('\t\ttime elapsed: %.2f seconds' % (etime - stime))
 
+        # JIRA: MADLIB-1351
+        # Calculate Perplexity for evaluate_every Iteration
+        # Skil the calculation at the first iteration as the model generated
+        # at first iteration is a random model
 
 Review comment:
   I think we should be more verbose in this comment. Something like (but 
definitely not limited to)
   ```
   For each iteration 
   1. Model table is updated (for the first iteration, it is the random model. 
For iteration >1 , the model that is updated is learnt in the previous 
iteration)
   1. __lda_count_topic_agg is called
   1. then lda_gibbs_sample is called which learns and updates the model(the 
updated model is not passed to python. The learnt model is updated in the next 
iteration)
   
   Because of this workflow we can safely ignore the first perplexity value.
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [madlib] kaknikhil commented on a change in pull request #432: MADLIB-1351 : Added stopping criteria on perplexity to LDA

Reply via email to