[ https://issues.apache.org/jira/browse/SPARK-10808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14906975#comment-14906975 ]
Mohamed Baddar commented on SPARK-10808: ---------------------------------------- Hello [~josephkb] , can i take this task . thanks > LDA user guide: discuss running time of LDA > ------------------------------------------- > > Key: SPARK-10808 > URL: https://issues.apache.org/jira/browse/SPARK-10808 > Project: Spark > Issue Type: Documentation > Components: Documentation, MLlib > Reporter: Joseph K. Bradley > Priority: Minor > > Based on feedback like [SPARK-10791], we should discuss the computational and > communication complexity of LDA and its optimizers in the MLlib Programming > Guide. E.g.: > * Online LDA can be faster than EM. > * To make online LDA run faster, you can use a smaller miniBatchFraction. > * Communication > ** For EM, communication on each iteration is on the order of # topics * > (vocabSize + # docs). > ** For online LDA, communication on each iteration is on the order of # > topics * vocabSize. > * Decreasing vocabSize and # topics can speed things up. It's often fine to > eliminate uncommon words, unless you are trying to create a very large number > of topics. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org