[ https://issues.apache.org/jira/browse/SPARK-10808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joseph K. Bradley updated SPARK-10808: -------------------------------------- Description: Based on feedback like [SPARK-10791], we should discuss the computational and communication complexity of LDA and its optimizers in the MLlib Programming Guide. E.g.: * Online LDA can be faster than EM. * To make online LDA run faster, you can use a smaller miniBatchFraction. * Communication ** For EM, communication on each iteration is on the order of # topics * (vocabSize + # docs). ** For online LDA, communication on each iteration is on the order of # topics * vocabSize. * Decreasing vocabSize and # topics can speed things up. It's often fine to eliminate uncommon words, unless you are trying to create a very large number of topics. was: Based on feedback like [SPARK-10791], we should discuss the computational and communication complexity of LDA and its optimizers in the MLlib Programming Guide. E.g.: * Online LDA can be faster than EM. * To make online LDA run faster, you can use a smaller miniBatchFraction. * Communication ** For EM, communication on each iteration is on the order of # topics * (vocabSize + # docs). ** For online LDA, communication on each iteration is on the order of # topics * vocabSize. * Decreasing vocabSize and # topics can speed things up. > LDA user guide: discuss running time of LDA > ------------------------------------------- > > Key: SPARK-10808 > URL: https://issues.apache.org/jira/browse/SPARK-10808 > Project: Spark > Issue Type: Documentation > Components: Documentation, MLlib > Reporter: Joseph K. Bradley > Priority: Minor > > Based on feedback like [SPARK-10791], we should discuss the computational and > communication complexity of LDA and its optimizers in the MLlib Programming > Guide. E.g.: > * Online LDA can be faster than EM. > * To make online LDA run faster, you can use a smaller miniBatchFraction. > * Communication > ** For EM, communication on each iteration is on the order of # topics * > (vocabSize + # docs). > ** For online LDA, communication on each iteration is on the order of # > topics * vocabSize. > * Decreasing vocabSize and # topics can speed things up. It's often fine to > eliminate uncommon words, unless you are trying to create a very large number > of topics. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org