Joseph K. Bradley created SPARK-10808: -----------------------------------------
Summary: LDA user guide: discuss running time of LDA Key: SPARK-10808 URL: https://issues.apache.org/jira/browse/SPARK-10808 Project: Spark Issue Type: Documentation Components: Documentation, MLlib Reporter: Joseph K. Bradley Priority: Minor Based on feedback like [SPARK-10791], we should discuss the computational and communication complexity of LDA and its optimizers in the MLlib Programming Guide. E.g.: * Online LDA can be faster than EM. * To make online LDA run faster, you can use a smaller miniBatchFraction. * Communication ** For EM, communication on each iteration is on the order of # topics * (vocabSize + # docs). ** For online LDA, communication on each iteration is on the order of # topics * vocabSize. * Decreasing vocabSize and # topics can speed things up. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org