[ https://issues.apache.org/jira/browse/SPARK-19294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joseph K. Bradley updated SPARK-19294: -------------------------------------- Issue Type: Improvement (was: Bug) > improve LocalLDAModel save/load scaling for large models > -------------------------------------------------------- > > Key: SPARK-19294 > URL: https://issues.apache.org/jira/browse/SPARK-19294 > Project: Spark > Issue Type: Improvement > Reporter: Asher Krim > > The LDA model in ml has some of the same problems addressed by > https://issues.apache.org/jira/browse/SPARK-19247 for word2vec. > An LDA model is on order of `vocabSize` * `k`, which can easily reach 3gb for > k=1000 and vocabSize=3m. It's currently saved as a single datum in 1 > partition. > Instead, we should represent the matrix as a list, and use the logic from > https://issues.apache.org/jira/browse/SPARK-11994 to pick a reasonable number > of partitions. > cc [~josephkb] -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org