[ https://issues.apache.org/jira/browse/SPARK-5560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joseph K. Bradley resolved SPARK-5560. -------------------------------------- Resolution: Fixed Fix Version/s: 1.4.0 Target Version/s: (was: 1.5.0) > LDA EM should scale to more iterations > -------------------------------------- > > Key: SPARK-5560 > URL: https://issues.apache.org/jira/browse/SPARK-5560 > Project: Spark > Issue Type: Improvement > Components: MLlib > Affects Versions: 1.3.0 > Reporter: Joseph K. Bradley > Assignee: Joseph K. Bradley > Fix For: 1.4.0 > > Original Estimate: 336h > Remaining Estimate: 336h > > Latent Dirichlet Allocation (LDA) sometimes fails to run for many iterations > on large datasets, even when it is able to run for a few iterations. It > should be able to run for as many iterations as the user likes, with proper > persistence and checkpointing. > Here is an example from a test on 16 workers (EC2 r3.2xlarge) on a big > Wikipedia dataset: > * 100 topics > * Training set size: 4072243 documents > * Vocabulary size: 9869422 terms > * Training set size: 1041734290 tokens > It runs for about 10-15 iterations before failing, even when using a variety > of checkpointInterval values and longer timeout settings (up to 5 minutes). > The failure varies from disconnections from workers/driver to workers running > out of disk space. I would not expect workers to run out of memory or disk > space based on rough calculations. There was some job imbalance, but not a > significant amount. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org