[ https://issues.apache.org/jira/browse/SPARK-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiangrui Meng closed SPARK-2138. -------------------------------- Resolution: Not a Problem Target Version/s: (was: 1.4.0) I'm closing this issue with setting a larger akka.frameSize as the workaround. SPARK-3424 might be relevant. > The KMeans algorithm in the MLlib can lead to the Serialized Task size become > bigger and bigger > ----------------------------------------------------------------------------------------------- > > Key: SPARK-2138 > URL: https://issues.apache.org/jira/browse/SPARK-2138 > Project: Spark > Issue Type: Bug > Components: MLlib > Affects Versions: 0.9.0, 0.9.1 > Reporter: DjvuLee > Assignee: Xiangrui Meng > Labels: clustering > > When the algorithm running at certain stage, when running the reduceBykey() > function, It can lead to Executor Lost and Task lost, after several times. > the application exit. > When this error occurred, the size of serialized task is bigger than 10MB, > and the size become larger as the iteration increase. > the data generation file: https://gist.github.com/djvulee/7e3b2c9eb33ff0037622 > the running code: https://gist.github.com/djvulee/6bf00e60885215e3bfd5 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org