[ https://issues.apache.org/jira/browse/SPARK-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045558#comment-14045558 ]
Piotr Szul commented on SPARK-2138: ----------------------------------- I ran into similar problem when running KMean with v1.0.0. It look however than it's not the issue with backed but rather the driver - the driver tries to serialize tasks that are bigger then the default 10MB spark.akka.frameSize - no error - but the tasks do not reach executors. In 1.0.0 however adding: --driver-java-options "-Dspark.akka.frameSize=20" to spark-submit submit options helps. Also the size of the task seems to grow only after the first iteration of KMeans and then it remains the same. The biggest problem for me is that there are no errors being reported - took me a while to figure out why my clustering is not working. Here is a bit of code that has the same result: Math.rint(10.0) val items = sc.parallelize(List.range(0,10),10); val data = (for (i <- Iterator.range(0,1500000)) yield Math.random()).toArray println(items.map{ v => data.length * v }.count()) It tries to serialize tasks with 11372865 bytes and fails with default spark.akka.frameSize (using JavaSerializer) but works OK with: --driver-java-options "-Dspark.akka.frameSize=20" > The KMeans algorithm in the MLlib can lead to the Serialized Task size become > bigger and bigger > ----------------------------------------------------------------------------------------------- > > Key: SPARK-2138 > URL: https://issues.apache.org/jira/browse/SPARK-2138 > Project: Spark > Issue Type: Bug > Components: MLlib > Affects Versions: 0.9.0, 0.9.1 > Reporter: DjvuLee > Assignee: Xiangrui Meng > > When the algorithm running at certain stage, when running the reduceBykey() > function, It can lead to Executor Lost and Task lost, after several times. > the application exit. > When this error occurred, the size of serialized task is bigger than 10MB, > and the size become larger as the iteration increase. > the data generation file: https://gist.github.com/djvulee/7e3b2c9eb33ff0037622 > the running code: https://gist.github.com/djvulee/6bf00e60885215e3bfd5 -- This message was sent by Atlassian JIRA (v6.2#6252)