[ 
https://issues.apache.org/jira/browse/SPARK-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062026#comment-14062026
 ] 

DjvuLee commented on SPARK-2138:
--------------------------------

In my experiment, I set the akka.frameSize=200, my data is 1.8G, when the 
iteration become 18th, the size of serialized task is larger than 200, and the 
Spark application exit, so maybe we should improve this.

As for this issue, it actually not solved, although kmeans job can run success 
when set a large akka.framesize parameter. so I think keep this issue open is 
helpful.

> The KMeans algorithm in the MLlib can lead to the Serialized Task size become 
> bigger and bigger
> -----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-2138
>                 URL: https://issues.apache.org/jira/browse/SPARK-2138
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 0.9.0, 0.9.1
>            Reporter: DjvuLee
>            Assignee: Xiangrui Meng
>
> When the algorithm running at certain stage, when running the reduceBykey() 
> function, It can lead to Executor Lost and Task lost, after several times. 
> the application exit.
> When this error occurred, the size of serialized task is bigger than 10MB, 
> and the size become larger as the iteration increase.
> the data generation file: https://gist.github.com/djvulee/7e3b2c9eb33ff0037622
> the running code: https://gist.github.com/djvulee/6bf00e60885215e3bfd5



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to