[jira] [Commented] (SPARK-2138) The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger

DjvuLee (JIRA) Tue, 15 Jul 2014 08:05:43 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062163#comment-14062163
 ]


DjvuLee commented on SPARK-2138:
--------------------------------

oh, I am a little sorry that I write some mistaken in my last comment, 18 is 
not the iteration, it is the stage id.

[~mengxr] your speculate is right, after I read the code, I found it happens in 
the initialization stage indeed, with no relationship with iteration number. 
And if you have a higher cluster for the final result, the size of serialized 
task will become higher during initialization stage.

> The KMeans algorithm in the MLlib can lead to the Serialized Task size become 
> bigger and bigger
> -----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-2138
>                 URL: https://issues.apache.org/jira/browse/SPARK-2138
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 0.9.0, 0.9.1
>            Reporter: DjvuLee
>            Assignee: Xiangrui Meng
>
> When the algorithm running at certain stage, when running the reduceBykey() 
> function, It can lead to Executor Lost and Task lost, after several times. 
> the application exit.
> When this error occurred, the size of serialized task is bigger than 10MB, 
> and the size become larger as the iteration increase.
> the data generation file: https://gist.github.com/djvulee/7e3b2c9eb33ff0037622
> the running code: https://gist.github.com/djvulee/6bf00e60885215e3bfd5



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (SPARK-2138) The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger

Reply via email to