Xiaoye Sun created SPARK-18731:
----------------------------------

             Summary: Task size in K-means is so large
                 Key: SPARK-18731
                 URL: https://issues.apache.org/jira/browse/SPARK-18731
             Project: Spark
          Issue Type: Improvement
          Components: MLlib
    Affects Versions: 1.6.1
            Reporter: Xiaoye Sun
            Priority: Minor


When run the KMeans algorithm for a large model (e.g. 100k features and 100 
centers), there will be warning shown for many of the stages saying that the 
task size is very large. Here is an example warning. 
WARN TaskSetManager: Stage 23 contains a task of very large size (56256 KB). 
The maximum recommended task size is 100 KB.

This could happen at (sum at KMeansModel.scala:88), (takeSample at 
KMeans.scala:378), (aggregate at KMeans.scala:404) and (collect at 
KMeans.scala:436). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to