[jira] [Commented] (SPARK-6706) kmeans|| hangs for a long time if both k and vector dimension are large

Sean Owen (JIRA) Sat, 04 Apr 2015 00:05:43 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14395598#comment-14395598
 ]


Sean Owen commented on SPARK-6706:
----------------------------------

Hey Xi, I don't think this makes for a good JIRA as it does not sound like 
you've investigated what is happening. For example you can look at exactly what 
step is executing in the UI, and look at the source to know what is being 
computed. It's not clear whether you know it is stuck or simply still 
executing, or whether it's your RDDs that are being computed. Typically it's 
best to reproduce it locally vs master if at all possible. Although providing 
code is good, the whole code dump doesn't narrow it down.

> kmeans|| hangs for a long time if both k and vector dimension are large
> -----------------------------------------------------------------------
>
>                 Key: SPARK-6706
>                 URL: https://issues.apache.org/jira/browse/SPARK-6706
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 1.2.1, 1.3.0
>         Environment: Windows 64bit, Linux 64bit
>            Reporter: Xi Shen
>            Assignee: Xiangrui Meng
>              Labels: performance
>         Attachments: kmeans-debug.7z
>
>
> When doing k-means cluster with the "kmeans||" algorithm which is the default 
> one. The algorithm hangs at some "collect" step for a long time.
> Settings:
> - k above 100
> - feature dimension about 360
> - total data size is about 100 MB
> The issue was first noticed with Spark 1.2.1. I tested with both local and 
> cluster mode. On Spark 1.3.0. I, I can also reproduce this issue with local 
> mode. **However, I do not have a 1.3.0 cluster environment for me to test.**



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6706) kmeans|| hangs for a long time if both k and vector dimension are large

Reply via email to