[jira] [Commented] (SPARK-11560) Optimize KMeans implementation
[ https://issues.apache.org/jira/browse/SPARK-11560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15545010#comment-15545010 ] Apache Spark commented on SPARK-11560: -- User 'srowen' has created a pull request for this issue: https://github.com/apache/spark/pull/15342 > Optimize KMeans implementation > -- > > Key: SPARK-11560 > URL: https://issues.apache.org/jira/browse/SPARK-11560 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.6.0 >Reporter: Xiangrui Meng >Assignee: Yanbo Liang > > After we dropped `runs`, we can simplify and optimize the k-means > implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11560) Optimize KMeans implementation
[ https://issues.apache.org/jira/browse/SPARK-11560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15458467#comment-15458467 ] Apache Spark commented on SPARK-11560: -- User 'yanboliang' has created a pull request for this issue: https://github.com/apache/spark/pull/14937 > Optimize KMeans implementation > -- > > Key: SPARK-11560 > URL: https://issues.apache.org/jira/browse/SPARK-11560 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.6.0 >Reporter: Xiangrui Meng >Assignee: Yanbo Liang > > After we dropped `runs`, we can simplify and optimize the k-means > implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11560) Optimize KMeans implementation
[ https://issues.apache.org/jira/browse/SPARK-11560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118937#comment-15118937 ] Yanbo Liang commented on SPARK-11560: - [~yuhaoyan] I think the first step is to use BLAS Level 3 matrix-matrix multiplications to accelerate the pairwise distance computation. After that we can discuss the improvements specified to sparse data. I'm open to hear others' thoughts. > Optimize KMeans implementation > -- > > Key: SPARK-11560 > URL: https://issues.apache.org/jira/browse/SPARK-11560 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.6.0 >Reporter: Xiangrui Meng >Assignee: Yanbo Liang > > After we dropped `runs`, we can simplify and optimize the k-means > implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11560) Optimize KMeans implementation
[ https://issues.apache.org/jira/browse/SPARK-11560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116519#comment-15116519 ] yuhao yang commented on SPARK-11560: Will the new version support sparse data better, or it's not a target? > Optimize KMeans implementation > -- > > Key: SPARK-11560 > URL: https://issues.apache.org/jira/browse/SPARK-11560 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.6.0 >Reporter: Xiangrui Meng >Assignee: Yanbo Liang > > After we dropped `runs`, we can simplify and optimize the k-means > implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11560) Optimize KMeans implementation
[ https://issues.apache.org/jira/browse/SPARK-11560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105379#comment-15105379 ] Apache Spark commented on SPARK-11560: -- User 'yanboliang' has created a pull request for this issue: https://github.com/apache/spark/pull/10806 > Optimize KMeans implementation > -- > > Key: SPARK-11560 > URL: https://issues.apache.org/jira/browse/SPARK-11560 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.6.0 >Reporter: Xiangrui Meng > > After we dropped `runs`, we can simplify and optimize the k-means > implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11560) Optimize KMeans implementation
[ https://issues.apache.org/jira/browse/SPARK-11560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105427#comment-15105427 ] Apache Spark commented on SPARK-11560: -- User 'yanboliang' has created a pull request for this issue: https://github.com/apache/spark/pull/10306 > Optimize KMeans implementation > -- > > Key: SPARK-11560 > URL: https://issues.apache.org/jira/browse/SPARK-11560 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.6.0 >Reporter: Xiangrui Meng > > After we dropped `runs`, we can simplify and optimize the k-means > implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11560) Optimize KMeans implementation
[ https://issues.apache.org/jira/browse/SPARK-11560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072403#comment-15072403 ] Apache Spark commented on SPARK-11560: -- User 'yanboliang' has created a pull request for this issue: https://github.com/apache/spark/pull/10306 > Optimize KMeans implementation > -- > > Key: SPARK-11560 > URL: https://issues.apache.org/jira/browse/SPARK-11560 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.6.0 >Reporter: Xiangrui Meng > > After we dropped `runs`, we can simplify and optimize the k-means > implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11560) Optimize KMeans implementation
[ https://issues.apache.org/jira/browse/SPARK-11560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997103#comment-14997103 ] Joseph K. Bradley commented on SPARK-11560: --- Do we want to keep the implementation for the Pipelines API? We had worked on stacking models for linear methods (to do many runs at once) to amortize overhead, and this is the same kind of effort. It should be helpful in some problem domains. Has there been evidence that it's rarely useful? > Optimize KMeans implementation > -- > > Key: SPARK-11560 > URL: https://issues.apache.org/jira/browse/SPARK-11560 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.7.0 >Reporter: Xiangrui Meng > > After we dropped `runs`, we can simplify and optimize the k-means > implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11560) Optimize KMeans implementation
[ https://issues.apache.org/jira/browse/SPARK-11560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994556#comment-14994556 ] Jun Zheng commented on SPARK-11560: --- By simplification, do you mean we assume var "runs" in "initRandom", "initKMeansParallel" are always 1? > Optimize KMeans implementation > -- > > Key: SPARK-11560 > URL: https://issues.apache.org/jira/browse/SPARK-11560 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.7.0 >Reporter: Xiangrui Meng > > After we dropped `runs`, we can simplify and optimize the k-means > implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org