[ https://issues.apache.org/jira/browse/SPARK-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083305#comment-14083305 ]
Apache Spark commented on SPARK-1580: ------------------------------------- User 'mengxr' has created a pull request for this issue: https://github.com/apache/spark/pull/1731 > [MLlib] ALS: Estimate communication and computation costs given a partitioner > ----------------------------------------------------------------------------- > > Key: SPARK-1580 > URL: https://issues.apache.org/jira/browse/SPARK-1580 > Project: Spark > Issue Type: Improvement > Components: MLlib > Reporter: Tor Myklebust > Priority: Minor > > It would be nice to be able to estimate the amount of work needed to solve an > ALS problem. The chief components of this "work" are computation time---time > spent forming and solving the least squares problems---and communication > cost---the number of bytes sent across the network. Communication cost > depends heavily on how the users and products are partitioned. > We currently do not try to cluster users or products so that fewer feature > vectors need to be communicated. This is intended as a first step toward > that end---we ought to be able to tell whether one partitioning is better > than another. -- This message was sent by Atlassian JIRA (v6.2#6252)