[ https://issues.apache.org/jira/browse/SPARK-6567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371816#comment-15371816 ]
Ben McCann commented on SPARK-6567: ----------------------------------- [~hucheng] can you share your code for this? > Large linear model parallelism via a join and reduceByKey > --------------------------------------------------------- > > Key: SPARK-6567 > URL: https://issues.apache.org/jira/browse/SPARK-6567 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib > Reporter: Reza Zadeh > Attachments: model-parallelism.pptx > > > To train a linear model, each training point in the training set needs its > dot product computed against the model, per iteration. If the model is large > (too large to fit in memory on a single machine) then SPARK-4590 proposes > using parameter server. > There is an easier way to achieve this without parameter servers. In > particular, if the data is held as a BlockMatrix and the model as an RDD, > then each block can be joined with the relevant part of the model, followed > by a reduceByKey to compute the dot products. > This obviates the need for a parameter server, at least for linear models. > However, it's unclear how it compares performance-wise to parameter servers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org