[jira] [Commented] (SPARK-6567) Large linear model parallelism via a join and reduceByKey

Nick Pentreath (JIRA) Fri, 24 Feb 2017 00:06:07 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-6567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15882182#comment-15882182
 ]


Nick Pentreath commented on SPARK-6567:
---------------------------------------

This JIRA has been around for a while without any movement. I think generally 
it seems that the "vector-free" versions of algorithms such as L-BFGS (see 
https://spark-summit.org/east-2017/events/scaling-apache-spark-mllib-to-billions-of-parameters/)
 will be generally more efficient.

Shall we close this (unless there are major objections)?

> Large linear model parallelism via a join and reduceByKey
> ---------------------------------------------------------
>
>                 Key: SPARK-6567
>                 URL: https://issues.apache.org/jira/browse/SPARK-6567
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, MLlib
>            Reporter: Reza Zadeh
>         Attachments: model-parallelism.pptx
>
>
> To train a linear model, each training point in the training set needs its 
> dot product computed against the model, per iteration. If the model is large 
> (too large to fit in memory on a single machine) then SPARK-4590 proposes 
> using parameter server.
> There is an easier way to achieve this without parameter servers. In 
> particular, if the data is held as a BlockMatrix and the model as an RDD, 
> then each block can be joined with the relevant part of the model, followed 
> by a reduceByKey to compute the dot products.
> This obviates the need for a parameter server, at least for linear models. 
> However, it's unclear how it compares performance-wise to parameter servers.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6567) Large linear model parallelism via a join and reduceByKey

Reply via email to