[ 
https://issues.apache.org/jira/browse/SPARK-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494518#comment-14494518
 ] 

Alexander Ulanov commented on SPARK-5256:
-----------------------------------------

Probably the main issue for MLlib is that iterative algorithms are implemented 
with aggregate function. It has a fixed overhead around half of a second that 
limits its application when one needs to make big number of iterations. This is 
the case for bigger data for which Spark is intended for. This problem gets 
worse with stochastic algorithms because there is no good way to randomly pick 
data from RDD and one needs to sequentially look through it.

> Improving MLlib optimization APIs
> ---------------------------------
>
>                 Key: SPARK-5256
>                 URL: https://issues.apache.org/jira/browse/SPARK-5256
>             Project: Spark
>          Issue Type: Umbrella
>          Components: MLlib
>    Affects Versions: 1.2.0
>            Reporter: Joseph K. Bradley
>
> *Goal*: Improve APIs for optimization
> *Motivation*: There have been several disjoint mentions of improving the 
> optimization APIs to make them more pluggable, extensible, etc.  This JIRA is 
> a place to discuss what API changes are necessary for the long term, and to 
> provide links to other relevant JIRAs.
> Eventually, I hope this leads to a design doc outlining:
> * current issues
> * requirements such as supporting many types of objective functions, 
> optimization algorithms, and parameters to those algorithms
> * ideal API
> * breakdown of smaller JIRAs needed to achieve that API
> I will soon create an initial design doc, and I will try to watch this JIRA 
> and include ideas from JIRA comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to