is spark a good fit for sequential machine learning algorithms?

2014-11-03 Thread ll
i'm struggling with implementing a few algorithms with spark.  hope to get
help from the community.

most of the machine learning algorithms today are sequential, while spark
is all about parallelism.  it seems to me that using spark doesn't
actually help much, because in most cases you can't really paralellize a
sequential algorithm.

there must be some strong reasons why mllib was created and so many people
claim spark is ideal for machine learning.

what are those reasons?  

what are some specific examples when  how to use spark to implement
sequential machine learning algorithms?

any commen/feedback/answer is much appreciated.

thanks!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/is-spark-a-good-fit-for-sequential-machine-learning-algorithms-tp18000.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: is spark a good fit for sequential machine learning algorithms?

2014-11-03 Thread Xiangrui Meng
Many ML algorithms are sequential because they were not designed to be
parallel. However, ML is not driven by algorithms in practice, but by
data and applications. As datasets getting bigger and bigger, some
algorithms got revised to work in parallel, like SGD and matrix
factorization. MLlib tries to implement those scalable algorithms that
can handle large-scale datasets.

That being said, even with sequential ML algorithms, Spark is helpful.
Because in practice we need to test multiple sets of parameters and
select the best one. Though the algorithm is sequential, the training
part is embarrassingly parallel. We can broadcast the whole dataset,
and then train model 1 on node 1, model 2 on node 2, etc. Cross
validation also falls into this category.

-Xiangrui

On Mon, Nov 3, 2014 at 1:55 PM, ll duy.huynh@gmail.com wrote:
 i'm struggling with implementing a few algorithms with spark.  hope to get
 help from the community.

 most of the machine learning algorithms today are sequential, while spark
 is all about parallelism.  it seems to me that using spark doesn't
 actually help much, because in most cases you can't really paralellize a
 sequential algorithm.

 there must be some strong reasons why mllib was created and so many people
 claim spark is ideal for machine learning.

 what are those reasons?

 what are some specific examples when  how to use spark to implement
 sequential machine learning algorithms?

 any commen/feedback/answer is much appreciated.

 thanks!



 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/is-spark-a-good-fit-for-sequential-machine-learning-algorithms-tp18000.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org