To add to this, Spark helps in parallelism of feature creation on massive
datasets, which can be resource or memory intensive based on the nature of
the application.

Best Regards,
Sonal
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>



On Tue, Nov 4, 2014 at 10:49 AM, Xiangrui Meng <men...@gmail.com> wrote:

> Many ML algorithms are sequential because they were not designed to be
> parallel. However, ML is not driven by algorithms in practice, but by
> data and applications. As datasets getting bigger and bigger, some
> algorithms got revised to work in parallel, like SGD and matrix
> factorization. MLlib tries to implement those scalable algorithms that
> can handle large-scale datasets.
>
> That being said, even with sequential ML algorithms, Spark is helpful.
> Because in practice we need to test multiple sets of parameters and
> select the best one. Though the algorithm is sequential, the training
> part is embarrassingly parallel. We can broadcast the whole dataset,
> and then train model 1 on node 1, model 2 on node 2, etc. Cross
> validation also falls into this category.
>
> -Xiangrui
>
> On Mon, Nov 3, 2014 at 1:55 PM, ll <duy.huynh....@gmail.com> wrote:
> > i'm struggling with implementing a few algorithms with spark.  hope to
> get
> > help from the community.
> >
> > most of the machine learning algorithms today are "sequential", while
> spark
> > is all about "parallelism".  it seems to me that using spark doesn't
> > actually help much, because in most cases you can't really paralellize a
> > sequential algorithm.
> >
> > there must be some strong reasons why mllib was created and so many
> people
> > claim spark is ideal for machine learning.
> >
> > what are those reasons?
> >
> > what are some specific examples when & how to use spark to implement
> > "sequential" machine learning algorithms?
> >
> > any commen/feedback/answer is much appreciated.
> >
> > thanks!
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/is-spark-a-good-fit-for-sequential-machine-learning-algorithms-tp18000.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to