Re: A Comparison of Platforms for Implementing and Running Very Large Scale Machine Learning Algorithms

Ignacio Zendejas Thu, 14 Aug 2014 13:26:36 -0700

Thanks, Jeremy! That's awesome. There's a group at Facebook that is
considering using Spark, so to have more projects to refer to is great.


And Matei, I completely agree. MLlib is very exciting. I respect how well
you guys are managing the project for quality. This will set the Spark
ecosystem apart beyond the already impressive gains in performance and
productivity.

cheers,
Ignacio



On Thu, Aug 14, 2014 at 12:21 PM, Matei Zaharia <matei.zaha...@gmail.com>
wrote:

> Just as a note on this paper, apart from implementing the algorithms in
> naive Python, they also run it in a fairly inefficient way. In particular
> their implementations send the model out with every task closure, which is
> really expensive for a large model, and bring it back with collectAsMap().
> It would be much more efficient to send it e.g. with
> SparkContext.broadcast() or keep it distributed on the cluster throughout
> the computation, instead of making the drive node a bottleneck for
> communication.
>
> Implementing ML algorithms well by hand is unfortunately difficult, and
> this is why we have MLlib. The hope is that you either get your desired
> algorithm out of the box or get a higher-level primitive (e.g. stochastic
> gradient descent) that you can plug some functions into, without worrying
> about the communication.
>
> Matei
>
> On August 13, 2014 at 11:10:02 AM, Ignacio Zendejas (
> ignacio.zendejas...@gmail.com) wrote:
>
> Has anyone had a chance to look at this paper (with title in subject)?
> http://www.cs.rice.edu/~lp6/comparison.pdf
>
> Interesting that they chose to use Python alone. Do we know how much
> faster
> Scala is vs. Python in general, if at all?
>
> As with any and all benchmarks, I'm sure there are caveats, but it'd be
> nice to have a response to the question above for starters.
>
> Thanks,
> Ignacio
>
>

Re: A Comparison of Platforms for Implementing and Running Very Large Scale Machine Learning Algorithms

Reply via email to