On Fri, Mar 14, 2014 at 10:38 AM, Pat Ferrel <[email protected]> wrote:

> So with very little work we could have RSJ, Matrix ops, SSVD+PCA running
> on Spark in the mainline of Mahout? Honestly?


What makes you doubt? There's a unit test there that runs it in local mode.
Good benchmarking is what it lacks of course. It may require some presplit
tuning (e.g. for cases when hdfs splits are too large so that it would
affect run time of individual task), but that's an improvement, as
everything is. Point is since programming model is very palatble, one'd be
able to tweak these things with ease.

The distributed PCA version it think was not yet committed though. i think
it still sits on the dev branch.  it is not like it is a very active
development, more like POC. But given results on other ML learning projects
on spark, i don't see much reason to doubt the performance will be
significantly different from those that already run on spark. Again, it is
more about environment optimizer, ease of use and prototyping, programming
model.

Reply via email to