Re: mlib versus spark

Dmitriy Lyubimov Sun, 01 Jun 2014 12:48:57 -0700

I would add that what we were doing, (well, at least what i was doing), was
aimed at building a ML environment, rather than simply a collection of
algorithms. In practice I always wanted to customize something in of the
shelf algorithms. E.g. for things like als and rlfm there're a thousand
custom schemes published, and perhaps just as many unbpublished one. (Take,
for example, the bias trick Sebastian was mentioning in his tutorial --
this is an easy addition to all flavors of factorization techniques).


As I also mentioend before, featurization and standardization tweaks are
even more important than algorithms before, and nobody is doing it actually
well as it stands for Spark.

So we are looking into something along the lines of Julia. In that sense,
the difference between motivations of recent contributions to mahout and
mllib are just as vast as motivations behind Julia and mllib.
On May 31, 2014 11:16 PM, "Sebastian Schelter" <[email protected]> wrote:

> Hi Saikat,
>
> The differences are that MLLib offers a different set of algorithms (e.g.
> you want find cooccurrence analysis or stochastic svd) and that their
> codebase consists of hand-tuned, spark-specific implementations.
>
> Mahout on the other hand, allows to implement algorithms in an
> engine-agnostic, declarative way. This allows for the automatic
> optimization of our algorithms as well as for running the same code on
> multiple backends (there has been interested from h20 as well as Apache
> Flink to integrate with our DSL).
>
> --sebastian
>
> On 06/01/2014 01:41 AM, Saikat Kanjilal wrote:
>
>> Actually the subject of my email should say spark->mlib versus
>> mahout->spark :)
>>
>>  From: [email protected]
>>> To: [email protected]
>>> Subject: mlib versus spark
>>> Date: Sat, 31 May 2014 16:38:13 -0700
>>>
>>> Ok I'll admit I'm not seeing what the obvious differences are, I'm a bit
>>> confused when I think of mahout using spark, since spark already uses an
>>> embedded machine learning library (mlib) what would be the impetus to use
>>> mahout instead, seems like you should be able to write or add algortihms to
>>> mlib and use spark, has someone from mahout looked at mlib to see if there
>>> will be a strongusecase for using one versus the other?
>>> http://spark.apache.org/mllib/
>>>
>>
>>
>>
>

Re: mlib versus spark

Reply via email to