Hi Saikat,
The differences are that MLLib offers a different set of algorithms
(e.g. you want find cooccurrence analysis or stochastic svd) and that
their codebase consists of hand-tuned, spark-specific implementations.
Mahout on the other hand, allows to implement algorithms in an
engine-agnostic, declarative way. This allows for the automatic
optimization of our algorithms as well as for running the same code on
multiple backends (there has been interested from h20 as well as Apache
Flink to integrate with our DSL).
--sebastian
On 06/01/2014 01:41 AM, Saikat Kanjilal wrote:
Actually the subject of my email should say spark->mlib versus mahout->spark :)
From: sxk1...@hotmail.com
To: dev@mahout.apache.org
Subject: mlib versus spark
Date: Sat, 31 May 2014 16:38:13 -0700
Ok I'll admit I'm not seeing what the obvious differences are, I'm a bit
confused when I think of mahout using spark, since spark already uses an
embedded machine learning library (mlib) what would be the impetus to use
mahout instead, seems like you should be able to write or add algortihms to
mlib and use spark, has someone from mahout looked at mlib to see if there will
be a strongusecase for using one versus the other?
http://spark.apache.org/mllib/