Looking back over the last year Mahout has gone through a lot of changes. Most 
users are still using the legacy mapreduce code and new users have mostly 
looked elsewhere.

The fact that people as knowledgable as former committers compare Mahout to 
Oryx or MLlib seems odd to me because Mahout is neither a server nor a loose 
collection of algorithms. It was the later until all of mapreduce was moved to 
legacy and “no new mapreduce” was the rule.

But what is it now? What is unique and of value? Is it destined to be late to 
the party and chasing the algo checklists of things like MLlib?

First a slight digression. I looked at moving itemsimilarity to raw Spark if 
only to remove mrlegacy from the dependencies. At about the same time another 
Mahouter asked the Spark list how to transpose a matrix. He got the answer “why 
would you want to do that?” The fairly high performance algorithm behind 
spark-itemsimilarity was designed by Sebastian and requires an optimized A’A, 
A’B, A’C… and spark-rowsimilarity requires AA’. None of these are provided by 
MLlib. No actual transpose is required so these two things should be seen as 
separate comments about MLlib. The moral: unless I want to write optimized 
matrix transpose-and-multiply solvers I will stick with Mahout.

So back to Mahout’s unique value. Mahout today is a general linear algebra lib 
and environment that performs optimized calculations on modern engines like 
Spark. It is something like a Scala-fied R on Spark (or other engine).

If this is true then spark-itemsimilarity can be seen as a package/add-on that 
requires Mahout’s core Linear Algebra.

Why use Mahout? Use it if you need scalable general linear algebra. That’s not 
what MLlib does well. 

Should we be chasing MLlib’s algo list? Why would we? If we need some algo, why 
not consume it directly from MLlib or somewhere else? Why is a reimplementation 
important all else being equal?

Is general scalable linear algebra sufficient for all important ML algos? 
Certainly not. For instance streaming ones and in particular online updated 
streaming algos may have little to gain from Mahout as it is today. 

If the above is true then Mahout is nothing like what it was in 0.9 and is 
being unfairly compared to 0.9 and other things like that. This 
misunderstanding of what Mahout _is_ leads to misapplied criticism and lack of 
use for what it does well. At very least this all implies a very different 
description on the CMS at most maybe something as drastic as a name change.


Reply via email to