First is there anything we can’t agree on with that statement? I see nothing to 
disagree with personally, though I see no need to talk about potential outside 
contributions here, but I’ll let that slide.

If this is for the outside world then it needs to clearly answer:
1) If I want to run the _latest_ Mahout code what do I need to install/put into 
my lab or datacenter. The question is, “What am I buying into"
2) If I want to contribute, what does it mean that Mahout accepts no new 
mapreduce code. What is the alternative? What new code would be acceptable? We 
have rejected a couple proposed contribs because they were mapreduce. 

For #2
I’d change "Mahout will therefore reject new MapReduce algorithm 
implementations from now on.” to "Mahout will therefore reject new Hadoop 
MapReduce contributions--new Spark based contributions are welcome.”

For #1
Maybe the 'platform requirements' or 'installing on a cluster’ section is a 
better place to answer.

On May 20, 2014, at 12:42 AM, Sebastian Schelter <s...@apache.org> wrote:

On 05/18/2014 09:28 PM, Ted Dunning wrote:
> On Sun, May 18, 2014 at 11:33 AM, Sebastian Schelter <s...@apache.org> wrote:
> 
>> I suggest we start with a specific draft that someone prepares (maybe Ted
>> as he started the thread)
> 
> 
> This is a good strategy, and I am happy to start the discussion, but I
> wonder if it might help build consensus if somebody else started the ball
> rolling.
> 

Let's take the next from our homepage as starting point. What should we 
add/remove/modify?

----------------------------------------------------------------------------
The Mahout community decided to move its codebase onto modern data processing 
systems that offer a richer programming model and more efficient execution than 
Hadoop MapReduce. Mahout will therefore reject new MapReduce algorithm 
implementations from now on. We will however keep our widely used MapReduce 
algorithms in the codebase and maintain them.

We are building our future implementations on top of a DSL for linear algebraic 
operations which has been developed over the last months. Programs written in 
this DSL are automatically optimized and executed in parallel on Apache Spark.

Furthermore, there is an experimental contribution undergoing which aims to 
integrate the h20 platform into Mahout.
----------------------------------------------------------------------------

Reply via email to