Some very good points IMO, but there is no mention of actually making Mahout 
run on something specific. Are we really going for a shell that engine 
companies can use to put on top of their engine but doesn’t actually run on 
anything until the engine guys do some work? Clearly not true. We seem to be 
trying hard _not_ to address the 800 lb Gorilla in the room.

I’d be +1 if we clearly state a specific “mahout supported execution engine” 
add #5 stating flatly that “Our reference implementation will be on the Apache 
Spark execution engine”. 

Maybe using the "reference implementation” wording will allow us to address 
Spark v h2o in a clear way to users, contributors, and even engine people.


On May 21, 2014, at 6:00 AM, Ted Dunning <[email protected]> wrote:

Very good description of benefits.


On Wed, May 21, 2014 at 5:26 AM, Gokhan Capan <[email protected]> wrote:

> I want to express my opinions for the vision, too. I tried to capture those
> words from various discussions in the dev-list, and hope that most, of them
> support the common sense of excitement the new Mahout arouses
> 
> To me, the fundamental benefit of the shift that Mahout is undergoing is a
> better separation of the distributed execution engine, distributed data
> structures, matrix computations, and algorithms layers, which will allow
> the users/devs of Mahout with different roles focus on the relevant parts
> of the framework:
> 
>   1. A machine learning scientist, independent from the underlying
>   distributed execution engine, can utilize the matrix language and the
>   decompositions to implement new algorithms (which implies that the
> current
>   distributed mahout algorithms are to be rewritten in the matrix
> language)
>   2. A math-scala module contributor, for the benefit of higher level
>   algorithms, can add new, or improve existing functions (the set of
>   decompositions is an example) with optimization plans (such as if two
>   matrices are partitioned in the same way, ...), where the concrete
>   implementations of those optimizations are delegated to the distributed
>   execution engine layer
>   3. A distributed execution engine author can add machine learning
>   capabilities to her platform with i)concrete Matrix and Matrix I/O
>   implementation  ii)partitioning, checkpointing, broadcasting behaviors,
>   iii)BLAS
>   4. A Mahout user with access to a cluster operated by a
>   Mahout-supporting distributed execution engine can run machine learning
>   algorithms implemented on top of the matrix language
> 
> Best
> 
> Gokhan
> 
> 
> On Tue, May 20, 2014 at 8:30 PM, Dmitriy Lyubimov <[email protected]>
> wrote:
> 
>> inline
>> 
>> 
>> On Tue, May 20, 2014 at 12:42 AM, Sebastian Schelter <[email protected]>
>> wrote:
>> 
>>> 
>>>> 
>>> Let's take the next from our homepage as starting point. What should we
>>> add/remove/modify?
>>> 
>>> ------------------------------------------------------------
>>> ----------------
>>> The Mahout community decided to move its codebase onto modern data
>>> processing systems that offer a richer programming model and more
>> efficient
>>> execution than Hadoop MapReduce. Mahout will therefore reject new
>> MapReduce
>>> algorithm implementations from now on. We will however keep our widely
>> used
>>> MapReduce algorithms in the codebase and maintain them.
>>> 
>>> We are building our future implementations on top of a
>> 
>> Scala
>> 
>>> DSL for linear algebraic operations which has been developed over the
>> last
>>> months. Programs written in this DSL are automatically optimized and
>>> executed in parallel for Apache Spark.
>> 
>> More platforms to be added in the future.
>> 
>>> 
>>> Furthermore, there is an experimental contribution undergoing which
> aims
>>> to integrate the h20 platform into Mahout.
>>> ------------------------------------------------------------
>>> ----------------
>>> 
>> 
> 

Reply via email to