We recognize the value of a non-Java-centric API for doing math/algo
work, similar in spirit to what R has done. So...
0xdata is
- looking at how the h2o API/programming model fits with the existing
Mahout Java API
- doing some initial exploratory porting to the existing Java API
- watching with interest how the API work moves, especially if a
consensus arrives around The Right Way to program these kinds of ML and
Big Data algorithms
- Looking to support ALL efforts around open-source ML systems, in part
because we don't know which solution is best
- In particular H2O is a killer-fast backend for doing distributed
computations, but is not the easiest thing to use. We are working to
improve & extend that usability, while keeping our speed.
- Modifying H2O's internal API to support the Mahout Java API, and an R
API, and a potential Scala-based API (perhaps from Spark/DataBricks and
perhaps from Dmitriy's work) - goes directly to our goal of making H2O
more usable, and supporting a higher goal of making ML on Big Data more
usable ('cause a faster backend means it's faster on the same-size data,
or possible on bigger data).
Cliff
On 5/6/2014 11:27 AM, Saikat Kanjilal wrote:
The paragraph(s) don't necessarily clearly identify whether the non-comitters
are currently only working on 0xdata or spark or both(which is actually the
case), ideally a statement around non-committers doing work in both areas would
be great with the obvious open-source addition that outside contributions are
encouraged.
From: [email protected]
Date: Tue, 6 May 2014 18:23:18 +0200
Subject: consensus statement?
To: [email protected]
I have been involved in side conversations to try to build a bit of unity
among our community and would like to propose this as a statement of what
we are doing:
Apache Mahout is moving immediately to a faster execution model. The first
of these is Spark. Outside contributions are always encouraged.
As a bit of commentary, it is clear that what the committers are working on
is Spark and it is clear that Spark will be the first new platform for
Mahout. It is also clear that there are non-committers (the 0xdata crew
for one) who are working with the community to extend Mahout beyond just
Spark. As a statement of where the community is *right* now, however, I
don't think we need to say much more than that we encourage contributions.
Sound fair? Correct?