Thanks Henry!

Do you know of a good source that gives pointers or examples how to
interact with H2O ?

Stephan


On Sun, Jan 4, 2015 at 7:14 PM, Till Rohrmann <trohrm...@apache.org> wrote:

> The idea to work with H2O sounds really interesting.
>
> In terms of the Mahout DSL this would mean that we have to translate a
> Flink dataset into H2O's basic abstraction of distributed data and vice
> versa. Everything other than writing to disk with one system and reading
> from there with the other is probably non-trivial and hard to realize.
> On Jan 4, 2015 9:18 AM, "Henry Saputra" <henry.sapu...@gmail.com> wrote:
>
> > Happy new year all!
> >
> > Like the idea to add ML module with Flink.
> >
> > As I have mentioned to Kostas, Stephan, and Robert before, I would
> > love to see if we could work with H20 project [1], and it seemed like
> > the community has added support for it for Apache Mahout backend
> > binding [2].
> >
> > So we might get some additional scale ML algos like deep learning.
> >
> > Definitely would love to help with this initiative =)
> >
> > - Henry
> >
> > [1] https://github.com/h2oai/h2o-dev
> > [2] https://issues.apache.org/jira/browse/MAHOUT-1500
> >
> > On Fri, Jan 2, 2015 at 6:46 AM, Stephan Ewen <se...@apache.org> wrote:
> > > Hi everyone!
> > >
> > > Happy new year, first of all and I hope you had a nice end-of-the-year
> > > season.
> > >
> > > I thought that it is a good time now to officially kick off the
> creation
> > of
> > > a library of machine learning algorithms. There are a lot of individual
> > > artifacts and algorithms floating around which we should consolidate.
> > >
> > > The machine-learning library in Flink would stand on two legs:
> > >
> > >  - A collection of efficient implementations for common problems and
> > > algorithms, e.g., Regression (logistic), clustering (k-Means, Canopy),
> > > Matrix Factorization (ALS), ...
> > >
> > >  - An adapter to the linear algebra DSL in Apache Mahout.
> > >
> > > In the long run, it would be the goal to be able to mix and match code
> > from
> > > both parts.
> > > The linear algebra DSL is very convenient when it comes to quickly
> > > composing an algorithm, or some custom pre- and post-processing steps.
> > > For some complex algorithms, however, a low level system specific
> > > implementation is necessary to make the algorithm efficient.
> > > Being able to call the tailored algorithms from the DSL would combine
> the
> > > benefits.
> > >
> > >
> > > As a concrete initial step, I suggest to do the following:
> > >
> > > 1) We create a dedicated maven sub-project for that ML library
> > > (flink-lib-ml). The project gets two sub-projects, one for the
> collection
> > > of specialized algorithms, one for the Mahout DSL
> > >
> > > 2) We add the code for the existing specialized algorithms. As followup
> > > work, we need to consolidate data types between those algorithms, to
> > ensure
> > > that they can easily be combined/chained.
> > >
> > > 3) The code for the Flink bindings to the Mahout DSL will actually
> reside
> > > in the Mahout project, which we need to add as a dependency to
> > flink-lib-ml.
> > >
> > > 4) We add some examples of Mahout DSL algorithms, and a template how to
> > use
> > > them within Flink programs.
> > >
> > > 5) Create a good introductory readme.md, outlining this structure. The
> > > readme can also track the implemented algorithms and the ones we put on
> > the
> > > roadmap.
> > >
> > >
> > > Comments welcome :-)
> > >
> > >
> > > Greetings,
> > > Stephan
> >
>

Reply via email to