Re: Call to action – Mahout needs your help

Gokhan Capan Tue, 26 Mar 2013 13:02:29 -0700

Sure.


On Tue, Mar 26, 2013 at 7:14 PM, Dan Filimon <dangeorge.fili...@gmail.com>wrote:

> Gokhan, I totally agree that we need of all that. Would you mind
> starting a new thread about this?
> This thread is great for listing ideas, but it's already become pretty
> long and it's getting hard to keep track.
>
> On Tue, Mar 26, 2013 at 6:38 PM, Gokhan Capan <gkhn...@gmail.com> wrote:
> > Hi,
> >
> > Would you consider to refactor Mahout, so that the project follows a
> clear,
> > layered structure for all algorithms, and to document it, such as:
> >
> >
> >    - All algorithms take Mahout matrices as input, and outputs matrices
> as
> >    learned model
> >    - All preprocessing tools should be generic enough, so that they
> produce
> >    appropriate inputs for mahout algorithms
> >    - All algorithms should output the learned model so that people can
> use
> >    them beyond training and testing
> >    - Tools those dump results (e.g. clusterdump) should follow a strictly
> >    defined format suggested by community.
> >    - Evaluation tools should be generic enough so they can be used by all
> >    similar kinds of algorithms.
> >    - ...
> >
> > Users would know the steps they need to perform to use Mahout, and one
> step
> > can be replaced by an alternative.
> >
> > Developers would know the inputs and outputs of their contributions
> clearly
> > and they would contribute to the layer (preprocessing, algorithm, etc.)
> > they feel comfortable with.
> >
> > Mahout has tools for nearly all of these steps listed here, but
> personally
> > when I use Mahout (and I’ve been using it for a long time), I feel lost
> in
> > the steps I should follow.
> >
> > Moreover, the refactoring may eliminate duplicate data structures, and
> > stick to Mahout matrices if available. All similarity measures should
> > operate on vectors, for example.
> >
> > An illustrating example: In our lab, we implemented an HBase backed
> Mahout
> > Matrix, which we use it for our projects where online algorithms operate
> on
> > large data and learn a parameter matrix (one needs this for matrix
> > factorization based recommenders). Then the parameter matrix becomes an
> > input for the live system. This refactoring cascaded, and we replaced
> > underlying data structures of Recommender DataModel with a persistent
> > matrix.
> >
> > Now:
> >
> >
> >    - Everyone knows that any dataset should be in Mahout matrix format,
> and
> >    applies appropriate preprocessing, or writes one.
> >    - We can use different recommenders interchangeably
> >    - Any optimization on matrix operations apply everywhere.
> >    - Different people can work on different parts (evaluation, model
> >    optimization, recommender algorithms) without bothering others.
> >
> > Apart from all, I should say that I am always eager to contribute to
> > Mahout, as some of committers already know.
> >
> > Best Regards
> >
> > On Tue, Mar 26, 2013 at 5:23 PM, Isabel Drost <isa...@apache.org> wrote:
> >
> >> On Tue, Mar 26, 2013 at 3:59 PM, Grant Ingersoll <gsing...@apache.org
> >> >wrote:
> >>
> >> > I believe the GSOC proposal for Mentors is due soon, so if someone is
> >> > doing it, they better hop on comdev ASAP and submit.
> >> >
> >>
> >> For more information also check <http://community.apache.org/gsoc.html>
> -
> >> in particular the "for mentors" bit of the page.
> >>
> >>
> >> Isabel
> >>
> >
> >
> >
> > --
> > Gokhan
>



-- 
Gokhan

Re: Call to action – Mahout needs your help

Reply via email to