Sure.
On Tue, Mar 26, 2013 at 7:14 PM, Dan Filimon <dangeorge.fili...@gmail.com>wrote: > Gokhan, I totally agree that we need of all that. Would you mind > starting a new thread about this? > This thread is great for listing ideas, but it's already become pretty > long and it's getting hard to keep track. > > On Tue, Mar 26, 2013 at 6:38 PM, Gokhan Capan <gkhn...@gmail.com> wrote: > > Hi, > > > > Would you consider to refactor Mahout, so that the project follows a > clear, > > layered structure for all algorithms, and to document it, such as: > > > > > > - All algorithms take Mahout matrices as input, and outputs matrices > as > > learned model > > - All preprocessing tools should be generic enough, so that they > produce > > appropriate inputs for mahout algorithms > > - All algorithms should output the learned model so that people can > use > > them beyond training and testing > > - Tools those dump results (e.g. clusterdump) should follow a strictly > > defined format suggested by community. > > - Evaluation tools should be generic enough so they can be used by all > > similar kinds of algorithms. > > - ... > > > > Users would know the steps they need to perform to use Mahout, and one > step > > can be replaced by an alternative. > > > > Developers would know the inputs and outputs of their contributions > clearly > > and they would contribute to the layer (preprocessing, algorithm, etc.) > > they feel comfortable with. > > > > Mahout has tools for nearly all of these steps listed here, but > personally > > when I use Mahout (and I’ve been using it for a long time), I feel lost > in > > the steps I should follow. > > > > Moreover, the refactoring may eliminate duplicate data structures, and > > stick to Mahout matrices if available. All similarity measures should > > operate on vectors, for example. > > > > An illustrating example: In our lab, we implemented an HBase backed > Mahout > > Matrix, which we use it for our projects where online algorithms operate > on > > large data and learn a parameter matrix (one needs this for matrix > > factorization based recommenders). Then the parameter matrix becomes an > > input for the live system. This refactoring cascaded, and we replaced > > underlying data structures of Recommender DataModel with a persistent > > matrix. > > > > Now: > > > > > > - Everyone knows that any dataset should be in Mahout matrix format, > and > > applies appropriate preprocessing, or writes one. > > - We can use different recommenders interchangeably > > - Any optimization on matrix operations apply everywhere. > > - Different people can work on different parts (evaluation, model > > optimization, recommender algorithms) without bothering others. > > > > Apart from all, I should say that I am always eager to contribute to > > Mahout, as some of committers already know. > > > > Best Regards > > > > On Tue, Mar 26, 2013 at 5:23 PM, Isabel Drost <isa...@apache.org> wrote: > > > >> On Tue, Mar 26, 2013 at 3:59 PM, Grant Ingersoll <gsing...@apache.org > >> >wrote: > >> > >> > I believe the GSOC proposal for Mentors is due soon, so if someone is > >> > doing it, they better hop on comdev ASAP and submit. > >> > > >> > >> For more information also check <http://community.apache.org/gsoc.html> > - > >> in particular the "for mentors" bit of the page. > >> > >> > >> Isabel > >> > > > > > > > > -- > > Gokhan > -- Gokhan