I think that you mentioned a very good point with stating that it is not
clear whether Mahout is a library, a standalone program to interact with
via the command line. IMO, its first and foremost a library (similar to
Lucene), and this should also be reflected in the codebase.
That is my view as well and I think we have been moderately successful at it.

+1

As for the complexity issue, I don't know that we ever solve it, we just need 
to identify contributors in those areas quickly, mentor them, and make them 
committers as soon as they are ready.

On that note: GSoC is coming up, and I think it's a great opportunity to build some momentum in this direction. I know that when students see "scalable machine learning" their first thought isn't improving testing and documentation, but if we pushed hard in those areas specifically, in addition to making a broad effort on JIRA to elucidate exactly what needs work, we could likely pick up several quality students that could make lasting contributions.


I think that Mahout is and should always be more than recommenders, but
that we should be more courageous in throwing out things that are not
used very much or not maintained very much or don't meet the quality
standards which we would like to see.

+1 . On my end of things, while I do think some sort of canonical spectral clustering algorithm would be very useful to have, e.g. spectral k-means, the Eigencuts algorithm is one example of something that is so specialized that it could probably be jettisoned.

Reply via email to