Lucene has a big problem that Mahout has avoided: continuity of stored data. Mahout has thus far refused to support a permanent data format, which is a wise move.
There is the "patch aging" problem: if a patch does not get committed, API drift causes it to stop working. There are some very useful patches in Solr/Lucene that have not gone in but are worthwhile to a minority of users. These get pushed forward in fits & starts or just stop getting used. There is an opportunity cost for ignoring useful tools; new users don't find them and then give up. A problem unique to Mahout is algorithm quality regression: we have seen many cases where a program passes all unit tests but loses quality with real-world data ( bayes classification recently?) or just blows up (RecommenderJob). I'm sure these happen in all projects slightly different forms, but I've lived in the Solr/Lucene world for a long time and only know them in that context. Long-term, the trick is to place the right scaffolding that allows existing parts to get filled in well, and new parts to add on in a coherent way. Part of this is technical, and part of it is cultural: the parts are easy to understand and work with, and people feel invited and respected. On Mon, Oct 10, 2011 at 11:10 PM, Isabel Drost <isa...@apache.org> wrote: > On 11.10.2011 Lance Norskog wrote: > > The Hadoop people said "we'll change whatever we feel like" and look > where > > that led to :) > > I think we have two conflicting goals here: On the one hand users who have > Mahout in production need stability - in terms of interfaces, but even more > so > in terms of file formats for trained models, input vectors and such. On the > other hand we as a project need room for experimentation and innovation. > > I think marking experimental interfaces is a nice compromise of making it > explicit to users which parts they can rely on but also making it as simple > as > using the latest release or even trunk for those that want to be > early-adopters. > > It could be a first step to a 1.0 release we all have been looking forward > to > for so long: Makes obvious which parts of Mahout still need caring hands > for > cleanup, refactoring and improved integratability (Thanks to those who have > spend time on these tasks lately.). > > In addition we need to think about what kind of backwards compatibility > guarantees we want to give to users - might make sense to steal some of > Lucene's > knowledge in that area as well. I think deciding on whether to use abstract > classes vs. interfaces may well turn out to be our smallest questionmark. > > > Isabel > > -- Lance Norskog goks...@gmail.com