Lucene has a big problem that Mahout has avoided: continuity of stored data.
Mahout has thus far refused to support a permanent data format, which is a
wise move.

There is the "patch aging" problem: if a patch does not get committed, API
drift causes it to stop working. There are some very useful patches in
Solr/Lucene that have not gone in but are worthwhile to a minority of users.
 These get pushed forward in fits & starts or just stop getting used. There
is an opportunity cost for ignoring useful tools; new users don't find them
and then give up.

A problem unique to Mahout is algorithm quality regression: we have seen
many cases where a program passes all unit tests but loses quality with
real-world data ( bayes classification recently?) or just blows up
(RecommenderJob).

I'm sure these happen in all projects slightly different forms, but I've
lived in the Solr/Lucene world for a long time and only know them in that
context.

Long-term, the trick is to place the right scaffolding that allows existing
parts to get filled in well, and new parts to add on in a coherent way. Part
of this is technical, and part of it is cultural: the parts are easy to
understand and work with, and people feel invited and respected.

On Mon, Oct 10, 2011 at 11:10 PM, Isabel Drost <isa...@apache.org> wrote:

> On 11.10.2011 Lance Norskog wrote:
> > The Hadoop people said "we'll change whatever we feel like" and look
> where
> > that led to :)
>
> I think we have two conflicting goals here: On the one hand users who have
> Mahout in production need stability - in terms of interfaces, but even more
> so
> in terms of file formats for trained models, input vectors and such. On the
> other hand we as a project need room for experimentation and innovation.
>
> I think marking experimental interfaces is a nice compromise of making it
> explicit to users which parts they can rely on but also making it as simple
> as
> using the latest release or even trunk for those that want to be
> early-adopters.
>
> It could be a first step to a 1.0 release we all have been looking forward
> to
> for so long: Makes obvious which parts of Mahout still need caring hands
> for
> cleanup, refactoring and improved integratability (Thanks to those who have
> spend time on these tasks lately.).
>
> In addition we need to think about what kind of backwards compatibility
> guarantees we want to give to users - might make sense to steal some of
> Lucene's
> knowledge in that area as well. I think deciding on whether to use abstract
> classes vs. interfaces may well turn out to be our smallest questionmark.
>
>
> Isabel
>
>


-- 
Lance Norskog
goks...@gmail.com

Reply via email to