I feel like I am most closely aligned with Grant. Very little to add.

Like it or not, Mahout is a library, not a coherent product such as hbase.
It's a collection of algorithms connectied together with some fairly thin
structure and persitence glue, but thenglue rarely can go much beyond that.
That naturally presents difficulties with support as not every committer is
broadly qualified to advise on any algorithm (as opposed for example to
hbase which is pretty much a single product and therefore is much easier to
gain proficiency in).

If we look around at ml projects, e.g. bugs, wopal wabbit, libsvn, they all
seem to revolve around single area of ml. Hence they get support in that
area. There are few exceptions like weka but they revolve around "non-big'
data and therefore use well known approaches whereas Mahout almost always
requires an added value to make a method scalable. That added value is
rarely resulting in a published paper or even descently reviewed working
notes, which makes support of the thing even more difficult.

Hence, few thoughts.
1 request and review more or less detailed working notes from the
contrjbutor before he vanishes from radar.

2 don't get upset by multiplicity of open jiras. If jjra sits around and not
fixed for the upcoming release, just create a special 'backlog' fix target
and throw it there until the author provides more information.

3 I suggest review some contributions from practicality point of view. I.e.
if the author had concrete need for his contribution and was using it
himself, take more favourable view of it. It would result in majority of
contributions being focused on most common pragmatic need, rather than being
a technology in a search of a problem. (That's btw how my code got evolved,
I coded it not because I had an itch, but because I needed Mr based lsa
solution). In other words, pragmatically necessary things tend to get more
chance of being finished and improved upon naturally. But they still may
take months and even years to evolve to a nicely optimized  solution, so no
need to nix something right away. Just throw it in backlog, and unless
author does not reappear in as much as 18 months, dont nix it, just let it
sit in backlog limbo. These things often don't come up easy.(to me anyway).

4 even though we may not fully understand the method, we still may create
some standard requirements for the contributions. I already mentioned
working notes. But we may also ask to define standard characteristics, such
as number of Map reduce iterations required, parallelization sttategy,
flops. It would be ideal if we could also find a way to do and publish a
standard benchmarks on say 10G input just to see if it smells. It would help
(me at least) if this data along with maturity level were published in wiki.
Also request a method tutorial from the contributor written to wiki.
On Oct 22, 2011 10:36 AM, "Benson Margulies" <[email protected]> wrote:

> Drat: I wrote 'is necessarily a badge of shame' when I meant to write
> 'is not necessarily a badge of shame'.
>

Reply via email to