I feel like I am most closely aligned with Grant. Very little to add. Like it or not, Mahout is a library, not a coherent product such as hbase. It's a collection of algorithms connectied together with some fairly thin structure and persitence glue, but thenglue rarely can go much beyond that. That naturally presents difficulties with support as not every committer is broadly qualified to advise on any algorithm (as opposed for example to hbase which is pretty much a single product and therefore is much easier to gain proficiency in).
If we look around at ml projects, e.g. bugs, wopal wabbit, libsvn, they all seem to revolve around single area of ml. Hence they get support in that area. There are few exceptions like weka but they revolve around "non-big' data and therefore use well known approaches whereas Mahout almost always requires an added value to make a method scalable. That added value is rarely resulting in a published paper or even descently reviewed working notes, which makes support of the thing even more difficult. Hence, few thoughts. 1 request and review more or less detailed working notes from the contrjbutor before he vanishes from radar. 2 don't get upset by multiplicity of open jiras. If jjra sits around and not fixed for the upcoming release, just create a special 'backlog' fix target and throw it there until the author provides more information. 3 I suggest review some contributions from practicality point of view. I.e. if the author had concrete need for his contribution and was using it himself, take more favourable view of it. It would result in majority of contributions being focused on most common pragmatic need, rather than being a technology in a search of a problem. (That's btw how my code got evolved, I coded it not because I had an itch, but because I needed Mr based lsa solution). In other words, pragmatically necessary things tend to get more chance of being finished and improved upon naturally. But they still may take months and even years to evolve to a nicely optimized solution, so no need to nix something right away. Just throw it in backlog, and unless author does not reappear in as much as 18 months, dont nix it, just let it sit in backlog limbo. These things often don't come up easy.(to me anyway). 4 even though we may not fully understand the method, we still may create some standard requirements for the contributions. I already mentioned working notes. But we may also ask to define standard characteristics, such as number of Map reduce iterations required, parallelization sttategy, flops. It would be ideal if we could also find a way to do and publish a standard benchmarks on say 10G input just to see if it smells. It would help (me at least) if this data along with maturity level were published in wiki. Also request a method tutorial from the contributor written to wiki. On Oct 22, 2011 10:36 AM, "Benson Margulies" <[email protected]> wrote: > Drat: I wrote 'is necessarily a badge of shame' when I meant to write > 'is not necessarily a badge of shame'. >
