When the board looks at the health of a community, one of the questions it asks (or so I am told) is, 'Is the community responsive to requests for assistance?'
Now, the board's bar here is quite low. I'm not trying to suggest for a moment that Mahout is in any danger of attracting unfavorable attention. However, I think that this helps to motivate my zoological metaphor. Every new algo added to Mahout is a potential driver of support demand. This is true of all open source projects, but it's more true, if you will permit the solecism, for a 'zoo' -- a library of related but independent components -- than it is for a more monolithic project. Today's really pretty patch is tomorrow's email thread dangling without anyone to answer it. The community, I think, has to strike a balance between accepting contributions and its capability to support and maintain those contributions. Github (or Apache Extras), as an alternative target for 'one more algorithm' is necessarily a badge of shame, but perhaps just a reflection of reality. Yes, we want to encourage contributors. However, to abuse another analogy, there's a difference between adding birds to the flock and getting visited by a cuckoo. Now, the more that the community succeeds in creating reusable cores that get used to build many algorithms, the less expensive it is to support another algo. At the same time, it validates the role of the project as, primarily, providing the core, leaving some individual algorithms to find their own home. Grant is a long-term veteran of Lucene, which is (im)famous for things that hang out in patches forever. From some points of view, this is perfectly sensible. From others, it's frustrating, confusing, or even infuriating. There's something to be said, in my opinion, for a relatively rapid triage into: a) a proposed core change. Either the community likes the idea or not. b) a completely new capability. Let it live in a 'contrib' region, or at Extras, or Github, and see what happens to it. One way or the other, a patch in JIRA is a very clumsy place for people to obtain and try out a proposed change. Branches (for core changes) in svn are better. Branches in git (coming to Apache R?SN) are better and far easier to maintain. For purely additive code, far better to be sitting in source control someplace. I'm merely a plumber around here when I'm anything at all. So I'm going to restrain myself to relatively few comments on this subject, in which I will try to avoid degenerating into a dog in the manger.