When the board looks at the health of a community, one of the
questions it asks (or so I am told) is, 'Is the community responsive
to requests for assistance?'

Now, the board's bar here is quite low. I'm not trying to suggest for
a moment that Mahout is in any danger of attracting unfavorable
attention. However, I think that this helps to motivate my zoological
metaphor.

Every new algo added to Mahout is a potential driver of support
demand. This is true of all open source projects, but it's more true,
if you will permit the solecism, for a 'zoo' -- a library of related
but independent components -- than it is for a more monolithic
project.

Today's really pretty patch is tomorrow's email thread dangling
without anyone to answer it.

The community, I think, has to strike a balance between accepting
contributions and its capability to support and maintain those
contributions. Github (or Apache Extras), as an alternative target for
'one more algorithm' is necessarily a badge of shame, but perhaps just
a reflection of reality. Yes, we want to encourage contributors.
However, to abuse another analogy, there's a difference between adding
birds to the flock and getting visited by a cuckoo.

Now, the more that the community succeeds in creating reusable cores
that get used to build many algorithms, the less expensive it is to
support another algo. At the same time, it validates the role of the
project as, primarily, providing the core, leaving some individual
algorithms to find their own home.

Grant is a long-term veteran of Lucene, which is (im)famous for things
that hang out in patches forever. From some points of view, this is
perfectly sensible. From others, it's frustrating, confusing, or even
infuriating. There's something to be said, in my opinion, for a
relatively rapid triage into:

  a) a proposed core change. Either the community likes the idea or not.
  b) a completely new capability. Let it live in a 'contrib' region,
or at Extras, or Github, and see what happens to it.

One way or the other, a patch in JIRA is a very clumsy place for
people to obtain and try out a proposed change. Branches (for core
changes) in svn are better. Branches in git (coming to Apache R?SN)
are better and far easier to maintain. For purely additive code, far
better to be sitting in source control someplace.

I'm merely a plumber around here when I'm anything at all. So I'm
going to restrain myself to relatively few comments on this subject,
in which I will try to avoid degenerating into a dog in the manger.

Reply via email to