On 03/05/11 01:41, Roy T. Fielding wrote:
>
 I am constantly amazed at how
quiet it is in this project, at least until I remember that
most of the work is done exclusively via jira, unlike any of
my other followed projects that use jira.  I'd suggest that
the right place to hold any discussion is on the dev list,
but I am not on that list because it receives way too many
automated notifications.  Maybe it would help discussion on
dev if notices were sent elsewhere and only discussions were
held on dev.

I've seen this before on the Maven lists, where there's mostly a stream of JIRA changes above anything else:
http://mail-archives.apache.org/mod_mbox/maven-dev/200510.mbox/browser

however, they've got no JIRA issues in their list now, which may imply all changes aren't going to the list, or they arent using it so much:
http://mail-archives.apache.org/mod_mbox/maven-dev/201104.mbox/browser

(pause: bisecting their list shows that in 1.mar.06 they forked JIRA to a separate list to hide the details of ongoing work)

In some ways it's a means of dealing with a large and fast moving codebase: you subscribe to the issues that matter to you, all the discussions on a specific feature are archived, etc.

However, it has some flaws
-discouragement of community, you become a group of people working on JIRA issues, rather than on a large integrated project -with work spread across common, hdfs and mapreduce JIRAs and mailing lists, it's hard to keep all the things in your head -it is pretty much a full time job to do so. And I don't know about the others, but I don't have the time. -we need a way of gently moving people from those who use hadoop to those who develop it. To me, every end user is a warm engineering resource we just need to point at a problem that they care about. The scale of the project, its complexity, JIRA change rate and testing difficulties are all barriers to entry -you end up needing a team of people
 * someone to track all the issues and keep the design in their head
 * 1+ person to test
 * 1+ person to code
I don't know about others, but I can't do this on my own.

The attempt to split up into HDFS+MAPREDUCE was one tactic to deal with this, but it hasn't worked, we just have more mailing lists to track (or in my case, fall behind on).

votewise:

-I'm favour of shipping an apache release of 20.x that has the patches that Y! and others have added to deal with scale and availability -and which has been tested by them. This will provide an apache release for people to use in production systems -because the official apache releases have lagged the CDH and Y! releases.

-I'd like to see all the changes integrated into trunk too, as it doesn't make sense for a patch in this branch not to be in trunk.

Steve

Reply via email to