Bringing this to dev@, mid-thread, per Grant's suggestion. There was a
brief and fruitful thread on private@ to discuss project governance,
but the topic has shifted such that it's useful to just talk on dev@.

If I may paraphrase: I expressed concern about the sprawl of code and
algorithms, aging of JIRA issues, and said I thought we probably had
too few "zookeepers" (to borrow Benson's metaphor) for this big a zoo,
and too many "breeders" adding new bags of code here and there. People
seem to have interest in new non-Hadoop platforms in particular, but
are not giving much attention to what's there now. I expressed concern
this would lead to an increasingly difficult mess of code and project
identity, even before 1.0.


I proposed narrowing the scope of the project -- while not rejecting
all new code, strongly weighting towards contributions that enhance
and fix existing code rather than new green-field code through the 0.6
and 1.0 release. I also proposed a concerted effort to clean up JIRA
in the short term.

Most discussion centered around the final proposal: in exchange for
putting off a lot of new, different stuff, I proposed thinking of
"Mahout2" as a place for those ideas -- perhaps even as not
Hadoop-based. It could be a mostly green-field rewrite. Now, keep in
mind this would be quite a ways away -- just a point of discussion
now. But it might be a good conceptual rationalization for nearly
freezing scope now and polishing -- because we'd have a future bucket
to put these in and time to talk it over.

So: comments are certainly welcome on the above!


And now I'd like to rejoin the thread by replying to Grant below:

My broader concern is that an unhealthy project is the bigger danger
to the community. And, I see some patterns that feel harmful, to me.

I do see large patches sit in JIRA for 6 months and get cancelled.
That's not good -- either it was a good patch and should have been
picked up, or it wasn't suitable, and it should have been clearer it
wasn't suitable before the contributor went to the trouble.

I see JIRAs tagged for version X, and then untouched and slipped to
version X+1, X+2. This means that the community doesn't have credible
information about when an issue is going to be addressed. In fact they
don't have info about *if* the issue is even good to address, and so
worth working in (see point above).

Finally I see a lot of "Someone should do X at some point" JIRAs, and
Someone rarely does them. While these feel like work and progress, I
think they're harmful: it shows the community that to-dos aren't
always done. It condones a culture of post-it-notes for future work.


I assert that even in open source (perhaps all the more?) we do need
enough project coordination such that these problems don't crop up. We
should be able to meet basic expectations about scope, process, and
roadmap -- not nearly as much as a corporate software project, but
some semblance of it. This is a separate thing from providing space
for ideas, to-dos, thoughts, bits of code, etc.

And maybe we're just disagreeing about how to implement those two
things. I think JIRA is for project coordination, and I think the
mailing list or wiki, or Github if you like are for ideas, open-ended
brainstorms, to-dos. Taking something into JIRA and letting it sit
there, to me, is therefore hurting and not helping (see above).

If you see JIRA as a place for ideas and loose ends -- then of course
this doesn't look like any problem! But then I'd ask, where's the
project plan? Because we need that.

I don't think that it's wrong to close an issue that hasn't been
touched for 9 months as WontFix. I'm not being anti-community. I'm a
messenger of slightly bad news, that's all. The issue already wasn't
going to be fixed, I'll bet you. And there's some reason for that --
which is what I'm trying to address.


My answer of course is simple: rein in scope to match effort
available. It's simplistic but sure works.



On Sat, Oct 22, 2011 at 8:47 AM, Grant Ingersoll <[email protected]> wrote:
> Whew, lots to read and a great conversation.  It seems the number 1 rule of 
> open source is these kinds of questions happen while on vacation or at a 
> conference.
>
> So, here are some random thoughts, hopefully trying to take into account this 
> thread and what I feel we have learned in the past few years:
>
> 1. First off, the majority of this conversation should be happening on dev@.  
> With my "Mahout marketing hat on", it probably should be a different subject 
> line like "Moving towards 1.0 and beyond", but I don't care that much, just 
> defaulting to looking forward instead of back.  This very much needs to 
> happen as there is little here that is private other than the Chair 
> discussion.  Sean, do you want to start the thread?  Otherwise, I am happy to 
> do so.
>
> 2. I have a slightly different take on JIRA state.  On the one hand, it is 
> bad that we are not committing issues and I totally agree with you.  But I 
> also sense that you are demoralized by things being left open as I know you 
> do a lot of clean up work.  Personally, I don't think left open is bad.  In 
> fact, I don't see much reason to declare something as "Won't Fix" unless it 
> truly is a wrong idea/concept and we can outright reject it, which is rarely 
> the case.  If it is marked as "Won't Fix" simply because someone hasn't taken 
> up the work to do it in a while, then I would argue that is anti-community.  
> Leaving it open says to the community, "We haven't ruled this out.  If you so 
> desire, please come offer a fix."   It encourages contribution and itch 
> scratching.  I've seen issues completed years later in Lucene because of this 
> and I believe it is a good thing.  I know that runs against common closed 
> source engineering practices, but it is one I think is valuable in open 
> source.  So, how do we get over the clutter factor?  Selecting what patches 
> will be in what version and maintaining "Fix Version".  JIRA has enough 
> filtering capabilities that is quite simple to remove clutter by filtering it 
> out.  In other words, I would say we leave most things open and get much 
> better about saying what issues need to be in what version.
>
> I also still think we need to get auto patch checking implemented.  We have 
> all the instructions, we just need to work through it from a Jenkins 
> standpoint.  I think this will help quite a bit.
>
> 3. I totally agree we should start culling some things.  Again, though, that 
> discussion needs to happen on dev@ and maybe even ask on user@ for input.  In 
> a perfect world, we could write ML code that ran, frictionless, on a layer of 
> abstraction that allowed people to plug in the underlying engine (Hadoop, 
> local, Spark, etc.) and it just worked.  In the meantime, we should just be 
> practical and cut what isn't used knowing we can resurrect it later.  I think 
> our modules more or less make sense and our focus on the three C's make sense 
> and our overarching goal of creating "scalable machine learning algorithms" 
> makes sense.  Let's work from there.  I'll save other thoughts on this stuff 
> for the dev@ conversation.
>
> The hard part that we really need to overcome is that this ensuing 
> conversation is likely to be a long one, filled with opinions.  This is a 
> good thing.  We need to have that discussion over the course of a week or two 
> and out of it needs to come a concrete proposal that we can then vote on.  
> And once that vote is done, we act on it.  The last point there being the one 
> that matters most.

Reply via email to