On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant <oliph...@enthought.com> wrote: > I think Nathaniel and Matthew provided very > specific feedback that was helpful in understanding other perspectives of a > difficult problem. In particular, I really wanted bit-patterns > implemented. However, I also understand that Mark did quite a bit of work > and altered his original designs quite a bit in response to community > feedback. I wasn't a major part of the pull request discussion, nor did I > merge the changes, but I support Charles if he reviewed the code and felt > like it was the right thing to do. I likely would have done the same thing > rather than let Mark Wiebe's work languish.
My connectivity is spotty this week, so I'll stay out of the technical discussion for now, but I want to share a story. Maybe a year ago now, Jonathan Taylor and I were debating what the best API for describing statistical models would be -- whether we wanted something like R's "formulas" (which I supported), or another approach based on sympy (his idea). To summarize, I thought his API was confusing, pointlessly complicated, and didn't actually solve the problem; he thought R-style formulas were superficially simpler but hopelessly confused and inconsistent underneath. Now, obviously, I was right and he was wrong. Well, obvious to me, anyway... ;-) But it wasn't like I could just wave a wand and make his arguments go away, no matter how annoying and wrong-headed I thought they were... I could write all the code I wanted but no-one would use it unless I could convince them it's actually the right solution, so I had to engage with him, and dig deep into his arguments. What I discovered was that (as I thought) R-style formulas *do* have a solid theoretical basis -- but (as he thought) all the existing implementations *are* broken and inconsistent! I'm still not sure I can actually convince Jonathan to go my way, but, because of his stubbornness, I had to invent a better way of handling these formulas, and so my library[1] is actually the first implementation of these things that has a rigorous theory behind it, and in the process it avoids two fundamental, decades-old bugs in R. (And I'm not sure the R folks can fix either of them at this point without breaking a ton of code, since they both have API consequences.) -- It's extremely common for healthy FOSS projects to insist on consensus for almost all decisions, where consensus means something like "every interested party has a veto"[2]. This seems counterintuitive, because if everyone's vetoing all the time, how does anything get done? The trick is that if anyone *can* veto, then vetoes turn out to actually be very rare. Everyone knows that they can't just ignore alternative points of view -- they have to engage with them if they want to get anything done. So you get buy-in on features early, and no vetoes are necessary. And by forcing people to engage with each other, like me with Jonathan, you get better designs. But what about the cost of all that code that doesn't get merged, or written, because everyone's spending all this time debating instead? Better designs are nice and all, but how does that justify letting working code languish? The greatest risk for a FOSS project is that people will ignore you. Projects and features live and die by community buy-in. Consider the "NA mask" feature right now. It works (at least the parts of it that are implemented). It's in mainline. But IIRC, Pierre said last time that he doesn't think the current design will help him improve or replace numpy.ma. Up-thread, Wes McKinney is leaning towards ignoring this feature in favor of his library pandas' current hacky NA support. Members of the neuroimaging crowd are saying that the memory overhead is too high and the benefits too marginal, so they'll stick with NaNs. Together these folk a huge proportion of the this feature's target audience. So what have we actually accomplished by merging this to mainline? Are we going to be stuck supporting a feature that only a fraction of the target audience actually uses? (Maybe they're being dumb, but if people are ignoring your code for dumb reasons... they're still ignoring your code.) The consensus rule forces everyone to do the hardest and riskiest part -- building buy-in -- up front. Because you *have* to do it sooner or later, and doing it sooner doesn't just generate better designs. It drastically reduces the risk of ending up in a huge trainwreck. -- In my story at the beginning, I wished I had a magic wand to skip this annoying debate and political stuff. But giving it to me would have been a bad idea. I think that's went wrong with the NA discussion in the first place. Mark's an excellent programmer, and he tried his best to act in the good of everyone in the project -- but in the end, he did have a wand like that. He didn't have that sense that he *had* to get everyone on board (even the people who were saying dumb things), or he'd just be wasting his time. He didn't ask Pierre if the NA design would actually work for numpy.ma's purposes -- I did. You may have noticed that I do have some ideas for about how NA support should work. But my ideas aren't really the important thing. The alter-NEP was my attempt to find common ground between the different needs people were bringing up, so we could discuss whether it would work for people or not. I'm not wedded to anything in it. But this is a complicated issue with a lot of conflicting interests, and we need to find something that actually does work for everyone (or as large a subset as is practical). So here's what I think we should do: 1) I will submit a pull request backing Mark's NA work out of mainline, for now. (This is more or less done, I just need to get it onto github, see above re: connectivity) 2) I will also put together a new branch containing that work, rebased against current mainline, so it doesn't get lost. (Ditto.) 3) And we'll decide what to do with it *after* we hammer out a design that the various NA-supporting groups all find convincing. Or at least a design for some of the less controversial pieces (like the 'where=' ufunc argument?), get those merged, and then iterate incrementally. What do you all think? And in any case, thanks for reading, -- Nathaniel [1] https://github.com/charlton/charlton [2] For example, this is written into the Apache voting procedure: https://www.apache.org/foundation/voting.html (it's the "code modification" rules that are relevant). And as usual, Karl Fogel has more useful discussion: http://producingoss.com/en/consensus-democracy.html (see esp. the "When to vote" section, which is entirely about how to avoid voting) _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion