On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith <n...@pobox.com> wrote:
> On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant <oliph...@enthought.com> > wrote: > > I think Nathaniel and Matthew provided very > > specific feedback that was helpful in understanding other perspectives of > a > > difficult problem. In particular, I really wanted bit-patterns > > implemented. However, I also understand that Mark did quite a bit of > work > > and altered his original designs quite a bit in response to community > > feedback. I wasn't a major part of the pull request discussion, nor did > I > > merge the changes, but I support Charles if he reviewed the code and felt > > like it was the right thing to do. I likely would have done the same > thing > > rather than let Mark Wiebe's work languish. > > My connectivity is spotty this week, so I'll stay out of the technical > discussion for now, but I want to share a story. > > Maybe a year ago now, Jonathan Taylor and I were debating what the > best API for describing statistical models would be -- whether we > wanted something like R's "formulas" (which I supported), or another > approach based on sympy (his idea). To summarize, I thought his API > was confusing, pointlessly complicated, and didn't actually solve the > problem; he thought R-style formulas were superficially simpler but > hopelessly confused and inconsistent underneath. Now, obviously, I was > right and he was wrong. Well, obvious to me, anyway... ;-) But it > wasn't like I could just wave a wand and make his arguments go away, > no matter how annoying and wrong-headed I thought they were... I could > write all the code I wanted but no-one would use it unless I could > convince them it's actually the right solution, so I had to engage > with him, and dig deep into his arguments. > > What I discovered was that (as I thought) R-style formulas *do* have a > solid theoretical basis -- but (as he thought) all the existing > implementations *are* broken and inconsistent! I'm still not sure I > can actually convince Jonathan to go my way, but, because of his > stubbornness, I had to invent a better way of handling these formulas, > and so my library[1] is actually the first implementation of these > things that has a rigorous theory behind it, and in the process it > avoids two fundamental, decades-old bugs in R. (And I'm not sure the R > folks can fix either of them at this point without breaking a ton of > code, since they both have API consequences.) > > -- > > It's extremely common for healthy FOSS projects to insist on consensus > for almost all decisions, where consensus means something like "every > interested party has a veto"[2]. This seems counterintuitive, because > if everyone's vetoing all the time, how does anything get done? The > trick is that if anyone *can* veto, then vetoes turn out to actually > be very rare. Everyone knows that they can't just ignore alternative > points of view -- they have to engage with them if they want to get > anything done. So you get buy-in on features early, and no vetoes are > necessary. And by forcing people to engage with each other, like me > with Jonathan, you get better designs. > > But what about the cost of all that code that doesn't get merged, or > written, because everyone's spending all this time debating instead? > Better designs are nice and all, but how does that justify letting > working code languish? > > The greatest risk for a FOSS project is that people will ignore you. > Projects and features live and die by community buy-in. Consider the > "NA mask" feature right now. It works (at least the parts of it that > are implemented). It's in mainline. But IIRC, Pierre said last time > that he doesn't think the current design will help him improve or > replace numpy.ma. Up-thread, Wes McKinney is leaning towards ignoring > this feature in favor of his library pandas' current hacky NA support. > Members of the neuroimaging crowd are saying that the memory overhead > is too high and the benefits too marginal, so they'll stick with NaNs. > Together these folk a huge proportion of the this feature's target > audience. So what have we actually accomplished by merging this to > mainline? Are we going to be stuck supporting a feature that only a > fraction of the target audience actually uses? (Maybe they're being > dumb, but if people are ignoring your code for dumb reasons... they're > still ignoring your code.) > > The consensus rule forces everyone to do the hardest and riskiest part > -- building buy-in -- up front. Because you *have* to do it sooner or > later, and doing it sooner doesn't just generate better designs. It > drastically reduces the risk of ending up in a huge trainwreck. > > -- > > In my story at the beginning, I wished I had a magic wand to skip this > annoying debate and political stuff. But giving it to me would have > been a bad idea. I think that's went wrong with the NA discussion in > the first place. Mark's an excellent programmer, and he tried his best > to act in the good of everyone in the project -- but in the end, he > did have a wand like that. He didn't have that sense that he *had* to > get everyone on board (even the people who were saying dumb things), > or he'd just be wasting his time. He didn't ask Pierre if the NA > design would actually work for numpy.ma's purposes -- I did. > > You may have noticed that I do have some ideas for about how NA > support should work. But my ideas aren't really the important thing. > The alter-NEP was my attempt to find common ground between the > different needs people were bringing up, so we could discuss whether > it would work for people or not. I'm not wedded to anything in it. But > this is a complicated issue with a lot of conflicting interests, and > we need to find something that actually does work for everyone (or as > large a subset as is practical). > > So here's what I think we should do: > 1) I will submit a pull request backing Mark's NA work out of > mainline, for now. (This is more or less done, I just need to get it > onto github, see above re: connectivity) > 2) I will also put together a new branch containing that work, > rebased against current mainline, so it doesn't get lost. (Ditto.) > 3) And we'll decide what to do with it *after* we hammer out a > design that the various NA-supporting groups all find convincing. Or > at least a design for some of the less controversial pieces (like the > 'where=' ufunc argument?), get those merged, and then iterate > incrementally. > > What do you all think? > > Why don't you and Matthew work up an alternative implementation so we can compare the two? Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion