Hi, On Sat, Oct 29, 2011 at 11:14 AM, Wes McKinney <wesmck...@gmail.com> wrote: > On Fri, Oct 28, 2011 at 9:32 PM, Charles R Harris > <charlesr.har...@gmail.com> wrote: >> >> >> On Fri, Oct 28, 2011 at 6:45 PM, Wes McKinney <wesmck...@gmail.com> wrote: >>> >>> On Fri, Oct 28, 2011 at 7:53 PM, Benjamin Root <ben.r...@ou.edu> wrote: >>> > >>> > >>> > On Friday, October 28, 2011, Matthew Brett <matthew.br...@gmail.com> >>> > wrote: >>> >> Hi, >>> >> >>> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers >>> >> <ralf.gomm...@googlemail.com> wrote: >>> >>> >>> >>> >>> >>> On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett >>> >>> <matthew.br...@gmail.com> >>> >>> wrote: >>> >>>> >>> >>>> Hi, >>> >>>> >>> >>>> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >>> >>>> <charlesr.har...@gmail.com> wrote: >>> >>>> > >>> >>>> > >>> >>>> > On Fri, Oct 28, 2011 at 3:56 PM, Matthew Brett >>> >>>> > <matthew.br...@gmail.com> >>> >>>> > wrote: >>> >>>> >> >>> >>>> >> Hi, >>> >>>> >> >>> >>>> >> On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett >>> >>>> >> <matthew.br...@gmail.com> >>> >>>> >> wrote: >>> >>>> >> > Hi, >>> >>>> >> > >>> >>>> >> > On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris >>> >>>> >> > <charlesr.har...@gmail.com> wrote: >>> >>>> >> >> >>> >>>> >> >> >>> >>>> >> >> On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith >>> >>>> >> >> <n...@pobox.com> >>> >>>> >> >> wrote: >>> >>>> >> >>> >>> >>>> >> >>> On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant >>> >>>> >> >>> <oliph...@enthought.com> >>> >>>> >> >>> wrote: >>> >>>> >> >>> > I think Nathaniel and Matthew provided very >>> >>>> >> >>> > specific feedback that was helpful in understanding other >>> >>>> >> >>> > perspectives >>> >>>> >> >>> > of a >>> >>>> >> >>> > difficult problem. In particular, I really wanted >>> >>>> >> >>> > bit-patterns >>> >>>> >> >>> > implemented. However, I also understand that Mark did >>> >>>> >> >>> > quite >>> >>>> >> >>> > a >>> >>>> >> >>> > bit >>> >>>> >> >>> > of >>> >>>> >> >>> > work >>> >>>> >> >>> > and altered his original designs quite a bit in response to >>> >>>> >> >>> > community >>> >>>> >> >>> > feedback. I wasn't a major part of the pull request >>> >>>> >> >>> > discussion, >>> >>>> >> >>> > nor >>> >>>> >> >>> > did I >>> >>>> >> >>> > merge the changes, but I support Charles if he reviewed the >>> >>>> >> >>> > code >>> >>>> >> >>> > and >>> >>>> >> >>> > felt >>> >>>> >> >>> > like it was the right thing to do. I likely would have done >>> >>>> >> >>> > the >>> >>>> >> >>> > same >>> >>>> >> >>> > thing >>> >>>> >> >>> > rather than let Mark Wiebe's work languish. >>> >>>> >> >>> >>> >>>> >> >>> My connectivity is spotty this week, so I'll stay out of the >>> >>>> >> >>> technical >>> >>>> >> >>> discussion for now, but I want to share a story. >>> >>>> >> >>> >>> >>>> >> >>> Maybe a year ago now, Jonathan Taylor and I were debating what >>> >>>> >> >>> the >>> >>>> >> >>> best API for describing statistical models would be -- whether >>> >>>> >> >>> we >>> >>>> >> >>> wanted something like R's "formulas" (which I supported), or >>> >>>> >> >>> another >>> >>>> >> >>> approach based on sympy (his idea). To summarize, I thought >>> >>>> >> >>> his >>> >>>> >> >>> API >>> >>>> >> >>> was confusing, pointlessly complicated, and didn't actually >>> >>>> >> >>> solve >>> >>>> >> >>> the >>> >>>> >> >>> problem; he thought R-style formulas were superficially >>> >>>> >> >>> simpler >>> >>>> >> >>> but >>> >>>> >> >>> hopelessly confused and inconsistent underneath. Now, >>> >>>> >> >>> obviously, >>> >>>> >> >>> I >>> >>>> >> >>> was >>> >>>> >> >>> right and he was wrong. Well, obvious to me, anyway... ;-) But >>> >>>> >> >>> it >>> >>>> >> >>> wasn't like I could just wave a wand and make his arguments go >>> >>>> >> >>> away, >>> >>>> >> >>> no I should point out that the implementation hasn't - as far >>> >>>> >> >>> as >>> >>>> >> >>> I can >>> >> see - changed the discussion. The discussion was about the API. >>> >> Implementations are useful for agreed APIs because they can point out >>> >> where the API does not make sense or cannot be implemented. In this >>> >> case, the API Mark said he was going to implement - he did implement - >>> >> at least as far as I can see. Again, I'm happy to be corrected. >>> >> >>> >>>> In saying that we are insisting on our way, you are saying, >>> >>>> implicitly, >>> >>>> 'I >>> >>>> am not going to negotiate'. >>> >>> >>> >>> That is only your interpretation. The observation that Mark >>> >>> compromised >>> >>> quite a bit while you didn't seems largely correct to me. >>> >> >>> >> The problem here stems from our inability to work towards agreement, >>> >> rather than standing on set positions. I set out what changes I think >>> >> would make the current implementation OK. Can we please, please have >>> >> a discussion about those points instead of trying to argue about who >>> >> has given more ground. >>> >> >>> >>> That commitment would of course be good. However, even if that were >>> >>> possible >>> >>> before writing code and everyone agreed that the ideas of you and >>> >>> Nathaniel >>> >>> should be implemented in full, it's still not clear that either of you >>> >>> would >>> >>> be willing to write any code. Agreement without code still doesn't >>> >>> help >>> >>> us >>> >>> very much. >>> >> >>> >> I'm going to return to Nathaniel's point - it is a highly valuable >>> >> thing to set ourselves the target of resolving substantial discussions >>> >> by consensus. The route you are endorsing here is 'implementor >>> >> wins'. We don't need to do it that way. We're a mature sensible >>> >> bunch of adults who can talk out the issues until we agree they are >>> >> ready for implementation, and then implement. That's all Nathaniel is >>> >> saying. I think he's obviously right, and I'm sad that it isn't as >>> >> clear to y'all as it is to me. >>> >> >>> >> Best, >>> >> >>> >> Matthew >>> >> >>> > >>> > Everyone, can we please not do this?! I had enough of adults doing >>> > finger >>> > pointing back over the summer during the whole debt ceiling debate. I >>> > think >>> > we can all agree that we are better than the US congress? >>> > >>> > Forget about rudeness or decision processes. >>> > >>> > I will start by saying that I am willing to separate ignore and absent, >>> > but >>> > only on the write side of things. On read, I want a single way to >>> > identify >>> > the missing values. I also want only a single way to perform >>> > calculations >>> > (either skip or propagate). >>> > >>> > An indicator of success would be that people stop using NaNs and magic >>> > numbers (-9999, anyone?) and we could even deprecate nansum(), or at >>> > least >>> > strongly suggest in its docs to use NA. >>> >>> Well, I haven't completely made up my mind yet, will have to do some >>> more prototyping and playing (and potentially have some of my users >>> eat the differently-flavored dogfood), but I'm really not very >>> satisfied with the API at the moment. I'm mainly worried about the >>> abstraction leaking through to pandas users (this is a pretty large >>> group of people judging by # of downloads). >>> >>> The basic position I'm in is that I'm trying to push Python into a new >>> space, namely mainstream data analysis and statistical computing, one >>> that is solidly occupied by R and other such well-known players. My >>> target users are not computer scientists. They are not going to invest >>> in understanding dtypes very deeply or the internals of ndarray. In >>> fact I've spent a great deal of effort making it so that pandas users >>> can be productive and successful while having very little >>> understanding of NumPy. Yes, I essentially "protect" my users from >>> NumPy because using it well requires a certain level of sophistication >>> that I think is unfair to demand of people. This might seem totally >>> bizarre to some of you but it is simply the state of affairs. So far I >>> have been successful because more people are using Python and pandas >>> to do things that they used to do in R. The NA concept in R is dead >>> simple and I don't see why we are incapable of also implementing >>> something that is just as dead simple. To we, the scipy elite let's >>> call us, it seems simple: "oh, just pass an extra flag to all my array >>> constructors!" But this along with the masked array concept is going >>> to have two likely outcomes: >>> >>> 1) Create a great deal more complication in my already very large codebase >>> >>> and/or >>> >>> 2) force pandas users to understand the new masked arrays after I've >>> carefully made it so they can be largely ignorant of NumPy >>> >>> The mostly-NaN-based solution I've cobbled together and tweaked over >>> the last 42 months actually *works really well*, amazingly, with >>> relatively little cost in code complexity. Having found a reasonably >>> stable equilibrium I'm extremely resistant to upset the balance. >>> >>> So I don't know. After watching these threads bounce back and forth >>> I'm frankly not all that hopeful about a solution arising that >>> actually addresses my needs. >> >> But Wes, what *are* your needs? You keep saying this, but we need examples >> of how you want to operate and how numpy fails. As to dtypes, internals, and >> all that, I don't see any of that in the current implementation, unless you >> mean the maskna and skipna keywords. I believe someone on the previous >> thread mentioned a way to deal with that. >> >> Chuck >> > > Here are my needs: > > 1) How NAs are implemented cannot be end user visible. Having to pass > maskna=True is a problem. I suppose a solution is to set the flag to > true on every array inside of pandas so the user never knows (you > mentioned someone else had some other solution, i could go back and > dig it up?)
I guess this would be the same with bitpatterns, in that the user would have to specify a custom dtype. Is it possible to add a bitpattern NA (in the NaN values) to the current floating point types, at least in principle? So that np.float etc would have bitpattern NAs without a custom dtype? See you, Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion