On Sat, Mar 3, 2012 at 1:30 PM, Travis Oliphant <tra...@continuum.io> wrote:
> Hi all, > > I've been thinking a lot about the masked array implementation lately. > I finally had the time to look hard at what has been done and now am of the > opinion that I do not think that 1.7 can be released with the current state > of the masked array implementation *unless* it is clearly marked as > experimental and may be changed in 1.8 > > That was the intention. > I wish I had been able to be a bigger part of this conversation last year. > But, that is why I took the steps I took to try and figure out another > way to feed my family *and* stay involved in the NumPy community. I would > love to stay involved in what is happening in the SciPy community, but I am > more satisfied with what Ralf, Warren, Robert, Pauli, Josef, Charles, > Stefan, and others are doing there right now, and don't have time to keep > up with everything. Even though SciPy was the heart and soul of why I > even got involved with Python for open source in the first place and took > many years of my volunteer labor, I won't be able to spend significant time > on SciPy code over the coming months. At some point, I really hope to be > able to make contributions again to that code-base. Time will tell > whether or not my aspirations will be realized. It depends quite a bit on > whether or not my kids have what they need from me (which right now is > money and time). > > NumPy, on the other hand, is not in a position where I can feel > comfortable leaving my "baby" to others. I recognize and value the > contributions from many people to make NumPy what it is today (e.g. code > contributions, code rearrangement and standardization, build and install > improvement, and most recently some architectural changes). But, I feel > a personal responsibility for the code base as I spent a great many months > writing NumPy in the first place, and I've spent a great deal of time > interacting with NumPy users and feel like I have at least some sense of > their stories. Of course, I built on the shoulders of giants, and much > of what is there is *because of* where the code was adapted from (it was > not created de-novo). Currently, there remains much that needs to be > communicated, improved, and worked on, and I have specific opinions about > what some changes and improvements should be, how they should be written, > and how the resulting users need to be benefited. > It will take time to discuss all of this, and that's where I will spend > my open-source time in the coming months. > > In that vein: > > Because it is slated to go into release 1.7, we need to re-visit the > masked array discussion again. The NEP process is the appropriate one > and I'm glad we are taking that route for these discussions. My goal is > to get consensus in order for code to get into NumPy (regardless of who > writes the code). It may be that we don't come to a consensus > (reasonable and intelligent people can disagree on things --- look at the > coming election...). We can represent different parts of what is > fortunately a very large user-base of NumPy users. > > First of all, I want to be clear that I think there is much great work > that has been done in the current missing data code. There are some nice > features in the where clause of the ufunc and the machinery for the > iterator that allows re-using ufunc loops that are not re-written to check > for missing data. I'm sure there are other things as well that I'm not > quite aware of yet. However, I don't think the API presented to the > numpy user presently is the correct one for NumPy 1.X. > A few particulars: > > * the reduction operations need to default to "skipna" --- this is > the most common use case which has been re-inforced again to me today by a > new user to Python who is using masked arrays presently > > * the mask needs to be visible to the user if they use that > approach to missing data (people should be able to get a hold of the mask > and work with it in Python) > > * bit-pattern approaches to missing data (at least for float64 and > int32) need to be implemented. > > * there should be some way when using "masks" (even if it's hidden > from most users) for missing data to separate the low-level ufunc operation > from the operation > on the masks... > > Mind, Mark only had a few weeks to write code. I think the unfinished state is a direct function of that. > I have heard from several users that they will *not use the missing data* > in NumPy as currently implemented, and I can now see why. For better or > for worse, my approach to software is generally very user-driven and very > pragmatic. On the other hand, I'm also a mathematician and appreciate the > cognitive compression that can come out of well-formed structure. > None-the-less, I'm an *applied* mathematician and am ultimately motivated > by applications. > > I think that would be Wes. I thought the current state wasn't that far away from what he wanted in the only post where he was somewhat explicit. I think it would be useful for him to sit down with Mark at some time and thrash things out since I think there is some misunderstanding involved. > I will get a hold of the NEP and spend some time with it to discuss some > of this in that document. This will take several weeks (as PyCon is next > week and I have a tutorial I'm giving there). For now, I do not think > 1.7 can be released unless the masked array is labeled *experimental*. > > Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion