Re: [Numpy-discussion] NA masks in the next numpy release?
2011/10/28 Stéfan van der Walt > On Fri, Oct 28, 2011 at 4:19 PM, Charles R Harris > wrote: > > Memory use is a known problem. One way to start addressing it might be to > > implement a "bit" arraytype. It might even be possible to prototype that > on > > top of the existing types. Views make bit arrays a bit more interesting > ;) > > Since 1/8 can be represented exactly in floating point, I guess it's > technically possible to support non-integer strides? > I think the same effect could be obtained with fixed point integers, i.e., the last three bits are the fractional part. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
On Fri, Oct 28, 2011 at 4:19 PM, Charles R Harris wrote: > Memory use is a known problem. One way to start addressing it might be to > implement a "bit" arraytype. It might even be possible to prototype that on > top of the existing types. Views make bit arrays a bit more interesting ;) Since 1/8 can be represented exactly in floating point, I guess it's technically possible to support non-integer strides? Stéfan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
On Fri, Oct 28, 2011 at 5:05 PM, Chris.Barker wrote: > On 10/28/11 11:37 AM, Matthew Brett wrote: > > The main motivation for the alterNEP was our strong feeling that > > separating ABSENT and IGNORE was easier to comprehend and cleaner. > > I don't know about easier to comprehend, or cleaner, but it is more > feature-full. > > I see two issues here: > > 1) being able to distinguish between "ignore" and "not valid" > -- and being able to stop ignoring an ignored value. > > This could quite easily be accomplished with a mask approach -- indeed > with 8 bits, you could have 8 different possible masked states (not that > I'm suggesting that, at least not in core numpy.) > > However, with a bit-pattern approach, you simply can't implement > "ignore". Once it's been set, the previous value is lost. > > > 2) data size: A full mask takes extra space, sometimes a substantial > amount -- so a bit-pattern approach would be nice. > > > I like the idea (that I think Mark attempted to implement) that the > implementation should be hidden from the user - not necessarily entirely > hidden, but subtle enough that that casual user wouldn't need to care > about it. > > I believe the main reason it is hidden from the user is so that the implementation can be changed without impacting existing applications. What I would like to see at this point is folks trying out the software and asking questions on the list like: "I want to do A and tried B, which didn't work. Any suggestions?" In short, I want people to actually use the software to see what issues arise so that we can fix things up. Memory use is a known problem. One way to start addressing it might be to implement a "bit" arraytype. It might even be possible to prototype that on top of the existing types. Views make bit arrays a bit more interesting ;) In that case, I think if we could decide that we want both "ignore" and > "not valid" (and it seems there is a fair bit of interest in that), then > we can proceed with a mask-based approach, and develop an API that makes > as little reference to the mask as possible. > > Then a bit-pattern approach could be developed that uses the same API -- > it would not have the "ignore" option at all, but would be the same for > the "not valid" option. > > When I write this it seem entirely too complicated for both the > developers and users, but maybe it's not -- it could be analogous to > what we have now: arrays can be Fortran or C ordered, contiguous or not, > be views on other arrays or not. To really make numpy dance, you need to > understand all that, but you can also do a whole lot, and write a lot of > generic code, without digging into that. > > If we do all that, maybe there could be a sparse mask implementation, > etc. as well. > > Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
On 10/28/11 11:37 AM, Matthew Brett wrote: > The main motivation for the alterNEP was our strong feeling that > separating ABSENT and IGNORE was easier to comprehend and cleaner. I don't know about easier to comprehend, or cleaner, but it is more feature-full. I see two issues here: 1) being able to distinguish between "ignore" and "not valid" -- and being able to stop ignoring an ignored value. This could quite easily be accomplished with a mask approach -- indeed with 8 bits, you could have 8 different possible masked states (not that I'm suggesting that, at least not in core numpy.) However, with a bit-pattern approach, you simply can't implement "ignore". Once it's been set, the previous value is lost. 2) data size: A full mask takes extra space, sometimes a substantial amount -- so a bit-pattern approach would be nice. I like the idea (that I think Mark attempted to implement) that the implementation should be hidden from the user - not necessarily entirely hidden, but subtle enough that that casual user wouldn't need to care about it. In that case, I think if we could decide that we want both "ignore" and "not valid" (and it seems there is a fair bit of interest in that), then we can proceed with a mask-based approach, and develop an API that makes as little reference to the mask as possible. Then a bit-pattern approach could be developed that uses the same API -- it would not have the "ignore" option at all, but would be the same for the "not valid" option. When I write this it seem entirely too complicated for both the developers and users, but maybe it's not -- it could be analogous to what we have now: arrays can be Fortran or C ordered, contiguous or not, be views on other arrays or not. To really make numpy dance, you need to understand all that, but you can also do a whole lot, and write a lot of generic code, without digging into that. If we do all that, maybe there could be a sparse mask implementation, etc. as well. Maybe I'm dreaming, though... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
Hi, On Fri, Oct 28, 2011 at 1:52 PM, Benjamin Root wrote: > > > On Fri, Oct 28, 2011 at 3:22 PM, Matthew Brett > wrote: >> >> Hi, >> >> On Fri, Oct 28, 2011 at 1:14 PM, Benjamin Root wrote: >> > >> > >> > On Fri, Oct 28, 2011 at 3:02 PM, Matthew Brett >> > wrote: >> >> >> >> You and I know that I've got an array with values [99, 100, 3] and a >> >> mask with values [False, False, True]. So maybe I'd like to see what >> >> happens if I take off the mask from the second value. I know that's >> >> what I want to do, but I don't know how to do it, because you won't >> >> let me manipulate the mask, because I'm not allowed to know that the >> >> NA values come from the mask. >> >> >> >> The alterNEP is just saying - please - be straight with me. If >> >> you're doing masking, show me the mask, and don't try and hide that >> >> there are stored values underneath. >> >> >> > >> > Considering that you have admitted before to not regularly using masked >> > arrays, I seriously doubt that you would be able to judge whether this >> > is a >> > significant detriment or not. My entire point that I have been making >> > is >> > that Mark's implementation is not the same as the current masked arrays. >> > Instead, it is a cleaner, more mature implementation that gets rid of >> > extraneous "features". >> >> This may explain why we don't seem to be getting anywhere. I am sure >> that Mark's implementation of masking is great. We're not talking >> about that. We're talking about whether it's a good idea to make >> masking look as though it is implementing the ABSENT idea. That's >> what I think is confusing, and that's the conversation I have been >> trying to pursue. >> >> Best, >> >> Matthew > > Sorry if I came across too strongly there. No disrespect was intended. I wasn't worried about the disrespect. It's just I feel the discussion has not been to the point. > Personally, I think we are getting somewhere. We have been whittling away > what it is that we do agree upon, and have begun to specify *exactly* what > it is that we disagree on. I have understand your concern, and -- like I > said in my previous email -- it makes sense from the perspective of numpy.ma > users have had up to now. But I'm not a numpy.ma user, I'm just someone who knows that what you are doing is masking out values. The fact that I do not use numpy.ma points out that it's possible to find this highly counter-intuitive without prior bias. > But, I re-raise my point that I have been making > about the need to re-think masked arrays. If we consider masks as advanced > slicing or boolean indexing, then being unable to access the underlying > values actually makes a lot of sense. > > Consider it a contract when I pass a set of data with only certain values > exposed. Because I passed the data with only those values exposed, then it > must have been entirely my intention to let the function know of only those > values. It would be a violation of that contract if the function obtained > those masked values. If I want to communicate both the original values and > a particular mask, then I pass the array and a view with a particular mask. This is the old discussion about what Python users expect. I think they expect to be treated as adults. That is, breaking the contract should not be easy to do by accident, but it should be allowed. > Maybe it would be helpful that an array can never have its own mask, but > rather, only views can carry masks? > > In conclusion, I submit that this is largely a problem that can be solved > with the proper documentation. New users who never used numpy.ma before do > not have to concern themselves with the old way of thinking and are just > simply taught what masked arrays "are". Meanwhile, a special section of the > documentation should be made that teaches numpy.ma users how masked arrays > "should be". I don't think documentation will solve it. In a way, the ideal user is someone who doesn't know what's going on, because, for a while, they may not realize that when they thought they were doing assignment, in fact they are doing masking. Unfortunately, I suspect almost everyone using these things will start to realize that, and then they will start getting confused. I find it confusing, and I believe myself to understand the issues pretty well, and be of numpy-user-range comprehension powers. See you, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
On Fri, Oct 28, 2011 at 3:22 PM, Matthew Brett wrote: > Hi, > > On Fri, Oct 28, 2011 at 1:14 PM, Benjamin Root wrote: > > > > > > On Fri, Oct 28, 2011 at 3:02 PM, Matthew Brett > > wrote: > >> > >> You and I know that I've got an array with values [99, 100, 3] and a > >> mask with values [False, False, True]. So maybe I'd like to see what > >> happens if I take off the mask from the second value. I know that's > >> what I want to do, but I don't know how to do it, because you won't > >> let me manipulate the mask, because I'm not allowed to know that the > >> NA values come from the mask. > >> > >> The alterNEP is just saying - please - be straight with me. If > >> you're doing masking, show me the mask, and don't try and hide that > >> there are stored values underneath. > >> > > > > Considering that you have admitted before to not regularly using masked > > arrays, I seriously doubt that you would be able to judge whether this is > a > > significant detriment or not. My entire point that I have been making is > > that Mark's implementation is not the same as the current masked arrays. > > Instead, it is a cleaner, more mature implementation that gets rid of > > extraneous "features". > > This may explain why we don't seem to be getting anywhere. I am sure > that Mark's implementation of masking is great. We're not talking > about that. We're talking about whether it's a good idea to make > masking look as though it is implementing the ABSENT idea. That's > what I think is confusing, and that's the conversation I have been > trying to pursue. > > Best, > > Matthew > Sorry if I came across too strongly there. No disrespect was intended. Personally, I think we are getting somewhere. We have been whittling away what it is that we do agree upon, and have begun to specify *exactly* what it is that we disagree on. I have understand your concern, and -- like I said in my previous email -- it makes sense from the perspective of numpy.mausers have had up to now. But, I re-raise my point that I have been making about the need to re-think masked arrays. If we consider masks as advanced slicing or boolean indexing, then being unable to access the underlying values actually makes a lot of sense. Consider it a contract when I pass a set of data with only certain values exposed. Because I passed the data with only those values exposed, then it must have been entirely my intention to let the function know of only those values. It would be a violation of that contract if the function obtained those masked values. If I want to communicate both the original values and a particular mask, then I pass the array and a view with a particular mask. Maybe it would be helpful that an array can never have its own mask, but rather, only views can carry masks? In conclusion, I submit that this is largely a problem that can be solved with the proper documentation. New users who never used numpy.ma before do not have to concern themselves with the old way of thinking and are just simply taught what masked arrays "are". Meanwhile, a special section of the documentation should be made that teaches numpy.ma users how masked arrays "should be". Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
2011/10/28 Stéfan van der Walt > On Fri, Oct 28, 2011 at 12:47 PM, Benjamin Root wrote: > > > > 2011/10/28 Stéfan van der Walt > >> The > >> implementation as it stands essentially gives us a faster and more > >> integrated version of numpy.ma; but it has become clear from this > >> conversation that such an approach overlooks a very common subset of > >> masked-related problems. > >> > > Which are...? (given the history of this discussion, let's not assume > > anything is clear). > > The case where the number of elements in the array vastly outnumbers > the number of masked elements. (Images, 3D volumes, large > time-series, tables, etc.) > > E.g., if you are taking measurements from a sensor, but once in a blue > moon the sensor messes up, you simply want to mark those values as > missing, but you do not want to allocate a whole new chunk of memory > to do so. > > I had a chat with JB Poline this morning, who mentioned that sparse > matrix storage of the mask may also be an option. Those containers > typically trade off insertion vs. lookup speeds, so I'm not sure > whether it'd be feasible, but I like the idea. > > I think simple run length encoding might work well with masks. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
Hi, On Fri, Oct 28, 2011 at 1:14 PM, Benjamin Root wrote: > > > On Fri, Oct 28, 2011 at 3:02 PM, Matthew Brett > wrote: >> >> You and I know that I've got an array with values [99, 100, 3] and a >> mask with values [False, False, True]. So maybe I'd like to see what >> happens if I take off the mask from the second value. I know that's >> what I want to do, but I don't know how to do it, because you won't >> let me manipulate the mask, because I'm not allowed to know that the >> NA values come from the mask. >> >> The alterNEP is just saying - please - be straight with me. If >> you're doing masking, show me the mask, and don't try and hide that >> there are stored values underneath. >> > > Considering that you have admitted before to not regularly using masked > arrays, I seriously doubt that you would be able to judge whether this is a > significant detriment or not. My entire point that I have been making is > that Mark's implementation is not the same as the current masked arrays. > Instead, it is a cleaner, more mature implementation that gets rid of > extraneous "features". This may explain why we don't seem to be getting anywhere. I am sure that Mark's implementation of masking is great. We're not talking about that. We're talking about whether it's a good idea to make masking look as though it is implementing the ABSENT idea. That's what I think is confusing, and that's the conversation I have been trying to pursue. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
On Fri, Oct 28, 2011 at 1:14 PM, Benjamin Root wrote: > Considering that you have admitted before to not regularly using masked > arrays, I seriously doubt that you would be able to judge whether this is a > significant detriment or not. Let's not be unreasonable; Matthew has a valid concern (maybe from experience in teaching numpy): once the machinery under the hood becomes opaque, it becomes much harder to use numpy intuitively. Stéfan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
On Fri, Oct 28, 2011 at 3:02 PM, Matthew Brett wrote: > > You and I know that I've got an array with values [99, 100, 3] and a > mask with values [False, False, True]. So maybe I'd like to see what > happens if I take off the mask from the second value. I know that's > what I want to do, but I don't know how to do it, because you won't > let me manipulate the mask, because I'm not allowed to know that the > NA values come from the mask. > > The alterNEP is just saying - please - be straight with me. If > you're doing masking, show me the mask, and don't try and hide that > there are stored values underneath. > > Considering that you have admitted before to not regularly using masked arrays, I seriously doubt that you would be able to judge whether this is a significant detriment or not. My entire point that I have been making is that Mark's implementation is not the same as the current masked arrays. Instead, it is a cleaner, more mature implementation that gets rid of extraneous "features". Instead of fussing around with a mask directly in the array, the user of masked arrays should now consider the use of views as the masks. It works beautifully because it works off a well-documented and well-understood feature of numpy. Of course, when you look at the feature in your way, with those expectations, then I would agree that it might be confusing. But given that this is a completely new feature, then we have the opportunity to properly document and show how to rethink a user's pre-conceptions of masked arrays. Users can keep the original array as a plain array and have mask1, mask2, mask3, etc as being separate views. It is a completely different way to think of masked arrays, and considering that masked arrays are not widely used in other toolkits, I think we can be free to change the paradigm. Further, there is no reason why we can't keep numpy.ma around for backwards compatibility and for those who "just don't get it". Cheers, Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
On Fri, Oct 28, 2011 at 12:47 PM, Benjamin Root wrote: > > 2011/10/28 Stéfan van der Walt >> The >> implementation as it stands essentially gives us a faster and more >> integrated version of numpy.ma; but it has become clear from this >> conversation that such an approach overlooks a very common subset of >> masked-related problems. >> > Which are...? (given the history of this discussion, let's not assume > anything is clear). The case where the number of elements in the array vastly outnumbers the number of masked elements. (Images, 3D volumes, large time-series, tables, etc.) E.g., if you are taking measurements from a sensor, but once in a blue moon the sensor messes up, you simply want to mark those values as missing, but you do not want to allocate a whole new chunk of memory to do so. I had a chat with JB Poline this morning, who mentioned that sparse matrix storage of the mask may also be an option. Those containers typically trade off insertion vs. lookup speeds, so I'm not sure whether it'd be feasible, but I like the idea. Stéfan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
Hi, On Fri, Oct 28, 2011 at 12:15 PM, Lluís wrote: > Summarizing: let's forget for a moment that "mask" has a meaning in english: This is at the core of the problem. You and I know what's really going on - there's a mask over the data. But in what follows we're going to try and pretend that is not what is going on. The result is something that is rather hard to understand, and, when you do understand it, it's surprising and inconvenient. This is all because we tried to conceal what was really going on. > - "maskna" corresponds to ABSENT > - "ownmaskna" corresponds to IGNORED > > The problem here is that of the two implementation mechanisms (masks and > bitpatterns), only the first can provide both semantics. But let's be clear. The current masked array implementation is made so it looks like ABSENT, and makes IGNORED hard to get to. > Let's start with an array that already supports NAs: > > In [1]: a = np.array([1, 2, 3], maskna = True) > > > > ABSENT (destructive NA assignment) > -- > > Once you assign NA, even if you're using NA masks, the value seems to be lost > forever (i.e., the assignment is destructive regardless of the value): > > In [2]: b = a.view() > In [3]: c = a.view(maskna = True) > In [4]: b[0] = np.NA > In [5]: a > Out[5]: array([NA, 2, 3]) > In [6]: b > Out[6]: array([NA, 2, 3]) > In [7]: c > Out[7]: array([NA, 2, 3]) Right - the mask (fundamentally an IGNORED signal) is pretending to implement ABSENT. But - as you point out below - I'm pasting it here - in fact it's IGNORED. > In [21]: a = np.array([1, 2, 3]) > Out[21]: array([1, 2, 3]) > In [22]: b = a.view(maskna = True) > In [23]: b[0] = np.NA > In [24]: a > Out[24]: array([1, 2, 3]) > In [25]: b > Out[25]: array([NA, 2, 3]) But now - I've done this: >>> a = np.array([99, 100, 3], maskna=True) >>> a[0:2] = np.NA You and I know that I've got an array with values [99, 100, 3] and a mask with values [False, False, True]. So maybe I'd like to see what happens if I take off the mask from the second value. I know that's what I want to do, but I don't know how to do it, because you won't let me manipulate the mask, because I'm not allowed to know that the NA values come from the mask. The alterNEP is just saying - please - be straight with me. If you're doing masking, show me the mask, and don't try and hide that there are stored values underneath. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
2011/10/28 Stéfan van der Walt > On Fri, Oct 28, 2011 at 11:16 AM, Benjamin Root wrote: > > this by making missing data front-and-center. However, my belief is that > > Mark's approach is easier to comprehend and is cleaner. Cleaner features > > means that it is more likely to be used. > > Cleaner features may be easier to adopt, but whether they are used or > not depends on whether they address the problem in hand. The > implementation as it stands essentially gives us a faster and more > integrated version of numpy.ma; but it has become clear from this > conversation that such an approach overlooks a very common subset of > masked-related problems. > > Which are...? (given the history of this discussion, let's not assume anything is clear). > We should be concerned about memory use; we often don't have too much > of it, and accessing it is slow. > > Would it be workable to store 8 mask bits per byte instead? I don't > think it should impact on the speed much, and we can always generate a > full mask for the user on request. > > I suggested such an idea a while back. This is part of the reason why Mark decided that the masks should not be exposed for direct access in case it is decided that masks could be implemented that way. I have a vague recollection of him commenting about some tests he did along that route, but I don't remember it. Cheers, Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
On Fri, Oct 28, 2011 at 11:16 AM, Benjamin Root wrote: > this by making missing data front-and-center. However, my belief is that > Mark's approach is easier to comprehend and is cleaner. Cleaner features > means that it is more likely to be used. Cleaner features may be easier to adopt, but whether they are used or not depends on whether they address the problem in hand. The implementation as it stands essentially gives us a faster and more integrated version of numpy.ma; but it has become clear from this conversation that such an approach overlooks a very common subset of masked-related problems. We should be concerned about memory use; we often don't have too much of it, and accessing it is slow. Would it be workable to store 8 mask bits per byte instead? I don't think it should impact on the speed much, and we can always generate a full mask for the user on request. Regards Stéfan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
I haven't actually tested the code, but AFAIK the following is a short overview with examples of how the two orthogonal feature axis (ABSENT/IGNORE and PROPAGATE/SKIP) are related and how it all is supposed to work. I have never talked to Mark or anybody else in this list (that is, outside of this list), so I may well be mistaken. Thus, sorry if there are any inaccuracies and/or if you are already aware of what I'm describing here. So please tell me if this has helped clarify why I (and I hope others) think the implementation mechanism is independent of the semantics. Lluis ABSENT vs IGNORE Travis Oliphant writes: > As I mentioned. I find the ability to separate an ABSENT idea from an > IGNORED idea convincing.In other words, I think distinguishing between > masks > and bit-patterns is not just an implementation detail, but provides a useful > concept for multiple use-cases. I think it's an implementation detail as long as you have two clear ways of separating them. Summarizing: let's forget for a moment that "mask" has a meaning in english: - "maskna" corresponds to ABSENT - "ownmaskna" corresponds to IGNORED The problem here is that of the two implementation mechanisms (masks and bitpatterns), only the first can provide both semantics. Let's start with an array that already supports NAs: In [1]: a = np.array([1, 2, 3], maskna = True) ABSENT (destructive NA assignment) -- Once you assign NA, even if you're using NA masks, the value seems to be lost forever (i.e., the assignment is destructive regardless of the value): In [2]: b = a.view() In [3]: c = a.view(maskna = True) In [4]: b[0] = np.NA In [5]: a Out[5]: array([NA, 2, 3]) In [6]: b Out[6]: array([NA, 2, 3]) In [7]: c Out[7]: array([NA, 2, 3]) This is the default behaviour, and is probably what the regular user expects by what has been learned from previous uses of the "view" method. Note that here "maskna" acts as an idempotent operation. Once an array has the "maskna" property, all its views will transitively (and destructively) use it. Also note that an array copy will make a copy of both "regular" data and NA values, as expected. IGNORED (non-destructive NA assignment) --- But you can also have non-destructuve NA assignments, although *only* if you explicitly (and thus purposefully) ask for it -> ownmaskna In [8]: b = a.view(ownmaskna = True) In [9]: b[1] = np.NA In [10]: a Out[10]: array([NA, 2, 3]) In [11]: b Out[11]: array([NA, NA, 3]) In [12]: a[2] = np.NA In [13]: a Out[13]: array([NA, 2, NA]) In [14]: b Out[14]: array([NA, NA, 3]) The mask is a copy: In [15]: a[0] = 1 In [16]: a Out[16]: array([1, 2, 3], maskna = True) In [17]: b Out[17]: array([NA, NA, 3]) But the data itself is not (aka, non-NA values are *always* destructive, but I think this is out of the scope of this discussion): In [17]: a[0] = -10 In [18]: a[2] = -30 In [19]: a Out[19]: array([-10, 2, -30], maskna = True) In [20]: b Out[20]: array([NA, NA, -30]) The dark corner --- The only potential misunderstanding can be the creation of a NA-masked array from a "regular" array. This is precisely why I put this case at the end, as it seems to break the intuition some people have about assignment being always destructive (unless you explicitly ask for IGNORED, which is not the case): In [21]: a = np.array([1, 2, 3]) Out[21]: array([1, 2, 3]) In [22]: b = a.view(maskna = True) In [23]: b[0] = np.NA In [24]: a Out[24]: array([1, 2, 3]) In [25]: b Out[25]: array([NA, 2, 3]) This is in fact a corner case, and there is no obvious (and efficient!) way to handle it. As "a" is just a "regular" array, and has no support for any type of NA values (neither masks nor bit-patterns), assignments to any of its views cannot, in any case, be destructive. Note that the previous holds true because it currently is a design decision to forbid the in-flight conversion from "regular" to "NA-enabled" arrays. In fact I forgot that, when reading the docs in [1], I thought that a slight change could make it all feel more consistent: the view of a regular array can have NA values only if "ownmaskna" is used (IGNORED/non-destructive NA assignments), and will give an error if "maskna" is used in entry number 19. [1] http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html#creating-na-masked-views PROPAGATE vs SKIP = I've also read some comments regarding this. Maybe I didn't explain myself correctly in previous mails, or maybe I just misunderstood other people's mails (which might not be about this at all). PROPAGATE - All ufuncs in ndarray propagate NA values. Note that ABSENT (destructive NA-assignment) is also a default, so we could say that the default is R-like behaviour (AFAIK). SKIP You have a different array type (let's call it skip_array), where all ufuncs do *not* propagate NA value
Re: [Numpy-discussion] NA masks in the next numpy release?
Hi, On Fri, Oct 28, 2011 at 11:16 AM, Benjamin Root wrote: > On Fri, Oct 28, 2011 at 12:39 PM, Matthew Brett > wrote: >> >> Hi, >> >> On Thu, Oct 27, 2011 at 10:56 PM, Benjamin Root wrote: >> > >> > >> > On Thursday, October 27, 2011, Charles R Harris >> > >> > wrote: >> >> >> >> >> >> On Thu, Oct 27, 2011 at 7:16 PM, Travis Oliphant >> >> >> >> wrote: >> >>> >> >>> That is a pretty good explanation. I find myself convinced by >> >>> Matthew's >> >>> arguments. I think that being able to separate ABSENT from IGNORED >> >>> is a >> >>> good idea. I also like being able to control SKIP and PROPAGATE (but >> >>> I >> >>> think the current implementation allows this already). >> >>> >> >>> What is the counter-argument to this proposal? >> >>> >> >> >> >> What exactly do you find convincing? The current masks propagate by >> >> default: >> >> >> >> In [1]: a = ones(5, maskna=1) >> >> >> >> In [2]: a[2] = NA >> >> >> >> In [3]: a >> >> Out[3]: array([ 1., 1., NA, 1., 1.]) >> >> >> >> In [4]: a + 1 >> >> Out[4]: array([ 2., 2., NA, 2., 2.]) >> >> >> >> In [5]: a[2] = 10 >> >> >> >> In [5]: a >> >> Out[5]: array([ 1., 1., 10., 1., 1.], maskna=True) >> >> >> >> >> >> I don't see an essential difference between the implementation using >> >> masks >> >> and one using bit patterns, the mask when attached to the original >> >> array >> >> just adds a bit pattern by extending all the types by one byte, an >> >> approach >> >> that easily extends to all existing and future types, which is why Mark >> >> went >> >> that way for the first implementation given the time available. The >> >> masks >> >> are hidden because folks wanted something that behaved more like R and >> >> also >> >> because of the desire to combine the missing, ignore, and later >> >> possibly bit >> >> patterns in a unified manner. Note that the pseudo assignment was also >> >> meant >> >> to look like R. Adding true bit patterns to numpy isn't trivial and I >> >> believe Mark was thinking of parametrized types for that. >> >> >> >> The main problems I see with masks are unified storage and possibly >> >> memory >> >> use. The rest is just behavor and desired API and that can be adjusted >> >> within the current implementation. There is nothing essentially masky >> >> about >> >> masks. >> >> >> >> Chuck >> >> >> >> >> > >> > I think chuck sums it up quite nicely. The implementation detail about >> > using mask versus bit patterns can still be discussed and addressed. >> > Personally, I just don't see how parameterized dtypes would be easier to >> > use >> > than the pseudo assignment. >> > >> > The elegance of mark's solution was to consider the treatment of missing >> > data in a unified manner. This puts missing data in a more prominent >> > spot >> > for extension builders, which should greatly improve support throughout >> > the >> > ecosystem. >> >> Are extension builders then required to use the numpy C API to get >> their data? Speaking as an extension builder, I would rather you gave >> me the mask and the bitpattern information and let me do that myself. >> > > Forgive me, I wasn't clear. What I am speaking of is more about a typical > human failing. If a programmer for a module never encounters masked arrays, > then when they code up a function to operate on numpy data, it is quite > likely that they would never take it into consideration. Notice the > prolific use of "np.asarray()" even within the numpy codebase, which > destroys masked arrays. Hmm - that sounds like it could cause some surprises. So, what you were saying was just that it was good that masked arrays were now closer to the core? That's reasonable, but I don't think it's relevant to the current discussion. I think we all agree it is nice to have masked arrays in the core. > However, by making missing data support more integral into the core of > numpy, then it is far more likely that a programmer would take it into > consideration when designing their algorithm, or at least explicitly > document that their module does not support missing data. Both NEPs does > this by making missing data front-and-center. However, my belief is that > Mark's approach is easier to comprehend and is cleaner. Cleaner features > means that it is more likely to be used. The main motivation for the alterNEP was our strong feeling that separating ABSENT and IGNORE was easier to comprehend and cleaner. I think it would be hard to argue that the aterNEP idea is not more explicit. >> > By letting there be a single missing data framework (instead of >> > two) all that users need to figure out is when they want nan-like >> > behavior >> > (propagate) or to be more like masks (skip). Numpy takes care of the >> > rest. >> > There is a reason why I like using masked arrays because I don't have >> > to >> > use nansum in my library functions to guard against the possibility of >> > receiving nans. Duck-typing is a good thing. >> > >> > My argument against s
Re: [Numpy-discussion] NA masks in the next numpy release?
On Fri, Oct 28, 2011 at 12:58 PM, Gary Strangman wrote: > > >> I wonder if that might be handled as a scikits-image extension, rather > >> than core numpy? > > > > I think Stefan and Nathaniel and Gary Strangman and others are saying > > we don't want to pay the price of a large memory hike for masking. I > > suspect that Nathaniel is right, and that a large majority of those of > > us who want 'missing data' functionality, also want what we've called > > ABSENT missing values, and care about memory. > > FWIW, Matthew correctly interprets my concerns. I also have very large > non-image datasets, so pushing the problem into a more custom extension > (esp. one focused on images) doesn't help me much. > > -best > Gary > > I would wonder if the masks could benefit from the approach used for the "carray" (compressed arrays) project? Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
On Fri, Oct 28, 2011 at 12:39 PM, Matthew Brett wrote: > Hi, > > On Thu, Oct 27, 2011 at 10:56 PM, Benjamin Root wrote: > > > > > > On Thursday, October 27, 2011, Charles R Harris < > charlesr.har...@gmail.com> > > wrote: > >> > >> > >> On Thu, Oct 27, 2011 at 7:16 PM, Travis Oliphant < > oliph...@enthought.com> > >> wrote: > >>> > >>> That is a pretty good explanation. I find myself convinced by > Matthew's > >>> arguments.I think that being able to separate ABSENT from IGNORED > is a > >>> good idea. I also like being able to control SKIP and PROPAGATE (but > I > >>> think the current implementation allows this already). > >>> > >>> What is the counter-argument to this proposal? > >>> > >> > >> What exactly do you find convincing? The current masks propagate by > >> default: > >> > >> In [1]: a = ones(5, maskna=1) > >> > >> In [2]: a[2] = NA > >> > >> In [3]: a > >> Out[3]: array([ 1., 1., NA, 1., 1.]) > >> > >> In [4]: a + 1 > >> Out[4]: array([ 2., 2., NA, 2., 2.]) > >> > >> In [5]: a[2] = 10 > >> > >> In [5]: a > >> Out[5]: array([ 1., 1., 10., 1., 1.], maskna=True) > >> > >> > >> I don't see an essential difference between the implementation using > masks > >> and one using bit patterns, the mask when attached to the original array > >> just adds a bit pattern by extending all the types by one byte, an > approach > >> that easily extends to all existing and future types, which is why Mark > went > >> that way for the first implementation given the time available. The > masks > >> are hidden because folks wanted something that behaved more like R and > also > >> because of the desire to combine the missing, ignore, and later possibly > bit > >> patterns in a unified manner. Note that the pseudo assignment was also > meant > >> to look like R. Adding true bit patterns to numpy isn't trivial and I > >> believe Mark was thinking of parametrized types for that. > >> > >> The main problems I see with masks are unified storage and possibly > memory > >> use. The rest is just behavor and desired API and that can be adjusted > >> within the current implementation. There is nothing essentially masky > about > >> masks. > >> > >> Chuck > >> > >> > > > > I think chuck sums it up quite nicely. The implementation detail about > > using mask versus bit patterns can still be discussed and addressed. > > Personally, I just don't see how parameterized dtypes would be easier to > use > > than the pseudo assignment. > > > > The elegance of mark's solution was to consider the treatment of missing > > data in a unified manner. This puts missing data in a more prominent > spot > > for extension builders, which should greatly improve support throughout > the > > ecosystem. > > Are extension builders then required to use the numpy C API to get > their data? Speaking as an extension builder, I would rather you gave > me the mask and the bitpattern information and let me do that myself. > > Forgive me, I wasn't clear. What I am speaking of is more about a typical human failing. If a programmer for a module never encounters masked arrays, then when they code up a function to operate on numpy data, it is quite likely that they would never take it into consideration. Notice the prolific use of "np.asarray()" even within the numpy codebase, which destroys masked arrays. However, by making missing data support more integral into the core of numpy, then it is far more likely that a programmer would take it into consideration when designing their algorithm, or at least explicitly document that their module does not support missing data. Both NEPs does this by making missing data front-and-center. However, my belief is that Mark's approach is easier to comprehend and is cleaner. Cleaner features means that it is more likely to be used. > > By letting there be a single missing data framework (instead of > > two) all that users need to figure out is when they want nan-like > behavior > > (propagate) or to be more like masks (skip). Numpy takes care of the > rest. > > There is a reason why I like using masked arrays because I don't have to > > use nansum in my library functions to guard against the possibility of > > receiving nans. Duck-typing is a good thing. > > > > My argument against separating IGNORE and PROPAGATE is that it becomes > too > > tempting to want to mix these in an array, but the desired behavior would > > likely become ambiguous.. > > > > There is one other proplem that I just thought of that I don't think has > > been outlined in either NEP. What if I perform an operation between an > > array set up with propagate NAs and an array with skip NAs? > > These are explicitly covered in the alterNEP: > > https://gist.github.com/1056379/ > > Sort of. You speak of reduction operations for a single array with a mix of NA and IGNOREs. I guess in that case, it wouldn't make a difference for element-wise operations between two arrays (plus adding the NAs propagate harder rule). Alth
Re: [Numpy-discussion] NA masks in the next numpy release?
>> I wonder if that might be handled as a scikits-image extension, rather >> than core numpy? > > I think Stefan and Nathaniel and Gary Strangman and others are saying > we don't want to pay the price of a large memory hike for masking. I > suspect that Nathaniel is right, and that a large majority of those of > us who want 'missing data' functionality, also want what we've called > ABSENT missing values, and care about memory. FWIW, Matthew correctly interprets my concerns. I also have very large non-image datasets, so pushing the problem into a more custom extension (esp. one focused on images) doesn't help me much. -best Gary The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance HelpLine at http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
Hi, On Fri, Oct 28, 2011 at 9:21 AM, Chris.Barker wrote: > On 10/27/11 7:51 PM, Travis Oliphant wrote: >> As I mentioned. I find the ability to separate an ABSENT idea from an >> IGNORED idea convincing. In other words, I think distinguishing between >> masks and bit-patterns is not just an implementation detail, but >> provides a useful concept for multiple use-cases. > > Exactly -- while one can implement ABSENT with a mask, one can not > implement IGNORE with a bit-pattern. So it is not an implementation detail. > > I also think bit-patterns are a bit of a dead end: > > - there is only a standard for one data type family: i.e. NaN for ieee > float types > > - So we would be coming up with our own standard (or adopting an > existing one, but I don't think there is one widely supported) for other > types. This means: > 1) a lot of work to do Largest possible negative integer for ints / largest integer for uints / not allowed for bool? > 2) a binary format incompatible with other code, compilers, etc. This > is a BIG deal -- a major strength of numpy is that it serves as a > wrapper for a data block that is compatible with C, Fortran or whatever > code -- special bit patterns would make this a lot harder. Extension code is going to get harder. At the moment, as far as I understand it, our extension code can receive a masked array and (without an explicit check from us) ignore the mask and process all the values. Then you're in the unfortunate situation of caring what's under the mask. Bitpatterns would - I imagine - be safer in that respect in that they would be new dtypes and thus extension code would by default reject them as unknown. > We also talked about the fact that a 8-bit mask provides the ability to > carry other information in the mask -- not jsut "missing" or "ignored", > but a handful of other possible reasons for masking. I think that has a > lot of possibilities. > > On 10/28/11 2:11 AM, Stéfan van der Walt wrote: >> Another data point: I've been spending some time on scikits-image >> recently, and although masked values would be highly useful in that >> context, the cost of doubling memory use (for uint8 images, e.g.) is >> too high. > >> 2) that we make a concerted effort to implement the bitmask mode of >> operation as soon as possible. > > I wonder if that might be handled as a scikits-image extension, rather > than core numpy? I think Stefan and Nathaniel and Gary Strangman and others are saying we don't want to pay the price of a large memory hike for masking. I suspect that Nathaniel is right, and that a large majority of those of us who want 'missing data' functionality, also want what we've called ABSENT missing values, and care about memory. See you, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
Hi, On Thu, Oct 27, 2011 at 10:56 PM, Benjamin Root wrote: > > > On Thursday, October 27, 2011, Charles R Harris > wrote: >> >> >> On Thu, Oct 27, 2011 at 7:16 PM, Travis Oliphant >> wrote: >>> >>> That is a pretty good explanation. I find myself convinced by Matthew's >>> arguments. I think that being able to separate ABSENT from IGNORED is a >>> good idea. I also like being able to control SKIP and PROPAGATE (but I >>> think the current implementation allows this already). >>> >>> What is the counter-argument to this proposal? >>> >> >> What exactly do you find convincing? The current masks propagate by >> default: >> >> In [1]: a = ones(5, maskna=1) >> >> In [2]: a[2] = NA >> >> In [3]: a >> Out[3]: array([ 1., 1., NA, 1., 1.]) >> >> In [4]: a + 1 >> Out[4]: array([ 2., 2., NA, 2., 2.]) >> >> In [5]: a[2] = 10 >> >> In [5]: a >> Out[5]: array([ 1., 1., 10., 1., 1.], maskna=True) >> >> >> I don't see an essential difference between the implementation using masks >> and one using bit patterns, the mask when attached to the original array >> just adds a bit pattern by extending all the types by one byte, an approach >> that easily extends to all existing and future types, which is why Mark went >> that way for the first implementation given the time available. The masks >> are hidden because folks wanted something that behaved more like R and also >> because of the desire to combine the missing, ignore, and later possibly bit >> patterns in a unified manner. Note that the pseudo assignment was also meant >> to look like R. Adding true bit patterns to numpy isn't trivial and I >> believe Mark was thinking of parametrized types for that. >> >> The main problems I see with masks are unified storage and possibly memory >> use. The rest is just behavor and desired API and that can be adjusted >> within the current implementation. There is nothing essentially masky about >> masks. >> >> Chuck >> >> > > I think chuck sums it up quite nicely. The implementation detail about > using mask versus bit patterns can still be discussed and addressed. > Personally, I just don't see how parameterized dtypes would be easier to use > than the pseudo assignment. > > The elegance of mark's solution was to consider the treatment of missing > data in a unified manner. This puts missing data in a more prominent spot > for extension builders, which should greatly improve support throughout the > ecosystem. Are extension builders then required to use the numpy C API to get their data? Speaking as an extension builder, I would rather you gave me the mask and the bitpattern information and let me do that myself. > By letting there be a single missing data framework (instead of > two) all that users need to figure out is when they want nan-like behavior > (propagate) or to be more like masks (skip). Numpy takes care of the rest. > There is a reason why I like using masked arrays because I don't have to > use nansum in my library functions to guard against the possibility of > receiving nans. Duck-typing is a good thing. > > My argument against separating IGNORE and PROPAGATE is that it becomes too > tempting to want to mix these in an array, but the desired behavior would > likely become ambiguous.. > > There is one other proplem that I just thought of that I don't think has > been outlined in either NEP. What if I perform an operation between an > array set up with propagate NAs and an array with skip NAs? These are explicitly covered in the alterNEP: https://gist.github.com/1056379/ Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
On 10/27/11 7:51 PM, Travis Oliphant wrote: > As I mentioned. I find the ability to separate an ABSENT idea from an > IGNORED idea convincing. In other words, I think distinguishing between > masks and bit-patterns is not just an implementation detail, but > provides a useful concept for multiple use-cases. Exactly -- while one can implement ABSENT with a mask, one can not implement IGNORE with a bit-pattern. So it is not an implementation detail. I also think bit-patterns are a bit of a dead end: - there is only a standard for one data type family: i.e. NaN for ieee float types - So we would be coming up with our own standard (or adopting an existing one, but I don't think there is one widely supported) for other types. This means: 1) a lot of work to do 2) a binary format incompatible with other code, compilers, etc. This is a BIG deal -- a major strength of numpy is that it serves as a wrapper for a data block that is compatible with C, Fortran or whatever code -- special bit patterns would make this a lot harder. We also talked about the fact that a 8-bit mask provides the ability to carry other information in the mask -- not jsut "missing" or "ignored", but a handful of other possible reasons for masking. I think that has a lot of possibilities. On 10/28/11 2:11 AM, Stéfan van der Walt wrote: > Another data point: I've been spending some time on scikits-image > recently, and although masked values would be highly useful in that > context, the cost of doubling memory use (for uint8 images, e.g.) is > too high. > 2) that we make a concerted effort to implement the bitmask mode of > operation as soon as possible. I wonder if that might be handled as a scikits-image extension, rather than core numpy? Is there a standard bit pattern for missing data in images? -- it's presumable quite important to maintain binary compatibility with image formats, processing tools, etc. I guess what I'm getting at is that special bit-pattern implementations may be domain specific. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
On Thu, Oct 27, 2011 at 8:51 PM, Travis Oliphant wrote: > As I mentioned. I find the ability to separate an ABSENT idea from an > IGNORED idea convincing.In other words, I think distinguishing between > masks and bit-patterns is not just an implementation detail, but provides a > useful concept for multiple use-cases. > > I understand exactly what it would take to add bit-patterns to NumPy. I > also understand what Mark did and agree that it is possible to add Matthew's > idea to the current code-base. I think it is worth exploring > > A masked view can be considered as simply a mask on the viewed data. I agree that in that case it might be nicer to have some operations that are only allowed for views, such as taking a view with a mask from somewhere else rather than having to set it up with assignments. It might also be useful if masked values in a view could be exposed without assigning to the underlying value, perhaps with a np.EXPOSE assignment. But I think these operations could be implemented on top of the current code, although we might want an additional flag. Space saving can be addressed with bit masks. Unified storage can be addressed by bit patterns that get translated between stored data and numpy arrays with NA. So on and so forth. As people begin to use the current implementation I hope that they offer feedback as to what they discover so that the API and implementation can mature into something widely useful. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
On Fri, Oct 28, 2011 at 10:45:09AM +0200, Han Genuit wrote: > Also, I like the short and concise abbreviation for 'Not Applicable', > NA. It has more common uses than IGNORE. > (See also here: > http://www.johndcook.com/R_language_for_programmers.html#missing) That's a very R centric point a view: you know what NA stands for, thus you find it meaningful. I can tell you that when I work with naive users, I keep having to explain that NA stands for 'not available', whereas IGNORE is at least somewhat explicit. Acronyms are a curse for communication, and they tend to be very domain specific. My two euro-cents (a rising currency, now that it has been saved by our generous leaders) G ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
Hi all, On Thu, Oct 27, 2011 at 7:51 PM, Travis Oliphant wrote: > I understand exactly what it would take to add bit-patterns to NumPy. I > also understand what Mark did and agree that it is possible to add Matthew's > idea to the current code-base. I think it is worth exploring Another data point: I've been spending some time on scikits-image recently, and although masked values would be highly useful in that context, the cost of doubling memory use (for uint8 images, e.g.) is too high. Many users with large data sets (and I think almost all researchers working on >2D data would be included here as well) may have the same problem. So, while I applaud the efforts made to include a masked array implementation, I'd like to ask that: 1) We are mindful that any design decisions taken before the next release should not *preclude* the implementation of bit-masks (with, hopefully, a shared interface) and 2) that we make a concerted effort to implement the bitmask mode of operation as soon as possible. The NEP stated that both would be implemented, and I understand that due to lack of time a pragmatic call had to be made--but that was, in my opinion, one of its strong features. Regards Stéfan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
Yes, to further iterate on that, you can also create multiple masked views with each its own mask properties. It would be ambiguous to mix a bit-pattern NA together with standard NA's in the same mask, but you can make different specialized masked views on the same data. Also, I like the short and concise abbreviation for 'Not Applicable', NA. It has more common uses than IGNORE. (See also here: http://www.johndcook.com/R_language_for_programmers.html#missing) Concerning the assignment, it is a bit implicit, I agree, but the representation and application of masks is also implicit. I think you only have to know that NA will be a mask assignment and not a data assignment. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
> It should be possible to remove a mask when copying an array. > This was a concession on the part of those pushing for masks. Eventually, I ended up realizing that it resulted in a stronger design. Consider the following: foo(a[4:10]) Should function foo be able to access the rest of array "a", even though it has a part of it? Of course not! Now, if one considers masking as a form of advanced slicing, then it wouldnt make sense for foo() to be able to access parts it wasn't given. That being said, this is where NumPy array views come into play. You can create a view of the original data, add masks to the view, and still have access to all of the original data, unmasked. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
There is a way to assign whole masks in the current implementation: >>> a = np.arange(9, maskna=True).reshape((3,3)) >>> a array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) >>> mask = np.array([[False, False, True], [False, True, False], [True, False, True]]) >>> np.copyto(a, np.NA, where=mask) >>> a array([[0, 1, NA], [3, NA, 5], [NA, 7, NA]]) I think the "ValueError: Cannot assign NA to an array which does not support NAs" when trying to copy an array with a mask to an array without a mask is a bug.. >>> a = np.arange(9, maskna=True).reshape((3,3)) >>> a.flags.maskna True >>> b = a.copy(maskna=False) >>> b.flags.maskna False It should be possible to remove a mask when copying an array. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
On Thursday, October 27, 2011, Charles R Harris wrote: > > > On Thu, Oct 27, 2011 at 7:16 PM, Travis Oliphant wrote: >> >> That is a pretty good explanation. I find myself convinced by Matthew's arguments.I think that being able to separate ABSENT from IGNORED is a good idea. I also like being able to control SKIP and PROPAGATE (but I think the current implementation allows this already). >> >> What is the counter-argument to this proposal? >> > > What exactly do you find convincing? The current masks propagate by default: > > In [1]: a = ones(5, maskna=1) > > In [2]: a[2] = NA > > In [3]: a > Out[3]: array([ 1., 1., NA, 1., 1.]) > > In [4]: a + 1 > Out[4]: array([ 2., 2., NA, 2., 2.]) > > In [5]: a[2] = 10 > > In [5]: a > Out[5]: array([ 1., 1., 10., 1., 1.], maskna=True) > > > I don't see an essential difference between the implementation using masks and one using bit patterns, the mask when attached to the original array just adds a bit pattern by extending all the types by one byte, an approach that easily extends to all existing and future types, which is why Mark went that way for the first implementation given the time available. The masks are hidden because folks wanted something that behaved more like R and also because of the desire to combine the missing, ignore, and later possibly bit patterns in a unified manner. Note that the pseudo assignment was also meant to look like R. Adding true bit patterns to numpy isn't trivial and I believe Mark was thinking of parametrized types for that. > > The main problems I see with masks are unified storage and possibly memory use. The rest is just behavor and desired API and that can be adjusted within the current implementation. There is nothing essentially masky about masks. > > Chuck > > I think chuck sums it up quite nicely. The implementation detail about using mask versus bit patterns can still be discussed and addressed. Personally, I just don't see how parameterized dtypes would be easier to use than the pseudo assignment. The elegance of mark's solution was to consider the treatment of missing data in a unified manner. This puts missing data in a more prominent spot for extension builders, which should greatly improve support throughout the ecosystem. By letting there be a single missing data framework (instead of two) all that users need to figure out is when they want nan-like behavior (propagate) or to be more like masks (skip). Numpy takes care of the rest. There is a reason why I like using masked arrays because I don't have to use nansum in my library functions to guard against the possibility of receiving nans. Duck-typing is a good thing. My argument against separating IGNORE and PROPAGATE is that it becomes too tempting to want to mix these in an array, but the desired behavior would likely become ambiguous.. There is one other proplem that I just thought of that I don't think has been outlined in either NEP. What if I perform an operation between an array set up with propagate NAs and an array with skip NAs? cheers, Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
As I mentioned. I find the ability to separate an ABSENT idea from an IGNORED idea convincing.In other words, I think distinguishing between masks and bit-patterns is not just an implementation detail, but provides a useful concept for multiple use-cases. I understand exactly what it would take to add bit-patterns to NumPy. I also understand what Mark did and agree that it is possible to add Matthew's idea to the current code-base. I think it is worth exploring -Travis On Oct 27, 2011, at 9:08 PM, Charles R Harris wrote: > > > On Thu, Oct 27, 2011 at 7:16 PM, Travis Oliphant > wrote: > That is a pretty good explanation. I find myself convinced by Matthew's > arguments.I think that being able to separate ABSENT from IGNORED is a > good idea. I also like being able to control SKIP and PROPAGATE (but I > think the current implementation allows this already). > > What is the counter-argument to this proposal? > > > What exactly do you find convincing? The current masks propagate by default: > > In [1]: a = ones(5, maskna=1) > > In [2]: a[2] = NA > > In [3]: a > Out[3]: array([ 1., 1., NA, 1., 1.]) > > In [4]: a + 1 > Out[4]: array([ 2., 2., NA, 2., 2.]) > > In [5]: a[2] = 10 > > In [5]: a > Out[5]: array([ 1., 1., 10., 1., 1.], maskna=True) > > > I don't see an essential difference between the implementation using masks > and one using bit patterns, the mask when attached to the original array just > adds a bit pattern by extending all the types by one byte, an approach that > easily extends to all existing and future types, which is why Mark went that > way for the first implementation given the time available. The masks are > hidden because folks wanted something that behaved more like R and also > because of the desire to combine the missing, ignore, and later possibly bit > patterns in a unified manner. Note that the pseudo assignment was also meant > to look like R. Adding true bit patterns to numpy isn't trivial and I believe > Mark was thinking of parametrized types for that. > > The main problems I see with masks are unified storage and possibly memory > use. The rest is just behavor and desired API and that can be adjusted within > the current implementation. There is nothing essentially masky about masks. > > Chuck > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion --- Travis Oliphant Enthought, Inc. oliph...@enthought.com 1-512-536-1057 http://www.enthought.com ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
On Thu, Oct 27, 2011 at 7:16 PM, Travis Oliphant wrote: > That is a pretty good explanation. I find myself convinced by Matthew's > arguments.I think that being able to separate ABSENT from IGNORED is a > good idea. I also like being able to control SKIP and PROPAGATE (but I > think the current implementation allows this already). > > What is the counter-argument to this proposal? > > What exactly do you find convincing? The current masks propagate by default: In [1]: a = ones(5, maskna=1) In [2]: a[2] = NA In [3]: a Out[3]: array([ 1., 1., NA, 1., 1.]) In [4]: a + 1 Out[4]: array([ 2., 2., NA, 2., 2.]) In [5]: a[2] = 10 In [5]: a Out[5]: array([ 1., 1., 10., 1., 1.], maskna=True) I don't see an essential difference between the implementation using masks and one using bit patterns, the mask when attached to the original array just adds a bit pattern by extending all the types by one byte, an approach that easily extends to all existing and future types, which is why Mark went that way for the first implementation given the time available. The masks are hidden because folks wanted something that behaved more like R and also because of the desire to combine the missing, ignore, and later possibly bit patterns in a unified manner. Note that the pseudo assignment was also meant to look like R. Adding true bit patterns to numpy isn't trivial and I believe Mark was thinking of parametrized types for that. The main problems I see with masks are unified storage and possibly memory use. The rest is just behavor and desired API and that can be adjusted within the current implementation. There is nothing essentially masky about masks. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
That is a pretty good explanation. I find myself convinced by Matthew's arguments.I think that being able to separate ABSENT from IGNORED is a good idea. I also like being able to control SKIP and PROPAGATE (but I think the current implementation allows this already). What is the counter-argument to this proposal? -Travis On Oct 27, 2011, at 7:31 PM, Matthew Brett wrote: > Hi, > > On Tue, Oct 25, 2011 at 7:56 PM, Travis Oliphant > wrote: >> So, I am very interested in making sure I remember the details of the >> counterproposal.What I recall is that you wanted to be able to >> differentiate between a "bit-pattern" mask and a boolean-array mask in the >> API. I believe currently even when bit-pattern masks are implemented the >> difference will be "hidden" from the user on the Python level. >> >> I am sure to be missing other parts of the discussion as I have been in and >> out of it. > > The ideas > -- > > The question that we were addressing in the alter-NEP was: should > missing values implemented as bitpatterns appear to be the same as > missing values implemented with masks? We said no, and Mark said yes. > > To restate the argument in brief; Nathaniel and I and some others > thought that there were two separable ideas in play: > > 1) A value that is finally and completely missing. == ABSENT > 2) A value that we would like to ignore for the moment but might want > back at some future time == IGNORED > > (I'm using the adjectives ABSENT and IGNORED here to be short for the > objects 'absent value' and 'ignored value'. This is to distinguish > from the verbs below). > > We thought bitpatterns were a good match for the former, and masking > was a good match for the latter. > > We all agreed there were two things you might like to do with values > that were missing in both senses above: > > A) PROPAGATE; V + 1 == V > B) SKIP; K + 1 == 1 > > (Note verbs for the behaviors). > > I believe the original np.ma masked arrays always SKIP. > > In [2]: a = np.ma.masked_array? > In [3]: a = np.ma.masked_array([99, 2], mask=[True, False]) > In [4]: a > Out[4]: > masked_array(data = [-- 2], > mask = [ True False], > fill_value = 99) > In [5]: a.sum() > Out[5]: 2 > > There was some discussion as to whether there was a reason to think > that ABSENT should always or by default PROPAGATE, and IGNORED should > always or by default SKIP. Chuck is referring to this idea when he > said further up this thread: > >> For instance, I'm thinking skipna=1 is the natural default for the masked >> arrays. > > The current implementation > --- > > What we have now is an implementation of masked arrays, but more > tightly integrated into the numpy core. In our language we have an > implementation of IGNORED that is tuned to be nearly indistinguishable > from the behavior we are expecting of ABSENT. > > Specifically, once you have done this: > > In [9]: a = np.array([99, 2], maskna=True) > > you can get something representing the mask: > > In [11]: np.isna(a) > Out[11]: array([False, False], dtype=bool) > > but I believe there is no way of setting the mask directly. In order > to set the mask, you have to do what looks like an assignment: > > In [12]: a[0] = np.NA > In [14]: a > Out[14]: array([NA, 2]) > > In fact, what has happened is the mask has changed, but the underlying > value has not: > > In [18]: orig = np.array([99, 2]) > > In [19]: a = orig.view(maskna=True) > > In [20]: a[0] = np.NA > > In [21]: a > Out[21]: array([NA, 2]) > > In [22]: orig > Out[22]: array([99, 2]) > > This is different from real assignment: > > In [23]: a[0] = 0 > > In [24]: a > Out[24]: array([0, 2], maskna=True) > > In [25]: orig > Out[25]: array([0, 2]) > > Some effort has gone into making it difficult to pull off the mask: > > In [30]: a.view(np.int64) > Out[30]: array([NA, 2]) > > In [31]: a.view(np.int64).flags > Out[31]: > C_CONTIGUOUS : True > F_CONTIGUOUS : True > OWNDATA : False > MASKNA : True > OWNMASKNA : False > WRITEABLE : True > ALIGNED : True > UPDATEIFCOPY : False > > In [32]: a.astype(np.int64) > --- > ValueErrorTraceback (most recent call last) > /home/mb312/ in () > > 1 a.astype(np.int64) > > ValueError: Cannot assign NA to an array which does not support NAs > > The default behavior of the masked values is PROPAGATE, but they can > be individually made to SKIP: > > In [28]: a.sum() # PROPAGATE > Out[28]: NA(dtype='int64') > > In [29]: a.sum(skipna=True) # SKIP > Out[29]: 2 > > Where's the beef? > - > > I personally still think that it is confusing to fuse the concept of: > > 1) Masked arrays > 2) Arrays with bitpattern codes for missing > > and the concepts of > > A) ABSENT and > B) IGNORED > > Consequences for current code > -
Re: [Numpy-discussion] NA masks in the next numpy release?
Hi, On Tue, Oct 25, 2011 at 7:56 PM, Travis Oliphant wrote: > So, I am very interested in making sure I remember the details of the > counterproposal. What I recall is that you wanted to be able to > differentiate between a "bit-pattern" mask and a boolean-array mask in the > API. I believe currently even when bit-pattern masks are implemented the > difference will be "hidden" from the user on the Python level. > > I am sure to be missing other parts of the discussion as I have been in and > out of it. The ideas -- The question that we were addressing in the alter-NEP was: should missing values implemented as bitpatterns appear to be the same as missing values implemented with masks? We said no, and Mark said yes. To restate the argument in brief; Nathaniel and I and some others thought that there were two separable ideas in play: 1) A value that is finally and completely missing. == ABSENT 2) A value that we would like to ignore for the moment but might want back at some future time == IGNORED (I'm using the adjectives ABSENT and IGNORED here to be short for the objects 'absent value' and 'ignored value'. This is to distinguish from the verbs below). We thought bitpatterns were a good match for the former, and masking was a good match for the latter. We all agreed there were two things you might like to do with values that were missing in both senses above: A) PROPAGATE; V + 1 == V B) SKIP; K + 1 == 1 (Note verbs for the behaviors). I believe the original np.ma masked arrays always SKIP. In [2]: a = np.ma.masked_array? In [3]: a = np.ma.masked_array([99, 2], mask=[True, False]) In [4]: a Out[4]: masked_array(data = [-- 2], mask = [ True False], fill_value = 99) In [5]: a.sum() Out[5]: 2 There was some discussion as to whether there was a reason to think that ABSENT should always or by default PROPAGATE, and IGNORED should always or by default SKIP. Chuck is referring to this idea when he said further up this thread: > For instance, I'm thinking skipna=1 is the natural default for the masked > arrays. The current implementation --- What we have now is an implementation of masked arrays, but more tightly integrated into the numpy core. In our language we have an implementation of IGNORED that is tuned to be nearly indistinguishable from the behavior we are expecting of ABSENT. Specifically, once you have done this: In [9]: a = np.array([99, 2], maskna=True) you can get something representing the mask: In [11]: np.isna(a) Out[11]: array([False, False], dtype=bool) but I believe there is no way of setting the mask directly. In order to set the mask, you have to do what looks like an assignment: In [12]: a[0] = np.NA In [14]: a Out[14]: array([NA, 2]) In fact, what has happened is the mask has changed, but the underlying value has not: In [18]: orig = np.array([99, 2]) In [19]: a = orig.view(maskna=True) In [20]: a[0] = np.NA In [21]: a Out[21]: array([NA, 2]) In [22]: orig Out[22]: array([99, 2]) This is different from real assignment: In [23]: a[0] = 0 In [24]: a Out[24]: array([0, 2], maskna=True) In [25]: orig Out[25]: array([0, 2]) Some effort has gone into making it difficult to pull off the mask: In [30]: a.view(np.int64) Out[30]: array([NA, 2]) In [31]: a.view(np.int64).flags Out[31]: C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : False MASKNA : True OWNMASKNA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False In [32]: a.astype(np.int64) --- ValueErrorTraceback (most recent call last) /home/mb312/ in () > 1 a.astype(np.int64) ValueError: Cannot assign NA to an array which does not support NAs The default behavior of the masked values is PROPAGATE, but they can be individually made to SKIP: In [28]: a.sum() # PROPAGATE Out[28]: NA(dtype='int64') In [29]: a.sum(skipna=True) # SKIP Out[29]: 2 Where's the beef? - I personally still think that it is confusing to fuse the concept of: 1) Masked arrays 2) Arrays with bitpattern codes for missing and the concepts of A) ABSENT and B) IGNORED Consequences for current code Specifically, it still seems to me to make sense to prefer this: >> a = np.array([99, 2[, masking=True) >> a.mask [ True, True ] >> a.sum() 101 >> a.mask[0] = False >> a.sum() 2 It might make sense, as Chuck suggests, to change the default to 'skipna=True', and I'd further suggest renaming np.NA to np.IGNORED and 'skipna' to skipignored' for clarity. I still think the pseudo-assignment: In [20]: a[0] = np.NA is confusing, and should be removed. Later, should we ever have bitpatterns, there would be something like np.ABSENT. This of course would make sense for assignment: In [20]: a[0] = np.ABSENT There would be another keyword argument 'skipabsent=F
Re: [Numpy-discussion] NA masks in the next numpy release?
Hi, On Tue, Oct 25, 2011 at 7:56 PM, Travis Oliphant wrote: > So, I am very interested in making sure I remember the details of the > counterproposal. What I recall is that you wanted to be able to > differentiate between a "bit-pattern" mask and a boolean-array mask in the > API. I believe currently even when bit-pattern masks are implemented the > difference will be "hidden" from the user on the Python level. > > I am sure to be missing other parts of the discussion as I have been in and > out of it. Nathaniel - are you online today? Do you have time to review the current implementation and see if it affects the initial discussion? I'm running around most of today but I should have time to do some thinking later this afternoon CA time. See you, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
There is also: Missing/accumulating data http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057406.html An NA compromise idea -- many-NA http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057408.html NEPaNEP lessons - was: alterNEP http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057435.html NA/Missing Data Conference Call Summary http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057474.html HPC missing data - was: NA/Missing Data Conference Call Summary http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057482.html using the same vocabulary for missing value ideas http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057485.html towards a more productive missing values/masked arrays discussion... http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057511.html miniNEP1: where= argument for ufuncs http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057513.html miniNEP 2: NA support via special dtypes http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057542.html Missing Data development plan http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057567.html Missing Values Discussion http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057579.html NA masks for NumPy are ready to test http://mail.scipy.org/pipermail/numpy-discussion/2011-August/058103.html ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
On 10/25/2011 04:56 PM, Travis Oliphant wrote: > So, I am very interested in making sure I remember the details of the > counterproposal.What I recall is that you wanted to be able to > differentiate between a "bit-pattern" mask and a boolean-array mask > in the API. I believe currently even when bit-pattern masks are > implemented the difference will be "hidden" from the user on the > Python level. > > I am sure to be missing other parts of the discussion as I have been > in and out of it. > > Thanks, > > -Travis The alternative-NEP is here: https://gist.github.com/1056379/ One thread of discussion is here: http://www.mail-archive.com/numpy-discussion@scipy.org/msg32268.html and continued here: http://www.mail-archive.com/numpy-discussion@scipy.org/msg32371.html Eric ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
So, I am very interested in making sure I remember the details of the counterproposal.What I recall is that you wanted to be able to differentiate between a "bit-pattern" mask and a boolean-array mask in the API. I believe currently even when bit-pattern masks are implemented the difference will be "hidden" from the user on the Python level. I am sure to be missing other parts of the discussion as I have been in and out of it. Thanks, -Travis On Oct 25, 2011, at 7:02 PM, Matthew Brett wrote: > Hi, > > Thank you for your gracious email. > > On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant > wrote: >> It is a shame that Nathaniel and perhaps Matthew do not feel like their >> voice was heard. I wish I could have participated more fully in some of >> the discussions. I don't know if I could have really helped, but I would >> have liked to have tried to perhaps work alongside Mark to integrate some of >> the other ideas that had been expressed during the discussion. >> Unfortunately, I was traveling in NYC most of the time that Mark was >> working on this project and did not get a chance to interact with him as >> much as I would have liked. >> My view is that we didn't get quite to where I thought we would get, nor >> where I think we could be. I think Nathaniel and Matthew provided very >> specific feedback that was helpful in understanding other perspectives of a >> difficult problem. In particular, I really wanted bit-patterns >> implemented.However, I also understand that Mark did quite a bit of work >> and altered his original designs quite a bit in response to community >> feedback. I wasn't a major part of the pull request discussion, nor did I >> merge the changes, but I support Charles if he reviewed the code and felt >> like it was the right thing to do. I likely would have done the same thing >> rather than let Mark Wiebe's work languish. >> Merging Mark's code does not mean there is not more work to be done, but it >> is consistent with the reality that currently development on NumPy happens >> when people have the time to do it.I have not seen anything to convince >> me that there is not still time to make specific API changes that address >> some of the concerns. >> Perhaps, Nathaniel and or Matthew could summarize their concerns again and >> if desired submit a pull request to revert the changes. However, there is >> a definite bias against removing working code unless the arguments are very >> strong and receive a lot of support from others. > > Honestly - I am not sure whether there is any interest now, in the > arguments we made before. If there is, who is interested? I mean, > past politeness. > > I wasn't trying to restart that discussion, because I didn't know what > good it could do. At first I was hoping that we could ask whether > there was a better way of dealing with disagreements like this. > Later it seemed to me that the atmosphere was getting bad, and I > wanted to say that because I thought it was important. > >> Thank you for continuing to voice your opinions even when it may feel that >> the tide is against you. My view is that we only learn from people who >> disagree with us. > > Thank you for saying that. I hope that y'all will tell me if I am > making it harder for you to disagree, and I am sorry if I did so > here. > > Best, > > Matthew > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion --- Travis Oliphant Enthought, Inc. oliph...@enthought.com 1-512-536-1057 http://www.enthought.com ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
Hi, Thank you for your gracious email. On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant wrote: > It is a shame that Nathaniel and perhaps Matthew do not feel like their > voice was heard. I wish I could have participated more fully in some of > the discussions. I don't know if I could have really helped, but I would > have liked to have tried to perhaps work alongside Mark to integrate some of > the other ideas that had been expressed during the discussion. > Unfortunately, I was traveling in NYC most of the time that Mark was > working on this project and did not get a chance to interact with him as > much as I would have liked. > My view is that we didn't get quite to where I thought we would get, nor > where I think we could be. I think Nathaniel and Matthew provided very > specific feedback that was helpful in understanding other perspectives of a > difficult problem. In particular, I really wanted bit-patterns > implemented. However, I also understand that Mark did quite a bit of work > and altered his original designs quite a bit in response to community > feedback. I wasn't a major part of the pull request discussion, nor did I > merge the changes, but I support Charles if he reviewed the code and felt > like it was the right thing to do. I likely would have done the same thing > rather than let Mark Wiebe's work languish. > Merging Mark's code does not mean there is not more work to be done, but it > is consistent with the reality that currently development on NumPy happens > when people have the time to do it. I have not seen anything to convince > me that there is not still time to make specific API changes that address > some of the concerns. > Perhaps, Nathaniel and or Matthew could summarize their concerns again and > if desired submit a pull request to revert the changes. However, there is > a definite bias against removing working code unless the arguments are very > strong and receive a lot of support from others. Honestly - I am not sure whether there is any interest now, in the arguments we made before. If there is, who is interested? I mean, past politeness. I wasn't trying to restart that discussion, because I didn't know what good it could do. At first I was hoping that we could ask whether there was a better way of dealing with disagreements like this. Later it seemed to me that the atmosphere was getting bad, and I wanted to say that because I thought it was important. > Thank you for continuing to voice your opinions even when it may feel that > the tide is against you. My view is that we only learn from people who > disagree with us. Thank you for saying that. I hope that y'all will tell me if I am making it harder for you to disagree, and I am sorry if I did so here. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
It is a shame that Nathaniel and perhaps Matthew do not feel like their voice was heard. I wish I could have participated more fully in some of the discussions. I don't know if I could have really helped, but I would have liked to have tried to perhaps work alongside Mark to integrate some of the other ideas that had been expressed during the discussion. Unfortunately, I was traveling in NYC most of the time that Mark was working on this project and did not get a chance to interact with him as much as I would have liked. My view is that we didn't get quite to where I thought we would get, nor where I think we could be. I think Nathaniel and Matthew provided very specific feedback that was helpful in understanding other perspectives of a difficult problem. In particular, I really wanted bit-patterns implemented. However, I also understand that Mark did quite a bit of work and altered his original designs quite a bit in response to community feedback. I wasn't a major part of the pull request discussion, nor did I merge the changes, but I support Charles if he reviewed the code and felt like it was the right thing to do. I likely would have done the same thing rather than let Mark Wiebe's work languish. Merging Mark's code does not mean there is not more work to be done, but it is consistent with the reality that currently development on NumPy happens when people have the time to do it.I have not seen anything to convince me that there is not still time to make specific API changes that address some of the concerns. Perhaps, Nathaniel and or Matthew could summarize their concerns again and if desired submit a pull request to revert the changes. However, there is a definite bias against removing working code unless the arguments are very strong and receive a lot of support from others. Thank you for continuing to voice your opinions even when it may feel that the tide is against you. My view is that we only learn from people who disagree with us. Best regards, -Travis On Oct 25, 2011, at 1:24 PM, Benjamin Root wrote: > On Tue, Oct 25, 2011 at 1:03 PM, Matthew Brett > wrote: > Hi, > > On Tue, Oct 25, 2011 at 8:04 AM, Lluís wrote: > > Matthew Brett writes: > >> I'm afraid I find this whole thread very unpleasant. > > > >> I have the odd impression of being back at high school. Some of the > >> big kids are pushing me around and then the other kids join in. > > > >> It didn't have to be this way. > > > >> Someone could have replied like this to Nathaniel: > > > >> "Oh - yes - I'm sorry - we actually had the discussion on the pull > >> request. Looking back, I see that we didn't flag this up on the > >> mailing list and maybe we should have. Thanks for pointing that out. > >> Maybe we could start another discussion of the API in view of the > >> changes that have gone in". > > > >> But that didn't happen. > > > > Well, I really thought that all the interested parties would take a look at > > [1]. > > > > While it's true that the pull requests are not obvious if you're not using > > the > > functionalities of the github web (or unless announced in this list), I > > think > > that Mark's announcement was precisely directed at having a new round of > > discussions after having some code to play around with and see how > > intuitive or > > counter-intuitive the implemented concepts could be. > > I just wanted to be clear what I meant. > > The key point is not whether or not the pull-request or request for > testing was in fact the right place for the discussion that Travis > suggested. I guess you can argue that either way. I'd say no, but > I can see how you would disagree on that. > > > This is getting very meta... a disagreement about the disagreement. > > The key point is - how much do we value constructive disagreement? > > > Personally, I value it very much. My impression of the discussion we all had > at the beginning was that the needs of the two distinct communities (R-users > and masked array users) were both heard and largely addressed. Aspects of > both approaches were used, and the final result is, IMHO, inspired and > elegant. Is it perfect? No. Are there ways to improve it? Absolutely, and I > fully expect that to happen. > > If we do value constructive disagreement then we'll go out of our way > to talk through the points of contention, and make sure that the > people who disagree, especially the minority, feel that they have been > fully heard. > > If we don't value constructive disagreement then we'll let the other > side know that further disagreement will be taken as a sign of bad > faith. > > Now - what do you see here? I see the second and that worries me. > > > It is disappointing that you choose not to participate in the thread linked > above or in the pull request itself. If I remember correctly, you were > working on finishing up your dissertation, so I fully understand the time >
Re: [Numpy-discussion] NA masks in the next numpy release?
Matthew Brett writes: [...] >>> If we do value constructive disagreement then we'll go out of our way >>> to talk through the points of contention, and make sure that the >>> people who disagree, especially the minority, feel that they have been >>> fully heard. >>> >>> If we don't value constructive disagreement then we'll let the other >>> side know that further disagreement will be taken as a sign of bad >>> faith. >>> >>> Now - what do you see here? I see the second and that worries me. >>> >> >> It is disappointing that you choose not to participate in the thread linked >> above or in the pull request itself. If I remember correctly, you were >> working on finishing up your dissertation, so I fully understand the time >> constraints involved there. However, the pull request and the email >> notification is the de facto method of staging and discussing changes in any >> development project. No objections were raised in that pull request, so it >> went in after some time passed. To hold off the merge, all one would need >> to do is fire off a quick comment requesting a delay to have a chance to >> review the pull request. > I think the pull-request was not the right vehicle for the discussion, > you think it was, that's fine, I don't think we need to rehearse that. > My question (if you are answering my question) is: if you put yourself > in my or Nathaniel's shoes, would you feel that you had been warmly > encouraged to express disagreement, or would you feel something else. I sense (bear with me, my senses are not very sharp) that you feel your concerns have not been addressed, and thus the sensation that features you disagreed upon were sneaked through a silent pull request. And yes, the initial discussions were too heated on some moments (me included), but that does not imply that the current state is ignoring the concerns everybody raised. >> Luckily, git is a VCS, so we are fully capable of reverting any necessary >> changes if warranted. If you have any concerns or suggestions for changes >> in the current implementation, feel free to raise them and open additional >> pull requests. There is no "ganging up" here or any other subterfuge. Tell >> us exactly what are your issues with the current setup, provide example code >> demonstrating the issues, and we can certainly discuss ways to improve this. > Has the situation changed since the counter-NEP that Nathaniel and I wrote up? I couldn't find the link, but AFAIR the main concerns were: - Using bit patterns as a more efficient missing data mechanism that is compatible with third-party binary libraries. As the NEP says, although not implemented (due to lack of time), bit patterns are a desirable extension that will be able to coexist with masks while providing a single and consistent Python and C API for both bit patterns and masks. - Being able to expose the non-destructive nature of masks. There is only one very specific path leading to such behaviour [1], so users not interested in it should never inadvertently fall into its use (aka, they don't even need to know about it). [1] http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html#creating-na-masked-views If we agree that it is reasonable to think that the concerns in the "counter-NEP" have been addressed in the current implementation, then I think it is not unreasonable to take the silence to Mark's mail and the pull request as a green light. Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
Hi, On Tue, Oct 25, 2011 at 11:24 AM, Benjamin Root wrote: > On Tue, Oct 25, 2011 at 1:03 PM, Matthew Brett > wrote: >> >> Hi, >> >> On Tue, Oct 25, 2011 at 8:04 AM, Lluís wrote: >> > Matthew Brett writes: >> >> I'm afraid I find this whole thread very unpleasant. >> > >> >> I have the odd impression of being back at high school. Some of the >> >> big kids are pushing me around and then the other kids join in. >> > >> >> It didn't have to be this way. >> > >> >> Someone could have replied like this to Nathaniel: >> > >> >> "Oh - yes - I'm sorry - we actually had the discussion on the pull >> >> request. Looking back, I see that we didn't flag this up on the >> >> mailing list and maybe we should have. Thanks for pointing that out. >> >> Maybe we could start another discussion of the API in view of the >> >> changes that have gone in". >> > >> >> But that didn't happen. >> > >> > Well, I really thought that all the interested parties would take a look >> > at [1]. >> > >> > While it's true that the pull requests are not obvious if you're not >> > using the >> > functionalities of the github web (or unless announced in this list), I >> > think >> > that Mark's announcement was precisely directed at having a new round of >> > discussions after having some code to play around with and see how >> > intuitive or >> > counter-intuitive the implemented concepts could be. >> >> I just wanted to be clear what I meant. >> >> The key point is not whether or not the pull-request or request for >> testing was in fact the right place for the discussion that Travis >> suggested. I guess you can argue that either way. I'd say no, but >> I can see how you would disagree on that. >> > > This is getting very meta... a disagreement about the disagreement. Yes, the important point is a social one. The other points are details. >> The key point is - how much do we value constructive disagreement? >> > > Personally, I value it very much. Well - I think everyone believes that that they value constructive discussion, but the question is, what happens when people really disagree? > My impression of the discussion we all > had at the beginning was that the needs of the two distinct communities > (R-users and masked array users) were both heard and largely addressed. > Aspects of both approaches were used, and the final result is, IMHO, > inspired and elegant. Is it perfect? No. Are there ways to improve it? > Absolutely, and I fully expect that to happen. To be clear once more, I personally feel we don't need to discuss: 1) Whether Mark did a good job on the code (I have high bias to imagine so). 2) Whether something along these lines would be good to have in numpy >> If we do value constructive disagreement then we'll go out of our way >> to talk through the points of contention, and make sure that the >> people who disagree, especially the minority, feel that they have been >> fully heard. >> >> If we don't value constructive disagreement then we'll let the other >> side know that further disagreement will be taken as a sign of bad >> faith. >> >> Now - what do you see here? I see the second and that worries me. >> > > It is disappointing that you choose not to participate in the thread linked > above or in the pull request itself. If I remember correctly, you were > working on finishing up your dissertation, so I fully understand the time > constraints involved there. However, the pull request and the email > notification is the de facto method of staging and discussing changes in any > development project. No objections were raised in that pull request, so it > went in after some time passed. To hold off the merge, all one would need > to do is fire off a quick comment requesting a delay to have a chance to > review the pull request. I think the pull-request was not the right vehicle for the discussion, you think it was, that's fine, I don't think we need to rehearse that. My question (if you are answering my question) is: if you put yourself in my or Nathaniel's shoes, would you feel that you had been warmly encouraged to express disagreement, or would you feel something else. > Luckily, git is a VCS, so we are fully capable of reverting any necessary > changes if warranted. If you have any concerns or suggestions for changes > in the current implementation, feel free to raise them and open additional > pull requests. There is no "ganging up" here or any other subterfuge. Tell > us exactly what are your issues with the current setup, provide example code > demonstrating the issues, and we can certainly discuss ways to improve this. Has the situation changed since the counter-NEP that Nathaniel and I wrote up? > Remember, we *all* have a common agreement here. NumPy needs better support > for missing data (in whatever form). Let's work from that assumption and > make NumPy a better library to use for everybody! I remember walking past a church in a small town in the California desert. It ha
Re: [Numpy-discussion] NA masks in the next numpy release?
On Tue, Oct 25, 2011 at 1:03 PM, Matthew Brett wrote: > Hi, > > On Tue, Oct 25, 2011 at 8:04 AM, Lluís wrote: > > Matthew Brett writes: > >> I'm afraid I find this whole thread very unpleasant. > > > >> I have the odd impression of being back at high school. Some of the > >> big kids are pushing me around and then the other kids join in. > > > >> It didn't have to be this way. > > > >> Someone could have replied like this to Nathaniel: > > > >> "Oh - yes - I'm sorry - we actually had the discussion on the pull > >> request. Looking back, I see that we didn't flag this up on the > >> mailing list and maybe we should have. Thanks for pointing that out. > >> Maybe we could start another discussion of the API in view of the > >> changes that have gone in". > > > >> But that didn't happen. > > > > Well, I really thought that all the interested parties would take a look > at [1]. > > > > While it's true that the pull requests are not obvious if you're not > using the > > functionalities of the github web (or unless announced in this list), I > think > > that Mark's announcement was precisely directed at having a new round of > > discussions after having some code to play around with and see how > intuitive or > > counter-intuitive the implemented concepts could be. > > I just wanted to be clear what I meant. > > The key point is not whether or not the pull-request or request for > testing was in fact the right place for the discussion that Travis > suggested. I guess you can argue that either way. I'd say no, but > I can see how you would disagree on that. > > This is getting very meta... a disagreement about the disagreement. > The key point is - how much do we value constructive disagreement? > > Personally, I value it very much. My impression of the discussion we all had at the beginning was that the needs of the two distinct communities (R-users and masked array users) were both heard and largely addressed. Aspects of both approaches were used, and the final result is, IMHO, inspired and elegant. Is it perfect? No. Are there ways to improve it? Absolutely, and I fully expect that to happen. > If we do value constructive disagreement then we'll go out of our way > to talk through the points of contention, and make sure that the > people who disagree, especially the minority, feel that they have been > fully heard. > > If we don't value constructive disagreement then we'll let the other > side know that further disagreement will be taken as a sign of bad > faith. > > Now - what do you see here? I see the second and that worries me. > > It is disappointing that you choose not to participate in the thread linked above or in the pull request itself. If I remember correctly, you were working on finishing up your dissertation, so I fully understand the time constraints involved there. However, the pull request and the email notification is the de facto method of staging and discussing changes in any development project. No objections were raised in that pull request, so it went in after some time passed. To hold off the merge, all one would need to do is fire off a quick comment requesting a delay to have a chance to review the pull request. Luckily, git is a VCS, so we are fully capable of reverting any necessary changes if warranted. If you have any concerns or suggestions for changes in the current implementation, feel free to raise them and open additional pull requests. There is no "ganging up" here or any other subterfuge. Tell us exactly what are your issues with the current setup, provide example code demonstrating the issues, and we can certainly discuss ways to improve this. Remember, we *all* have a common agreement here. NumPy needs better support for missing data (in whatever form). Let's work from that assumption and make NumPy a better library to use for everybody! Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
Hi, On Tue, Oct 25, 2011 at 8:04 AM, Lluís wrote: > Matthew Brett writes: >> I'm afraid I find this whole thread very unpleasant. > >> I have the odd impression of being back at high school. Some of the >> big kids are pushing me around and then the other kids join in. > >> It didn't have to be this way. > >> Someone could have replied like this to Nathaniel: > >> "Oh - yes - I'm sorry - we actually had the discussion on the pull >> request. Looking back, I see that we didn't flag this up on the >> mailing list and maybe we should have. Thanks for pointing that out. >> Maybe we could start another discussion of the API in view of the >> changes that have gone in". > >> But that didn't happen. > > Well, I really thought that all the interested parties would take a look at > [1]. > > While it's true that the pull requests are not obvious if you're not using the > functionalities of the github web (or unless announced in this list), I think > that Mark's announcement was precisely directed at having a new round of > discussions after having some code to play around with and see how intuitive > or > counter-intuitive the implemented concepts could be. I just wanted to be clear what I meant. The key point is not whether or not the pull-request or request for testing was in fact the right place for the discussion that Travis suggested. I guess you can argue that either way. I'd say no, but I can see how you would disagree on that. The key point is - how much do we value constructive disagreement? If we do value constructive disagreement then we'll go out of our way to talk through the points of contention, and make sure that the people who disagree, especially the minority, feel that they have been fully heard. If we don't value constructive disagreement then we'll let the other side know that further disagreement will be taken as a sign of bad faith. Now - what do you see here? I see the second and that worries me. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
Matthew Brett writes: > I'm afraid I find this whole thread very unpleasant. > I have the odd impression of being back at high school. Some of the > big kids are pushing me around and then the other kids join in. > It didn't have to be this way. > Someone could have replied like this to Nathaniel: > "Oh - yes - I'm sorry - we actually had the discussion on the pull > request. Looking back, I see that we didn't flag this up on the > mailing list and maybe we should have. Thanks for pointing that out. > Maybe we could start another discussion of the API in view of the > changes that have gone in". > But that didn't happen. Well, I really thought that all the interested parties would take a look at [1]. While it's true that the pull requests are not obvious if you're not using the functionalities of the github web (or unless announced in this list), I think that Mark's announcement was precisely directed at having a new round of discussions after having some code to play around with and see how intuitive or counter-intuitive the implemented concepts could be. [1] http://old.nabble.com/NA-masks-for-NumPy-are-ready-to-test-td32291024.html Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
Hi, On Mon, Oct 24, 2011 at 11:44 PM, Han Genuit wrote: > Well, if I may have a say, I think that an open source project is > especially open when users as developers can contribute to the code > base and can participate in discussions on how to improve the existing > designs and ideas. I do not think a project is open when it crumbles > down into politics.. I have seen a lot of work done by Mark especially > to ensure that everyone had a say in what he was doing, up to the > point where this might not be fun anymore. And from what I can see at > the time, which was back in August, everyone has had plenty of > opportunity to discuss or contribute to the specific changes that were > made. > > This was an open contribution to the NumPy code, not some cooked up > shady business by high and mighty developers and I, for one, am happy > with how it turned out. I'm afraid I find this whole thread very unpleasant. I have the odd impression of being back at high school. Some of the big kids are pushing me around and then the other kids join in. It didn't have to be this way. Someone could have replied like this to Nathaniel: "Oh - yes - I'm sorry - we actually had the discussion on the pull request. Looking back, I see that we didn't flag this up on the mailing list and maybe we should have. Thanks for pointing that out. Maybe we could start another discussion of the API in view of the changes that have gone in". But that didn't happen. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
Well, if I may have a say, I think that an open source project is especially open when users as developers can contribute to the code base and can participate in discussions on how to improve the existing designs and ideas. I do not think a project is open when it crumbles down into politics.. I have seen a lot of work done by Mark especially to ensure that everyone had a say in what he was doing, up to the point where this might not be fun anymore. And from what I can see at the time, which was back in August, everyone has had plenty of opportunity to discuss or contribute to the specific changes that were made. This was an open contribution to the NumPy code, not some cooked up shady business by high and mighty developers and I, for one, am happy with how it turned out. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
Charles R Harris writes: [...] > It might useful to have a way of setting global defaults, or something like a > with statement. These are the sort of things that can be adjusted based on > experience. For instance, I'm thinking skipna=1 is the natural default for the > masked arrays. I already raised this concern during the initial discussions, and Mark came up with nice solution. Instead of having an additional stateful global interface that code would have to check in addition to the "skipna" argument, you can have a simple function that takes and/or constructs an ndarray and redefines its ufunc wrapper to always set the "skipna = True" argument. Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
On Mon, Oct 24, 2011 at 11:12 AM, Wes McKinney wrote: > On Mon, Oct 24, 2011 at 10:54 AM, Charles R Harris > wrote: > > > > > > On Mon, Oct 24, 2011 at 8:40 AM, Charles R Harris > > wrote: > >> > >> > >> On Sun, Oct 23, 2011 at 11:23 PM, Wes McKinney > >> wrote: > >>> > >>> On Sun, Oct 23, 2011 at 8:07 PM, Eric Firing > wrote: > >>> > On 10/23/2011 12:34 PM, Nathaniel Smith wrote: > >>> > > >>> >> like. And in this case I do think we can come up with an API that > will > >>> >> make everyone happy, but that Mark's current API probably can't be > >>> >> incrementally evolved to become that API.) > >>> >> > >>> > > >>> > No one could object to coming up with an API that makes everyone > happy, > >>> > provided that it actually gets coded up, tested, and is found to be > >>> > fast > >>> > and maintainable. When you say the API probably can't be evolved, do > >>> > you mean that the underlying implementation also has to be redone? > And > >>> > if so, who will do it, and when? > >>> > > >>> > Eric > >>> > ___ > >>> > NumPy-Discussion mailing list > >>> > NumPy-Discussion@scipy.org > >>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >>> > > >>> > >>> I personally am a bit apprehensive as I am worried about the masked > >>> array abstraction "leaking" through to users of pandas, something > >>> which I simply will not accept (why I decided against using numpy.ma > >>> early on, that + performance problems). Basically if having an > >>> understanding of masked arrays is a prerequisite for using pandas, the > >>> whole thing is DOA to me as it undermines the usability arguments I've > >>> been making about switching to Python (from R) for data analysis and > >>> statistical computing. > >> > >> The missing data functionality looks far more like R than numpy.ma. > >> > > > > For instance > > > > In [8]: a = arange(5, maskna=1) > > > > In [9]: a[2] = np.NA > > > > In [10]: a.mean() > > Out[10]: NA(dtype='float64') > > > > In [11]: a.mean(skipna=1) > > Out[11]: 2.0 > > > > In [12]: a = arange(5) > > > > In [13]: b = a.view(maskna=1) > > > > In [14]: a.mean() > > Out[14]: 2.0 > > > > In [15]: b[2] = np.NA > > > > In [16]: b.mean() > > Out[16]: NA(dtype='float64') > > > > In [17]: b.mean(skipna=1) > > Out[17]: 2.0 > > > > Chuck > > > > ___ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > I don't really agree with you. > > some sample R code > > > arr <- rnorm(10) > > arr[5:8] <- NA > > arr > [1] 0.6451460 -1.1285552 0.6869828 0.4018868 NA NA > [7] NA NA 0.3322803 -1.9201257 > > In your examples you had to pass maskna=True-- I suppose that my only > recourse would be to make sure that every array inside a DataFrame, > for example, has maskna=True set. I'll have to look in more detail and > see if it's feasible/desirable. There's a memory cost to pay, but you > can't get the functionality for free. I may just end up sticking with > NaN as it's worked pretty well so far the last few years-- it's an > impure solution but one with reasonably good performance > characteristics in the places that matter. > It might useful to have a way of setting global defaults, or something like a with statement. These are the sort of things that can be adjusted based on experience. For instance, I'm thinking skipna=1 is the natural default for the masked arrays. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
On Mon, Oct 24, 2011 at 10:54 AM, Charles R Harris wrote: > > > On Mon, Oct 24, 2011 at 8:40 AM, Charles R Harris > wrote: >> >> >> On Sun, Oct 23, 2011 at 11:23 PM, Wes McKinney >> wrote: >>> >>> On Sun, Oct 23, 2011 at 8:07 PM, Eric Firing wrote: >>> > On 10/23/2011 12:34 PM, Nathaniel Smith wrote: >>> > >>> >> like. And in this case I do think we can come up with an API that will >>> >> make everyone happy, but that Mark's current API probably can't be >>> >> incrementally evolved to become that API.) >>> >> >>> > >>> > No one could object to coming up with an API that makes everyone happy, >>> > provided that it actually gets coded up, tested, and is found to be >>> > fast >>> > and maintainable. When you say the API probably can't be evolved, do >>> > you mean that the underlying implementation also has to be redone? And >>> > if so, who will do it, and when? >>> > >>> > Eric >>> > ___ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion@scipy.org >>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> > >>> >>> I personally am a bit apprehensive as I am worried about the masked >>> array abstraction "leaking" through to users of pandas, something >>> which I simply will not accept (why I decided against using numpy.ma >>> early on, that + performance problems). Basically if having an >>> understanding of masked arrays is a prerequisite for using pandas, the >>> whole thing is DOA to me as it undermines the usability arguments I've >>> been making about switching to Python (from R) for data analysis and >>> statistical computing. >> >> The missing data functionality looks far more like R than numpy.ma. >> > > For instance > > In [8]: a = arange(5, maskna=1) > > In [9]: a[2] = np.NA > > In [10]: a.mean() > Out[10]: NA(dtype='float64') > > In [11]: a.mean(skipna=1) > Out[11]: 2.0 > > In [12]: a = arange(5) > > In [13]: b = a.view(maskna=1) > > In [14]: a.mean() > Out[14]: 2.0 > > In [15]: b[2] = np.NA > > In [16]: b.mean() > Out[16]: NA(dtype='float64') > > In [17]: b.mean(skipna=1) > Out[17]: 2.0 > > Chuck > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > I don't really agree with you. some sample R code > arr <- rnorm(10) > arr[5:8] <- NA > arr [1] 0.6451460 -1.1285552 0.6869828 0.4018868 NA NA [7] NA NA 0.3322803 -1.9201257 In your examples you had to pass maskna=True-- I suppose that my only recourse would be to make sure that every array inside a DataFrame, for example, has maskna=True set. I'll have to look in more detail and see if it's feasible/desirable. There's a memory cost to pay, but you can't get the functionality for free. I may just end up sticking with NaN as it's worked pretty well so far the last few years-- it's an impure solution but one with reasonably good performance characteristics in the places that matter. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
24.10.2011 16:40, Charles R Harris kirjoitti: [clip] > The missing data functionality looks far more like R than numpy.ma ... and masked arrays must be explicitly requested by the user [1]. The MA stuff can "leak through" only if the user makes use of a library that returns masked results (or explicitly creates masked arrays), but as far as I understand that's about the same situation as with np.ma. .. [1] http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html -- Pauli Virtanen ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
On Mon, Oct 24, 2011 at 8:40 AM, Charles R Harris wrote: > > > On Sun, Oct 23, 2011 at 11:23 PM, Wes McKinney wrote: > >> On Sun, Oct 23, 2011 at 8:07 PM, Eric Firing wrote: >> > On 10/23/2011 12:34 PM, Nathaniel Smith wrote: >> > >> >> like. And in this case I do think we can come up with an API that will >> >> make everyone happy, but that Mark's current API probably can't be >> >> incrementally evolved to become that API.) >> >> >> > >> > No one could object to coming up with an API that makes everyone happy, >> > provided that it actually gets coded up, tested, and is found to be fast >> > and maintainable. When you say the API probably can't be evolved, do >> > you mean that the underlying implementation also has to be redone? And >> > if so, who will do it, and when? >> > >> > Eric >> > ___ >> > NumPy-Discussion mailing list >> > NumPy-Discussion@scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> >> I personally am a bit apprehensive as I am worried about the masked >> array abstraction "leaking" through to users of pandas, something >> which I simply will not accept (why I decided against using numpy.ma >> early on, that + performance problems). Basically if having an >> understanding of masked arrays is a prerequisite for using pandas, the >> whole thing is DOA to me as it undermines the usability arguments I've >> been making about switching to Python (from R) for data analysis and >> statistical computing. >> > > The missing data functionality looks far more like R than numpy.ma. > > For instance In [8]: a = arange(5, maskna=1) In [9]: a[2] = np.NA In [10]: a.mean() Out[10]: NA(dtype='float64') In [11]: a.mean(skipna=1) Out[11]: 2.0 In [12]: a = arange(5) In [13]: b = a.view(maskna=1) In [14]: a.mean() Out[14]: 2.0 In [15]: b[2] = np.NA In [16]: b.mean() Out[16]: NA(dtype='float64') In [17]: b.mean(skipna=1) Out[17]: 2.0 Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
On Sun, Oct 23, 2011 at 11:23 PM, Wes McKinney wrote: > On Sun, Oct 23, 2011 at 8:07 PM, Eric Firing wrote: > > On 10/23/2011 12:34 PM, Nathaniel Smith wrote: > > > >> like. And in this case I do think we can come up with an API that will > >> make everyone happy, but that Mark's current API probably can't be > >> incrementally evolved to become that API.) > >> > > > > No one could object to coming up with an API that makes everyone happy, > > provided that it actually gets coded up, tested, and is found to be fast > > and maintainable. When you say the API probably can't be evolved, do > > you mean that the underlying implementation also has to be redone? And > > if so, who will do it, and when? > > > > Eric > > ___ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > I personally am a bit apprehensive as I am worried about the masked > array abstraction "leaking" through to users of pandas, something > which I simply will not accept (why I decided against using numpy.ma > early on, that + performance problems). Basically if having an > understanding of masked arrays is a prerequisite for using pandas, the > whole thing is DOA to me as it undermines the usability arguments I've > been making about switching to Python (from R) for data analysis and > statistical computing. > The missing data functionality looks far more like R than numpy.ma. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
Nathaniel Smith writes: [...] > Is the idea to continue the discussion and rework the API while it is in > master, delaying the next release for as long as it takes to achieve > consensus? Well, for those who missed it, I think the first thing to do should be to carefully read and discuss the contents of: https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
On Sun, Oct 23, 2011 at 8:07 PM, Eric Firing wrote: > On 10/23/2011 12:34 PM, Nathaniel Smith wrote: > >> like. And in this case I do think we can come up with an API that will >> make everyone happy, but that Mark's current API probably can't be >> incrementally evolved to become that API.) >> > > No one could object to coming up with an API that makes everyone happy, > provided that it actually gets coded up, tested, and is found to be fast > and maintainable. When you say the API probably can't be evolved, do > you mean that the underlying implementation also has to be redone? And > if so, who will do it, and when? > > Eric > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > I personally am a bit apprehensive as I am worried about the masked array abstraction "leaking" through to users of pandas, something which I simply will not accept (why I decided against using numpy.ma early on, that + performance problems). Basically if having an understanding of masked arrays is a prerequisite for using pandas, the whole thing is DOA to me as it undermines the usability arguments I've been making about switching to Python (from R) for data analysis and statistical computing. Performance is also a concern, but based on prior discussions it seems a great deal can be done there. - Wes ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
On 10/23/2011 12:34 PM, Nathaniel Smith wrote: > like. And in this case I do think we can come up with an API that will > make everyone happy, but that Mark's current API probably can't be > incrementally evolved to become that API.) > No one could object to coming up with an API that makes everyone happy, provided that it actually gets coded up, tested, and is found to be fast and maintainable. When you say the API probably can't be evolved, do you mean that the underlying implementation also has to be redone? And if so, who will do it, and when? Eric ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
On Sun, Oct 23, 2011 at 2:29 PM, Eric Firing wrote: > Ultimately, though, the numpy core developers must decide what goes in > and what does not. Consensus is desirable but may not always be > possible or optimal, especially if "consensus" is interpreted as > "unanimity". There is a risk in deciding to accept a major change, but > it is mitigated by the ability to make future changes, and it is a risk > that must be taken if progress is to be made. As a numpy user, I was > pleased to see Travis make the decision that Mark should get on with the > coding, and I was pleased to see Charles make the decision to merge the > pull request. Well, let's not jump to conclusions -- this is why I wrote an email asking questions in the first place :-). Consensus certainly does not mean unanimity, but yes, of course, sometimes disagreements are irreconcilable. As a benevolent dictator[1] on other projects I've been stuck dealing with some of these myself. But of the two core numpy developers who have been most involved in this, Charles has just stated that he thought there had been more discussion than had actually occurred, and Travis described a "reasonable opposition", so it's not at all clear to me that the core developers have decided that no consensus is possible and they simply have to step in. (And in general, irreconcilable differences are quite rare in FOSS projects... e.g., I remember the Subversion folks set up a voting procedure to handle these cases, and then the only time they used it in like a 5 year period was to settle an argument about code formatting. Insisting on consensus really does mostly work, even though it does often take longer than one would like. And in this case I do think we can come up with an API that will make everyone happy, but that Mark's current API probably can't be incrementally evolved to become that API.) So if there's been an executive decision then I can live with it, but I'd like to see someone say that before I assume it's true. It's just as likely that there was confusion, or Charles jumped the gun, or whatever, and that consensus is still useful and desired in this case. I hope so. [1] https://secure.wikimedia.org/wikipedia/en/wiki/Benevolent_Dictator_For_Life -- Nathaniel ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
On Sun, Oct 23, 2011 at 3:28 PM, Benjamin Root wrote: > > > On Sunday, October 23, 2011, Nathaniel Smith wrote: > > On Sun, Oct 23, 2011 at 12:53 PM, Charles R Harris > > wrote: > >> On Sun, Oct 23, 2011 at 12:54 PM, Matthew Brett < > matthew.br...@gmail.com> > >> wrote: > >>> I think this email might be a plea to the numpy steering group, and to > >>> Travis in particular, to see if we can use a discussion of this series > >>> of events to decide on a good way to proceed in future. > >> > >> Oh come, people had plenty to say, you and Nathaniel in particular. > Mark > >> pointed to the pull request, anyone who was interested could comment on > it, > > > > Ah, this helps answer my initial question -- I can see how you might > > have thought things were more resolved if you thought that we were > > aware of the pull request and chose not to participate. That's a > > reasonable source of confusion. > > > > But I (and presumably others) were unaware of the pull request, > > because it turns out that actually Mark did *not* point to the pull > > request, at least in email to either me or numpy-discussion. As far as > > I can tell, the first time that pull request has ever been mentioned > > on the list is in Pauli's email today. (I did worry I might have > > missed it, so I just double-checked the archives for August 18-August > > 27, which is the time period the pull request was open, and couldn't > > find anything there.) > > > > (Also, for the record, I'd ask that next time you want to make sure > > that there has been sufficient discussion on a controversial feature > > that has "strong and reasonable opposition", you make more of an > > effort to make sure that the relevant stakeholders are aware...?) > > > >> Benjamin Root did so, for instance. The fact things didn't go the way > you > >> wanted doesn't indicate insufficient discussion. And you are certainly > >> welcome to put together an alternative and put up a pull request. > > > > In the interests of not turning this into a game of procedural > > brinksmanship, can we agree that the point of pull requests and such > > is to make sure that code which ends up in numpy releases generally > > matches what the community wants? Obviously the community has not > > reached a consensus on this code and API, so I'll prepare a pull > > request to temporarily revert the change, and we can work from there. > > > > -- Nathaniel > > > > The discussion started on mark's branches, which was referred to several > times in emails (that's how I started). When it reached a particular level > of maturity, a pull request was made and additional work went into it. The > initial discussion happened for quite a while. > > Plus, my understanding is that it isnt the full Nep, but the core parts > (but I haven't checked in a while). > > In its current state, it is a working implementation that can be used to explore the API. Bit patterns are missing and the masks are handled at the iterator level rather than in the low level ufunc loops, so it isn't particularly fast. IIRC, Mark was careful to leave some hooks for further development and also set things up so that in the future masks could be adapted to allow different mask values with different interpretations. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
On 10/23/2011 10:49 AM, Nathaniel Smith wrote: > But I (and presumably others) were unaware of the pull request, > because it turns out that actually Mark did*not* point to the pull > request, at least in email to either me or numpy-discussion. As far as > I can tell, the first time that pull request has ever been mentioned > on the list is in Pauli's email today. (I did worry I might have > missed it, so I just double-checked the archives for August 18-August > 27, which is the time period the pull request was open, and couldn't > find anything there.) Ideally, Mark's message announcing that his branch was ready for testing (a message that started a thread of constructive comment) would have mentioned the pull request: http://www.mail-archive.com/numpy-discussion@scipy.org/msg33151.html Ultimately, though, the numpy core developers must decide what goes in and what does not. Consensus is desirable but may not always be possible or optimal, especially if "consensus" is interpreted as "unanimity". There is a risk in deciding to accept a major change, but it is mitigated by the ability to make future changes, and it is a risk that must be taken if progress is to be made. As a numpy user, I was pleased to see Travis make the decision that Mark should get on with the coding, and I was pleased to see Charles make the decision to merge the pull request. Eric ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
On Sunday, October 23, 2011, Nathaniel Smith wrote: > On Sun, Oct 23, 2011 at 12:53 PM, Charles R Harris > wrote: >> On Sun, Oct 23, 2011 at 12:54 PM, Matthew Brett >> wrote: >>> I think this email might be a plea to the numpy steering group, and to >>> Travis in particular, to see if we can use a discussion of this series >>> of events to decide on a good way to proceed in future. >> >> Oh come, people had plenty to say, you and Nathaniel in particular. Mark >> pointed to the pull request, anyone who was interested could comment on it, > > Ah, this helps answer my initial question -- I can see how you might > have thought things were more resolved if you thought that we were > aware of the pull request and chose not to participate. That's a > reasonable source of confusion. > > But I (and presumably others) were unaware of the pull request, > because it turns out that actually Mark did *not* point to the pull > request, at least in email to either me or numpy-discussion. As far as > I can tell, the first time that pull request has ever been mentioned > on the list is in Pauli's email today. (I did worry I might have > missed it, so I just double-checked the archives for August 18-August > 27, which is the time period the pull request was open, and couldn't > find anything there.) > > (Also, for the record, I'd ask that next time you want to make sure > that there has been sufficient discussion on a controversial feature > that has "strong and reasonable opposition", you make more of an > effort to make sure that the relevant stakeholders are aware...?) > >> Benjamin Root did so, for instance. The fact things didn't go the way you >> wanted doesn't indicate insufficient discussion. And you are certainly >> welcome to put together an alternative and put up a pull request. > > In the interests of not turning this into a game of procedural > brinksmanship, can we agree that the point of pull requests and such > is to make sure that code which ends up in numpy releases generally > matches what the community wants? Obviously the community has not > reached a consensus on this code and API, so I'll prepare a pull > request to temporarily revert the change, and we can work from there. > > -- Nathaniel > The discussion started on mark's branches, which was referred to several times in emails (that's how I started). When it reached a particular level of maturity, a pull request was made and additional work went into it. The initial discussion happened for quite a while. Plus, my understanding is that it isnt the full Nep, but the core parts (but I haven't checked in a while). Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
On Sun, Oct 23, 2011 at 12:53 PM, Charles R Harris wrote: > On Sun, Oct 23, 2011 at 12:54 PM, Matthew Brett > wrote: >> I think this email might be a plea to the numpy steering group, and to >> Travis in particular, to see if we can use a discussion of this series >> of events to decide on a good way to proceed in future. > > Oh come, people had plenty to say, you and Nathaniel in particular. Mark > pointed to the pull request, anyone who was interested could comment on it, Ah, this helps answer my initial question -- I can see how you might have thought things were more resolved if you thought that we were aware of the pull request and chose not to participate. That's a reasonable source of confusion. But I (and presumably others) were unaware of the pull request, because it turns out that actually Mark did *not* point to the pull request, at least in email to either me or numpy-discussion. As far as I can tell, the first time that pull request has ever been mentioned on the list is in Pauli's email today. (I did worry I might have missed it, so I just double-checked the archives for August 18-August 27, which is the time period the pull request was open, and couldn't find anything there.) (Also, for the record, I'd ask that next time you want to make sure that there has been sufficient discussion on a controversial feature that has "strong and reasonable opposition", you make more of an effort to make sure that the relevant stakeholders are aware...?) > Benjamin Root did so, for instance. The fact things didn't go the way you > wanted doesn't indicate insufficient discussion. And you are certainly > welcome to put together an alternative and put up a pull request. In the interests of not turning this into a game of procedural brinksmanship, can we agree that the point of pull requests and such is to make sure that code which ends up in numpy releases generally matches what the community wants? Obviously the community has not reached a consensus on this code and API, so I'll prepare a pull request to temporarily revert the change, and we can work from there. -- Nathaniel ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
On 10/23/2011 04:07 PM, Robert Kern wrote: > On Sun, Oct 23, 2011 at 20:58, Matthew Brett wrote: >> Hi, >> >> On Sun, Oct 23, 2011 at 12:53 PM, Charles R Harris >> wrote: >>> >>> >>> On Sun, Oct 23, 2011 at 12:54 PM, Matthew Brett >>> wrote: Hi, On Sun, Oct 23, 2011 at 11:21 AM, Nathaniel Smith wrote: > Hi all, > > I was surprised today to notice that Mark's NA mask support appears to > have been merged into numpy master and is described in the draft > release notes[1]. My surprise is because merging it to mainline > without any discussion on the list seems to contradict what what > Travis wrote in July, that it was being developed as an experiment and > explicitly *not* intended to be merged without further discussion: > > "Basically, because there is not consensus and in fact a strong and > reasonable opposition to specific points, Mark's NEP as proposed > cannot be accepted in its entirety right now. However, I believe an > implementation of his NEP is useful and will be instructive in > resolving the issues and so I have instructed him to spend Enthought > time on the implementation. Any changes that need to be made to the > API before it is accepted into a released form of NumPy can still be > made even after most of the implementation is completed as far as I > understand it."[2] > > Can anyone explain what the plan is here? Is the idea to continue the > discussion and rework the API while it is in master, delaying the next > release for as long as it takes to achieve consensus? Or is there some > mysterious git thing going on where "master" is actually an > experimental branch and the real mainline development is happening > somewhere else? Or something else I'm not thinking of? Please help me > understand. I don't know about you, but watching the development from a distance it became increasingly clear to me that this would happen. I"m sure you've had the experience as I have, of mixing several desirable changes into the same set of commits, and it's hard work to avoid this. I imagine this is what happened with Mark's MA changes. The result is actually an extension of the problems of the original discussion, which is a feeling that we the community do not have a say in the development. I think this email might be a plea to the numpy steering group, and to Travis in particular, to see if we can use a discussion of this series of events to decide on a good way to proceed in future. >>> >>> Oh come, people had plenty to say, you and Nathaniel in particular. Mark >>> pointed to the pull request, anyone who was interested could comment on it, >>> Benjamin Root did so, for instance. The fact things didn't go the way you >>> wanted doesn't indicate insufficient discussion. And you are certainly >>> welcome to put together an alternative and put up a pull request. >> >> I was also guessing that something like this would be the reply to >> Nathaniel's post. > > But it wasn't. It was a reply to your message. > >> I think this reply is rude because it implies some sort of sour-grapes >> from Nathaniel, when he is politely referring back to an explicit >> reassurance from Travis. > > What Travis assured did happen, just on the pull request (on which > everyone's input was requested and where most "should this be merged?" > discussions are *meant* to happen) rather than on the mailing list. Except that for a project with a large user community (like numpy), you will _not_ get the feedback you are looking for on github pull-request pages. That's because most users do not look at detailed developer related things like pull requests. But they do read the mailing list. I don't use these features so I don't have a dog in this fight. But potentially controversial changes really should be discussed on the mailing list rather than on pull requests (and yes, I know that there was a lot of discussion about this stuff some months ago). Scott -- Scott M. RansomAddress: NRAO Phone: (434) 296-0320 520 Edgemont Rd. email: sran...@nrao.edu Charlottesville, VA 22903 USA GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
Hi, On Sun, Oct 23, 2011 at 1:07 PM, Robert Kern wrote: > On Sun, Oct 23, 2011 at 20:58, Matthew Brett wrote: >> Hi, >> >> On Sun, Oct 23, 2011 at 12:53 PM, Charles R Harris >> wrote: >>> >>> >>> On Sun, Oct 23, 2011 at 12:54 PM, Matthew Brett >>> wrote: Hi, On Sun, Oct 23, 2011 at 11:21 AM, Nathaniel Smith wrote: > Hi all, > > I was surprised today to notice that Mark's NA mask support appears to > have been merged into numpy master and is described in the draft > release notes[1]. My surprise is because merging it to mainline > without any discussion on the list seems to contradict what what > Travis wrote in July, that it was being developed as an experiment and > explicitly *not* intended to be merged without further discussion: > > "Basically, because there is not consensus and in fact a strong and > reasonable opposition to specific points, Mark's NEP as proposed > cannot be accepted in its entirety right now. However, I believe an > implementation of his NEP is useful and will be instructive in > resolving the issues and so I have instructed him to spend Enthought > time on the implementation. Any changes that need to be made to the > API before it is accepted into a released form of NumPy can still be > made even after most of the implementation is completed as far as I > understand it."[2] > > Can anyone explain what the plan is here? Is the idea to continue the > discussion and rework the API while it is in master, delaying the next > release for as long as it takes to achieve consensus? Or is there some > mysterious git thing going on where "master" is actually an > experimental branch and the real mainline development is happening > somewhere else? Or something else I'm not thinking of? Please help me > understand. I don't know about you, but watching the development from a distance it became increasingly clear to me that this would happen. I"m sure you've had the experience as I have, of mixing several desirable changes into the same set of commits, and it's hard work to avoid this. I imagine this is what happened with Mark's MA changes. The result is actually an extension of the problems of the original discussion, which is a feeling that we the community do not have a say in the development. I think this email might be a plea to the numpy steering group, and to Travis in particular, to see if we can use a discussion of this series of events to decide on a good way to proceed in future. >>> >>> Oh come, people had plenty to say, you and Nathaniel in particular. Mark >>> pointed to the pull request, anyone who was interested could comment on it, >>> Benjamin Root did so, for instance. The fact things didn't go the way you >>> wanted doesn't indicate insufficient discussion. And you are certainly >>> welcome to put together an alternative and put up a pull request. >> >> I was also guessing that something like this would be the reply to >> Nathaniel's post. > > But it wasn't. It was a reply to your message. If you read the message again I think you will see that, although it is addressed to me, it is referring to Nathaniel's question which was, 'Why was this not discussed as promised'. My post was 'This was obviously going to happen and that is a problem, do you all agree and what can we do about it?'. >> I think this reply is rude because it implies some sort of sour-grapes >> from Nathaniel, when he is politely referring back to an explicit >> reassurance from Travis. > > What Travis assured did happen, just on the pull request (on which > everyone's input was requested and where most "should this be merged?" > discussions are *meant* to happen) rather than on the mailing list. It just isn't reasonable to ask for high-level API discussions on the pull-request in this situation. Unless Travis tells me he did mean that, I can only assume that he didn't and he meant that we would revisit the high-level mailing list discussions - on the mailing list. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
On Sun, Oct 23, 2011 at 20:58, Matthew Brett wrote: > Hi, > > On Sun, Oct 23, 2011 at 12:53 PM, Charles R Harris > wrote: >> >> >> On Sun, Oct 23, 2011 at 12:54 PM, Matthew Brett >> wrote: >>> >>> Hi, >>> >>> On Sun, Oct 23, 2011 at 11:21 AM, Nathaniel Smith wrote: >>> > Hi all, >>> > >>> > I was surprised today to notice that Mark's NA mask support appears to >>> > have been merged into numpy master and is described in the draft >>> > release notes[1]. My surprise is because merging it to mainline >>> > without any discussion on the list seems to contradict what what >>> > Travis wrote in July, that it was being developed as an experiment and >>> > explicitly *not* intended to be merged without further discussion: >>> > >>> > "Basically, because there is not consensus and in fact a strong and >>> > reasonable opposition to specific points, Mark's NEP as proposed >>> > cannot be accepted in its entirety right now. However, I believe an >>> > implementation of his NEP is useful and will be instructive in >>> > resolving the issues and so I have instructed him to spend Enthought >>> > time on the implementation. Any changes that need to be made to the >>> > API before it is accepted into a released form of NumPy can still be >>> > made even after most of the implementation is completed as far as I >>> > understand it."[2] >>> > >>> > Can anyone explain what the plan is here? Is the idea to continue the >>> > discussion and rework the API while it is in master, delaying the next >>> > release for as long as it takes to achieve consensus? Or is there some >>> > mysterious git thing going on where "master" is actually an >>> > experimental branch and the real mainline development is happening >>> > somewhere else? Or something else I'm not thinking of? Please help me >>> > understand. >>> >>> I don't know about you, but watching the development from a distance >>> it became increasingly clear to me that this would happen. I"m sure >>> you've had the experience as I have, of mixing several desirable >>> changes into the same set of commits, and it's hard work to avoid >>> this. I imagine this is what happened with Mark's MA changes. >>> >>> The result is actually an extension of the problems of the original >>> discussion, which is a feeling that we the community do not have a say >>> in the development. >>> >>> I think this email might be a plea to the numpy steering group, and to >>> Travis in particular, to see if we can use a discussion of this series >>> of events to decide on a good way to proceed in future. >>> >> >> Oh come, people had plenty to say, you and Nathaniel in particular. Mark >> pointed to the pull request, anyone who was interested could comment on it, >> Benjamin Root did so, for instance. The fact things didn't go the way you >> wanted doesn't indicate insufficient discussion. And you are certainly >> welcome to put together an alternative and put up a pull request. > > I was also guessing that something like this would be the reply to > Nathaniel's post. But it wasn't. It was a reply to your message. > I think this reply is rude because it implies some sort of sour-grapes > from Nathaniel, when he is politely referring back to an explicit > reassurance from Travis. What Travis assured did happen, just on the pull request (on which everyone's input was requested and where most "should this be merged?" discussions are *meant* to happen) rather than on the mailing list. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
On Sun, Oct 23, 2011 at 12:21 PM, Nathaniel Smith wrote: > Hi all, > > I was surprised today to notice that Mark's NA mask support appears to > have been merged into numpy master and is described in the draft > release notes[1]. My surprise is because merging it to mainline > without any discussion on the list seems to contradict what what > Travis wrote in July, that it was being developed as an experiment and > explicitly *not* intended to be merged without further discussion: > > "Basically, because there is not consensus and in fact a strong and > reasonable opposition to specific points, Mark's NEP as proposed > cannot be accepted in its entirety right now. However, I believe an > implementation of his NEP is useful and will be instructive in > resolving the issues and so I have instructed him to spend Enthought > time on the implementation. Any changes that need to be made to the > API before it is accepted into a released form of NumPy can still be > made even after most of the implementation is completed as far as I > understand it."[2] > > Can anyone explain what the plan is here? Is the idea to continue the > discussion and rework the API while it is in master, delaying the next > release for as long as it takes to achieve consensus? Or is there some > mysterious git thing going on where "master" is actually an > experimental branch and the real mainline development is happening > somewhere else? Or something else I'm not thinking of? Please help me > understand. > > No, it's in and has been for a while. You should spend some time with it and make specific suggestion for improvement. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
Hi, On Sun, Oct 23, 2011 at 12:53 PM, Charles R Harris wrote: > > > On Sun, Oct 23, 2011 at 12:54 PM, Matthew Brett > wrote: >> >> Hi, >> >> On Sun, Oct 23, 2011 at 11:21 AM, Nathaniel Smith wrote: >> > Hi all, >> > >> > I was surprised today to notice that Mark's NA mask support appears to >> > have been merged into numpy master and is described in the draft >> > release notes[1]. My surprise is because merging it to mainline >> > without any discussion on the list seems to contradict what what >> > Travis wrote in July, that it was being developed as an experiment and >> > explicitly *not* intended to be merged without further discussion: >> > >> > "Basically, because there is not consensus and in fact a strong and >> > reasonable opposition to specific points, Mark's NEP as proposed >> > cannot be accepted in its entirety right now. However, I believe an >> > implementation of his NEP is useful and will be instructive in >> > resolving the issues and so I have instructed him to spend Enthought >> > time on the implementation. Any changes that need to be made to the >> > API before it is accepted into a released form of NumPy can still be >> > made even after most of the implementation is completed as far as I >> > understand it."[2] >> > >> > Can anyone explain what the plan is here? Is the idea to continue the >> > discussion and rework the API while it is in master, delaying the next >> > release for as long as it takes to achieve consensus? Or is there some >> > mysterious git thing going on where "master" is actually an >> > experimental branch and the real mainline development is happening >> > somewhere else? Or something else I'm not thinking of? Please help me >> > understand. >> >> I don't know about you, but watching the development from a distance >> it became increasingly clear to me that this would happen. I"m sure >> you've had the experience as I have, of mixing several desirable >> changes into the same set of commits, and it's hard work to avoid >> this. I imagine this is what happened with Mark's MA changes. >> >> The result is actually an extension of the problems of the original >> discussion, which is a feeling that we the community do not have a say >> in the development. >> >> I think this email might be a plea to the numpy steering group, and to >> Travis in particular, to see if we can use a discussion of this series >> of events to decide on a good way to proceed in future. >> > > Oh come, people had plenty to say, you and Nathaniel in particular. Mark > pointed to the pull request, anyone who was interested could comment on it, > Benjamin Root did so, for instance. The fact things didn't go the way you > wanted doesn't indicate insufficient discussion. And you are certainly > welcome to put together an alternative and put up a pull request. I was also guessing that something like this would be the reply to Nathaniel's post. I think this reply is rude because it implies some sort of sour-grapes from Nathaniel, when he is politely referring back to an explicit reassurance from Travis. I was trying to avoid this sort of thing by concentrating on thinking about what to do in future. Best, Matthew Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
On Sun, Oct 23, 2011 at 12:54 PM, Matthew Brett wrote: > Hi, > > On Sun, Oct 23, 2011 at 11:21 AM, Nathaniel Smith wrote: > > Hi all, > > > > I was surprised today to notice that Mark's NA mask support appears to > > have been merged into numpy master and is described in the draft > > release notes[1]. My surprise is because merging it to mainline > > without any discussion on the list seems to contradict what what > > Travis wrote in July, that it was being developed as an experiment and > > explicitly *not* intended to be merged without further discussion: > > > > "Basically, because there is not consensus and in fact a strong and > > reasonable opposition to specific points, Mark's NEP as proposed > > cannot be accepted in its entirety right now. However, I believe an > > implementation of his NEP is useful and will be instructive in > > resolving the issues and so I have instructed him to spend Enthought > > time on the implementation. Any changes that need to be made to the > > API before it is accepted into a released form of NumPy can still be > > made even after most of the implementation is completed as far as I > > understand it."[2] > > > > Can anyone explain what the plan is here? Is the idea to continue the > > discussion and rework the API while it is in master, delaying the next > > release for as long as it takes to achieve consensus? Or is there some > > mysterious git thing going on where "master" is actually an > > experimental branch and the real mainline development is happening > > somewhere else? Or something else I'm not thinking of? Please help me > > understand. > > I don't know about you, but watching the development from a distance > it became increasingly clear to me that this would happen. I"m sure > you've had the experience as I have, of mixing several desirable > changes into the same set of commits, and it's hard work to avoid > this. I imagine this is what happened with Mark's MA changes. > > The result is actually an extension of the problems of the original > discussion, which is a feeling that we the community do not have a say > in the development. > > I think this email might be a plea to the numpy steering group, and to > Travis in particular, to see if we can use a discussion of this series > of events to decide on a good way to proceed in future. > > Oh come, people had plenty to say, you and Nathaniel in particular. Mark pointed to the pull request, anyone who was interested could comment on it, Benjamin Root did so, for instance. The fact things didn't go the way you wanted doesn't indicate insufficient discussion. And you are certainly welcome to put together an alternative and put up a pull request. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
Hi, On Sun, Oct 23, 2011 at 11:21 AM, Nathaniel Smith wrote: > Hi all, > > I was surprised today to notice that Mark's NA mask support appears to > have been merged into numpy master and is described in the draft > release notes[1]. My surprise is because merging it to mainline > without any discussion on the list seems to contradict what what > Travis wrote in July, that it was being developed as an experiment and > explicitly *not* intended to be merged without further discussion: > > "Basically, because there is not consensus and in fact a strong and > reasonable opposition to specific points, Mark's NEP as proposed > cannot be accepted in its entirety right now. However, I believe an > implementation of his NEP is useful and will be instructive in > resolving the issues and so I have instructed him to spend Enthought > time on the implementation. Any changes that need to be made to the > API before it is accepted into a released form of NumPy can still be > made even after most of the implementation is completed as far as I > understand it."[2] > > Can anyone explain what the plan is here? Is the idea to continue the > discussion and rework the API while it is in master, delaying the next > release for as long as it takes to achieve consensus? Or is there some > mysterious git thing going on where "master" is actually an > experimental branch and the real mainline development is happening > somewhere else? Or something else I'm not thinking of? Please help me > understand. I don't know about you, but watching the development from a distance it became increasingly clear to me that this would happen. I"m sure you've had the experience as I have, of mixing several desirable changes into the same set of commits, and it's hard work to avoid this. I imagine this is what happened with Mark's MA changes. The result is actually an extension of the problems of the original discussion, which is a feeling that we the community do not have a say in the development. I think this email might be a plea to the numpy steering group, and to Travis in particular, to see if we can use a discussion of this series of events to decide on a good way to proceed in future. See you, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NA masks in the next numpy release?
23.10.2011 20:21, Nathaniel Smith kirjoitti: > I was surprised today to notice that Mark's NA mask support appears to > have been merged into numpy master and is described in the draft > release notes[1]. My surprise is because merging it to mainline > without any discussion on the list seems to contradict what what > Travis wrote in July, that it was being developed as an experiment and > explicitly *not* intended to be merged without further discussion: FWIW, the changes did not go in through a "back door", but through a pull request: https://github.com/numpy/numpy/pull/141 Whether issues with the API were resolved or not before merging, I don't know. (One can also ask whether it would be a good idea to forward noise from the pull requests to the ML.) [clip] > Can anyone explain what the plan is here? Is the idea to continue the > discussion and rework the API while it is in master, delaying the next > release for as long as it takes to achieve consensus? Or is there some > mysterious git thing going on where "master" is actually an > experimental branch and the real mainline development is happening > somewhere else? Or something else I'm not thinking of? Please help me > understand. No, master is supposed to be the integration branch with only finished stuff in it. -- Pauli Virtanen ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion