Re: [Numpy-discussion] Masked Arrays in NumPy 1.x
Hi, I finished reading the doc I listed in the other thread. As the NA stuff will be marked as Experimental in numpy 1.7, why not define a new macro like NPY_NA_VERSION that will give the version of the NA C-api? That way, people will be able to detect if there is change in the c-api of NA when they write it. So this will allow to break this interface more easily. We would just need to make a big warning to do this check it. The current NPY_VERSION and NPY_FEATURE_VERSION macro don't allow removing feature. Probably a function like PyArray_GetNACVersion would be useful too.[1] Continuing on my previous post, old code need to be changed to don't accept NA inputs. With the current trunk, this can be done like this: PyObject* an_input = ; if (!PyArray_Check(an_input) { PyErr_SetString(PyExc_ValueError, expected an ndarray); %(fail)s } if (NPY_FEATURE_VERSION = 0x0008){ if(PyArray_HasNASupport((PyArrayObject*) an_input )){ PyErr_SetString(PyExc_ValueError, masked array are not supported by this function); %(fail)s } } In the 1.6.1 release, NPY_FEATURE_VERSION had value 0x0007. This value wasn't changed in the trunk. I suppose it will be raised to 0x0008 for numpy 1.7. Can we suppose that old code check input with PyArray_Check()? I think so, but it would be really helpful if people that are here for longer them me can confirm/deny this? Frédéric [1] http://docs.scipy.org/doc/numpy/reference/c-api.array.html#checking-the-api-version ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Masked Arrays in NumPy 1.x
2012/4/24 Frédéric Bastien no...@nouiz.org Hi, I finished reading the doc I listed in the other thread. As the NA stuff will be marked as Experimental in numpy 1.7, why not define a new macro like NPY_NA_VERSION that will give the version of the NA C-api? That way, people will be able to detect if there is change in the c-api of NA when they write it. So this will allow to break this interface more easily. We would just need to make a big warning to do this check it. This sounds like a good thing to do. The current NPY_VERSION and NPY_FEATURE_VERSION macro don't allow removing feature. Probably a function like PyArray_GetNACVersion would be useful too.[1] Continuing on my previous post, old code need to be changed to don't accept NA inputs. With the current trunk, this can be done like this: PyObject* an_input = ; if (!PyArray_Check(an_input) { PyErr_SetString(PyExc_ValueError, expected an ndarray); %(fail)s } if (NPY_FEATURE_VERSION = 0x0008){ if(PyArray_HasNASupport((PyArrayObject*) an_input )){ PyErr_SetString(PyExc_ValueError, masked array are not supported by this function); %(fail)s } } In the 1.6.1 release, NPY_FEATURE_VERSION had value 0x0007. This value wasn't changed in the trunk. I suppose it will be raised to 0x0008 for numpy 1.7. Can we suppose that old code check input with PyArray_Check()? I think so, but it would be really helpful if people that are here for longer them me can confirm/deny this? Should be 6 in 1.6 # Binary compatibility version number. This number is increased whenever the # C-API is changed such that binary compatibility is broken, i.e. whenever a # recompile of extension modules is needed. C_ABI_VERSION = 0x0109 # Minor API version. This number is increased whenever a change is made to the # C-API -- whether it breaks binary compatibility or not. Some changes, such # as adding a function pointer to the end of the function table, can be made # without breaking binary compatibility. In this case, only the C_API_VERSION # (*not* C_ABI_VERSION) would be increased. Whenever binary compatibility is # broken, both C_API_VERSION and C_ABI_VERSION should be increased. C_API_VERSION = 0x0006 It's now 7. This is set in numpy/core/setup_common.py. Where are you seeing 7 for 1.6? Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Masked Arrays in NumPy 1.x
On Tue, Apr 24, 2012 at 4:03 PM, Charles R Harris charlesr.har...@gmail.com wrote: Should be 6 in 1.6 # Binary compatibility version number. This number is increased whenever the # C-API is changed such that binary compatibility is broken, i.e. whenever a # recompile of extension modules is needed. C_ABI_VERSION = 0x0109 # Minor API version. This number is increased whenever a change is made to the # C-API -- whether it breaks binary compatibility or not. Some changes, such # as adding a function pointer to the end of the function table, can be made # without breaking binary compatibility. In this case, only the C_API_VERSION # (*not* C_ABI_VERSION) would be increased. Whenever binary compatibility is # broken, both C_API_VERSION and C_ABI_VERSION should be increased. C_API_VERSION = 0x0006 It's now 7. This is set in numpy/core/setup_common.py. Where are you seeing 7 for 1.6? My bad, when I grepped, I found this line: ./build/src.linux-x86_64-2.7/numpy/core/include/numpy/_numpyconfig.h:#define NPY_API_VERSION 0x0007 That tell the version 0x0007. But this is in a file in the build directory. As I my last build was with a later version, it isn't the right number! Fred ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Masked Arrays in NumPy 1.x
Hi Paul, On Wed, Apr 11, 2012 at 8:57 PM, Paul Hobson pmhob...@gmail.com wrote: Travis et al, This isn't a reply to anything specific in your email and I apologize if there is a better thread or place to share this information. I've been meaning to participate in the discussion for a long time and never got around to it. The main thing I'd like to is convey my typical use of the numpy.ma module as an environmental engineer analyzing censored datasets --contaminant concentrations that are either at well understood values (not masked) or some unknown value below an upper bound (masked). My basic understanding is that this discussion revolved around how to treat masked data (ignored vs missing) and how to implement one, both, or some middle ground between those two concepts. If I'm off-base, just ignore all of the following. For my purposes, numpy.ma is implemented in a way very well suited to my needs. Here's a gist of a something that was *really* hard for me before I discovered numpy.ma and numpy in general. (this is a bit much, see below for the highlights) https://gist.github.com/2361814 The main message here is that I include the upper bounds of the unknown values (detection limits) in my array and use that to statistically estimate their values. I must be able to retrieve the masked detection limits throughout this process. Additionally the masks as currently implemented allow me sort first the undetected values, then the detected values (see __rosRanks in the gist). As boots-on-the-ground user of numpy, I'm ecstatic that this tool exists. I'm also pretty flexible and don't anticipated any major snags in my work if things change dramatically as the masked/missing/ignored functionality evolves. Thanks to everyone for the hard work and great tools, -Paul Hobson Thanks for this note -- it's getting feedback from people on how they're actually using numpy.ma is *very* helpful, because there's a lot more data out there on the missing data use case. But, I couldn't quite figure out what you're actually doing in this code. It looks like the measurements that you're masking out have some values hidden behind the mask, which you then make use of? Unfortunately, I don't know anything about environmental engineering or the method of Hirsch and Stedinger (1987). Could you elaborate a bit on what these masked values mean and how you process them? -- Nathaniel ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Masked Arrays in NumPy 1.x
Thank you very much for contributing this description.It is very helpful to see how people use numpy.ma in the wild. -Travis On Apr 11, 2012, at 2:57 PM, Paul Hobson wrote: Travis et al, This isn't a reply to anything specific in your email and I apologize if there is a better thread or place to share this information. I've been meaning to participate in the discussion for a long time and never got around to it. The main thing I'd like to is convey my typical use of the numpy.ma module as an environmental engineer analyzing censored datasets --contaminant concentrations that are either at well understood values (not masked) or some unknown value below an upper bound (masked). My basic understanding is that this discussion revolved around how to treat masked data (ignored vs missing) and how to implement one, both, or some middle ground between those two concepts. If I'm off-base, just ignore all of the following. For my purposes, numpy.ma is implemented in a way very well suited to my needs. Here's a gist of a something that was *really* hard for me before I discovered numpy.ma and numpy in general. (this is a bit much, see below for the highlights) https://gist.github.com/2361814 The main message here is that I include the upper bounds of the unknown values (detection limits) in my array and use that to statistically estimate their values. I must be able to retrieve the masked detection limits throughout this process. Additionally the masks as currently implemented allow me sort first the undetected values, then the detected values (see __rosRanks in the gist). As boots-on-the-ground user of numpy, I'm ecstatic that this tool exists. I'm also pretty flexible and don't anticipated any major snags in my work if things change dramatically as the masked/missing/ignored functionality evolves. Thanks to everyone for the hard work and great tools, -Paul Hobson On Mon, Apr 9, 2012 at 9:52 PM, Travis Oliphant tra...@continuum.io wrote: Hey all, I've been waiting for Mark Wiebe to arrive in Austin where he will spend several weeks, but I also know that masked arrays will be only one of the things he and I are hoping to make head-way on while he is in Austin. Nevertheless, we need to make progress on the masked array discussion and if we want to finalize the masked array implementation we will need to finish the design. I've caught up on most of the discussion including Mark's NEP, Nathaniel's NEP and other writings and the very-nice mailing list discussion that included a somewhat detailed discussion on the algebra of IGNORED. I think there are some things still to be decided. However, I think some things are pretty clear: 1) Masked arrays are going to be fundamental in NumPy and these should replace most people's use of numpy.ma. The numpy.ma code will remain as a compatibility layer 2) The reality of #1 and NumPy's general philosophy to date means that masked arrays in NumPy should support the common use-cases of masked arrays (including getting and setting of the mask from the Python and C-layers). However, the semantic of what the mask implies may change from what numpy.ma uses to having a True value meaning selected. 3) There will be missing-data dtypes in NumPy. Likely only a limited sub-set (string, bytes, int64, int32, float32, float64, complex64, complex32, and object) with an API that allows more to be defined if desired. These will most likely use Mark's nice machinery for managing the calculation structure without requiring new C-level loops to be defined. 4) I'm still not sure about whether the IGNORED concept is necessary or not.I really like the separation that was emphasized between implementation (masks versus bit-patterns) and operations (propagating versus non-propagating). Pauli even created another dimension which I don't totally grok and therefore can't remember. Pauli? Do you still feel that is a necessary construction? But, do we need the IGNORED concept to indicate what amounts to different default key-word arguments to functions that operate on NumPy arrays containing missing data (however that is represented)?My current weak view is that it is not really necessary. But, I could be convinced otherwise. I think the good news is that given Mark's hard-work and Nathaniel's follow-up we are really quite far along. I would love to get Nathaniel's opinion about what remains un-done in the current NumPy code-base. I would also appreciate knowing (from anyone with an interest) opinions of items 1-4 above and anything else I've left out. Thanks, -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion
Re: [Numpy-discussion] Masked Arrays in NumPy 1.x
Travis et al, This isn't a reply to anything specific in your email and I apologize if there is a better thread or place to share this information. I've been meaning to participate in the discussion for a long time and never got around to it. The main thing I'd like to is convey my typical use of the numpy.ma module as an environmental engineer analyzing censored datasets --contaminant concentrations that are either at well understood values (not masked) or some unknown value below an upper bound (masked). My basic understanding is that this discussion revolved around how to treat masked data (ignored vs missing) and how to implement one, both, or some middle ground between those two concepts. If I'm off-base, just ignore all of the following. For my purposes, numpy.ma is implemented in a way very well suited to my needs. Here's a gist of a something that was *really* hard for me before I discovered numpy.ma and numpy in general. (this is a bit much, see below for the highlights) https://gist.github.com/2361814 The main message here is that I include the upper bounds of the unknown values (detection limits) in my array and use that to statistically estimate their values. I must be able to retrieve the masked detection limits throughout this process. Additionally the masks as currently implemented allow me sort first the undetected values, then the detected values (see __rosRanks in the gist). As boots-on-the-ground user of numpy, I'm ecstatic that this tool exists. I'm also pretty flexible and don't anticipated any major snags in my work if things change dramatically as the masked/missing/ignored functionality evolves. Thanks to everyone for the hard work and great tools, -Paul Hobson On Mon, Apr 9, 2012 at 9:52 PM, Travis Oliphant tra...@continuum.io wrote: Hey all, I've been waiting for Mark Wiebe to arrive in Austin where he will spend several weeks, but I also know that masked arrays will be only one of the things he and I are hoping to make head-way on while he is in Austin. Nevertheless, we need to make progress on the masked array discussion and if we want to finalize the masked array implementation we will need to finish the design. I've caught up on most of the discussion including Mark's NEP, Nathaniel's NEP and other writings and the very-nice mailing list discussion that included a somewhat detailed discussion on the algebra of IGNORED. I think there are some things still to be decided. However, I think some things are pretty clear: 1) Masked arrays are going to be fundamental in NumPy and these should replace most people's use of numpy.ma. The numpy.ma code will remain as a compatibility layer 2) The reality of #1 and NumPy's general philosophy to date means that masked arrays in NumPy should support the common use-cases of masked arrays (including getting and setting of the mask from the Python and C-layers). However, the semantic of what the mask implies may change from what numpy.ma uses to having a True value meaning selected. 3) There will be missing-data dtypes in NumPy. Likely only a limited sub-set (string, bytes, int64, int32, float32, float64, complex64, complex32, and object) with an API that allows more to be defined if desired. These will most likely use Mark's nice machinery for managing the calculation structure without requiring new C-level loops to be defined. 4) I'm still not sure about whether the IGNORED concept is necessary or not. I really like the separation that was emphasized between implementation (masks versus bit-patterns) and operations (propagating versus non-propagating). Pauli even created another dimension which I don't totally grok and therefore can't remember. Pauli? Do you still feel that is a necessary construction? But, do we need the IGNORED concept to indicate what amounts to different default key-word arguments to functions that operate on NumPy arrays containing missing data (however that is represented)? My current weak view is that it is not really necessary. But, I could be convinced otherwise. I think the good news is that given Mark's hard-work and Nathaniel's follow-up we are really quite far along. I would love to get Nathaniel's opinion about what remains un-done in the current NumPy code-base. I would also appreciate knowing (from anyone with an interest) opinions of items 1-4 above and anything else I've left out. Thanks, -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Masked Arrays in NumPy 1.x
On 04/09/2012 06:52 PM, Travis Oliphant wrote: Hey all, I've been waiting for Mark Wiebe to arrive in Austin where he will spend several weeks, but I also know that masked arrays will be only one of the things he and I are hoping to make head-way on while he is in Austin.Nevertheless, we need to make progress on the masked array discussion and if we want to finalize the masked array implementation we will need to finish the design. I've caught up on most of the discussion including Mark's NEP, Nathaniel's NEP and other writings and the very-nice mailing list discussion that included a somewhat detailed discussion on the algebra of IGNORED. I think there are some things still to be decided. However, I think some things are pretty clear: 1) Masked arrays are going to be fundamental in NumPy and these should replace most people's use of numpy.ma. The numpy.ma code will remain as a compatibility layer Excellent! In mpl and other heavy users of numpy.ma there will still be work to do to handle all varieties of input, but it should be manageable. 2) The reality of #1 and NumPy's general philosophy to date means that masked arrays in NumPy should support the common use-cases of masked arrays (including getting and setting of the mask from the Python and C-layers). However, the semantic of what the mask implies may change from what numpy.ma uses to having a True value meaning selected. I never understood a strong argument for that change from numpy.ma. When editing data, it is natural to use flag bits to indicate various rejection criteria; no bit set means it's all good, so a False is naturally good and True is naturally mask it out. But I can live with the change if you and Mark see a good reason for it. 3) There will be missing-data dtypes in NumPy. Likely only a limited sub-set (string, bytes, int64, int32, float32, float64, complex64, complex32, and object) with an API that allows more to be defined if desired. These will most likely use Mark's nice machinery for managing the calculation structure without requiring new C-level loops to be defined. So, these will be the bit-pattern versions of NA, correct? With the bit pattern specified as an attribute of the dtype? Good, but... Are we getting into trouble here, figuring out how to handle all combinations of numpy.ma, masked dtypes, and Mark's masked NA? 4) I'm still not sure about whether the IGNORED concept is necessary or not.I really like the separation that was emphasized between implementation (masks versus bit-patterns) and operations (propagating versus non-propagating). Pauli even created another dimension which I don't totally grok and therefore can't remember. Pauli? Do you still feel that is a necessary construction? But, do we need the IGNORED concept to indicate what amounts to different default key-word arguments to functions that operate on NumPy arrays containing missing data (however that is represented)?My current weak view is that it is not really necessary. But, I could be convinced otherwise. I agree (if I understand you correctly); the goal is an expressive, explicit language that lets people accomplish what they want, clearly and quickly, and I think this is more a matter of practicality than purity of theory. Nevertheless, achieving that is easier said than done, and figuring out how to handle corner cases is better done sooner than later. Numpy.ma has never been perfect, but it has proven a good tool for practical work in my experience. (Many thanks to Pierre GM for all his work on it.) One of the nice things it does is to automatically mask out invalid results. This saves quit a bit of explicit checking that would otherwise be required. Eric I think the good news is that given Mark's hard-work and Nathaniel's follow-up we are really quite far along. I would love to get Nathaniel's opinion about what remains un-done in the current NumPy code-base. I would also appreciate knowing (from anyone with an interest) opinions of items 1-4 above and anything else I've left out. Thanks, -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Masked Arrays in NumPy 1.x
10.04.2012 06:52, Travis Oliphant kirjoitti: [clip] 4) I'm still not sure about whether the IGNORED concept is necessary or not. I really like the separation that was emphasized between implementation (masks versus bit-patterns) and operations (propagating versus non-propagating). Pauli even created another dimension which I don't totally grok and therefore can't remember. Pauli? Do you still feel that is a necessary construction? I think the conclusion from that discussion subthread was only that retaining commutativity of binary operations is probably impossible, if unmasking of values is allowed. (I think in that discussion the big difference between IGNORED and MISSING was that ignored values could be unmasked while missing values could not.) If I recall correctly, my suggestion was that you might be able to rescue the situation by changing what assignment means, e.g. in `x[:5] = y` what gets written to `x` at the points where values in `x` and/or `y` are masked/ignored. But I think some counterexamples why this will not work as intended came up. `numpy.ma` operations are not commutative, which can be sometimes surprising, but apparently one just has to be pragmatical and live with this as there's no real way around it. I don't have very good suggestions on how these features should be designed --- I use them too seldom. Pauli ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion