Re: [Numpy-discussion] Masked Arrays in NumPy 1.x

2012-04-24 Thread Frédéric Bastien
Hi,

I finished reading the doc I listed in the other thread. As the NA
stuff will be marked as Experimental in numpy 1.7, why not define a
new macro like NPY_NA_VERSION that will give the version of the NA
C-api? That way, people will be able to detect if there is change in
the c-api of NA when they write it. So this will allow to break this
interface more easily. We would just need to make a big warning to do
this check it.

The current NPY_VERSION and NPY_FEATURE_VERSION macro don't allow
removing feature. Probably a function like PyArray_GetNACVersion would
be useful too.[1]

Continuing on my previous post, old code need to be changed to don't
accept NA inputs.  With the current trunk, this can be done like this:

PyObject* an_input = ;

if (!PyArray_Check(an_input) {
PyErr_SetString(PyExc_ValueError, expected an ndarray);
%(fail)s
}

if (NPY_FEATURE_VERSION = 0x0008){
   if(PyArray_HasNASupport((PyArrayObject*) an_input )){
  PyErr_SetString(PyExc_ValueError, masked array are not
supported by this function);
  %(fail)s
   }
}

In the 1.6.1 release, NPY_FEATURE_VERSION had value 0x0007. This
value wasn't changed in the trunk. I suppose it will be raised to
0x0008 for numpy 1.7.

Can we suppose that old code check input with PyArray_Check()? I think
so, but it would be really helpful if  people that are here for longer
them me can confirm/deny this?

Frédéric

[1] 
http://docs.scipy.org/doc/numpy/reference/c-api.array.html#checking-the-api-version
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Masked Arrays in NumPy 1.x

2012-04-24 Thread Charles R Harris
2012/4/24 Frédéric Bastien no...@nouiz.org

 Hi,

 I finished reading the doc I listed in the other thread. As the NA
 stuff will be marked as Experimental in numpy 1.7, why not define a
 new macro like NPY_NA_VERSION that will give the version of the NA
 C-api? That way, people will be able to detect if there is change in
 the c-api of NA when they write it. So this will allow to break this
 interface more easily. We would just need to make a big warning to do
 this check it.


This sounds like a good thing to do.


 The current NPY_VERSION and NPY_FEATURE_VERSION macro don't allow
 removing feature. Probably a function like PyArray_GetNACVersion would
 be useful too.[1]

 Continuing on my previous post, old code need to be changed to don't
 accept NA inputs.  With the current trunk, this can be done like this:

 PyObject* an_input = ;

 if (!PyArray_Check(an_input) {
PyErr_SetString(PyExc_ValueError, expected an ndarray);
%(fail)s
 }

 if (NPY_FEATURE_VERSION = 0x0008){
   if(PyArray_HasNASupport((PyArrayObject*) an_input )){
  PyErr_SetString(PyExc_ValueError, masked array are not
 supported by this function);
  %(fail)s
   }
 }

 In the 1.6.1 release, NPY_FEATURE_VERSION had value 0x0007. This
 value wasn't changed in the trunk. I suppose it will be raised to
 0x0008 for numpy 1.7.

 Can we suppose that old code check input with PyArray_Check()? I think
 so, but it would be really helpful if  people that are here for longer
 them me can confirm/deny this?


Should be 6 in 1.6

# Binary compatibility version number. This number is increased whenever the
# C-API is changed such that binary compatibility is broken, i.e. whenever a
# recompile of extension modules is needed.
C_ABI_VERSION = 0x0109

# Minor API version.  This number is increased whenever a change is made to
the
# C-API -- whether it breaks binary compatibility or not.  Some changes,
such
# as adding a function pointer to the end of the function table, can be made
# without breaking binary compatibility.  In this case, only the
C_API_VERSION
# (*not* C_ABI_VERSION) would be increased.  Whenever binary compatibility
is
# broken, both C_API_VERSION and C_ABI_VERSION should be increased.
C_API_VERSION = 0x0006

It's now 7. This is set in numpy/core/setup_common.py. Where are you seeing
7 for 1.6?

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Masked Arrays in NumPy 1.x

2012-04-24 Thread Frédéric Bastien
On Tue, Apr 24, 2012 at 4:03 PM, Charles R Harris
charlesr.har...@gmail.com wrote:

 Should be 6 in 1.6

 # Binary compatibility version number. This number is increased whenever the
 # C-API is changed such that binary compatibility is broken, i.e. whenever a
 # recompile of extension modules is needed.
 C_ABI_VERSION = 0x0109

 # Minor API version.  This number is increased whenever a change is made to
 the
 # C-API -- whether it breaks binary compatibility or not.  Some changes,
 such
 # as adding a function pointer to the end of the function table, can be made
 # without breaking binary compatibility.  In this case, only the
 C_API_VERSION
 # (*not* C_ABI_VERSION) would be increased.  Whenever binary compatibility
 is
 # broken, both C_API_VERSION and C_ABI_VERSION should be increased.
 C_API_VERSION = 0x0006

 It's now 7. This is set in numpy/core/setup_common.py. Where are you seeing
 7 for 1.6?

My bad, when I grepped, I found this line:

./build/src.linux-x86_64-2.7/numpy/core/include/numpy/_numpyconfig.h:#define
NPY_API_VERSION 0x0007

That tell the version 0x0007. But this is in a file in the build
directory. As I my last build was with a later version, it isn't the
right number!

Fred
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Masked Arrays in NumPy 1.x

2012-04-23 Thread Nathaniel Smith
Hi Paul,

On Wed, Apr 11, 2012 at 8:57 PM, Paul Hobson pmhob...@gmail.com wrote:
 Travis et al,

 This isn't a reply to anything specific in your email and I apologize
 if there is a better thread or place to share this information. I've
 been meaning to participate in the discussion for a long time and
 never got around to it. The main thing I'd like to is convey my
 typical use of the numpy.ma module as an environmental engineer
 analyzing censored datasets --contaminant concentrations that are
 either at well understood values (not masked) or some unknown value
 below an upper bound (masked).

 My basic understanding is that this discussion revolved around how to
 treat masked data (ignored vs missing) and how to implement one, both,
 or some middle ground between those two concepts. If I'm off-base,
 just ignore all of the following.

 For my purposes, numpy.ma is implemented in a way very well suited to
 my needs. Here's a gist of a something that was *really* hard for me
 before I discovered numpy.ma and numpy in general. (this is a bit
 much, see below for the highlights)
 https://gist.github.com/2361814

 The main message here is that I include the upper bounds of the
 unknown values (detection limits) in my array and use that to
 statistically estimate their values. I must be able to retrieve the
 masked detection limits throughout this process. Additionally the
 masks as currently implemented allow me sort first the undetected
 values, then the detected values (see __rosRanks in the gist).

 As boots-on-the-ground user of numpy, I'm ecstatic that this tool
 exists. I'm also pretty flexible and don't anticipated any major snags
 in my work if things change dramatically as the masked/missing/ignored
 functionality evolves.

 Thanks to everyone for the hard work and great tools,
 -Paul Hobson

Thanks for this note -- it's getting feedback from people on how
they're actually using numpy.ma is *very* helpful, because there's a
lot more data out there on the missing data use case.

But, I couldn't quite figure out what you're actually doing in this
code. It looks like the measurements that you're masking out have some
values hidden behind the mask, which you then make use of?
Unfortunately, I don't know anything about environmental engineering
or the method of Hirsch and Stedinger (1987). Could you elaborate a
bit on what these masked values mean and how you process them?

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Masked Arrays in NumPy 1.x

2012-04-23 Thread Travis Oliphant
Thank you very much for contributing this description.It is very helpful to 
see how people use numpy.ma in the wild. 

-Travis

On Apr 11, 2012, at 2:57 PM, Paul Hobson wrote:

 Travis et al,
 
 This isn't a reply to anything specific in your email and I apologize
 if there is a better thread or place to share this information. I've
 been meaning to participate in the discussion for a long time and
 never got around to it. The main thing I'd like to is convey my
 typical use of the numpy.ma module as an environmental engineer
 analyzing censored datasets --contaminant concentrations that are
 either at well understood values (not masked) or some unknown value
 below an upper bound (masked).
 
 My basic understanding is that this discussion revolved around how to
 treat masked data (ignored vs missing) and how to implement one, both,
 or some middle ground between those two concepts. If I'm off-base,
 just ignore all of the following.
 
 For my purposes, numpy.ma is implemented in a way very well suited to
 my needs. Here's a gist of a something that was *really* hard for me
 before I discovered numpy.ma and numpy in general. (this is a bit
 much, see below for the highlights)
 https://gist.github.com/2361814
 
 The main message here is that I include the upper bounds of the
 unknown values (detection limits) in my array and use that to
 statistically estimate their values. I must be able to retrieve the
 masked detection limits throughout this process. Additionally the
 masks as currently implemented allow me sort first the undetected
 values, then the detected values (see __rosRanks in the gist).
 
 As boots-on-the-ground user of numpy, I'm ecstatic that this tool
 exists. I'm also pretty flexible and don't anticipated any major snags
 in my work if things change dramatically as the masked/missing/ignored
 functionality evolves.
 
 Thanks to everyone for the hard work and great tools,
 -Paul Hobson
 
 On Mon, Apr 9, 2012 at 9:52 PM, Travis Oliphant tra...@continuum.io wrote:
 Hey all,
 
 I've been waiting for Mark Wiebe to arrive in Austin where he will spend 
 several weeks, but I also know that masked arrays will be only one of the 
 things he and I are hoping to make head-way on while he is in Austin.
 Nevertheless, we need to make progress on the masked array discussion and if 
 we want to finalize the masked array implementation we will need to finish 
 the design.
 
 I've caught up on most of the discussion including Mark's NEP, Nathaniel's 
 NEP and other writings and the very-nice mailing list discussion that 
 included a somewhat detailed discussion on the algebra of IGNORED.   I think 
 there are some things still to be decided.  However, I think some things are 
 pretty clear:
 
1) Masked arrays are going to be fundamental in NumPy and these 
 should replace most people's use of numpy.ma.   The numpy.ma code will 
 remain as a compatibility layer
 
2) The reality of #1 and NumPy's general philosophy to date means 
 that masked arrays in NumPy should support the common use-cases of masked 
 arrays (including getting and setting of the mask from the Python and 
 C-layers).  However, the semantic of what the mask implies may change from 
 what numpy.ma uses to having  a True value meaning selected.
 
3) There will be missing-data dtypes in NumPy.   Likely only a 
 limited sub-set (string, bytes, int64, int32, float32, float64, complex64, 
 complex32, and object) with an API that allows more to be defined if 
 desired.   These will most likely use Mark's nice machinery for managing the 
 calculation structure without requiring new C-level loops to be defined.
 
4) I'm still not sure about whether the IGNORED concept is necessary 
 or not.I really like the separation that was emphasized between 
 implementation (masks versus bit-patterns) and operations (propagating 
 versus non-propagating).   Pauli even created another dimension which I 
 don't totally grok and therefore can't remember.   Pauli?  Do you still feel 
 that is a necessary construction?  But, do we need the IGNORED concept to 
 indicate what amounts to different default key-word arguments to functions 
 that operate on NumPy arrays containing missing data (however that is 
 represented)?My current weak view is that it is not really necessary.   
 But, I could be convinced otherwise.
 
 I think the good news is that given Mark's hard-work and Nathaniel's 
 follow-up we are really quite far along.   I would love to get Nathaniel's 
 opinion about what remains un-done in the current NumPy code-base.   I would 
 also appreciate knowing (from anyone with an interest) opinions of items 1-4 
 above and anything else I've left out.
 
 Thanks,
 
 -Travis
 
 
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 ___
 NumPy-Discussion 

Re: [Numpy-discussion] Masked Arrays in NumPy 1.x

2012-04-11 Thread Paul Hobson
Travis et al,

This isn't a reply to anything specific in your email and I apologize
if there is a better thread or place to share this information. I've
been meaning to participate in the discussion for a long time and
never got around to it. The main thing I'd like to is convey my
typical use of the numpy.ma module as an environmental engineer
analyzing censored datasets --contaminant concentrations that are
either at well understood values (not masked) or some unknown value
below an upper bound (masked).

My basic understanding is that this discussion revolved around how to
treat masked data (ignored vs missing) and how to implement one, both,
or some middle ground between those two concepts. If I'm off-base,
just ignore all of the following.

For my purposes, numpy.ma is implemented in a way very well suited to
my needs. Here's a gist of a something that was *really* hard for me
before I discovered numpy.ma and numpy in general. (this is a bit
much, see below for the highlights)
https://gist.github.com/2361814

The main message here is that I include the upper bounds of the
unknown values (detection limits) in my array and use that to
statistically estimate their values. I must be able to retrieve the
masked detection limits throughout this process. Additionally the
masks as currently implemented allow me sort first the undetected
values, then the detected values (see __rosRanks in the gist).

As boots-on-the-ground user of numpy, I'm ecstatic that this tool
exists. I'm also pretty flexible and don't anticipated any major snags
in my work if things change dramatically as the masked/missing/ignored
functionality evolves.

Thanks to everyone for the hard work and great tools,
-Paul Hobson

On Mon, Apr 9, 2012 at 9:52 PM, Travis Oliphant tra...@continuum.io wrote:
 Hey all,

 I've been waiting for Mark Wiebe to arrive in Austin where he will spend 
 several weeks, but I also know that masked arrays will be only one of the 
 things he and I are hoping to make head-way on while he is in Austin.    
 Nevertheless, we need to make progress on the masked array discussion and if 
 we want to finalize the masked array implementation we will need to finish 
 the design.

 I've caught up on most of the discussion including Mark's NEP, Nathaniel's 
 NEP and other writings and the very-nice mailing list discussion that 
 included a somewhat detailed discussion on the algebra of IGNORED.   I think 
 there are some things still to be decided.  However, I think some things are 
 pretty clear:

        1) Masked arrays are going to be fundamental in NumPy and these should 
 replace most people's use of numpy.ma.   The numpy.ma code will remain as a 
 compatibility layer

        2) The reality of #1 and NumPy's general philosophy to date means that 
 masked arrays in NumPy should support the common use-cases of masked arrays 
 (including getting and setting of the mask from the Python and C-layers).  
 However, the semantic of what the mask implies may change from what numpy.ma 
 uses to having  a True value meaning selected.

        3) There will be missing-data dtypes in NumPy.   Likely only a limited 
 sub-set (string, bytes, int64, int32, float32, float64, complex64, complex32, 
 and object) with an API that allows more to be defined if desired.   These 
 will most likely use Mark's nice machinery for managing the calculation 
 structure without requiring new C-level loops to be defined.

        4) I'm still not sure about whether the IGNORED concept is necessary 
 or not.    I really like the separation that was emphasized between 
 implementation (masks versus bit-patterns) and operations (propagating versus 
 non-propagating).   Pauli even created another dimension which I don't 
 totally grok and therefore can't remember.   Pauli?  Do you still feel that 
 is a necessary construction?  But, do we need the IGNORED concept to indicate 
 what amounts to different default key-word arguments to functions that 
 operate on NumPy arrays containing missing data (however that is 
 represented)?    My current weak view is that it is not really necessary.   
 But, I could be convinced otherwise.

 I think the good news is that given Mark's hard-work and Nathaniel's 
 follow-up we are really quite far along.   I would love to get Nathaniel's 
 opinion about what remains un-done in the current NumPy code-base.   I would 
 also appreciate knowing (from anyone with an interest) opinions of items 1-4 
 above and anything else I've left out.

 Thanks,

 -Travis




 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Masked Arrays in NumPy 1.x

2012-04-10 Thread Eric Firing
On 04/09/2012 06:52 PM, Travis Oliphant wrote:
 Hey all,

 I've been waiting for Mark Wiebe to arrive in Austin where he will
 spend several weeks, but I also know that masked arrays will be only
 one of the things he and I are hoping to make head-way on while he is
 in Austin.Nevertheless, we need to make progress on the masked
 array discussion and if we want to finalize the masked array
 implementation we will need to finish the design.

 I've caught up on most of the discussion including Mark's NEP,
 Nathaniel's NEP and other writings and the very-nice mailing list
 discussion that included a somewhat detailed discussion on the
 algebra of IGNORED.   I think there are some things still to be
 decided.  However, I think some things are pretty clear:

 1) Masked arrays are going to be fundamental in NumPy and these
 should replace most people's use of numpy.ma.   The numpy.ma code
 will remain as a compatibility layer

Excellent!  In mpl and other heavy users of numpy.ma there will still be 
work to do to handle all varieties of input, but it should be manageable.


 2) The reality of #1 and NumPy's general philosophy to date means
 that masked arrays in NumPy should support the common use-cases of
 masked arrays (including getting and setting of the mask from the
 Python and C-layers).  However, the semantic of what the mask implies
 may change from what numpy.ma uses to having  a True value meaning
 selected.

I never understood a strong argument for that change from numpy.ma. 
When editing data, it is natural to use flag bits to indicate various 
rejection criteria; no bit set means it's all good, so a False is 
naturally good and True is naturally mask it out.  But I can live 
with the change if you and Mark see a good reason for it.


 3) There will be missing-data dtypes in NumPy.   Likely
 only a limited sub-set (string, bytes, int64, int32, float32,
 float64, complex64, complex32, and object) with an API that allows
 more to be defined if desired.   These will most likely use Mark's
 nice machinery for managing the calculation structure without
 requiring new C-level loops to be defined.

So, these will be the bit-pattern versions of NA, correct?  With the bit 
pattern specified as an attribute of the dtype?  Good, but...

Are we getting into trouble here, figuring out how to handle all 
combinations of numpy.ma, masked dtypes, and Mark's masked NA?


 4) I'm still not sure about whether the IGNORED concept is necessary
 or not.I really like the separation that was emphasized between
 implementation (masks versus bit-patterns) and operations
 (propagating versus non-propagating).   Pauli even created another
 dimension which I don't totally grok and therefore can't remember.
 Pauli?  Do you still feel that is a necessary construction?  But, do
 we need the IGNORED concept to indicate what amounts to different
 default key-word arguments to functions that operate on NumPy arrays
 containing missing data (however that is represented)?My current
 weak view is that it is not really necessary.   But, I could be
 convinced otherwise.

I agree (if I understand you correctly); the goal is an expressive, 
explicit language that lets people accomplish what they want, clearly 
and quickly, and I think this is more a matter of practicality than 
purity of theory.  Nevertheless, achieving that is easier said than 
done, and figuring out how to handle corner cases is better done sooner 
than later.

Numpy.ma has never been perfect, but it has proven a good tool for 
practical work in my experience.  (Many thanks to Pierre GM for all his 
work on it.) One of the nice things it does is to automatically mask out 
invalid results.  This saves quit a bit of explicit checking that would 
otherwise be required.

Eric


 I think the good news is that given Mark's hard-work and Nathaniel's
 follow-up we are really quite far along.   I would love to get
 Nathaniel's opinion about what remains un-done in the current NumPy
 code-base.   I would also appreciate knowing (from anyone with an
 interest) opinions of items 1-4 above and anything else I've left
 out.

 Thanks,

 -Travis
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Masked Arrays in NumPy 1.x

2012-04-10 Thread Pauli Virtanen
10.04.2012 06:52, Travis Oliphant kirjoitti:
[clip]
   4) I'm still not sure about whether the IGNORED
 concept is necessary or not. I really like the separation
 that was emphasized between implementation (masks versus
 bit-patterns) and operations (propagating versus non-propagating).
 Pauli even created another dimension which I don't totally grok
 and therefore can't remember.   Pauli?  Do you still feel that
 is a necessary construction?

I think the conclusion from that discussion subthread was only that
retaining commutativity of binary operations is probably impossible, if
unmasking of values is allowed. (I think in that discussion the big
difference between IGNORED and MISSING was that ignored values could be
unmasked while missing values could not.)

If I recall correctly, my suggestion was that you might be able to
rescue the situation by changing what assignment means, e.g. in
`x[:5] = y` what gets written to `x` at the points where values in `x`
and/or `y` are masked/ignored. But I think some counterexamples why this
will not work as intended came up.

`numpy.ma` operations are not commutative, which can be sometimes
surprising, but apparently one just has to be pragmatical and live with
this as there's no real way around it. I don't have very good
suggestions on how these features should be designed --- I use them too
seldom.

Pauli

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion