[Numpy-discussion] Masked Arrays in NumPy 1.x

Travis Oliphant Mon, 09 Apr 2012 21:53:06 -0700

Hey all, 

I've been waiting for Mark Wiebe to arrive in Austin where he will spend 
several weeks, but I also know that masked arrays will be only one of the 
things he and I are hoping to make head-way on while he is in Austin.    
Nevertheless, we need to make progress on the masked array discussion and if we 
want to finalize the masked array implementation we will need to finish the 
design.


I've caught up on most of the discussion including Mark's NEP, Nathaniel's NEP 
and other writings and the very-nice mailing list discussion that included a 
somewhat detailed discussion on the algebra of IGNORED.   I think there are 
some things still to be decided.  However, I think some things are pretty 
clear: 

        1) Masked arrays are going to be fundamental in NumPy and these should 
replace most people's use of numpy.ma.   The numpy.ma code will remain as a 
compatibility layer

        2) The reality of #1 and NumPy's general philosophy to date means that 
masked arrays in NumPy should support the common use-cases of masked arrays 
(including getting and setting of the mask from the Python and C-layers).  
However, the semantic of what the mask implies may change from what numpy.ma 
uses to having  a True value meaning selected.   
        
        3) There will be missing-data dtypes in NumPy.   Likely only a limited 
sub-set (string, bytes, int64, int32, float32, float64, complex64, complex32, 
and object) with an API that allows more to be defined if desired.   These will 
most likely use Mark's nice machinery for managing the calculation structure 
without requiring new C-level loops to be defined. 

        4) I'm still not sure about whether the IGNORED concept is necessary or 
not.    I really like the separation that was emphasized between implementation 
(masks versus bit-patterns) and operations (propagating versus 
non-propagating).   Pauli even created another dimension which I don't totally 
grok and therefore can't remember.   Pauli?  Do you still feel that is a 
necessary construction?  But, do we need the IGNORED concept to indicate what 
amounts to different default key-word arguments to functions that operate on 
NumPy arrays containing missing data (however that is represented)?    My 
current weak view is that it is not really necessary.   But, I could be 
convinced otherwise. 

I think the good news is that given Mark's hard-work and Nathaniel's follow-up 
we are really quite far along.   I would love to get Nathaniel's opinion about 
what remains un-done in the current NumPy code-base.   I would also appreciate 
knowing (from anyone with an interest) opinions of items 1-4 above and anything 
else I've left out.   

Thanks,

-Travis




_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Masked Arrays in NumPy 1.x

Reply via email to