Clearly there are some overlaps between what masked arrays are
trying to achieve and what Rs NA mechanisms are trying to achieve.
Are they really similar enough that they should function using
the same API?
Yes.
And if so, won't that be confusing?
No, I don't believe so, any more than NA's in R, NaN's, or Inf's are already
confusing.
As one who's been silently following (most of) this thread, and a heavy R
and numpy user, perhaps I should chime in briefly here with a use case. I
more-or-less always work with partially masked data, like Matthew, but not
numpy masked arrays because the memory overhead is prohibitive. And, sad
to say, my experiments don't always go perfectly. I therefore have arrays
in which there is /both/ (1) data that is simply missing (np.NA?)--it
never had a value and never will--as well as simultaneously (2) data that
that is temporarily masked (np.IGNORE? np.MASKED?) where I want to
mask/unmask different portions for different purposes/analyses. I consider
these two separate, completely independent issues and I unfortunately
currently have to kluge a lot to handle this.
Concretely, consider a list of 100,000 observations (rows), with 12
measures per observation-row (a 100,000 x 12 array). Every now and then,
sprinkled throughout this array, I have missing values (someone didn't
answer a question, or a computer failed to record a response, or
whatever). For some analyses I want to mask the whole row (e.g.,
complete-case analysis), leaving me with array entries that should be
tagged with all 4 possible labels:
1) not masked, not missing
2) masked, not missing
3) not masked, missing
4) masked, missing
Obviously #4 is "overkill" ... but only until I want to unmask that row.
At that point, I need to be sure that missing values remain missing when
unmasked. Can a single API really handle this?
-best
Gary
The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion