On 6/24/19 3:09 PM, Marten van Kerkwijk wrote:
> Hi Allan,
> 
> Thanks for bringing up the noclobber explicitly (and Stephan for asking
> for clarification; I was similarly confused).
> 
> It does clarify the difference in mental picture. In mine, the operation
> would indeed be guaranteed to be done on the underlying data, without
> copy and without `.filled(...)`. I should clarify further that I use
> `where` only to skip reading elements (i.e., in reductions), not writing
> elements (as you mention, the unwritten element will often be nonsense -
> e.g., wrong units - which to me is worse than infinity or something
> similar; I've not worried at all about runtime warnings). Note that my
> main reason here is not that I'm against filling with numbers for
> numerical arrays, but rather wanting to make minimal assumptions about
> the underlying data itself. This may well be a mistake (but I want to
> find out where it breaks).
> 
> Anyway, it would seem in many ways all the better that our models are
> quite different. I definitely see the advantages of your choice to
> decide one can do with masked data elements whatever is logical ahead of
> an operation!
> 
> Thanks also for bringing up a useful example with `np.dot(m, m)` -
> clearly, I didn't yet get beyond overriding ufuncs!
> 
> In my mental model, where I'd apply `np.dot` on the data and the mask
> separately, the result will be wrong, so the mask has to be set (which
> it would be). For your specific example, that might not be the best
> solution, but when using `np.dot(matrix_shaped, matrix_shaped)`, I think
> it does give the correct masking: any masked element in a matrix better
> propagate to all parts that it influences, even if there is a reduction
> of sorts happening. So, perhaps a price to pay for a function that tries
> to do multiple things.
> 
> The alternative solution in my model would be to replace `np.dot` with a
> masked-specific implementation of what `np.dot` is supposed to stand for
> (in your simple example, `np.add.reduce(np.multiply(m, m))` - more
> generally, add relevant `outer` and `axes`). This would be similar to
> what I think all implementations do for `.mean()` - we cannot calculate
> that from the data using any fill value or skipping, so rather use a
> more easily cared-for `.sum()` and divide by a suitable number of
> elements. But in both examples the disadvantage is that we took away the
> option to use the underlying class's `.dot()` or `.mean()` implementations.

Just to note, my current implementation uses the IGNORE style of mask,
so does not propagate the mask in np.dot:

    >>> a = MaskedArray([[1,1,1], [1,X,1], [1,1,1]])
    >>> np.dot(a, a)

    MaskedArray([[3, 2, 3],
                 [2, 2, 2],
                 [3, 2, 3]])

I'm not at all set on that behavior and we can do something else. For
now, I chose this way since it seemed to best match the "IGNORE" mask
behavior.

The behavior you described further above where the output row/col would
be masked corresponds better to "NA" (propagating) mask behavior, which
I am leaving for later implementation.

best,
Allan

> 
> (Aside: considerations such as these underlie my longed-for exposure of
> standard implementations of functions in terms of basic ufunc calls.)
> 
> Another example of a function for which I think my model is not
> particularly insightful (and for which it is difficult to know what to
> do generally) is `np.fft.fft`. Since an fft is equivalent to a
> sine/cosine fits to data points, the answer for masked data is in
> principle quite well-defined. But much less easy to implement!
> 
> All the best,
> 
> Marten
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Reply via email to