[Numpy-discussion] [JOB] Full-time opportunity – Software engineer for open source project

2019-06-24 Thread Umberto Lupo
Who we are
L2F is a start-up based on the EPFL Innovation Park (Lausanne, Switzerland).  
We are currently working at the frontier of machine learning and topological 
data analysis, in collaboration with several academic partners.

Our Mission
We are developing an open source Python library implementing innovative 
topological data analysis algorithms which are being designed by our team of 
full-time research scientists and post-doctoral researchers.  The library shall 
be user-friendly, well documented, high-performance and well integrated with 
state-of-the-art machine learning libraries (such as NumPy/SciPy, scikit-learn 
and Keras or other popular deep learning frameworks).  We are offering a 
full-time job in our company to help us develop this library.  The candidate 
will work in the L2F research team.

Profile description
We are looking for a computer scientist matching these characteristics:

  *   2+ years of experience in software engineering.
  *   Skilled with Python and C++ (in particular, at ease wrapping C++ code for 
Python).
  *   Aware of how open source communities work.  Better if he/she contributed 
in open-source collaborations, such as scikit-learn.
  *   At ease writing specifications, developer documentation and good user 
documentation.
  *   Fluent with continuous integration, Git and common developer tools.
  *   Skilled in testing architectures (unit tests, integration tests, etc.).

How to apply
Applicants can write an e-mail to Dr. Matteo Caorsi 
(m.cao...@l2f.ch) attaching their CV and a short letter 
detailing their relevant experience and motivation.

Starting date
This position is available for immediate start for the right candidate.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Syntax Improvement for Array Transpose

2019-06-24 Thread Ilhan Polat
Please don't introduce more errors for 1D arrays. They are already very
counter-intuitive for transposition and for other details not relevant to
this issue. Emitting errors for such a basic operation is very bad for user
experience. This already is the case with wildly changing slicing syntax.
It would have made sense if 2D arrays were the default objects and 1D
required extra effort to create. But it is the other way around. Hence a
transpose operation is "expected" from it. This would kind of force all
NumPy users to shift their code one tab further to accomodate for the extra
try, catch blocks for "Oh wait, what if a 1D array comes in?" checks for
the existence of transposability everytime I write down `.T` in the code.

Code example; I am continuously writing code involving lots of matrix
products with inverses and transposes/hermitians (say, the 2nd eq.,
https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.solve_continuous_are.html
)
That means I have to check at least 4-6 matrices if any of them are
transposable to make that equation go through.

The dot-H solution is actually my ideal choice but I get the point that the
base namespace is already crowded. I am even OK with having
`x.conj(T=True)` having a keyword for extra transposition so that I can get
away with `x.conj(1)`; it doesn't solve the fundamental issue but at least
gives some convenience.

Best,
ilhan









On Mon, Jun 24, 2019 at 3:11 AM Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> I had not looked at any implementation (only remembered the nice idea of
> "importing from the future"), and looking at the links Eric shared, it
> seems that the only way this would work is, effectively, pre-compilation
> doing a `.replace('.T', '._T_from_the_future')`, where you'd be
> hoping that there never is any other meaning for a `.T` attribute, for any
> class, since it is impossible to be sure a given variable is an ndarray.
> (Actually, a lot less implausible than for the case of numpy indexing
> discussed in the link...)
>
> Anyway, what I had in mind was something along the lines of inside the
> `.T` code there being be a check on whether a particular future item was
> present in the environment. But thinking more, I can see that it is not
> trivial to get to know something about the environment in which the code
> that called you was written
>
> So, it seems there is no (simple) way to tell numpy that inside a given
> module you want `.T` to have the new behaviour, but still to warn if
> outside the module it is used in the old way (when risky)?
>
> -- Marten
>
> p.s. I'm somewhat loath to add new properties to ndarray, but `.T` and
> `.H` have such obvious and clear meaning to anyone dealing with (complex)
> matrices that I think it is worth it. See
> https://mail.python.org/pipermail/numpy-discussion/2019-June/079584.html
> for a list of options of attributes that we might deprecate "in exchange"...
>
> All the best,
>
> Marten
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Syntax Improvement for Array Transpose

2019-06-24 Thread Hameer Abbasi
Given that np.dot and np.matmul do the right thing for 1-D arrays, I would be 
opposed to introducing an error as well.

 

From: NumPy-Discussion 
 on behalf of 
Ilhan Polat 
Reply-To: Discussion of Numerical Python 
Date: Monday, 24. June 2019 at 11:58
To: Discussion of Numerical Python 
Subject: Re: [Numpy-discussion] Syntax Improvement for Array Transpose

 

Please don't introduce more errors for 1D arrays. They are already very 
counter-intuitive for transposition and for other details not relevant to this 
issue. Emitting errors for such a basic operation is very bad for user 
experience. This already is the case with wildly changing slicing syntax. It 
would have made sense if 2D arrays were the default objects and 1D required 
extra effort to create. But it is the other way around. Hence a transpose 
operation is "expected" from it. This would kind of force all NumPy users to 
shift their code one tab further to accomodate for the extra try, catch blocks 
for "Oh wait, what if a 1D array comes in?" checks for the existence of 
transposability everytime I write down `.T` in the code. 

 

Code example; I am continuously writing code involving lots of matrix products 
with inverses and transposes/hermitians (say, the 2nd eq., 
https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.solve_continuous_are.html
 )

That means I have to check at least 4-6 matrices if any of them are 
transposable to make that equation go through.

 

The dot-H solution is actually my ideal choice but I get the point that the 
base namespace is already crowded. I am even OK with having `x.conj(T=True)` 
having a keyword for extra transposition so that I can get away with 
`x.conj(1)`; it doesn't solve the fundamental issue but at least gives some 
convenience.

 

Best,

ilhan

 

 

 

 

 

 

 

 

 

On Mon, Jun 24, 2019 at 3:11 AM Marten van Kerkwijk  
wrote:

I had not looked at any implementation (only remembered the nice idea of 
"importing from the future"), and looking at the links Eric shared, it seems 
that the only way this would work is, effectively, pre-compilation doing a 
`.replace('.T', '._T_from_the_future')`, where you'd be hoping that 
there never is any other meaning for a `.T` attribute, for any class, since it 
is impossible to be sure a given variable is an ndarray. (Actually, a lot less 
implausible than for the case of numpy indexing discussed in the link...)

 

Anyway, what I had in mind was something along the lines of inside the `.T` 
code there being be a check on whether a particular future item was present in 
the environment. But thinking more, I can see that it is not trivial to get to 
know something about the environment in which the code that called you was 
written

 

So, it seems there is no (simple) way to tell numpy that inside a given module 
you want `.T` to have the new behaviour, but still to warn if outside the 
module it is used in the old way (when risky)?

 

-- Marten

 

p.s. I'm somewhat loath to add new properties to ndarray, but `.T` and `.H` 
have such obvious and clear meaning to anyone dealing with (complex) matrices 
that I think it is worth it. See 
https://mail.python.org/pipermail/numpy-discussion/2019-June/079584.html for a 
list of options of attributes that we might deprecate "in exchange"...

 

All the best,

 

Marten

 

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

___ NumPy-Discussion mailing list 
NumPy-Discussion@python.org 
https://mail.python.org/mailman/listinfo/numpy-discussion 

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Syntax Improvement for Array Transpose

2019-06-24 Thread Marten van Kerkwijk
Dear Hameer, Ilhan,

Just to be sure: for a 1-d array, you'd both consider `.T` giving a shape
of `(n, 1)` the right behaviour? I.e., it should still change from what it
is now - which is to leave the shape at `(n,)`.

Your argument about `dot` and `matmul` having similar behaviour certainly
adds weight (but then, as I wrote before, my opinion on this changes by the
second, so I'm very happy to defer to others who have a clearer sense of
what is the right thing to do here!).

I think my main worry now is how to get to be able to use a new state
without having to wait 4..6 releases...

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Syntax Improvement for Array Transpose

2019-06-24 Thread Hameer Abbasi
Hello Marten,

I was suggesting not changing the shape at all, since dot/matmul/solve do the 
right thing already in such a case.

In my proposal, only for ndim >=2 do we switch the last two dimensions.

Ilhan is right that adding a special case for ndim=1 (error) adds programmer 
overhead, which is against the general philosophy of NumPy I feel.

Get Outlook for iOS


From: NumPy-Discussion 
 on behalf of 
Marten van Kerkwijk 
Sent: Monday, June 24, 2019 3:24 PM
To: Discussion of Numerical Python
Subject: Re: [Numpy-discussion] Syntax Improvement for Array Transpose

Dear Hameer, Ilhan,

Just to be sure: for a 1-d array, you'd both consider `.T` giving a shape of 
`(n, 1)` the right behaviour? I.e., it should still change from what it is now 
- which is to leave the shape at `(n,)`.

Your argument about `dot` and `matmul` having similar behaviour certainly adds 
weight (but then, as I wrote before, my opinion on this changes by the second, 
so I'm very happy to defer to others who have a clearer sense of what is the 
right thing to do here!).

I think my main worry now is how to get to be able to use a new state without 
having to wait 4..6 releases...

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Syntax Improvement for Array Transpose

2019-06-24 Thread Todd
I think we need to do something about the 1D case.  I know from a strict
mathematical standpoint it doesn't do anything, and philosophically we
should avoid special cases, but I think the current solution leads to
enough confusion and silently doing an unexpected thing that I think we
need a better approach.

Personally I think it is a nonsensical operation and so should result in an
exception, but at the very least I think it needs to raise a warning.

On Mon, Jun 24, 2019, 09:54 Hameer Abbasi  wrote:

> Hello Marten,
>
> I was suggesting not changing the shape at all, since dot/matmul/solve do
> the right thing already in such a case.
>
> In my proposal, only for ndim >=2 do we switch the last two dimensions.
>
> Ilhan is right that adding a special case for ndim=1 (error) adds
> programmer overhead, which is against the general philosophy of NumPy I
> feel.
>
> Get Outlook for iOS 
>
> --
> *From:* NumPy-Discussion  gmail@python.org> on behalf of Marten van Kerkwijk <
> m.h.vankerkw...@gmail.com>
> *Sent:* Monday, June 24, 2019 3:24 PM
> *To:* Discussion of Numerical Python
> *Subject:* Re: [Numpy-discussion] Syntax Improvement for Array Transpose
>
> Dear Hameer, Ilhan,
>
> Just to be sure: for a 1-d array, you'd both consider `.T` giving a shape
> of `(n, 1)` the right behaviour? I.e., it should still change from what it
> is now - which is to leave the shape at `(n,)`.
>
> Your argument about `dot` and `matmul` having similar behaviour certainly
> adds weight (but then, as I wrote before, my opinion on this changes by the
> second, so I'm very happy to defer to others who have a clearer sense of
> what is the right thing to do here!).
>
> I think my main worry now is how to get to be able to use a new state
> without having to wait 4..6 releases...
>
> All the best,
>
> Marten
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Syntax Improvement for Array Transpose

2019-06-24 Thread Alan Isaac

Points of reference:
Mathematica: https://reference.wolfram.com/language/ref/Transpose.html
Matlab: https://www.mathworks.com/help/matlab/ref/permute.html

Personally I would find any divergence between a.T and a.transpose()
to be rather surprising.

Cheers, Alan Isaac
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Syntax Improvement for Array Transpose

2019-06-24 Thread Todd
I think the corresponding MATLAB function/operation is this:

https://www.mathworks.com/help/matlab/ref/transpose.html


On Mon, Jun 24, 2019, 10:33 Alan Isaac  wrote:

> Points of reference:
> Mathematica: https://reference.wolfram.com/language/ref/Transpose.html
> Matlab: https://www.mathworks.com/help/matlab/ref/permute.html
>
> Personally I would find any divergence between a.T and a.transpose()
> to be rather surprising.
>
> Cheers, Alan Isaac
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Syntax Improvement for Array Transpose

2019-06-24 Thread Stephan Hoyer
On Sun, Jun 23, 2019 at 10:05 PM Stewart Clelland 
wrote:

> Hi All,
>
> Based on discussion with Marten on github
> , I have a couple of
> suggestions on syntax improvements on array transpose operations.
>
> First, introducing a shorthand for the Hermitian Transpose operator. I
> thought "A.HT" might be a viable candidate.
>

I agree that short-hand for the Hermitian transpose would make sense,
though I would try to stick with "A.H". It's one of the last reasons to
prefer the venerable np.matrix. NumPy arrays already has loads of
methods/properties, and this is a case (like @ for matrix multiplication)
where the operator significantly improves readability: consider "(x.H @
M @ x) / (x.H @ x)" vs "(x.conj().T @ M @ x) / (x.conj().T @ x)" [1].
Nearly everyone who does linear algebra with complex numbers would find
this useful.

If I recall correctly, the last time this came up, it was suggested that we
might implement this with NumPy view as  a "complex conjugate" dtype rather
than a memory copy. This would allow the operation to be essentially free.
I find this very appealing, both due to symmetry with ".T" and because of
the principle that properties should be cheap to compute.

So my tentative vote would be (1) yes, let's do the short-hand attribute,
but (2) let's wait until we have a complex conjugate dtype that do this
efficiently. My hope is that this should be relatively doable in a year or
two after current dtype refactor/usability effect comes to fruition.

Best,
Stephan

[1]  I copied the first non-trivial example off the Wikipedia page for a
Hermitian matrix:  https://en.wikipedia.org/wiki/Hermitian_matrix
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Syntax Improvement for Array Transpose

2019-06-24 Thread Todd
On Mon, Jun 24, 2019 at 11:00 AM Stephan Hoyer  wrote:

> On Sun, Jun 23, 2019 at 10:05 PM Stewart Clelland <
> stewartclell...@gmail.com> wrote:
>
>> Hi All,
>>
>> Based on discussion with Marten on github
>> , I have a couple of
>> suggestions on syntax improvements on array transpose operations.
>>
>> First, introducing a shorthand for the Hermitian Transpose operator. I
>> thought "A.HT" might be a viable candidate.
>>
>
> I agree that short-hand for the Hermitian transpose would make sense,
> though I would try to stick with "A.H". It's one of the last reasons to
> prefer the venerable np.matrix. NumPy arrays already has loads of
> methods/properties, and this is a case (like @ for matrix multiplication)
> where the operator significantly improves readability: consider "(x.H @
> M @ x) / (x.H @ x)" vs "(x.conj().T @ M @ x) / (x.conj().T @ x)" [1].
> Nearly everyone who does linear algebra with complex numbers would find
> this useful.
>
> If I recall correctly, the last time this came up, it was suggested that
> we might implement this with NumPy view as  a "complex conjugate" dtype
> rather than a memory copy. This would allow the operation to be essentially
> free. I find this very appealing, both due to symmetry with ".T" and
> because of the principle that properties should be cheap to compute.
>
> So my tentative vote would be (1) yes, let's do the short-hand attribute,
> but (2) let's wait until we have a complex conjugate dtype that do this
> efficiently. My hope is that this should be relatively doable in a year or
> two after current dtype refactor/usability effect comes to fruition.
>
> Best,
> Stephan
>
> [1]  I copied the first non-trivial example off the Wikipedia page for a
> Hermitian matrix:  https://en.wikipedia.org/wiki/Hermitian_matrix
>
>
I would call it .CT or something like that, based on the term "Conjugate
transpose".  Wikipedia redirects "Hermitian transpose" to "Conjugate
transpose", and google has 49,800 results for "Hermitian transpose" vs
201,000 for "Conjugate transpose" (both with quotes).  So "Conjugate
transpose" seems to be the more widely-known name.  Further, I think what a
"Conjugate transpose" does is immediately obvious to someone who isn't
already familiar with the term so long as they know what a "conjugate" and
"transpose" are, while no one would be able to tell what a "Hermitian
transpose" unless they are already familiar with the name.  So I have no
problem calling it a "Hermitian transpose" somewhere in the docs, but I
think the naming and documentation should focus on the "Conjugate
transpose" term.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Syntax Improvement for Array Transpose

2019-06-24 Thread Alan Isaac

Iirc, that works only on (2-d) matrices.
Cheers, Alan Isaac


On 6/24/2019 10:45 AM, Todd wrote:

I think the corresponding MATLAB function/operation is this:

https://www.mathworks.com/help/matlab/ref/transpose.html


___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new MaskedArray class

2019-06-24 Thread Allan Haldane
On 6/22/19 11:50 AM, Marten van Kerkwijk wrote:
> Hi Allan,
> 
> I'm not sure I would go too much by what the old MaskedArray class did. 
> It indeed made an effort not to overwrite masked values with a new 
> result, even to the extend of copying back masked input data elements to 
> the output data array after an operation. But the fact that this is 
> non-sensical if the dtype changes (or the units in an operation on 
> quantities) suggests that this mental model simply does not work.
> 
> I think a sensible alternative mental model for the MaskedArray class is 
> that all it does is forward any operations to the data it holds and 
> separately propagate a mask, ORing elements together for binary 
> operations, etc., and explicitly skipping masked elements in reductions 
> (ideally using `where` to be as agnostic as possible about the 
> underlying data, for which, e.g., setting masked values to `0` for 
> `np.reduce.add` may or may not be the right thing to do - what if they 
> are string? >
> With this mental picture, the underlying data are always have 
> well-defined meaning: they have been operated on as if the mask did not 
> exist. There then is also less reason to try to avoid getting it back to 
> the user.
> 
> As a concrete example (maybe Ben has others): in astropy we have a 
> sigma-clipping average routine, which uses a `MaskedArray` to 
> iteratively mask items that are too far off from the mean; here, the 
> mask varies each iteration (an initially masked element can come back 
> into play), but the data do not.
> 
> All the best,
> 
> Marten

I want to distinguish 3 different behaviors we are discussing:

 1. Making a "no-clobber" guarantee on the underlying data
 2. whether the data under the mask should have "sensible" values
 3. whether to allow "unmasking"



1. For "no clobber"
===

I agree this would be a nice guarantee to make, as long as it does not
impose too much burden on us. Sometimes it is better not to chain
ourselves down.

By using "where", I can indeed make many api-functions and ufuncs
guarantee no-clobber. There are still a bunch of functions that seem
tricky to implement currently without either clobbering or making a
copy: dot, cross, outer, sort/argsort/searchsorted, correlate, convolve,
nonzero, and similar functions.

I'll think about how to implement those in a no-clobber way. Any
suggestions welcome, eg for np.dot/outer.

A no-clobber guarantee makes your "iterative mask" example solvable in
an efficient (no-copy) way:

mask, last_mask = False
while True:
dat_mean = np.mean(MaskedArray(data, mask))
mask, last_mask = np.abs(data - mask) > cutoff, mask
if np.all(mask == last_mask):
break

The MaskedArray constructor should have pretty minimal overhead.


2. Whether we can make "masked" data keep sensible values
=

I'm not confident this is a good guarantee to make. Certainly in a
simple case like c = a + b we can make masked values in c contain the
correct sum of the masked data in a and b. But I foresee complications
in other cases. For instance,

MaskedArray([4,4,4])/MaskedArray([0, 1, 2], mask=[1,0,1])

If we use the "where" ufunc argument to evaluate the division operation,
a division-by-0 warning is not output which is good. However, if we use
"where" index 2 does not get updated correctly and will contain
"nonsense". If we use "where" a lot (which I think we should) we can
expect a lot of uninitialized masked values to commonly appear.

So my interpretation is that this comment:

> I think a sensible alternative mental model for the MaskedArray class
is that all it does is forward any operations to the data it holds and
separately propagate a mask

should only roughly be true, in the sense that we will not simply
"forward any operations" but we will also use "where" arguments which
produce nonsense masked values.


3. Whether to allow unmasking
=

If we agree that masked values will contain nonsense, it seems like a
bad idea for those values to be easily exposed.

Further, in all the comments so far I have not seen an example of a need
for unmasking that is not more easily, efficiently and safely achieved
by simply creating a new MaskedArray with a different mask.

If super-users want to access the ._data attribute they can, but I don't
think it should be recommended.

Normal users can use the ".filled" method, which by the way I
implemented to optionally support returning a readonly view rather than
a copy (the default).

Cheers,
Allan

> 
> On Sat, Jun 22, 2019 at 10:54 AM Allan Haldane  > wrote:
> 
> On 6/21/19 2:37 PM, Benjamin Root wrote:
>  > Just to note, data that is masked isn't always garbage. There are
> plenty
>  > of use-cases where one may want to temporarily apply a mask for a
> set of
>  > computation, or possibly want to apply a series of di

Re: [Numpy-discussion] Syntax Improvement for Array Transpose

2019-06-24 Thread Stephan Hoyer
On Mon, Jun 24, 2019 at 8:10 AM Todd  wrote:

> On Mon, Jun 24, 2019 at 11:00 AM Stephan Hoyer  wrote:
>
>> On Sun, Jun 23, 2019 at 10:05 PM Stewart Clelland <
>> stewartclell...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> Based on discussion with Marten on github
>>> , I have a couple of
>>> suggestions on syntax improvements on array transpose operations.
>>>
>>> First, introducing a shorthand for the Hermitian Transpose operator. I
>>> thought "A.HT" might be a viable candidate.
>>>
>>
>> I agree that short-hand for the Hermitian transpose would make sense,
>> though I would try to stick with "A.H". It's one of the last reasons to
>> prefer the venerable np.matrix. NumPy arrays already has loads of
>> methods/properties, and this is a case (like @ for matrix multiplication)
>> where the operator significantly improves readability: consider "(x.H @
>> M @ x) / (x.H @ x)" vs "(x.conj().T @ M @ x) / (x.conj().T @ x)" [1].
>> Nearly everyone who does linear algebra with complex numbers would find
>> this useful.
>>
>> If I recall correctly, the last time this came up, it was suggested that
>> we might implement this with NumPy view as  a "complex conjugate" dtype
>> rather than a memory copy. This would allow the operation to be essentially
>> free. I find this very appealing, both due to symmetry with ".T" and
>> because of the principle that properties should be cheap to compute.
>>
>> So my tentative vote would be (1) yes, let's do the short-hand attribute,
>> but (2) let's wait until we have a complex conjugate dtype that do this
>> efficiently. My hope is that this should be relatively doable in a year or
>> two after current dtype refactor/usability effect comes to fruition.
>>
>> Best,
>> Stephan
>>
>> [1]  I copied the first non-trivial example off the Wikipedia page for a
>> Hermitian matrix:  https://en.wikipedia.org/wiki/Hermitian_matrix
>>
>>
> I would call it .CT or something like that, based on the term "Conjugate
> transpose".  Wikipedia redirects "Hermitian transpose" to "Conjugate
> transpose", and google has 49,800 results for "Hermitian transpose" vs
> 201,000 for "Conjugate transpose" (both with quotes).  So "Conjugate
> transpose" seems to be the more widely-known name.  Further, I think what a
> "Conjugate transpose" does is immediately obvious to someone who isn't
> already familiar with the term so long as they know what a "conjugate" and
> "transpose" are, while no one would be able to tell what a "Hermitian
> transpose" unless they are already familiar with the name.  So I have no
> problem calling it a "Hermitian transpose" somewhere in the docs, but I
> think the naming and documentation should focus on the "Conjugate
> transpose" term.
>

Sure, we should absolutely document the name as the "Conjugate transpose".
But the standard mathematical notation is definitely "A^H" rather than
"A^{CT}".


> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new MaskedArray class

2019-06-24 Thread Allan Haldane
On 6/23/19 6:58 PM, Eric Wieser wrote:
> I think we’d need to consider separately the operation on the mask
> and on the data. In my proposal, the data would always do
> |np.sum(array, where=~mask)|, while how the mask would propagate
> might depend on the mask itself,
> 
> I quite like this idea, and I think Stephan’s strawman design is
> actually plausible, where |MaskedArray.mask| is either an |InvalidMask|
> or a |IgnoreMask| instance to pick between the different propagation
> types. Both classes could simply have an underlying |._array| attribute
> pointing to a duck-array of some kind that backs their boolean data.
> 
> The second version requires that you /also/ know how Mask classes
> work, and how they implement +
> 
> I remain unconvinced that Mask classes should behave differently on
> different ufuncs. I don’t think |np.minimum(ignore_na, b)| is any
> different to |np.add(ignore_na, b)| - either both should produce |b|, or
> both should produce |ignore_na|. I would lean towards produxing
> |ignore_na|, and propagation behavior differing between “ignore” and
> “invalid” only for |reduce| / |accumulate| operations, where the concept
> of skipping an application is well-defined.
> 
> Some possible follow-up questions that having two distinct masked types
> raise:
> 
>   * what if I want my data to support both invalid and skip fields at
> the same time? sum([invalid, skip, 1]) == invalid
>   * is there a use case for more that these two types of mask?
> |invalid_due_to_reason_A|, |invalid_due_to_reason_B| would be
> interesting things to track through a calculation, possibly a
> dictionary of named masks.
> 
> Eric

General comments on the last few emails:

For now I intentionally decided not to worry about NA masks in my
implementation. I want to get a first working implementation finished
for IGNORE style.

I agree it would be nice to have some support for NA style later, either
by a new MaskedArray subclass, a ducktype'd .mask attribute, or by some
other modification. In the latter category, consider that currently the
mask is stored as a boolean (1 byte) mask. One idea I have not put much
thought into is that internally we could make the mask a uint8, so
unmasked would be "0" IGNORE mask would be 1, and NA mask would be 2.
That allows mixing of mask types. Not sure how messy it would be to
implement.

For the idea of replacing the mask by a ducktype for NA style, my
instinct is that would be tricky. Many of the MaskedArray
__array_function__ method implementations, like sort and dot and many
others, do "complicated" computations using the mask that I don't think
you could easily get to work right by simply substituting a mask
ducktype. I think those would need to be reimplemented for NA style, in
other words you would need to make a MaskedArray subclass anyway.

Cheers,
Allan


> On Sun, 23 Jun 2019 at 15:28, Stephan Hoyer  > wrote:
> 
> On Sun, Jun 23, 2019 at 11:55 PM Marten van Kerkwijk
> mailto:m.h.vankerkw...@gmail.com>> wrote:
> 
> Your proposal would be something like np.sum(array,
> where=np.ones_like(array))? This seems rather verbose for a
> common operation. Perhaps np.sum(array, where=True) would
> work, making use of broadcasting? (I haven't actually
> checked whether this is well-defined yet.)
> 
> I think we'd need to consider separately the operation on the
> mask and on the data. In my proposal, the data would always do
> `np.sum(array, where=~mask)`, while how the mask would propagate
> might depend on the mask itself, i.e., we'd have different mask
> types for `skipna=True` (default) and `False` ("contagious")
> reductions, which differed in doing `logical_and.reduce` or
> `logical_or.reduce` on the mask.
> 
> 
> OK, I think I finally understand what you're getting at. So suppose
> this this how we implement it internally. Would we really insist on
> a user creating a new MaskedArray with a new mask object, e.g., with
> a GreedyMask? We could add sugar for this, but certainly
> array.greedy_masked().sum() is significantly less clear than
> array.sum(skipna=False).
> 
> I'm also a little concerned about a proliferation of
> MaskedArray/Mask types. New types are significantly harder to
> understand than new functions (or new arguments on existing
> functions). I don't know if we have enough distinct use cases for
> this many types.
> 
> Are there use-cases for propagating masks separately from
> data? If not, it might make sense to only define mask
> operations along with data, which could be much simpler.
> 
> 
> I had only thought about separating out the concern of mask
> propagation from the "MaskedArray" class to the mask proper, but
> it might indeed make things easier if the mask a

Re: [Numpy-discussion] new MaskedArray class

2019-06-24 Thread Allan Haldane
On 6/24/19 11:46 AM, Allan Haldane wrote:
> A no-clobber guarantee makes your "iterative mask" example solvable in
> an efficient (no-copy) way:
> 
> mask, last_mask = False
> while True:
> dat_mean = np.mean(MaskedArray(data, mask))
> mask, last_mask = np.abs(data - mask) > cutoff, mask
> if np.all(mask == last_mask):
> break

Whoops, that should read "np.abs(data - dat_mean)" in there.

Allan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new MaskedArray class

2019-06-24 Thread Stephan Hoyer
On Mon, Jun 24, 2019 at 8:46 AM Allan Haldane 
wrote:

>  1. Making a "no-clobber" guarantee on the underlying data
>

Hi Allan -- could kindly clarify what you mean by "no-clobber"?

Is this referring to allowing masked arrays to mutate masked data values
in-place, even on apparently non-in-place operators? If so, that definitely
seems like a bad idea to me. I would much rather do an unnecessary copy
than have surprisingly non-thread-safe operations.


>  If we agree that masked values will contain nonsense, it seems like a
> bad idea for those values to be easily exposed.
>
> Further, in all the comments so far I have not seen an example of a need
> for unmasking that is not more easily, efficiently and safely achieved
> by simply creating a new MaskedArray with a different mask.


My understanding is that essentially every low-level MaskedArray function
is implemented by looking at the data and mask separately. If so, we should
definitely expose this API directly to users (as part of the public API for
MaskedArray), so they can write their own efficient algorithms.

As a concrete example, suppose I wanted to implement a low-level "grouped
mean" operation for masked arrays like that found in pandas. This isn't a
builtin NumPy function, so I would need to write this myself. This would be
relatively straightforward to do in Numba or Cython with raw NumPy arrays
(e.g., see my example here for a NaN skipping version:
https://github.com/shoyer/numbagg/blob/v0.1.0/numbagg/grouped.py), but to
do it efficiently you definitely don't want to make an unnecessary copy.

The usual reason for hiding implementation details is when we want to
reserve the right to change them. But if we're sure about the data model
(which I think we are for MaskedArray) then I think there's a lot of value
in exposing it directly to users, even if it's lower level than it
appropriate to use in most cases.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new MaskedArray class

2019-06-24 Thread Allan Haldane
On 6/24/19 12:16 PM, Stephan Hoyer wrote:
> On Mon, Jun 24, 2019 at 8:46 AM Allan Haldane  > wrote:
> 
>  1. Making a "no-clobber" guarantee on the underlying data
> 
> 
> Hi Allan -- could kindly clarify what you mean by "no-clobber"?
> 
> Is this referring to allowing masked arrays to mutate masked data values
> in-place, even on apparently non-in-place operators? If so, that
> definitely seems like a bad idea to me. I would much rather do an
> unnecessary copy than have surprisingly non-thread-safe operations.

Yes. In my current implementation, the operation:

 >>> a = np.arange(6)
 >>> m = MaskedArray(a, mask=a < 3)
 >>> res = np.dot(m, m)

will clobber elements of a. It appears that to avoid clobbering we will
need to have np.dot make a copy. I also discuss how my implementation
clobbers views in the docs:

https://github.com/ahaldane/ndarray_ducktypes/blob/master/doc/MaskedArray.md#views-and-copies

I expect I could be convinced to make a no-clobber guarantee, if others
agree it is better to accept the performance loss by making a copy
internally.

I just still have a hard time thinking of cases where clobbering is
really that confusing, or easily avoidable by the user making an
explicit copy. I like giving the user control over whether a copy is
made or not, since I expect in the vast majority of cases a copy is
unnecessary.

I think it will be rare usage for people to hold on to the data array
("a" in the example above). Most of the time you create the MaskedArray
on data created on the spot which you never touch directly again. We are
all already used to numpy's "view" behavior (eg, for the np.array
function), where if you don't explicitly make a copy of your orginal
array you can expect further operations to modify it. Admittedly for
MaskedArray it's a bit different since apparently readonly operations
like np.dot can clobber, but again, it doesn't seem hard to know about
or burdensome to avoid by explicit copy, and can give big performance
advantages.

> 
>  If we agree that masked values will contain nonsense, it seems like a
> bad idea for those values to be easily exposed.
> 
> Further, in all the comments so far I have not seen an example of a need
> for unmasking that is not more easily, efficiently and safely achieved
> by simply creating a new MaskedArray with a different mask.
> 
> 
> My understanding is that essentially every low-level MaskedArray
> function is implemented by looking at the data and mask separately. If
> so, we should definitely expose this API directly to users (as part of
> the public API for MaskedArray), so they can write their own efficient
> algorithms.>
> As a concrete example, suppose I wanted to implement a low-level
> "grouped mean" operation for masked arrays like that found in pandas.
> This isn't a builtin NumPy function, so I would need to write this
> myself. This would be relatively straightforward to do in Numba or
> Cython with raw NumPy arrays (e.g., see my example here for a NaN
> skipping
> version: https://github.com/shoyer/numbagg/blob/v0.1.0/numbagg/grouped.py),
> but to do it efficiently you definitely don't want to make an
> unnecessary copy.
> 
> The usual reason for hiding implementation details is when we want to
> reserve the right to change them. But if we're sure about the data model
> (which I think we are for MaskedArray) then I think there's a lot of
> value in exposing it directly to users, even if it's lower level than it
> appropriate to use in most cases.

Fair enough, I think it is all right to allow people access to ._data
and make some guarantees about it if they are implementing subclasses or
defining new ducktypes.

There should be a section in the documentation describing what
guanantees we make about ._data (or ._array if we change the name) and
how/when to use it.

Best,
Allan


> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] NumPy Community Meeting Thursday, June 26

2019-06-24 Thread Sebastian Berg
Hi all,

There will be a NumPy Community meeting on _Thursday_ June 26 at 11 am
Pacific Time. Everyone is invited to join in and edit the work-in-
progress meeting notes: https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg?both

Best wishes

Sebastian


PS: We decided to move to Thursday, since at least Matti is traveling.
If you have something to discuss and Thursday is not good, we can do
Wednesday as well.


signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy Community Meeting Thursday, June 27

2019-06-24 Thread Sebastian Berg
Sorry, Thursday is the 27th of course.


On Mon, 2019-06-24 at 11:02 -0700, Sebastian Berg wrote:
> Hi all,
> 
> There will be a NumPy Community meeting on _Thursday_ June 26 at 11
> am
> Pacific Time. Everyone is invited to join in and edit the work-in-
> progress meeting notes: https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg?both
> 
> Best wishes
> 
> Sebastian
> 
> 
> PS: We decided to move to Thursday, since at least Matti is
> traveling.
> If you have something to discuss and Thursday is not good, we can do
> Wednesday as well.


signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Syntax Improvement for Array Transpose

2019-06-24 Thread Marten van Kerkwijk
Hi Stephan,

Yes, the complex conjugate dtype would make things a lot faster, but I
don't quite see why we would wait for that with introducing the `.H`
property.

I do agree that `.H` is the correct name, giving most immediate clarity
(i.e., people who know what conjugate transpose is, will recognize it,
while likely having to look up `.CT`, while people who do not know will
have to look up regardless). But at the same time agree that the docstring
and other documentation should start with "Conjugate tranpose" - good to
try to avoid using names of people where you have to be in the "in crowd"
to know what it means.

The above said, if we were going with the initial suggestion of `.MT` for
matrix transpose, then I'd prefer `.CT` over `.HT` as its conjugate version.

But it seems there is little interest in that suggestion, although sadly a
clear path forward has not yet emerged either.

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new MaskedArray class

2019-06-24 Thread Marten van Kerkwijk
Hi Allan,

Thanks for bringing up the noclobber explicitly (and Stephan for asking for
clarification; I was similarly confused).

It does clarify the difference in mental picture. In mine, the operation
would indeed be guaranteed to be done on the underlying data, without copy
and without `.filled(...)`. I should clarify further that I use `where`
only to skip reading elements (i.e., in reductions), not writing elements
(as you mention, the unwritten element will often be nonsense - e.g., wrong
units - which to me is worse than infinity or something similar; I've not
worried at all about runtime warnings). Note that my main reason here is
not that I'm against filling with numbers for numerical arrays, but rather
wanting to make minimal assumptions about the underlying data itself. This
may well be a mistake (but I want to find out where it breaks).

Anyway, it would seem in many ways all the better that our models are quite
different. I definitely see the advantages of your choice to decide one can
do with masked data elements whatever is logical ahead of an operation!

Thanks also for bringing up a useful example with `np.dot(m, m)` - clearly,
I didn't yet get beyond overriding ufuncs!

In my mental model, where I'd apply `np.dot` on the data and the mask
separately, the result will be wrong, so the mask has to be set (which it
would be). For your specific example, that might not be the best solution,
but when using `np.dot(matrix_shaped, matrix_shaped)`, I think it does give
the correct masking: any masked element in a matrix better propagate to all
parts that it influences, even if there is a reduction of sorts happening.
So, perhaps a price to pay for a function that tries to do multiple things.

The alternative solution in my model would be to replace `np.dot` with a
masked-specific implementation of what `np.dot` is supposed to stand for
(in your simple example, `np.add.reduce(np.multiply(m, m))` - more
generally, add relevant `outer` and `axes`). This would be similar to what
I think all implementations do for `.mean()` - we cannot calculate that
from the data using any fill value or skipping, so rather use a more easily
cared-for `.sum()` and divide by a suitable number of elements. But in both
examples the disadvantage is that we took away the option to use the
underlying class's `.dot()` or `.mean()` implementations.

(Aside: considerations such as these underlie my longed-for exposure of
standard implementations of functions in terms of basic ufunc calls.)

Another example of a function for which I think my model is not
particularly insightful (and for which it is difficult to know what to do
generally) is `np.fft.fft`. Since an fft is equivalent to a sine/cosine
fits to data points, the answer for masked data is in principle quite
well-defined. But much less easy to implement!

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new MaskedArray class

2019-06-24 Thread Eric Firing

On 2019/06/24 9:09 AM, Marten van Kerkwijk wrote:
Another example of a function for which I think my model is not 
particularly insightful (and for which it is difficult to know what to 
do generally) is `np.fft.fft`. Since an fft is equivalent to a 
sine/cosine fits to data points, the answer for masked data is in 
principle quite well-defined. But much less easy to implement!


How is it well-defined?  If you have N points of which M are masked, you 
have a vector with N-M pieces of information in the time domain.  That 
can't be *uniquely* mapped to N points in the frequency domain.  I think 
fft applied to a MaskedArray input should raise an exception, at least 
if there are any masked points.  It must be left to the user to decide 
how to handle the masked points, e.g. fill with zero, or with the mean, 
or with linear interpolation, etc.


Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new MaskedArray class

2019-06-24 Thread Marten van Kerkwijk
Hi Eric,

The easiest definitely is for the mask to just propagate, which that even
if just one point is masked, all points in the fft will be masked.

On the direct point I made, I think it is correct that since one can think
of the Fourier transform of a sine/cosine fit, then there is a solution
even in the presence of some masked data, and this solution is distinct
from that for a specific choice of fill value. But of course it is also
true that the solution will be at least partially degenerate in its result
and possibly indeterminate (e.g., for the extreme example of a real
transform for which all but the first point are masked, all cosine term
amplitudes are equal to the value of the first term, and are completely
degenerate with each other, and all sine term amplitudes are indeterminate;
one has only one piece of information, after all). Yet the inverse of any
of those choices reproduces the input. That said, clearly there is a choice
to be made whether this solution is at all interesting, which means that
you are right that it needs an explicit user decision.

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new MaskedArray class

2019-06-24 Thread Warren Weckesser
On 6/24/19, Marten van Kerkwijk  wrote:
> Hi Eric,
>
> The easiest definitely is for the mask to just propagate, which that even
> if just one point is masked, all points in the fft will be masked.
>
> On the direct point I made, I think it is correct that since one can think
> of the Fourier transform of a sine/cosine fit, then there is a solution
> even in the presence of some masked data, and this solution is distinct
> from that for a specific choice of fill value. But of course it is also
> true that the solution will be at least partially degenerate in its result
> and possibly indeterminate (e.g., for the extreme example of a real
> transform for which all but the first point are masked, all cosine term
> amplitudes are equal to the value of the first term, and are completely
> degenerate with each other, and all sine term amplitudes are indeterminate;
> one has only one piece of information, after all). Yet the inverse of any
> of those choices reproduces the input. That said, clearly there is a choice
> to be made whether this solution is at all interesting, which means that
> you are right that it needs an explicit user decision.
>

FWIW: The discrete Fourier transform is equivalent to a matrix
multiplication (https://en.wikipedia.org/wiki/DFT_matrix,
https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.dft.html),
so whatever behavior you define for a nonmasked array times a masked
vector also applies to the FFT of a masked vector.

Warren


> All the best,
>
> Marten
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new MaskedArray class

2019-06-24 Thread Eric Firing

On 2019/06/24 11:39 AM, Marten van Kerkwijk wrote:

Hi Eric,

The easiest definitely is for the mask to just propagate, which that 
even if just one point is masked, all points in the fft will be masked.


This is perfectly reasonable, and consistent with what happens with 
nans, of course.  My suggestion of raising an Exception is probably not 
a good idea, as I realized shortly after sending the message.


As a side note, I am happy to see the current burst of effort toward 
improved MaskedArray functionality, and very grateful for the work you, 
Allan, and others are doing in that regard.


Eric




On the direct point I made, I think it is correct that since one can 
think of the Fourier transform of a sine/cosine fit, then there is a 
solution even in the presence of some masked data, and this solution is 
distinct from that for a specific choice of fill value. But of course it 
is also true that the solution will be at least partially degenerate in 
its result and possibly indeterminate (e.g., for the extreme example of 
a real transform for which all but the first point are masked, all 
cosine term amplitudes are equal to the value of the first term, and are 
completely degenerate with each other, and all sine term amplitudes are 
indeterminate; one has only one piece of information, after all). Yet 
the inverse of any of those choices reproduces the input. That said, 
clearly there is a choice to be made whether this solution is at all 
interesting, which means that you are right that it needs an explicit 
user decision.


All the best,

Marten


___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion



___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new MaskedArray class

2019-06-24 Thread Charles R Harris
On Mon, Jun 24, 2019 at 3:40 PM Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> Hi Eric,
>
> The easiest definitely is for the mask to just propagate, which that even
> if just one point is masked, all points in the fft will be masked.
>
> On the direct point I made, I think it is correct that since one can think
> of the Fourier transform of a sine/cosine fit, then there is a solution
> even in the presence of some masked data, and this solution is distinct
> from that for a specific choice of fill value. But of course it is also
> true that the solution will be at least partially degenerate in its result
> and possibly indeterminate (e.g., for the extreme example of a real
> transform for which all but the first point are masked, all cosine term
> amplitudes are equal to the value of the first term, and are completely
> degenerate with each other, and all sine term amplitudes are indeterminate;
> one has only one piece of information, after all). Yet the inverse of any
> of those choices reproduces the input. That said, clearly there is a choice
> to be made whether this solution is at all interesting, which means that
> you are right that it needs an explicit user decision.
>
>
Might be a good fit with the NUFFT

.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Syntax Improvement for Array Transpose

2019-06-24 Thread Ilhan Polat
I think enumerating the cases along the way makes it a bit more tangible
for the discussion


import numpy as np
z = 1+1j
z.conjugate()  # 1-1j

zz = np.array(z)
zz  # array(1+1j)
zz.T  # array(1+1j)  # OK expected.
zz.conj()  # 1-1j ?? what happened; no arrays?
zz.conjugate()  # 1-1j ?? same

zz1d = np.array([z]*3)
zz1d.T  # no change so this is not the regular 2D array
zz1d.conj()  # array([1.-1.j, 1.-1.j, 1.-1.j])
zz1d.conj().T  # array([1.-1.j, 1.-1.j, 1.-1.j])
zz1d.T.conj()  # array([1.-1.j, 1.-1.j, 1.-1.j])
zz1d[:, None].conj()  # 2D column vector - no surprises if [:, None] is
known

zz2d = zz1d[:, None]  # 2D column vector - no surprises if [:, None] is
known
zz2d.conj()  # 2D col vec conjugated
zz2d.conj().T  # 2D col vec conjugated transposed

zz3d = np.arange(24.).reshape(2,3,4).view(complex)
zz3d.conj()  # no surprises, conjugated
zz3d.conj().T  # ?? Why not the last two dims swapped like other stacked ops

# For scalar arrays conjugation strips the number
# For 1D arrays transpose is a no-op but conjugation works
# For 2D arrays conjugate it is the matlab's elementwise conjugation op .'
# and transpose is acting like expected
# For 3D arrays conjugate it is the matlab's elementwise conjugation op .'
# but transpose is the reversing all dims just like matlab's permute()
# with static dimorder.

and so on. Maybe we can try to identify all the use cases and the quirks
before we can make design the solution. Because these are a bit more
involved and I don't even know if this is exhaustive.


On Mon, Jun 24, 2019 at 8:21 PM Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> Hi Stephan,
>
> Yes, the complex conjugate dtype would make things a lot faster, but I
> don't quite see why we would wait for that with introducing the `.H`
> property.
>
> I do agree that `.H` is the correct name, giving most immediate clarity
> (i.e., people who know what conjugate transpose is, will recognize it,
> while likely having to look up `.CT`, while people who do not know will
> have to look up regardless). But at the same time agree that the docstring
> and other documentation should start with "Conjugate tranpose" - good to
> try to avoid using names of people where you have to be in the "in crowd"
> to know what it means.
>
> The above said, if we were going with the initial suggestion of `.MT` for
> matrix transpose, then I'd prefer `.CT` over `.HT` as its conjugate version.
>
> But it seems there is little interest in that suggestion, although sadly a
> clear path forward has not yet emerged either.
>
> All the best,
>
> Marten
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new MaskedArray class

2019-06-24 Thread Allan Haldane
On 6/24/19 3:09 PM, Marten van Kerkwijk wrote:
> Hi Allan,
> 
> Thanks for bringing up the noclobber explicitly (and Stephan for asking
> for clarification; I was similarly confused).
> 
> It does clarify the difference in mental picture. In mine, the operation
> would indeed be guaranteed to be done on the underlying data, without
> copy and without `.filled(...)`. I should clarify further that I use
> `where` only to skip reading elements (i.e., in reductions), not writing
> elements (as you mention, the unwritten element will often be nonsense -
> e.g., wrong units - which to me is worse than infinity or something
> similar; I've not worried at all about runtime warnings). Note that my
> main reason here is not that I'm against filling with numbers for
> numerical arrays, but rather wanting to make minimal assumptions about
> the underlying data itself. This may well be a mistake (but I want to
> find out where it breaks).
> 
> Anyway, it would seem in many ways all the better that our models are
> quite different. I definitely see the advantages of your choice to
> decide one can do with masked data elements whatever is logical ahead of
> an operation!
> 
> Thanks also for bringing up a useful example with `np.dot(m, m)` -
> clearly, I didn't yet get beyond overriding ufuncs!
> 
> In my mental model, where I'd apply `np.dot` on the data and the mask
> separately, the result will be wrong, so the mask has to be set (which
> it would be). For your specific example, that might not be the best
> solution, but when using `np.dot(matrix_shaped, matrix_shaped)`, I think
> it does give the correct masking: any masked element in a matrix better
> propagate to all parts that it influences, even if there is a reduction
> of sorts happening. So, perhaps a price to pay for a function that tries
> to do multiple things.
> 
> The alternative solution in my model would be to replace `np.dot` with a
> masked-specific implementation of what `np.dot` is supposed to stand for
> (in your simple example, `np.add.reduce(np.multiply(m, m))` - more
> generally, add relevant `outer` and `axes`). This would be similar to
> what I think all implementations do for `.mean()` - we cannot calculate
> that from the data using any fill value or skipping, so rather use a
> more easily cared-for `.sum()` and divide by a suitable number of
> elements. But in both examples the disadvantage is that we took away the
> option to use the underlying class's `.dot()` or `.mean()` implementations.

Just to note, my current implementation uses the IGNORE style of mask,
so does not propagate the mask in np.dot:

>>> a = MaskedArray([[1,1,1], [1,X,1], [1,1,1]])
>>> np.dot(a, a)

MaskedArray([[3, 2, 3],
 [2, 2, 2],
 [3, 2, 3]])

I'm not at all set on that behavior and we can do something else. For
now, I chose this way since it seemed to best match the "IGNORE" mask
behavior.

The behavior you described further above where the output row/col would
be masked corresponds better to "NA" (propagating) mask behavior, which
I am leaving for later implementation.

best,
Allan

> 
> (Aside: considerations such as these underlie my longed-for exposure of
> standard implementations of functions in terms of basic ufunc calls.)
> 
> Another example of a function for which I think my model is not
> particularly insightful (and for which it is difficult to know what to
> do generally) is `np.fft.fft`. Since an fft is equivalent to a
> sine/cosine fits to data points, the answer for masked data is in
> principle quite well-defined. But much less easy to implement!
> 
> All the best,
> 
> Marten
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new MaskedArray class

2019-06-24 Thread Stephan Hoyer
On Mon, Jun 24, 2019 at 3:56 PM Allan Haldane 
wrote:

> I'm not at all set on that behavior and we can do something else. For
> now, I chose this way since it seemed to best match the "IGNORE" mask
> behavior.
>
> The behavior you described further above where the output row/col would
> be masked corresponds better to "NA" (propagating) mask behavior, which
> I am leaving for later implementation.


This does seem like a clean way to *implement* things, but from a user
perspective I'm not sure I would want separate classes for "IGNORE" vs "NA"
masks.

I tend to think of "IGNORE" vs "NA" as descriptions of particular
operations rather than the data itself. There are a spectrum of ways to
handle missing data, and the right way to propagating missing values is
often highly context dependent. The right way to set this is in functions
where operations are defined, not on classes that may be defined far away
from where the computation happen. For example, pandas has a "min_count"
parameter in functions for intermediate use-cases between "IGNORE" and "NA"
semantics, e.g., "take an average, unless the number of data points is
fewer than min_count."

Are there examples of existing projects that define separate user-facing
types for different styles of masks?
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new MaskedArray class

2019-06-24 Thread Marten van Kerkwijk
Hi Allan,

> The alternative solution in my model would be to replace `np.dot` with a
> > masked-specific implementation of what `np.dot` is supposed to stand for
> > (in your simple example, `np.add.reduce(np.multiply(m, m))` - more
> > generally, add relevant `outer` and `axes`). This would be similar to
> > what I think all implementations do for `.mean()` - we cannot calculate
> > that from the data using any fill value or skipping, so rather use a
> > more easily cared-for `.sum()` and divide by a suitable number of
> > elements. But in both examples the disadvantage is that we took away the
> > option to use the underlying class's `.dot()` or `.mean()`
> implementations.
>
> Just to note, my current implementation uses the IGNORE style of mask,
> so does not propagate the mask in np.dot:
>
> >>> a = MaskedArray([[1,1,1], [1,X,1], [1,1,1]])
> >>> np.dot(a, a)
>
> MaskedArray([[3, 2, 3],
>  [2, 2, 2],
>  [3, 2, 3]])
>
> I'm not at all set on that behavior and we can do something else. For
> now, I chose this way since it seemed to best match the "IGNORE" mask
> behavior.
>

It is a nice example, I think. In terms of action on the data, one would
get this result as well in my pseudo-representation of
`np.add.reduce(np.multiply(m, m))` - as long as the multiply is taken as
outer product along the relevant axes (which does not ignore the mask,
i.e., if either element is masked, the product is too), and subsequently a
sum which works like other reductions and skips masked elements.

>From the FFT array multiplication analogy, though, it is not clear that,
effectively, replacing masked elements by 0 is reasonable.

Equivalently, thinking of `np.dot` in its 1-D form as presenting the length
of the projection of one vector along another, it is not clear what a
single masked element is supposed to mean. In a way, masking just one
element of a vector or of a matrix makes vector or matrix operations
meaningless.

I thought fitting data with a mask might give a counterexample, but in that
one usually calculates at some point r = y - A x, so no masking of the
matrix, and subtraction y-Ax passing on a mask, and summing of r ignoring
masked elements does just the right thing.

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new MaskedArray class

2019-06-24 Thread Marten van Kerkwijk
On Mon, Jun 24, 2019 at 7:21 PM Stephan Hoyer  wrote:

> On Mon, Jun 24, 2019 at 3:56 PM Allan Haldane 
> wrote:
>
>> I'm not at all set on that behavior and we can do something else. For
>> now, I chose this way since it seemed to best match the "IGNORE" mask
>> behavior.
>>
>> The behavior you described further above where the output row/col would
>> be masked corresponds better to "NA" (propagating) mask behavior, which
>> I am leaving for later implementation.
>
>
> This does seem like a clean way to *implement* things, but from a user
> perspective I'm not sure I would want separate classes for "IGNORE" vs "NA"
> masks.
>
> I tend to think of "IGNORE" vs "NA" as descriptions of particular
> operations rather than the data itself. There are a spectrum of ways to
> handle missing data, and the right way to propagating missing values is
> often highly context dependent. The right way to set this is in functions
> where operations are defined, not on classes that may be defined far away
> from where the computation happen. For example, pandas has a "min_count"
> parameter in functions for intermediate use-cases between "IGNORE" and "NA"
> semantics, e.g., "take an average, unless the number of data points is
> fewer than min_count."
>

Anything that specific like that is probably indeed outside of the purview
of a MaskedArray class.

But your general point is well taken: we really need to ask clearly what
the mask means not in terms of operations but conceptually.

Personally, I guess like Benjamin I have mostly thought of it as "data here
is bad" (because corrupted, etc.) or "data here is irrelevant" (because of
sea instead of land in a map). And I would like to proceed nevertheless
with calculating things on the remainder. For an expectation value (or,
less obviously, a minimum or maximum), this is mostly OK: just ignore the
masked elements. But even for something as simple as a sum, what is correct
is not obvious: if I ignore the count, I'm effectively assuming the
expectation is symmetric around zero (this is why `vector.dot(vector)`
fails); a better estimate would be `np.add.reduce(data, where=~mask) *
N(total) / N(unmasked)`.

Of course, the logical conclusion would be that this is not possible to do
without guidance from the user, or, thinking more, that really a masked
array is not at all what I want for this problem; really I am just using
(1-mask) as a weight, and the sum of the weights matters, so I should have
a WeightArray class where that is returned along with the sum of the data
(or, a bit less extreme, a `CountArray` class, or, more extreme, a value
and its uncertainty - heck, sounds a lot like my Variable class from 4
years ago, https://github.com/astropy/astropy/pull/3715, which even takes
care of covariance [following the Uncertainty package]).

OK, it seems I've definitely worked myself in a corner tonight where I'm
not sure any more what a masked array is good for in the first place...
I'll stop for the day!

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new MaskedArray class

2019-06-24 Thread Stephan Hoyer
On Mon, Jun 24, 2019 at 5:36 PM Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

>
>
> On Mon, Jun 24, 2019 at 7:21 PM Stephan Hoyer  wrote:
>
>> On Mon, Jun 24, 2019 at 3:56 PM Allan Haldane 
>> wrote:
>>
>>> I'm not at all set on that behavior and we can do something else. For
>>> now, I chose this way since it seemed to best match the "IGNORE" mask
>>> behavior.
>>>
>>> The behavior you described further above where the output row/col would
>>> be masked corresponds better to "NA" (propagating) mask behavior, which
>>> I am leaving for later implementation.
>>
>>
>> This does seem like a clean way to *implement* things, but from a user
>> perspective I'm not sure I would want separate classes for "IGNORE" vs "NA"
>> masks.
>>
>> I tend to think of "IGNORE" vs "NA" as descriptions of particular
>> operations rather than the data itself. There are a spectrum of ways to
>> handle missing data, and the right way to propagating missing values is
>> often highly context dependent. The right way to set this is in functions
>> where operations are defined, not on classes that may be defined far away
>> from where the computation happen. For example, pandas has a "min_count"
>> parameter in functions for intermediate use-cases between "IGNORE" and "NA"
>> semantics, e.g., "take an average, unless the number of data points is
>> fewer than min_count."
>>
>
> Anything that specific like that is probably indeed outside of the purview
> of a MaskedArray class.
>

I agree that it doesn't make much sense to have a "min_count" attribute on
a MaskedArray class, but certainly it makes sense for operations on
MaskedArray objects, e.g., to write something like
masked_array.mean(min_count=10). This is what users do in pandas today.


> But your general point is well taken: we really need to ask clearly what
> the mask means not in terms of operations but conceptually.
>
> Personally, I guess like Benjamin I have mostly thought of it as "data
> here is bad" (because corrupted, etc.) or "data here is irrelevant"
> (because of sea instead of land in a map). And I would like to proceed
> nevertheless with calculating things on the remainder. For an expectation
> value (or, less obviously, a minimum or maximum), this is mostly OK: just
> ignore the masked elements. But even for something as simple as a sum, what
> is correct is not obvious: if I ignore the count, I'm effectively assuming
> the expectation is symmetric around zero (this is why `vector.dot(vector)`
> fails); a better estimate would be `np.add.reduce(data, where=~mask) *
> N(total) / N(unmasked)`.
>

I think it's fine and logical to define default semantics for operations on
MaskedArray objects. Much of the time, replacing masked values with 0 is
the right thing to do for sum. Certainly IGNORE semantics are more useful
overall than the NA semantics.

But even if a MaskedArray conceptually always represents "bad" or
"irrelevant" data, the way to handle those missing values will differ based
on the use case, and not everything will fall cleanly into either IGNORE or
NA buckets. I think it makes sense to provide users with functions/methods
that expose these options, rather than requiring that they convert their
data into a different type MaskedArray.

"It is better to have 100 functions operate on one data structure than 10
functions on 10 data structures." —Alan Perlis
https://stackoverflow.com/questions/6016271/why-is-it-better-to-have-100-functions-operate-on-one-data-structure-than-10-fun
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Syntax Improvement for Array Transpose

2019-06-24 Thread Cameron Blocker
I would love for there to be .H property. I have .conj().T in almost every
math function that I write so that it will be general enough for complex
numbers.

Besides being less readable, what puts me in a bind is trying to
accommodate LinearOperator/LinearMap like duck type objects in place of
matrix inputs, such as an object that does an FFT, but acts like a matrix
and supports @. For my objects to work in my code, I have to create .conj()
and .T methods which are not as simple as defining .H (the adjoint) for say
an FFT. Sometimes I just define .T to be the adjoint/conjugate transpose,
and .conj() to do nothing so it will work with my code and I can avoid
making useless objects along the way, but then I am in a weird state where
np.asarray(A).T != np.asarray(A.T).

In my opinion, the matrix transpose operator and the conjugate transpose
operator should be one and the same. Something nice about both Julia and
MATLAB is that it takes more keystrokes to do a regular transpose instead
of a conjugate transpose. Then people who work exclusively with real
numbers can just forget that it's a conjugate transpose, and for relatively
simple algorithms, their code will just work with complex numbers with
little modification.

Ideally, I'd like to see a .H that was the defacto Matrix/Linear
Algebra/Conjugate transpose that for 2 or more dimensions, conjugate
transposes the last two dimensions and for 1 dimension just conjugates (if
necessary). And then .T can stay the Array/Tensor transpose for general
axis manipulation. I'd be okay with .T raising an error/warning on 1D
arrays if .H did not. I commonly write things like u.conj().T@v even if I
know both u and v are 1D just so it looks more like an inner product.

-Cameron

On Mon, Jun 24, 2019 at 6:43 PM Ilhan Polat  wrote:

> I think enumerating the cases along the way makes it a bit more tangible
> for the discussion
>
>
> import numpy as np
> z = 1+1j
> z.conjugate()  # 1-1j
>
> zz = np.array(z)
> zz  # array(1+1j)
> zz.T  # array(1+1j)  # OK expected.
> zz.conj()  # 1-1j ?? what happened; no arrays?
> zz.conjugate()  # 1-1j ?? same
>
> zz1d = np.array([z]*3)
> zz1d.T  # no change so this is not the regular 2D array
> zz1d.conj()  # array([1.-1.j, 1.-1.j, 1.-1.j])
> zz1d.conj().T  # array([1.-1.j, 1.-1.j, 1.-1.j])
> zz1d.T.conj()  # array([1.-1.j, 1.-1.j, 1.-1.j])
> zz1d[:, None].conj()  # 2D column vector - no surprises if [:, None] is
> known
>
> zz2d = zz1d[:, None]  # 2D column vector - no surprises if [:, None] is
> known
> zz2d.conj()  # 2D col vec conjugated
> zz2d.conj().T  # 2D col vec conjugated transposed
>
> zz3d = np.arange(24.).reshape(2,3,4).view(complex)
> zz3d.conj()  # no surprises, conjugated
> zz3d.conj().T  # ?? Why not the last two dims swapped like other stacked
> ops
>
> # For scalar arrays conjugation strips the number
> # For 1D arrays transpose is a no-op but conjugation works
> # For 2D arrays conjugate it is the matlab's elementwise conjugation op .'
> # and transpose is acting like expected
> # For 3D arrays conjugate it is the matlab's elementwise conjugation op .'
> # but transpose is the reversing all dims just like matlab's permute()
> # with static dimorder.
>
> and so on. Maybe we can try to identify all the use cases and the quirks
> before we can make design the solution. Because these are a bit more
> involved and I don't even know if this is exhaustive.
>
>
> On Mon, Jun 24, 2019 at 8:21 PM Marten van Kerkwijk <
> m.h.vankerkw...@gmail.com> wrote:
>
>> Hi Stephan,
>>
>> Yes, the complex conjugate dtype would make things a lot faster, but I
>> don't quite see why we would wait for that with introducing the `.H`
>> property.
>>
>> I do agree that `.H` is the correct name, giving most immediate clarity
>> (i.e., people who know what conjugate transpose is, will recognize it,
>> while likely having to look up `.CT`, while people who do not know will
>> have to look up regardless). But at the same time agree that the docstring
>> and other documentation should start with "Conjugate tranpose" - good to
>> try to avoid using names of people where you have to be in the "in crowd"
>> to know what it means.
>>
>> The above said, if we were going with the initial suggestion of `.MT` for
>> matrix transpose, then I'd prefer `.CT` over `.HT` as its conjugate version.
>>
>> But it seems there is little interest in that suggestion, although sadly
>> a clear path forward has not yet emerged either.
>>
>> All the best,
>>
>> Marten
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/nump