Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-07 Thread Ryan May
On Tue, Nov 7, 2017 at 1:20 PM, Chris Barker  wrote:

> On Mon, Nov 6, 2017 at 4:28 PM, Stephan Hoyer  wrote:
>
>>
>>> What's needed, though, is not just a single ABC. Some thought and design
>>> needs to go into segmenting the ndarray API to declare certain behaviors,
>>> just like was done for collections:
>>>
>>> https://docs.python.org/3/library/collections.abc.html
>>>
>>> You don't just have a single ABC declaring a collection, but rather "I
>>> am a mapping" or "I am a mutable sequence". It's more of a pain for
>>> developers to properly specify things, but this is not a bad thing to
>>> actually give code some thought.
>>>
>>
>> I agree, it would be nice to nail down a hierarchy of duck-arrays, if
>> possible. Although, there are quite a few options, so I don't know how
>> doable this is.
>>
>
> Exactly -- there are an exponential amount of options...
>
>
>> Well, to get the ball rolling a bit, the key thing that matplotlib needs
>> to know is if `shape`, `reshape`, 'size', broadcasting, and logical
>> indexing is respected. So, I see three possible abc's here: one for
>> attribute access (things like `shape` and `size`) and another for shape
>> manipulations (broadcasting and reshape, and assignment to .shape).
>
>
> I think we're going to get into an string of ABCs:
>
> ArrayLikeForMPL_ABC
>
> etc, etc.
>

Only if you try to provide perfectly-sized options for every occasion--but
that's not how we do things in (sane) software development. You provide a
few options that optimize the common use cases, and you don't try to cover
everything--let client code figure out the right combination from the
primitives you provide. One can always just inherit/register *all* the ABCs
if need be. The status quo is that we have 1 interface that covers
everything from multiple dims and shape to math and broadcasting to the
entire __array__ interface. Even breaking that up into the 3 "obvious"
chunks would be a massive improvement.

I just don't want to see this effort bog down into "this is so hard".
Getting it perfect is hard; getting it useful is much easier.

It's important to note that we can always break up/combine existing ABCs
into other ones later.


> And then a third abc for indexing support, although, I am not sure how
>> that could get implemented...
>
>
> This is the really tricky one -- all ABCs really check is the existence of
> methods -- making sure they behave the same way is up to the developer of
> the ducktype.
>
> which is K, but will require discipline.
>
> But indexing, specifically fancy indexing, is another matter -- I'm not
> sure if there even a way with an ABC to check for what types of indexing
> are support, but we'd still have the problem with whether the semantics are
> the same!
>
> For example, I work with netcdf variable objects, which are partly
> duck-typed as ndarrays, but I think n-dimensional fancy indexing works
> differently... how in the world do you detect that with an ABC???
>

Even documenting expected behavior as part of these ABCs would go a long
way towards helping standardize behavior.

Another idea would be to put together a conformance test suite as part of
this effort, in lieu of some kind of run-time checking of behavior (which
would be terrible). That would help developers of other "ducks" check that
they're doing the right things. I'd imagine the existing NumPy test suite
would largely cover this.

Ryan

-- 
Ryan May
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-07 Thread Chris Barker
On Mon, Nov 6, 2017 at 4:28 PM, Stephan Hoyer  wrote:

>
>> What's needed, though, is not just a single ABC. Some thought and design
>> needs to go into segmenting the ndarray API to declare certain behaviors,
>> just like was done for collections:
>>
>> https://docs.python.org/3/library/collections.abc.html
>>
>> You don't just have a single ABC declaring a collection, but rather "I am
>> a mapping" or "I am a mutable sequence". It's more of a pain for developers
>> to properly specify things, but this is not a bad thing to actually give
>> code some thought.
>>
>
> I agree, it would be nice to nail down a hierarchy of duck-arrays, if
> possible. Although, there are quite a few options, so I don't know how
> doable this is.
>

Exactly -- there are an exponential amount of options...


> Well, to get the ball rolling a bit, the key thing that matplotlib needs
> to know is if `shape`, `reshape`, 'size', broadcasting, and logical
> indexing is respected. So, I see three possible abc's here: one for
> attribute access (things like `shape` and `size`) and another for shape
> manipulations (broadcasting and reshape, and assignment to .shape).


I think we're going to get into an string of ABCs:

ArrayLikeForMPL_ABC

etc, etc.


> And then a third abc for indexing support, although, I am not sure how
> that could get implemented...


This is the really tricky one -- all ABCs really check is the existence of
methods -- making sure they behave the same way is up to the developer of
the ducktype.

which is K, but will require discipline.

But indexing, specifically fancy indexing, is another matter -- I'm not
sure if there even a way with an ABC to check for what types of indexing
are support, but we'd still have the problem with whether the semantics are
the same!

For example, I work with netcdf variable objects, which are partly
duck-typed as ndarrays, but I think n-dimensional fancy indexing works
differently... how in the world do you detect that with an ABC???

For the shapes and reshaping, I wrote an ShapedLikeNDArray mixin/ABC
> for astropy, which may be a useful starting point as it also provides
> a way to implement the methods ndarray uses to reshape and get
> elements: see
> https://github.com/astropy/astropy/blob/master/astropy/utils/misc.py#L863


Sounds like a good starting point for discussion.

-CHB



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-07 Thread Marten van Kerkwijk
Hi Benjamin,

For the shapes and reshaping, I wrote an ShapedLikeNDArray mixin/ABC
for astropy, which may be a useful starting point as it also provides
a way to implement the methods ndarray uses to reshape and get
elements: see 
https://github.com/astropy/astropy/blob/master/astropy/utils/misc.py#L863

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-06 Thread Ryan May
On Mon, Nov 6, 2017 at 3:18 PM, Chris Barker  wrote:

> Klunky, and maybe we could come up with a standard way to do it and
> include that in numpy, but I'm not sure that ABCs are the way to do it.
>

ABCs are *absolutely* the way to go about it. It's the only way baked into
the Python language itself that allows you to register a class for purposes
of `isinstance` without needing to subclass--i.e. duck-typing.

What's needed, though, is not just a single ABC. Some thought and design
needs to go into segmenting the ndarray API to declare certain behaviors,
just like was done for collections:

https://docs.python.org/3/library/collections.abc.html

You don't just have a single ABC declaring a collection, but rather "I am a
mapping" or "I am a mutable sequence". It's more of a pain for developers
to properly specify things, but this is not a bad thing to actually give
code some thought.

Ryan

-- 
Ryan May
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-06 Thread Chris Barker
On Sat, Nov 4, 2017 at 6:47 AM, Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

>
> You just summarized excellently why I'm on a quest to change `asarray`
> to `asanyarray` within numpy


+1 -- we should all be using asanyarray() most of the time. However a
couple notes:

asarray() pre-dates asanyarray() by a LOT. asanyarray was added to better
handle subclasses, but there is a lot of legacy code out there.

An legacy coders -- I know that I still usually use asarray without
thinking about it -- sorry!

Obviously, this covers only ndarray
> subclasses, not duck types, though I guess in principle one could use
> the ABC registration mechanism mentioned above to let those types pass
> through.
>

The trick there is that what does it mean to be duck-typed to an ndarray?
For may applications its' critical that the C API be the same, so
duck-typing doesn't really apply.

And in other cases, in only needs to support a small portion of the numpy
API. IS essence, there are an almost infinite number of possible ABCs for
an ndarray...

For my part, I've been known to write custom "array_like" code -- it checks
for the handful of methods I know I need to use, and tI test it against the
small handful of duck-typed arrays that I know I want my code to work with.

Klunky, and maybe we could come up with a standard way to do it and include
that in numpy, but I'm not sure that ABCs are the way to do it.


-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-04 Thread Marten van Kerkwijk
Hi Ben,

You just summarized excellently why I'm on a quest to change `asarray`
to `asanyarray` within numy (or at least add a `subok` keyword for
things like `broadcast_arrays`)! Obviously, this covers only ndarray
subclasses, not duck types, though I guess in principle one could use
the ABC registration mechanism mentioned above to let those types pass
through.

Returning to the original topic of the thread, with `__array_ufunc__`
it now is even easier to keep track of your meta data for ufuncs, and
has become possible to massage input data before the ufunc is called
(rather than just the output).

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-04 Thread Ben Rowland


> On 2 Nov 2017, at 22:39, Marten van Kerkwijk  
> wrote:
> 
> I guess my argument boils down to it being better to state that a
> function only accepts arrays and happily let it break on, e.g.,
> matrix, than use `asarray` to make a matrix into an array even though
> it really isn’t.

I would support this attitude, the user can always call `asarray` when passing 
their
data into the function if necessary, then they know upfront what the 
consequences
will be.

For my own ndarray subclass, I want it to behave exactly as a standard ndarray,
but in addition I add some metadata and some functions that act on that, for
example an affine transform and functions to convert between coordinate systems.
The current numpy system of overriding __array_wrap__, __array_finalize__ and
__new__ is great to allow the subclass and metadata to propagate through most
basic operations.

The problem is that many functions using `asarray` strip out all of this 
metadata
and return a bare ndarray. My current solution is to implement an `inherit` 
method
on my subclass which converts an ndarray and copies back all the metadata,
which often looks like this:
spec_data = data.inherit(np.fft.fft(data))

To use composition instead of inheritance would require me to forward every part
of the ndarray API as is, which would be a great deal of work which in nearly
every case would only achieve the same results as replacing `asarray` by
`asanyarray` in various library functions. I don’t want to change the behaviour 
of
the existing class, just to add some data and methods, and I can’t imagine I am
alone in that.

Ben

> 
> I do like the dtype ideas, but think I'd agree they're likely to come
> with their own problems. But just making new numerical types possible
> is interesting.
> 
> -- Marten
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-03 Thread Marten van Kerkwijk
Yes, I like the idea of, effectively, creating an ABC for ndarray -
with which one can register. -- Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread Stephan Hoyer
On Thu, Nov 2, 2017 at 3:35 PM Nathan Goldbaum 
wrote:

> Ah, but what if the dtype modifies the interface? That might sound evil,
> but it's something that's been proposed. For example, if I wanted to
> replace yt's YTArray in a backward compatibile way with a dtype and just
> use plain ndarrays everywhere, the dtype would need to *at least* modify
> ndarray's API, adding e.g. to(), convert_to_unit(), a units attribute, and
> several other things.
>

I suppose we'll need to sort this out. But adding new methods/properties
feels pretty safe to me, as long as existing ones are guaranteed to work in
the same way.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread Marten van Kerkwijk
I guess my argument boils down to it being better to state that a
function only accepts arrays and happily let it break on, e.g.,
matrix, than use `asarray` to make a matrix into an array even though
it really isn't.

I do like the dtype ideas, but think I'd agree they're likely to come
with their own problems. But just making new numerical types possible
is interesting.

-- Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread Nathan Goldbaum
On Thu, Nov 2, 2017 at 5:21 PM, Stephan Hoyer  wrote:

> On Thu, Nov 2, 2017 at 12:42 PM Nathan Goldbaum 
> wrote:
>
>> Would this issue be ameliorated given Nathaniel's proposal to try to move
>> away from subclasses and towards storing data in dtypes? Or would that just
>> mean that xarray would need to ban dtypes it doesn't know about?
>>
>
> Yes, I think custom dtypes would definitely help. Custom dtypes have a
> well contained interface, so lots of operations (e.g., concatenate,
> reshaping, indexing) are guaranteed to work in a dtype independent way. If
> you try to do an unsupported operation for such a dtype (e.g.,
> np.datetime64), you will generally get a good error message about an
> invalid dtype.
>
> In contrast, you can overload a subclass with totally arbitrary semantics
> (e.g., np.matrix) and of course for duck-types as well.
>
> This makes a big difference for libraries like dask or xarray, which need
> a standard interface to guarantee they do the right thing. I'm pretty sure
> we can wrap a custom dtype ndarray with units, but there's no way we're
> going to support np.matrix without significant work. It's hard to know
> which is which without well defined interfaces.
>

Ah, but what if the dtype modifies the interface? That might sound evil,
but it's something that's been proposed. For example, if I wanted to
replace yt's YTArray in a backward compatibile way with a dtype and just
use plain ndarrays everywhere, the dtype would need to *at least* modify
ndarray's API, adding e.g. to(), convert_to_unit(), a units attribute, and
several other things.

Of course if I don't care about backward compatibility I can just do all of
these operations on the dtype object itself. However, I suspect whatever
implementation of custom dtypes gets added to numpy, it will have the
property that it can act like an arbitrary ndarray subclass otherwise
libraries like yt, Pint, metpy, and astropy won't be able to switch to it.

-Nathan


>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread Stephan Hoyer
On Thu, Nov 2, 2017 at 12:42 PM Nathan Goldbaum 
wrote:

> Would this issue be ameliorated given Nathaniel's proposal to try to move
> away from subclasses and towards storing data in dtypes? Or would that just
> mean that xarray would need to ban dtypes it doesn't know about?
>

Yes, I think custom dtypes would definitely help. Custom dtypes have a well
contained interface, so lots of operations (e.g., concatenate, reshaping,
indexing) are guaranteed to work in a dtype independent way. If you try to
do an unsupported operation for such a dtype (e.g., np.datetime64), you
will generally get a good error message about an invalid dtype.

In contrast, you can overload a subclass with totally arbitrary semantics
(e.g., np.matrix) and of course for duck-types as well.

This makes a big difference for libraries like dask or xarray, which need a
standard interface to guarantee they do the right thing. I'm pretty sure we
can wrap a custom dtype ndarray with units, but there's no way we're going
to support np.matrix without significant work. It's hard to know which is
which without well defined interfaces.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread Marten van Kerkwijk
On Thu, Nov 2, 2017 at 5:09 PM, Benjamin Root  wrote:
> Duck typing is great and all for classes that implement some or all of the
> ndarray interface but remember what the main reason for asarray() and
> asanyarray(): to automatically promote lists and tuples and other
> "array-likes" to ndarrays. Ignoring the use-case of lists of lists is
> problematic at best.

How I wish numpy had never gone there! Convenience for what, exactly?
For the user not having to put `array()` around the list themselves?
We slow down everything for that? And even now we're trying to remove
some of the cases where both tuples and lists are allowed. Grr. Of
course, we are well and truly stuck with it - now it is one of the
main reasons to subclass rather than duck-type... Anyway, water under
the bridge...  -- Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread Marten van Kerkwijk
My 2¢ here is that all code should feel free to assume certain type of
input, as long as it is documented properly, but there is no reason to
enforce that by, e.g., putting `asarray` everywhere. Then, for some
pieces ducktypes and subclasses will just work like magic, and uses
you might never have foreseen become possible. For others, whoever
wants to use them has to do work (and up to a package maintainers to
decide whether or not to accept PRs that implement hooks, etc.)

I do see the argument that this way one becomes constrained in the
internal implementation, as a change may break an outward-looking
function, but while at times this may be inconvenient, in my
experience at others it may just make one realize an even better
implementation is possible. But then, I really like duck-typing...

-- Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread Stephan Hoyer
On Thu, Nov 2, 2017 at 9:45 AM  wrote:

> similar, scipy.special has ufuncs
> what units are those?
>
> Most code that I know (i.e. scipy.stats and statsmodels) does not use only
> "normal mathematical operations with ufuncs"
> I guess there are a lot of "abnormal" mathematical operations
> where just simply propagating the units will not work.
>

> Aside: The problem is more general also for other datastructures.
> E.g. statsmodels for most parts uses only numpy ndarrays inside the
> algorithm and computations because that provides well defined
> behavior. (e.g. pandas behaved too differently in many cases)
> I don't have much idea yet about how to change the infrastructure to
> allow the use of dask arrays, sparse matrices and similar and possibly
> automatic differentiation.
>

This is the exact same reason why pandas and xarray do not support wrapping
arbitrary ndarray subclasses or duck array types. The operations we use
internally (on numpy.ndarray objects) may not be what you would expect
externally, and may even be implementation details not considered part of
the public API. For example, in xarray we use numpy.nanmean() or
bottleneck.nanmean() instead of numpy.mean().

For NumPy and xarray, I think we could (and should) define an interface to
support subclasses and duck types for generic operations for core
use-cases. My main concern with subclasses / duck-arrays is
undefined/untested behavior, especially where we might silently give the
wrong answer or trigger some undesired operation (e.g., loading a lazily
computed into memory) rather than raising an informative error. Leaking
implementation details is another concern: we have already had several
cases in NumPy where a function only worked on a subclass if a particular
method was called internally, and broke when that was changed.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread Marten van Kerkwijk
Hi Josef,

Indeed, for some applications one would like to have different units
for different parts of an array. And that means that, at present, the
quantity implementations that we have are no good at storing, say, a
covariance matrix involving parameters with different units, where
thus each element of the covariance matrix has a different unit. I
fear at present it would have to be an object array instead; other
cases may be a bit easier to solve, by, e.g., allowing structured
arrays with similarly structured units. I do note that actually doing
it would clarify, e.g., what the axes in Vandermonde (spelling?)
matrices mean.

That said, there is truly an enormous benefit for checking units on
"regular" operations. Spacecraft have missed Mars because people
didn't do it properly...

All the best,

Marten

p.s. The scipy functions should indeed be included in the ufuncs
covered; there is a fairly long-standing issue for that in astropy...
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread josef . pktd
On Thu, Nov 2, 2017 at 12:23 PM, Ryan May  wrote:

> On Thu, Nov 2, 2017 at 6:46 AM,  wrote:
>
>>
>>
>> On Wed, Nov 1, 2017 at 6:55 PM, Nathan Goldbaum 
>> wrote:
>>
>>> I think the biggest issues could be resolved if __array_concatenate__
>>> were finished. Unfortunately I don't feel like I can take that on right now.
>>>
>>> See Ryan May's talk at scipy about using an ndarray subclass for units
>>> and the issues he's run into:
>>>
>>> https://www.youtube.com/watch?v=qCo9bkT9sow
>>>
>>
>>
>> Interesting talk, but I don't see how general library code should know
>> what units the output has.
>> for example if units are some flows per unit of time and we average, sum
>> or integrate over time, then what are the new units? (e.g. pandas time
>> aggregation)
>>
>
> A general library doesn't have to do anything--just not do annoying things
> like isinstance() checks and calling np.asarray() everywhere. Honestly one
> of those is the core of most of the problems I run into. It's currently
> more complicated when doing things in compiled land, but that's
> implementation details, not any kind of fundamental incompatibility.
>
> For basic mathematical operations, units have perfectly well defined
> semantics that many of us encountered in an introductory physics or
> chemistry class:
> - Want to add or subtract two things? They need to have the same units; a
> units library can handle conversion provided they have the same
> dimensionality (e.g. length, time)
> - Multiplication/Divison: combine and cancel units ( m/s * s -> m)
>
> Everything else we do on a computer with data in some way boils down to:
> add, subtract, multiply, divide.
>
> Average keeps the same units -- it's just a sum and division by a
> unit-less constant
> Integration (in 1-D) involves *two* variables, your data as well as the
> time/space coordinates (or dx or dt); fundamentally it's a multiplication
> by dx and a summation. The units results then are e.g. data.units *
> dx.units. This works just like it does in Physics 101 where you integrate
> velocity (i.e. m/s) over time (e.g. s) and get displacement (e.g. m)
>
> What are units of covariance or correlation between two variables with the
>> same units, and what are they between variables with different units?
>>
>
> Well, covariance is subtracting the mean from each variable and
> multiplying the residuals; therefore the units for cov(x, y):
>
> (x.units - x.units) * (y.units - y.units) -> x.units * y.units
>
> Correlation takes covariance and divides by the product of the standard
> deviations, so that's:
>
> (x.units * y.units) / (x.units * y.units) -> dimensionless
>
> Which is what I'd expect for a correlation.
>
>
>> How do you concatenate and operate arrays with different units?
>>
>
> If all arrays have compatible dimensionality (say meters, inches, miles),
> you convert to one (say the first) and concatenate like normal. If they're
> not compatible, you error out.
>
>
>> interpolation or prediction would work with using the existing units.
>>
>
> I'm sure you wrote that thinking units didn't play a role, but the math
> behind those operations works perfectly fine with units, with things
> cancelling out properly to give the same units out as in.
>

Some of it is in my reply to Marten.

regression and polyfit requires an X matrix with different units and then
some linear algebra like solve, pinv or svd.

So, while the predicted values have well defined units, the computation
involves some messier operations, unless you want to forgo linear algebra
in all intermediate step and reduce it to sum, division and inverse.

Josef


>
> Ryan
>
> --
> Ryan May
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread Ryan May
On Thu, Nov 2, 2017 at 6:46 AM,  wrote:

>
>
> On Wed, Nov 1, 2017 at 6:55 PM, Nathan Goldbaum 
> wrote:
>
>> I think the biggest issues could be resolved if __array_concatenate__
>> were finished. Unfortunately I don't feel like I can take that on right now.
>>
>> See Ryan May's talk at scipy about using an ndarray subclass for units
>> and the issues he's run into:
>>
>> https://www.youtube.com/watch?v=qCo9bkT9sow
>>
>
>
> Interesting talk, but I don't see how general library code should know
> what units the output has.
> for example if units are some flows per unit of time and we average, sum
> or integrate over time, then what are the new units? (e.g. pandas time
> aggregation)
>

A general library doesn't have to do anything--just not do annoying things
like isinstance() checks and calling np.asarray() everywhere. Honestly one
of those is the core of most of the problems I run into. It's currently
more complicated when doing things in compiled land, but that's
implementation details, not any kind of fundamental incompatibility.

For basic mathematical operations, units have perfectly well defined
semantics that many of us encountered in an introductory physics or
chemistry class:
- Want to add or subtract two things? They need to have the same units; a
units library can handle conversion provided they have the same
dimensionality (e.g. length, time)
- Multiplication/Divison: combine and cancel units ( m/s * s -> m)

Everything else we do on a computer with data in some way boils down to:
add, subtract, multiply, divide.

Average keeps the same units -- it's just a sum and division by a unit-less
constant
Integration (in 1-D) involves *two* variables, your data as well as the
time/space coordinates (or dx or dt); fundamentally it's a multiplication
by dx and a summation. The units results then are e.g. data.units *
dx.units. This works just like it does in Physics 101 where you integrate
velocity (i.e. m/s) over time (e.g. s) and get displacement (e.g. m)

What are units of covariance or correlation between two variables with the
> same units, and what are they between variables with different units?
>

Well, covariance is subtracting the mean from each variable and multiplying
the residuals; therefore the units for cov(x, y):

(x.units - x.units) * (y.units - y.units) -> x.units * y.units

Correlation takes covariance and divides by the product of the standard
deviations, so that's:

(x.units * y.units) / (x.units * y.units) -> dimensionless

Which is what I'd expect for a correlation.


> How do you concatenate and operate arrays with different units?
>

If all arrays have compatible dimensionality (say meters, inches, miles),
you convert to one (say the first) and concatenate like normal. If they're
not compatible, you error out.


> interpolation or prediction would work with using the existing units.
>

I'm sure you wrote that thinking units didn't play a role, but the math
behind those operations works perfectly fine with units, with things
cancelling out properly to give the same units out as in.

Ryan

-- 
Ryan May
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread Marten van Kerkwijk
Hi Josef,

astropy's Quantity is well developed and would give similar results to
pint; all those results  make sense if one wants to have consistent
units. A general library code will actually do the right thing as long
as it just uses normal mathematical operations with ufuncs - and as
long as it just duck types! - the unit code will then override and
properly propagate units to outputs, as can be seen in this example:
```
import astropy.units as u
np.fft.fftfreq(8, 1*u.min)
# 
np.fft.fftfreq(8, 1*u.min).var()
# 
```

> for example if units are some flows per unit of time and we average, sum or 
> integrate over time, then what are the new units? (e.g. pandas time 
> aggregation)

The units module will force you to take into account `dt`! This is in
fact one reason why it is so powerful. So, your example might go
something like:
```
flow = [1., 1.5, 1.5] * u.g / u.s
dt = [0.5, 0.5, 1.] * u.hr
np.sum(flow * dt)
# 
np.sum(flow * dt).to(u.kg)
# 
```

> How do you concatenate and operate arrays with different units?

This is where Nathaniel's `__array_concatenate__` would come in. For
regular arrays it is fine to just concatenate, but for almost anything
else you need a different approach. For quantities, the most logical
one would be to first create an empty array of the right size with the
unit of, e.g., the first part to be concatenated, and then set
sections to the input quantities (where the setter does unit
conversion and will fail if that is not possible).

All the best,

Marten

p.s. A fun subject is what to do with logarithmic units, such as the
magnitudes in astronomy... We have a module for that as well;
http://docs.astropy.org/en/latest/units/logarithmic_units.html
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-02 Thread josef . pktd
On Thu, Nov 2, 2017 at 8:46 AM,  wrote:

>
>
> On Wed, Nov 1, 2017 at 6:55 PM, Nathan Goldbaum 
> wrote:
>
>> I think the biggest issues could be resolved if __array_concatenate__
>> were finished. Unfortunately I don't feel like I can take that on right now.
>>
>> See Ryan May's talk at scipy about using an ndarray subclass for units
>> and the issues he's run into:
>>
>> https://www.youtube.com/watch?v=qCo9bkT9sow
>>
>
>
> Interesting talk, but I don't see how general library code should know
> what units the output has.
> for example if units are some flows per unit of time and we average, sum
> or integrate over time, then what are the new units? (e.g. pandas time
> aggregation)
> What are units of covariance or correlation between two variables with the
> same units, and what are they between variables with different units?
>
> How do you concatenate and operate arrays with different units?
>
> interpolation or prediction would work with using the existing units.
>
> partially related:
> statsmodels uses a wrapper for pandas Series and DataFrames and tries to
> preserve the index when possible and make up a new DataFrame or Series if
> the existing index doesn't apply.
> E.g. predicted values and residuals are in terms of the original provided
> index, and could also get original units assigned. That would also be
> possible with prediction confidence intervals. But for the rest, see above.
>

using pint

>>> x

>>> x / x


>>> x / (1 + x)
Traceback (most recent call last):
  File "", line 1, in 
  File "C:\...\python-3.4.4.amd64\lib\site-packages\pint\quantity.py", line
669, in __add__
raise DimensionalityError(self._units, 'dimensionless')
return self._add_sub(other, operator.add)
  File "C:\...\python-3.4.4.amd64\lib\site-packages\pint\quantity.py", line
580, in _add_sub
pint.errors.DimensionalityError: Cannot convert from 'meter' to
'dimensionless'

np.exp(x)
raises
pint.errors.DimensionalityError: Cannot convert from 'meter' ([length]) to
'dimensionless' (dimensionless)


Josef



>
> Josef
>
>
>>
>>
>> On Wed, Nov 1, 2017 at 5:50 PM, Marten van Kerkwijk <
>> m.h.vankerkw...@gmail.com> wrote:
>>
>>> From my experience with Quantity, routines that properly ducktype work
>>> well, those that feel the need to accept list and blatantly do
>>> `asarray` do not - even if in many cases they would have worked if
>>> they used `asanyarray`...  But there are lots of nice surprises, with,
>>> e.g., `np.fft.fftfreq` just working as one would hope.  Anyway, bottom
>>> line, I think you should let this stop you from trying only if you
>>> know something important does not work. -- Marten
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-11-01 Thread Nathan Goldbaum
I think the biggest issues could be resolved if __array_concatenate__ were
finished. Unfortunately I don't feel like I can take that on right now.

See Ryan May's talk at scipy about using an ndarray subclass for units and
the issues he's run into:

https://www.youtube.com/watch?v=qCo9bkT9sow

On Wed, Nov 1, 2017 at 5:50 PM, Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> From my experience with Quantity, routines that properly ducktype work
> well, those that feel the need to accept list and blatantly do
> `asarray` do not - even if in many cases they would have worked if
> they used `asanyarray`...  But there are lots of nice surprises, with,
> e.g., `np.fft.fftfreq` just working as one would hope.  Anyway, bottom
> line, I think you should let this stop you from trying only if you
> know something important does not work. -- Marten
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-10-31 Thread William Sheffler
Thank you all kindly for your responses! Based on your encouragement, I
will pursue an ndarray subclass / __array_ufunc__ implementation. I had
been toying with np.set_numeric_ops, which is less than ideal (for example,
np.ndarray.around segfaults if I use set_numeric_ops in any way).

A second question: very broadly speaking, how much 'pain' can I expect
trying to use an np.ndarray subclass in the broader python scientific
computing ecosystem, and is there general consensus that projects 'should'
support ndarray subclasses?

Will

> We spent a *long time* sorting out the messy details of __array_ufunc__
[1], especially for handling interactions between different types, e.g.,
between numpy arrays, non-numpy array-like objects, builtin Python objects,
objects that override arithmetic to act in non-numpy-like ways, and of
course subclasses of all the above.

> We hope that we have it right this time, but as we wrote in the NumPy
1.13 release notes "The API is provisional, we do not yet guarantee
backward compatibility as modifications may be made pending feedback." That
said, let's give it a try!

> If any changes are necessary, I expect it would likely relate to how we
handle interactions between different types. That's where we spent the
majority of the design effort, but debate is a poor substitute for
experience. I would be very surprised if the basic cases (one argument or
two arguments of the same type) need any changes.

Best,
Stephan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-10-27 Thread Marten van Kerkwijk
Hi Nathan,

Happy to hear that it works well for yt! In astropy's Quantity as
well, it greatly simplifies the code, and has made many operations
about two times faster (which is why I pushed so hard to get
__array_ufunc__ done...).  But for now we're stuck with supporting
__array_prepare__ and __array_wrap__ as well.

Of course, do let us know if there are issues with the current design.
Some further cleanup, especially with the use of `super` for ndarray
subclasses, is still foreseen (but none of that should impact the
API).

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-10-27 Thread Nathan Goldbaum
I’m using it in yt. If we were able to drop support for all old numpy
versions, switching would allow me to delete hundreds of lines of code.
As-is since we need to simultaneously support old and new versions, it adds
some additional complexity. If you’re ok with only supporting numpy >=
1.13, array_ufunc will make you life a lot easier.

On Fri, Oct 27, 2017 at 6:55 PM Marten van Kerkwijk <
m.h.vankerkw...@gmail.com> wrote:

> Just to second Stephan's comment: do try it! I've moved astropy's
> Quantity over to it, and am certainly counting on the basic interface
> staying put...  -- Marten
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-10-27 Thread Stephan Hoyer
Hi Will,

We spent a *long time* sorting out the messy details of __array_ufunc__
[1], especially for handling interactions between different types, e.g.,
between numpy arrays, non-numpy array-like objects, builtin Python objects,
objects that override arithmetic to act in non-numpy-like ways, and of
course subclasses of all the above.

We hope that we have it right this time, but as we wrote in the NumPy 1.13
release notes "The API is provisional, we do not yet guarantee backward
compatibility as modifications may be made pending feedback." That said,
let's give it a try!

If any changes are necessary, I expect it would likely relate to how we
handle interactions between different types. That's where we spent the
majority of the design effort, but debate is a poor substitute for
experience. I would be very surprised if the basic cases (one argument or
two arguments of the same type) need any changes.

Best,
Stephan

[1] https://docs.scipy.org/doc/numpy-1.13.0/neps/ufunc-overrides.html


On Fri, Oct 27, 2017 at 12:39 PM William Sheffler 
wrote:

> Right before 1.12, I arranged an API around an np.ndarray subclass, making
> use of __array_ufunc__ to customize behavior based on structured dtype (we
> come from c++ and really like operator overloading). Having seen
> __array_ufunc__ featured in Travis Oliphant's Guide to NumPy: 2nd Edition,
> I assumed this was the way to go. But it was removed in 1.12. Now that 1.13
> has reintroduced __array_ufunc__, can I now rely on its continued
> availability? I am considering using it as a base-level component in
> several libraries... is this a dangerous idea?
>
> Thanks!
> Will
>
> --
> William H. Sheffler Ph.D.
> Principal Engineer
> Institute for Protein Design
> University of Washington
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] is __array_ufunc__ ready for prime-time?

2017-10-27 Thread William Sheffler
Right before 1.12, I arranged an API around an np.ndarray subclass, making
use of __array_ufunc__ to customize behavior based on structured dtype (we
come from c++ and really like operator overloading). Having seen
__array_ufunc__ featured in Travis Oliphant's Guide to NumPy: 2nd Edition,
I assumed this was the way to go. But it was removed in 1.12. Now that 1.13
has reintroduced __array_ufunc__, can I now rely on its continued
availability? I am considering using it as a base-level component in
several libraries... is this a dangerous idea?

Thanks!
Will

-- 
William H. Sheffler Ph.D.
Principal Engineer
Institute for Protein Design
University of Washington
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion