Re: [Numpy-discussion] Custom Dtype/Units discussion

2016-07-11 Thread Travis Oliphant
On Mon, Jul 11, 2016 at 12:58 PM, Charles R Harris <
charlesr.har...@gmail.com> wrote:

>
>
> On Mon, Jul 11, 2016 at 11:39 AM, Chris Barker 
> wrote:
>
>>
>>
>> On Sun, Jul 10, 2016 at 8:12 PM, Nathan Goldbaum 
>> wrote:
>>
>>>
>>> Maybe this can be an informal BOF session?
>>>
>>
>> or  maybe a formal BoF? after all, how formal do they get?
>>
>> Anyway, it was my understanding that we really needed to do some
>> significant refactoring of how numpy deals with dtypes in order to do this
>> kind of thing cleanly -- so where has that gone since last year?
>>
>> Maybe this conversation should be about how to build a more flexible
>> dtype system generally, rather than specifically about unit support.
>> (though unit support is a great use-case to focus on)
>>
>
> Note that Mark Wiebe will also be giving a talk Friday, so he may be
> around. As the last person to add a type to Numpy and the designer of DyND
> he might have some useful input. DyND development is pretty active and I'm
> always curious how we can somehow move in that direction.
>
>
There has been a lot of work over the past 6 months on making DyND
implement the "pluribus" concept that I have talked about briefly in the
past.   DyND now has a separate C++ ndt data-type library.  The Python
interface to that type library is still unified in the dynd module but it
is separable and work is in progress to make a separate Python-wrapper to
this type library.The dynd type library is datashape described at
http://datashape.pydata.org

This type system is extensible and could be the foundation of a re-factored
NumPy.  My view (and what I am encouraging work in the direction of) is
that array computing in Python should be refactored into a "type-subsystem"
 (I think ndt is the right model there), a generic ufunc-system (I think
dynd has a very promising approach there as well), and then a container
(the memoryview already in Python might be enough already).  These
modules could be separately installed, maintained and eventually moved into
Python itself.

Then, a potential future NumPy project could be ported to be a layer of
calculations and connections to other C-libraries on-top of this system.
Many parts of the current code could be re-used in that effort --- or the
new system could be part of a re-factoring of NumPy to make the innards of
NumPy more accessible to a JIT compiler.

We are already far enough along that this could be pursued with a motivated
person.   It would take 18 months to complete the system but first-light
would be less than 6 months for a dedicated, motivated, and talented
resource.   DyND is far enough along as well as Cython and/or Numba to make
this pretty straight-forward.For this re-factored array-computing
project to take the NumPy name, this community would have to decide that
that is the right thing to do. But, other projects like Pandas and/or
xarray and/or numpy-py and/or NumPy on Jython could use this sub-system
also.

It has taken me a long time to actually get to the point where I would
recommend a specific way forward.   I have thought about this for many
years and don't make these recommendations lightly.The pluribus concept
is my recommendation about what would be best now and in the future --- and
I will be pursuing this concept and working to get to a point where this
community will accept it if possible because it would be ideal if this new
array library were still called NumPy.

My working view is that someone will have to build the new prototype NumPy
for the community to evaluate whether it's the right approach and get
consensus that it is the right way forward.There is enough there now
with DyND, data-shape, and Numba/Cython to do this fairly quickly. It
is not strictly necessary to use DyND or Numba or even data-shape to
accomplish this general plan --- but these are already available and a
great place to start as they have been built explicitly with the intention
of improving array-computing in Python.

This potential NumPy could be backwards compatible from an API perspective
(including a C-API) --- though recompliation would be necessary and there
would be some semantic differences in corner-cases that could either be
fixed where necessary but potentially just made part of the new version.

I will be at the Continuum Happy hour on Thursday at our offices and
welcome anyone to come discuss things with me there --- I am also willing
to meet with anyone on Thursday and Friday if I can --- but I don't have a
ticket to ScPy itself. Please CC me directly if you have questions.   I
try to follow the numpy-discussion mailing list but I am not always
successful at keeping up.

To be clear as some have mis-interpreted me in the past, while I originally
wrote NumPy (borrowing heavily from Numeric and drawing inspiration from
Numarray and receiving a lot of help for specific modules from many of
you), the community has continued to 

Re: [Numpy-discussion] deterministic, reproducible matmul / __matmult_

2016-07-11 Thread Pauli Virtanen
Mon, 11 Jul 2016 13:01:49 -0400, Jason Newton kirjoitti:
> Does the ML have any ideas on how one could get a matmul that will not
> allow any funny business on the evaluation of the products?  Funny
> business here is something like changing the evaluation order additions
> of terms. I want strict IEEE 754 compliance - no 80 bit registers, and
> perhaps control of the rounding mode, no unsafe math optimizations.

If you link Numpy with a BLAS and LAPACK libraries that have been 
compiled for this purpose, and turn on the compiler flags that enforce 
strict IEEE (and disable SSE) when compiling Numpy, you probably will get 
reproducible builds. Numpy itself just offloads the dot computations to 
BLAS, so if your BLAS is reproducible, things should mainly be OK.

You may also need to turn off the SSE optimizations in Numpy, because 
these can make results depend on memory alignment --- not in dot 
products, but in other computations.

Out of curiosity, what is the application where this is necessary?
Maybe there is a numerically stable formulation?

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Custom Dtype/Units discussion

2016-07-11 Thread Charles R Harris
On Mon, Jul 11, 2016 at 11:39 AM, Chris Barker 
wrote:

>
>
> On Sun, Jul 10, 2016 at 8:12 PM, Nathan Goldbaum 
> wrote:
>
>>
>> Maybe this can be an informal BOF session?
>>
>
> or  maybe a formal BoF? after all, how formal do they get?
>
> Anyway, it was my understanding that we really needed to do some
> significant refactoring of how numpy deals with dtypes in order to do this
> kind of thing cleanly -- so where has that gone since last year?
>
> Maybe this conversation should be about how to build a more flexible dtype
> system generally, rather than specifically about unit support. (though unit
> support is a great use-case to focus on)
>

Note that Mark Wiebe will also be giving a talk Friday, so he may be
around. As the last person to add a type to Numpy and the designer of DyND
he might have some useful input. DyND development is pretty active and I'm
always curious how we can somehow move in that direction.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Custom Dtype/Units discussion

2016-07-11 Thread Ryan May
On Mon, Jul 11, 2016 at 12:39 PM, Chris Barker 
wrote:
>
> Maybe this conversation should be about how to build a more flexible dtype
> system generally, rather than specifically about unit support. (though unit
> support is a great use-case to focus on)
>


I agree that a more general solution is a good goal--just that units is my
"sine qua non". Also, I would have love to have heard that someone solved
the unit + ndarray-like thing problem. :)

Ryan

-- 
Ryan May
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Custom Dtype/Units discussion

2016-07-11 Thread Chris Barker
On Sun, Jul 10, 2016 at 8:12 PM, Nathan Goldbaum 
wrote:

>
> Maybe this can be an informal BOF session?
>

or  maybe a formal BoF? after all, how formal do they get?

Anyway, it was my understanding that we really needed to do some
significant refactoring of how numpy deals with dtypes in order to do this
kind of thing cleanly -- so where has that gone since last year?

Maybe this conversation should be about how to build a more flexible dtype
system generally, rather than specifically about unit support. (though unit
support is a great use-case to focus on)

-CHB



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] deterministic, reproducible matmul / __matmult_

2016-07-11 Thread Jason Newton
Hello

I'm a long time user of numpy - but an issue I've had with it is
making sure I can reproduce the results of a floating point matrix
multiplication in other languages/modules (like c or GPU) in another,
or across installations.   I take great pains in doing this type of
work because it allows me to both prototype with python/numpy and to
also in a fairly strong/useful capacity, use it as a strict reference
implementation.  For me - small differences accumulate such that
allclose type thinking starts failing in a few iterations of
algorithms, things diverge.

I've had success with einsum before for some cases (by chance) where
no difference was ever observed between it and Eigen (C++), but I'm
not sure if I should use this any longer.  The new @ operator is very
tempting to use too, in prototyping.

Does the ML have any ideas on how one could get a matmul that will not
allow any funny business on the evaluation of the products?  Funny
business here is something like changing the evaluation order
additions of terms. I want strict IEEE 754 compliance - no 80 bit
registers, and perhaps control of the rounding mode, no unsafe math
optimizations.

I'm definitely willing to sacrifice performance (esp multi threaded
based enhancements which already cause problems in reduction ordering)
in order to get these guarantees.  I was looking around and found a
few BLAS's that might be worth a mention, comments on these would also
be welcome:

http://bebop.cs.berkeley.edu/reproblas/
https://exblas.lip6.fr/


-Jason
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Added atleast_nd, request for clarification/cleanup of atleast_3d

2016-07-11 Thread Joseph Fox-Rabinovitz
I would like to follow up on my original PR (7804). While there
appears to be some debate as to whether the PR is numpy material to
begin with, there do not appear to be any technical issues with it. To
make the decision more straightforward, I factored out the
non-controversial bug fixes to masked arrays into PR #7823, along with
their regression tests. This way, the original enhancement can be
closed or left hanging indefinitely, (even though I hope neither
happens). PR 7804 still has the bug fixes duplicated in it.

Regards,

-Joe


On Thu, Jul 7, 2016 at 9:11 AM, Joseph Fox-Rabinovitz
 wrote:
> On Thu, Jul 7, 2016 at 4:34 AM, Sebastian Berg
>  wrote:
>> On Mi, 2016-07-06 at 15:30 -0400, Benjamin Root wrote:
>>> I don't see how one could define a spec that would take an arbitrary
>>> array of indices at which to place new dimensions. By definition, you
>>>
>>
>> You just give a reordered range, so that (1, 0, 2) would be the current
>> 3D version. If 1D, fill in `1` and `2`, if 2D, fill in only `2` (0D,
>> add everything of course).
>
> I was originally thinking (-1, 0) for the 2D case. Just go along the
> list and fill as many dims as necessary. Your way is much better since
> it does not require a different operation for positive and negative
> indices.
>
>> However, I have my doubts that it is actually easier to understand then
>> to write yourself ;).
>
> A dictionary or ragged list would be better for that: either {1: (1,
> 0), 2: (2,)} or [(1, 0), (2,)]. The first is more clear since the
> index in the list is the starting ndim - 1.
>
>>
>> - Sebastian
>>
>>
>>> don't know how many dimensions are going to be added. If you knew,
>>> then you wouldn't be calling this function. I can only imagine simple
>>> rules such as 'left' or 'right' or maybe something akin to what
>>> at_least3d() implements.
>>>
>>> On Wed, Jul 6, 2016 at 3:20 PM, Joseph Fox-Rabinovitz >> @gmail.com> wrote:
>>> > On Wed, Jul 6, 2016 at 2:57 PM, Eric Firing 
>>> > wrote:
>>> > > On 2016/07/06 8:25 AM, Benjamin Root wrote:
>>> > >>
>>> > >> I wouldn't have the keyword be "where", as that collides with
>>> > the notion
>>> > >> of "where" elsewhere in numpy.
>>> > >
>>> > >
>>> > > Agreed.  Maybe "side"?
>>> >
>>> > I have tentatively changed it to "pos". The reason that I don't
>>> > like
>>> > "side" is that it implies only a subset of the possible ways that
>>> > that
>>> > the position of the new dimensions can be specified. The current
>>> > implementation only puts things on one side or the other, but I
>>> > have
>>> > considered also allowing an array of indices at which to place new
>>> > dimensions, and/or a dictionary keyed by the starting ndims. I do
>>> > not
>>> > think "side" would be appropriate for these extended cases, even if
>>> > they are very unlikely to ever materialize.
>>> >
>>> > -Joe
>>> >
>>> > > (I find atleast_1d and atleast_2d to be very helpful for handling
>>> > inputs, as
>>> > > Ben noted; I'm skeptical as to the value of atleast_3d and
>>> > atleast_nd.)
>>> > >
>>> > > Eric
>>> > >
>>> > > ___
>>> > > NumPy-Discussion mailing list
>>> > > NumPy-Discussion@scipy.org
>>> > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>> > ___
>>> > NumPy-Discussion mailing list
>>> > NumPy-Discussion@scipy.org
>>> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>> >
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion