Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-29 Thread Chris Barker
On Sat, Jan 27, 2018 at 8:50 PM, Allan Haldane 
wrote:

> On 01/26/2018 06:01 PM, josef.p...@gmail.com wrote:
>
>> I thought recarrays were pretty cool back in the day, but pandas is
>> a much better option.
>>
>> So I pretty much only use structured arrays for data exchange with C
>> code
>>
>> My impression is that this turns into a deprecate recarrays and
>> supporting recfunction issue.
>>
>>

> *should* we have any dataframe-like functionality in numpy?
>
> We get requests every once in a while about how to sort rows, or about
> adding a "groupby" function. I myself have used recarrays in a
> dataframe-like way, when I wanted a quick multiple-array object that
> supported numpy indexing. So there is some demand to have minimal
> "dataframe-like" behavior in numpy itself.
>
> recarrays play part of this role currently, though imperfectly due to
> padding and cache issues. I think I'm comfortable with supporting some
> minor use of structured/recarrays as dataframe-like, with a warning in docs
> that the user should really look at pandas/xarray, and that structured
> arrays are primarily for data exchange.
>

Well, I think we should either:

deprecate recarrays -- i.e. explicitly not support DataFrame-like
functionality in numpy, keeping only the data-exchange functionality as
maintained.

or

Properly support it -- which doesn't mean re-implementing Pandas or xarray,
but it would mean addressing any bug-like issues like not dealing properly
with padding.

Personally, I don't need/want it enough to contribute, but if someone does,
great.

This reminds me a bit of the old numpy.Matrix issue -- it was ALMOST there,
but not quite, with issues, and there was essentially no overlap between
the people that wanted it and the people that had the time and skills to
really make it work.

(If we want to dream, maybe one day we should make a minimal multiple-array
> container class. I imagine it would look pretty similar to recarray, but
> stored as a set of arrays instead of a structured array. But maybe
> recarrays are good enough, and let's not reimplement pandas either.)
>

Exactly -- we really don't need to re-implement Pandas

(except it's CSV reading capability :-) )

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-29 Thread Eric Wieser
I think that there's a lot of confusion going around about recarrays vs
structured arrays.

[`recarray`](
https://github.com/numpy/numpy/blob/v1.13.0/numpy/core/records.py) are a
wrapper around structured arrays that provide:
* Attribute access to fields as `arr.field` in addition to the normal
`arr['field']`
* Automatic datatype-guessing for nested lists of tuples (which needs a
little work, but seems like a justifiable feature)
* An undocumented `field` method that behaves like the 1.14 indexing
behavior (!)

Meanwhile, `recfunctions` is a collection of functions that work on normal
structured arrays - so is misleadingly named.
The only link to recarrays is that most of the functions have a
`asrecarray` parameter which applies `.view(recarray)` to the result.

> deprecate recarrays

Given how thin an abstraction they are over structured arrays, I don't
think you mean this.
Are you advocating for deprecating structured arrays entirely, or just
deprecating recfunctions?

Eric

On Mon, 29 Jan 2018 at 09:39 Chris Barker  wrote:

> On Sat, Jan 27, 2018 at 8:50 PM, Allan Haldane 
> wrote:
>
>> On 01/26/2018 06:01 PM, josef.p...@gmail.com wrote:
>>
>>> I thought recarrays were pretty cool back in the day, but pandas is
>>> a much better option.
>>>
>>> So I pretty much only use structured arrays for data exchange with C
>>> code
>>>
>>> My impression is that this turns into a deprecate recarrays and
>>> supporting recfunction issue.
>>>
>>>
>
>> *should* we have any dataframe-like functionality in numpy?
>
>
>>
>> We get requests every once in a while about how to sort rows, or about
>> adding a "groupby" function. I myself have used recarrays in a
>> dataframe-like way, when I wanted a quick multiple-array object that
>> supported numpy indexing. So there is some demand to have minimal
>> "dataframe-like" behavior in numpy itself.
>>
>> recarrays play part of this role currently, though imperfectly due to
>> padding and cache issues. I think I'm comfortable with supporting some
>> minor use of structured/recarrays as dataframe-like, with a warning in docs
>> that the user should really look at pandas/xarray, and that structured
>> arrays are primarily for data exchange.
>>
>
> Well, I think we should either:
>
> deprecate recarrays -- i.e. explicitly not support DataFrame-like
> functionality in numpy, keeping only the data-exchange functionality as
> maintained.
>
> or
>
> Properly support it -- which doesn't mean re-implementing Pandas or
> xarray, but it would mean addressing any bug-like issues like not dealing
> properly with padding.
>
> Personally, I don't need/want it enough to contribute, but if someone
> does, great.
>
> This reminds me a bit of the old numpy.Matrix issue -- it was ALMOST
> there, but not quite, with issues, and there was essentially no overlap
> between the people that wanted it and the people that had the time and
> skills to really make it work.
>
> (If we want to dream, maybe one day we should make a minimal
>> multiple-array container class. I imagine it would look pretty similar to
>> recarray, but stored as a set of arrays instead of a structured array. But
>> maybe recarrays are good enough, and let's not reimplement pandas either.)
>>
>
> Exactly -- we really don't need to re-implement Pandas
>
> (except it's CSV reading capability :-) )
>
> -CHB
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-29 Thread josef . pktd
On Mon, Jan 29, 2018 at 1:22 PM, Eric Wieser 
wrote:

> I think that there's a lot of confusion going around about recarrays vs
> structured arrays.
>
> [`recarray`](https://github.com/numpy/numpy/blob/v1.13.0/
> numpy/core/records.py) are a wrapper around structured arrays that
> provide:
> * Attribute access to fields as `arr.field` in addition to the normal
> `arr['field']`
> * Automatic datatype-guessing for nested lists of tuples (which needs a
> little work, but seems like a justifiable feature)
> * An undocumented `field` method that behaves like the 1.14 indexing
> behavior (!)
>
> Meanwhile, `recfunctions` is a collection of functions that work on normal
> structured arrays - so is misleadingly named.
> The only link to recarrays is that most of the functions have a
> `asrecarray` parameter which applies `.view(recarray)` to the result.
>
> > deprecate recarrays
>
> Given how thin an abstraction they are over structured arrays, I don't
> think you mean this.
> Are you advocating for deprecating structured arrays entirely, or just
> deprecating recfunctions?
>

First, statsmodels is in the pandas camp for dataframes, so I don't have
any invested interest in recarrays/structured dtypes anymore.

What I meant was that structured dtypes with implicit (hidden?) padding
becomes unintuitive for the recarray/dataframe usecase. (At least I won't
try to update my intuition about having extra things in there that are not
specified by the main structured dtype.) Also the dataframe_like usage of
structured dtypes doesn't seem to be much under consideration anymore.

So, my **impression** is that the recent changes make the
recarray/dataframe usecase for structured dtypes more difficult.

Given that there is pandas, xarray, dask and more, numpy could as well drop
any pretense of supporting dataframe_likes. Or, adjust the recfunctions so
we can still work dataframe_like with structured
dtypes/recarrays/recfunctions.


Josef



>
> Eric
>
> On Mon, 29 Jan 2018 at 09:39 Chris Barker  wrote:
>
>> On Sat, Jan 27, 2018 at 8:50 PM, Allan Haldane 
>> wrote:
>>
>>> On 01/26/2018 06:01 PM, josef.p...@gmail.com wrote:
>>>
 I thought recarrays were pretty cool back in the day, but pandas is
 a much better option.

 So I pretty much only use structured arrays for data exchange with C
 code

 My impression is that this turns into a deprecate recarrays and
 supporting recfunction issue.


>>
>>> *should* we have any dataframe-like functionality in numpy?
>>
>>
>>>
>>> We get requests every once in a while about how to sort rows, or about
>>> adding a "groupby" function. I myself have used recarrays in a
>>> dataframe-like way, when I wanted a quick multiple-array object that
>>> supported numpy indexing. So there is some demand to have minimal
>>> "dataframe-like" behavior in numpy itself.
>>>
>>> recarrays play part of this role currently, though imperfectly due to
>>> padding and cache issues. I think I'm comfortable with supporting some
>>> minor use of structured/recarrays as dataframe-like, with a warning in docs
>>> that the user should really look at pandas/xarray, and that structured
>>> arrays are primarily for data exchange.
>>>
>>
>> Well, I think we should either:
>>
>> deprecate recarrays -- i.e. explicitly not support DataFrame-like
>> functionality in numpy, keeping only the data-exchange functionality as
>> maintained.
>>
>> or
>>
>> Properly support it -- which doesn't mean re-implementing Pandas or
>> xarray, but it would mean addressing any bug-like issues like not dealing
>> properly with padding.
>>
>> Personally, I don't need/want it enough to contribute, but if someone
>> does, great.
>>
>> This reminds me a bit of the old numpy.Matrix issue -- it was ALMOST
>> there, but not quite, with issues, and there was essentially no overlap
>> between the people that wanted it and the people that had the time and
>> skills to really make it work.
>>
>> (If we want to dream, maybe one day we should make a minimal
>>> multiple-array container class. I imagine it would look pretty similar to
>>> recarray, but stored as a set of arrays instead of a structured array. But
>>> maybe recarrays are good enough, and let's not reimplement pandas either.)
>>>
>>
>> Exactly -- we really don't need to re-implement Pandas
>>
>> (except it's CSV reading capability :-) )
>>
>> -CHB
>>
>>
>> --
>>
>> Christopher Barker, Ph.D.
>> Oceanographer
>>
>> Emergency Response Division
>> NOAA/NOS/OR&R(206) 526-6959   voice
>> 7600 Sand Point Way NE   (206) 526-6329   fax
>> Seattle, WA  98115   (206) 526-6317   main reception
>>
>> chris.bar...@noaa.gov
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.pyth

Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-29 Thread Stefan van der Walt

On Mon, 29 Jan 2018 14:10:56 -0500, josef.p...@gmail.com wrote:

Given that there is pandas, xarray, dask and more, numpy could as well drop
any pretense of supporting dataframe_likes. Or, adjust the recfunctions so
we can still work dataframe_like with structured
dtypes/recarrays/recfunctions.


I haven't been following the duckarray discussion carefully, but could
this be an opportunity for a dataframe protocol, so that we can have
libraries ingest structured arrays, record arrays, pandas dataframes,
etc. without too much specialized code?

Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-29 Thread josef . pktd
On Mon, Jan 29, 2018 at 2:55 PM, Stefan van der Walt 
wrote:

> On Mon, 29 Jan 2018 14:10:56 -0500, josef.p...@gmail.com wrote:
>
>> Given that there is pandas, xarray, dask and more, numpy could as well
>> drop
>> any pretense of supporting dataframe_likes. Or, adjust the recfunctions so
>> we can still work dataframe_like with structured
>> dtypes/recarrays/recfunctions.
>>
>
> I haven't been following the duckarray discussion carefully, but could
> this be an opportunity for a dataframe protocol, so that we can have
> libraries ingest structured arrays, record arrays, pandas dataframes,
> etc. without too much specialized code?
>

AFAIU while not being in the data handling area, pandas defines the
interface and other libraries provide pandas compatible interfaces or
implementations.

statsmodels currently still has recarray support and usage. In some
interfaces we support pandas, recarrays and plain arrays, or anything where
asarray works correctly.

But recarrays became messy to support, one rewrite of some functions last
year converts recarrays to pandas, does the manipulation and then converts
back to recarrays.
Also we need to adjust our recarray usage with new numpy versions. But
there is no real benefit because I doubt that statsmodels still has any
recarray/structured dtype users. So, we only have to remove our own uses in
the datasets and unit tests.

Josef



>
> Stéfan
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Moving NumPy's PRNG Forward

2018-01-29 Thread Pierre de Buyl
Hello,

On Sat, Jan 27, 2018 at 09:28:54AM +0900, Robert Kern wrote:
> On Sat, Jan 27, 2018 at 1:14 AM, Kevin Sheppard
>  wrote:
> >
> > In terms of what is needed, I think that the underlying PRNG should
> be swappable.  The will provide a simple mechanism to allow certain
> types of advancement while easily providing backward compat.  In the
> current design this is very hard and requires compiling many nearly
> identical copies of RandomState. In pseudocode something like
> >
> > standard_normal(prng)
> >
> > where prng is a basic class that retains the PRNG state and has a
> small set of core random number generators that belong to the
> underlying PRNG -- probably something like int32, int64, double, and
> possibly int53. I am not advocating explicitly passing the PRNG as an
> argument, but having generators which can take any suitable PRNG would
> add a lot of flexibility in terms of taking advantage of improvements
> in the underlying PRNGs (see, e.g., xoroshiro128/xorshift1024).  The
> "small" core PRNG would have responsibility over state and streams.
> The remainder of the module would transform the underlying PRNG into
> the required distributions.
> (edit: after writing the following verbiage, I realize it can be summed
> up with more respect to your suggestion: yes, we should do this design,
> but we don't need to and shouldn't give up on a class with distribution
> methods.)
> Once the core PRNG C API is in place, I don't think we necessarily need
> to move away from a class structure per se, though it becomes an
> option.

(Sorry for cutting so much, I have a short question)

My typical use case for the C API of NumPy's random features is that I start
coding in pure Python and then switch to Cython. I have at least twice in the
past resorted to include "randomkit.h" and use that directly. My last work
actually implements a Python/Cython interface for rngs, see
http://threefry.readthedocs.io/using_from_cython.html

The goal is to use exactly the same backend in Python and Cython, with a cimport
and a few cdefs the only changes needed for a first port to Cython.

Is this type of feature in discussion or in project for the future of
numpy.random?

Regards,

Pierre

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-29 Thread Benjamin Root
I <3 structured arrays. I love the fact that I can access data by row and
then by fieldname, or vice versa. There are times when I need to pass just
a column into a function, and there are times when I need to process things
row by row. Yes, pandas is nice if you want the specialized indexing
features, but it becomes a bear to deal with if all you want is normal
indexing, or even the ability to easily loop over the dataset.

Cheers!
Ben Root

On Mon, Jan 29, 2018 at 3:24 PM,  wrote:

>
>
> On Mon, Jan 29, 2018 at 2:55 PM, Stefan van der Walt  > wrote:
>
>> On Mon, 29 Jan 2018 14:10:56 -0500, josef.p...@gmail.com wrote:
>>
>>> Given that there is pandas, xarray, dask and more, numpy could as well
>>> drop
>>> any pretense of supporting dataframe_likes. Or, adjust the recfunctions
>>> so
>>> we can still work dataframe_like with structured
>>> dtypes/recarrays/recfunctions.
>>>
>>
>> I haven't been following the duckarray discussion carefully, but could
>> this be an opportunity for a dataframe protocol, so that we can have
>> libraries ingest structured arrays, record arrays, pandas dataframes,
>> etc. without too much specialized code?
>>
>
> AFAIU while not being in the data handling area, pandas defines the
> interface and other libraries provide pandas compatible interfaces or
> implementations.
>
> statsmodels currently still has recarray support and usage. In some
> interfaces we support pandas, recarrays and plain arrays, or anything where
> asarray works correctly.
>
> But recarrays became messy to support, one rewrite of some functions last
> year converts recarrays to pandas, does the manipulation and then converts
> back to recarrays.
> Also we need to adjust our recarray usage with new numpy versions. But
> there is no real benefit because I doubt that statsmodels still has any
> recarray/structured dtype users. So, we only have to remove our own uses in
> the datasets and unit tests.
>
> Josef
>
>
>
>>
>> Stéfan
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] f2py bug in numpy v1.12.0 and above?

2018-01-29 Thread Solbrig,Jeremy
I have a suite of fortran code that I compile with f2py and use as a plugin to 
a python package.  I am using Python v2.7 from Anaconda.  When compiled using 
numpy v1.11.3 or lower, everything works fine, but if I upgrade to any more 
recent version I begin running into a runtime error.  Presumably I am just not 
linking something that needs to be linked, but I'm not sure what library that 
would be.


The error:

ImportError: .../libstddev_kernel.so: undefined symbol: 
_gfortran_stop_numeric_f08

Where libestddev_kernel.so is built using this command:
f2py --fcompiler=gnu95 --quiet --opt="-O3" -L. -L/usr/lib  -I. -I.../include 
-I/usr/include  -m libstddev_kernel -c stddev_kernel/stddev_kernel.f90 config.o


Is there something additional I should be linking to within numpy or my 
anaconda installation?


Thank you,

Jeremy


Jeremy Solbrig

Research Associate

Cooperative Institute for Research in the Atmosphere

Colorado State University

jeremy.solb...@colostate.edu

(970) 491-8805
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] f2py bug in numpy v1.12.0 and above?

2018-01-29 Thread Matthew Brett
Hi,

On Mon, Jan 29, 2018 at 10:16 AM, Solbrig,Jeremy
 wrote:
> I have a suite of fortran code that I compile with f2py and use as a plugin
> to a python package.  I am using Python v2.7 from Anaconda.  When compiled
> using numpy v1.11.3 or lower, everything works fine, but if I upgrade to any
> more recent version I begin running into a runtime error.  Presumably I am
> just not linking something that needs to be linked, but I'm not sure what
> library that would be.
>
>
> The error:
>
> ImportError: .../libstddev_kernel.so: undefined symbol:
> _gfortran_stop_numeric_f08
>
> Where libestddev_kernel.so is built using this command:
> f2py --fcompiler=gnu95 --quiet --opt="-O3" -L. -L/usr/lib  -I. -I.../include
> -I/usr/include  -m libstddev_kernel -c stddev_kernel/stddev_kernel.f90
> config.o
>
>
> Is there something additional I should be linking to within numpy or my
> anaconda installation?

Do you get the same problem with a pip install of numpy?   If not,
this may be an Anaconda packaging bug rather than a numpy bug ...

Cheers,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-29 Thread josef . pktd
On Mon, Jan 29, 2018 at 3:44 PM, Benjamin Root  wrote:

> I <3 structured arrays. I love the fact that I can access data by row and
> then by fieldname, or vice versa. There are times when I need to pass just
> a column into a function, and there are times when I need to process things
> row by row. Yes, pandas is nice if you want the specialized indexing
> features, but it becomes a bear to deal with if all you want is normal
> indexing, or even the ability to easily loop over the dataset.
>

I don't think there is a doubt that structured arrays, arrays with
structured dtypes, are a useful container. The question is whether they
should be more or the foundation for more.

For example, computing a mean, or reduce operation, over numeric element
("columns"). Before padded views it was possible to index by selecting the
relevant "columns" and view them as standard array. With padded views that
breaks and AFAICS, there is no way in numpy 1.14.0 to compute a mean of
some "columns". (I don't have numpy 1.14 to try or find a workaround, like
maybe looping over all relevant columns.)

Josef





>
> Cheers!
> Ben Root
>
> On Mon, Jan 29, 2018 at 3:24 PM,  wrote:
>
>>
>>
>> On Mon, Jan 29, 2018 at 2:55 PM, Stefan van der Walt <
>> stef...@berkeley.edu> wrote:
>>
>>> On Mon, 29 Jan 2018 14:10:56 -0500, josef.p...@gmail.com wrote:
>>>
 Given that there is pandas, xarray, dask and more, numpy could as well
 drop
 any pretense of supporting dataframe_likes. Or, adjust the recfunctions
 so
 we can still work dataframe_like with structured
 dtypes/recarrays/recfunctions.

>>>
>>> I haven't been following the duckarray discussion carefully, but could
>>> this be an opportunity for a dataframe protocol, so that we can have
>>> libraries ingest structured arrays, record arrays, pandas dataframes,
>>> etc. without too much specialized code?
>>>
>>
>> AFAIU while not being in the data handling area, pandas defines the
>> interface and other libraries provide pandas compatible interfaces or
>> implementations.
>>
>> statsmodels currently still has recarray support and usage. In some
>> interfaces we support pandas, recarrays and plain arrays, or anything where
>> asarray works correctly.
>>
>> But recarrays became messy to support, one rewrite of some functions last
>> year converts recarrays to pandas, does the manipulation and then converts
>> back to recarrays.
>> Also we need to adjust our recarray usage with new numpy versions. But
>> there is no real benefit because I doubt that statsmodels still has any
>> recarray/structured dtype users. So, we only have to remove our own uses in
>> the datasets and unit tests.
>>
>> Josef
>>
>>
>>
>>>
>>> Stéfan
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] f2py bug in numpy v1.12.0 and above?

2018-01-29 Thread Andrew Nelson
Something similar was mentioned at
https://github.com/scipy/scipy/issues/8325.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-29 Thread Allan Haldane
On 01/29/2018 04:02 PM, josef.p...@gmail.com wrote:
> 
> 
> On Mon, Jan 29, 2018 at 3:44 PM, Benjamin Root  > wrote:
> 
> I <3 structured arrays. I love the fact that I can access data by
> row and then by fieldname, or vice versa. There are times when I
> need to pass just a column into a function, and there are times when
> I need to process things row by row. Yes, pandas is nice if you want
> the specialized indexing features, but it becomes a bear to deal
> with if all you want is normal indexing, or even the ability to
> easily loop over the dataset.
> 
> 
> I don't think there is a doubt that structured arrays, arrays with
> structured dtypes, are a useful container. The question is whether they
> should be more or the foundation for more.
> 
> For example, computing a mean, or reduce operation, over numeric element
> ("columns"). Before padded views it was possible to index by selecting
> the relevant "columns" and view them as standard array. With padded
> views that breaks and AFAICS, there is no way in numpy 1.14.0 to compute
> a mean of some "columns". (I don't have numpy 1.14 to try or find a
> workaround, like maybe looping over all relevant columns.)
> 
> Josef

Just to clarify, structured types have always had padding bytes, that
isn't new.

What *is* new (which we are pushing to 1.15, I think) is that it may be
somewhat more common to end up with padding than before, and only if you
are specifically using multi-field indexing, which is a fairly
specialized case.

I think recfunctions already account properly for padding bytes. Except
for the bug in #8100, which we will fix, padding-bytes in recarrays are
more or less invisible to a non-expert who only cares about
dataframe-like behavior.

In other words, padding is no obstacle at all to computing a mean over a
column, and single-field indexes in 1.15 behave identically as before.
The only thing that will change in 1.15 is multi-field indexing, and it
has never been possible to compute a mean (or any binary operation) on
multiple fields.

Allan

> 
> Cheers!
> Ben Root
> 
> On Mon, Jan 29, 2018 at 3:24 PM,  > wrote:
> 
> 
> 
> On Mon, Jan 29, 2018 at 2:55 PM, Stefan van der Walt
> mailto:stef...@berkeley.edu>> wrote:
> 
> On Mon, 29 Jan 2018 14:10:56 -0500, josef.p...@gmail.com
>  wrote:
> 
> Given that there is pandas, xarray, dask and more, numpy
> could as well drop
> any pretense of supporting dataframe_likes. Or, adjust
> the recfunctions so
> we can still work dataframe_like with structured
> dtypes/recarrays/recfunctions.
> 
> 
> I haven't been following the duckarray discussion carefully,
> but could
> this be an opportunity for a dataframe protocol, so that we
> can have
> libraries ingest structured arrays, record arrays, pandas
> dataframes,
> etc. without too much specialized code?
> 
> 
> AFAIU while not being in the data handling area, pandas defines
> the interface and other libraries provide pandas compatible
> interfaces or implementations.
> 
> statsmodels currently still has recarray support and usage. In
> some interfaces we support pandas, recarrays and plain arrays,
> or anything where asarray works correctly.
> 
> But recarrays became messy to support, one rewrite of some
> functions last year converts recarrays to pandas, does the
> manipulation and then converts back to recarrays.
> Also we need to adjust our recarray usage with new numpy
> versions. But there is no real benefit because I doubt that
> statsmodels still has any recarray/structured dtype users. So,
> we only have to remove our own uses in the datasets and unit tests.
> 
> Josef
> 
>  
> 
> 
> Stéfan
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org 
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> 
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org 
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> 
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org 
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 

Re: [Numpy-discussion] Moving NumPy's PRNG Forward

2018-01-29 Thread Robert Kern
On Tue, Jan 30, 2018 at 5:39 AM, Pierre de Buyl <
pierre.deb...@chem.kuleuven.be> wrote:
>
> Hello,
>
> On Sat, Jan 27, 2018 at 09:28:54AM +0900, Robert Kern wrote:
> > On Sat, Jan 27, 2018 at 1:14 AM, Kevin Sheppard
> >  wrote:
> > >
> > > In terms of what is needed, I think that the underlying PRNG should
> > be swappable.  The will provide a simple mechanism to allow certain
> > types of advancement while easily providing backward compat.  In the
> > current design this is very hard and requires compiling many nearly
> > identical copies of RandomState. In pseudocode something like
> > >
> > > standard_normal(prng)
> > >
> > > where prng is a basic class that retains the PRNG state and has a
> > small set of core random number generators that belong to the
> > underlying PRNG -- probably something like int32, int64, double, and
> > possibly int53. I am not advocating explicitly passing the PRNG as an
> > argument, but having generators which can take any suitable PRNG would
> > add a lot of flexibility in terms of taking advantage of improvements
> > in the underlying PRNGs (see, e.g., xoroshiro128/xorshift1024).  The
> > "small" core PRNG would have responsibility over state and streams.
> > The remainder of the module would transform the underlying PRNG into
> > the required distributions.
> > (edit: after writing the following verbiage, I realize it can be summed
> > up with more respect to your suggestion: yes, we should do this design,
> > but we don't need to and shouldn't give up on a class with distribution
> > methods.)
> > Once the core PRNG C API is in place, I don't think we necessarily need
> > to move away from a class structure per se, though it becomes an
> > option.
>
> (Sorry for cutting so much, I have a short question)
>
> My typical use case for the C API of NumPy's random features is that I
start
> coding in pure Python and then switch to Cython. I have at least twice in
the
> past resorted to include "randomkit.h" and use that directly. My last work
> actually implements a Python/Cython interface for rngs, see
> http://threefry.readthedocs.io/using_from_cython.html
>
> The goal is to use exactly the same backend in Python and Cython, with a
cimport
> and a few cdefs the only changes needed for a first port to Cython.
>
> Is this type of feature in discussion or in project for the future of
> numpy.random?

Sort of, but not really. For sure, once we've made the decisions that let
us move forward to a new design, we'll have Cython implementations that can
be used natively from Cython as well as Python without code changes. *But*
it's not going to be an automatic speedup like your threefry library
allows. You designed that API such that each of the methods returns a
single scalar, so all you need to do is declare your functions `cpdef` and
provide a `.pxd`. Our methods return either a scalar or an array depending
on the arguments, so the methods will be declared to return `object`, and
you will pay the overhead costs for checking the arguments and such. We're
not going to change that Python API; we're only considering dropping
stream-compatibility, not source-compatibility.

I would like to make sure that we do expose a C/Cython API to the
distribution functions (i.e. that only draw a single variate and return a
scalar), but it's not likely to look exactly like the Python API. There
might be clever tricks that we can do to minimize the amount of changes
that one needs to do, though, if you are only drawing single variates at a
time (e.g. an agent-based simulation) and you want to make it go faster by
moving to Cython. For example, maybe we collect all of the single-variate
C-implemented methods into a single object sitting as an attribute on the
`Distributions` object.

cdef class DistributionsCAPI:
cdef double normal(double loc, double scale)
cdef double uniform(double low, double high)

cdef class Distributions:
cdef DistributionsCAPI capi
cpdef object normal(loc, scale, size=None):
if size is None and np.isscalar(loc) and np.isscalar(scale):
return self.capi.normal(loc, scale)
else:
 # ... Make an array

prng = Distributions(...)
# From Python:
x = prng.normal(mean, std)
# From Cython:
cdef double x = prng.capi.normal(mean, std)

But we need to make some higher-level decisions first before we can get
down to this level of design. Please do jump in and remind us of this use
case once we do get down to actual work on the new API design. Thanks!

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] f2py bug in numpy v1.12.0 and above?

2018-01-29 Thread Matthew Brett
Hi,

On Mon, Jan 29, 2018 at 1:02 PM, Andrew Nelson  wrote:
> Something similar was mentioned at
> https://github.com/scipy/scipy/issues/8325.

Yes, that one was also for Anaconda...

Cheers,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Moving NumPy's PRNG Forward

2018-01-29 Thread Kevin Sheppard
I agree with pretty much everything you wrote Robert.  I didn't have quote
the right frame but the generic class that takes a low-level core PRNG
sounds like the right design, and this should make user-generated
distributions easier to develop.  I was thinking along these lines inspired
by the SpiPy changes that use a LowLevelCallable, e.g.,

https://ilovesymposia.com/2017/03/12/scipys-new-lowlevelcallable-is-a-game-changer/


This might also allow users to extend the core PRNGs using something like
Numba JIT classes as an alternative.

Another area that needs though is how to correctly spawn in Multiprocess
application. This might be most easily addressed by providing a guide on a
good way rather than the arbitrary way used now.

Kevin





On Sat, Jan 27, 2018 at 5:03 PM  wrote:

> Send NumPy-Discussion mailing list submissions to
> numpy-discussion@python.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://mail.python.org/mailman/listinfo/numpy-discussion
> or, via email, send a message with subject or body 'help' to
> numpy-discussion-requ...@python.org
>
> You can reach the person managing the list at
> numpy-discussion-ow...@python.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of NumPy-Discussion digest..."
>
>
> Today's Topics:
>
>1. Re: Moving NumPy's PRNG Forward (Robert Kern)
>2. Re: Using np.frombuffer and cffi.buffer on array of C structs
>   (problem with struct member padding) (Joe)
>
>
> --
>
> Message: 1
> Date: Sat, 27 Jan 2018 09:28:54 +0900
> From: Robert Kern 
> To: Discussion of Numerical Python 
> Subject: Re: [Numpy-discussion] Moving NumPy's PRNG Forward
> Message-ID:
>  w...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> On Sat, Jan 27, 2018 at 1:14 AM, Kevin Sheppard <
> kevin.k.shepp...@gmail.com>
> wrote:
> >
> > I am a firm believer that the current situation is not sustainable.
> There are a lot of improvements that can practically be incorporated.
> While many of these are performance related, there are also improvements in
> accuracy over some ranges of parameters that cannot be incorporated. I also
> think that perfect stream reproducibility is a bit of a myth across
> versions since this really would require identical OS, compiler and
> possibly CPU for some of the generators that produce floats.
> >
> > I believe there is a case for separating the random generator from core
> NumPy.  Some points that favor becoming a subproject:
> >
> > 1. It is a pure consumer of NumPy API.  Other parts of the API do no
> depend on random.
> > 2. A stand alone package could be installed along side many different
> version of core NumPy which would reduce the pressure on freezing the
> stream.
>
> Removing numpy.random (or freezing it as deprecated legacy while all PRNG
> development moves elsewhere) is probably a non-starter. It's too used for
> us not to provide something. That said, we can (and ought to) make it much
> easier for external packages to provide PRNG capabilities (core PRNGs and
> distributions) that interoperate with the core functionality that numpy
> provides. I'm also happy to place a high barrier on adding more
> distributions to numpy.random once that is in place.
>
> Specifically, core uniform PRNGs should have a small common C API that
> distribution functions can use. This might just be a struct with an opaque
> `void*` state pointer and then 2 function pointers for drawing a uint64
> (whole range) and a double in [0,1) from the state. It's important to
> expose our core uniform PRNGs as a C API because there has been a desire to
> interoperate at that level, using the same PRNG state inside C or Fortran
> or GPU code. If that's in place, then people can write new efficient
> distribution functions in C that use this small C API agnostic to the core
> PRNG algorithm. It also makes it easy to implement new core PRNGs that the
> distribution functions provided by numpy.random can use.
>
> > In terms of what is needed, I think that the underlying PRNG should be
> swappable.  The will provide a simple mechanism to allow certain types of
> advancement while easily providing backward compat.  In the current design
> this is very hard and requires compiling many nearly identical copies of
> RandomState. In pseudocode something like
> >
> > standard_normal(prng)
> >
> > where prng is a basic class that retains the PRNG state and has a small
> set of core random number generators that belong to the underlying PRNG --
> probably something like int32, int64, double, and possibly int53. I am not
> advocating explicitly passing the PRNG as an argument, but having
> generators which can take any suitable PRNG would add a lot of flexibility
> in terms of taking advantage of improvements in the underlying PRNGs (see,
> e.g., xoroshiro128/xorshift1024).  The "small" core P

Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-29 Thread josef . pktd
On Mon, Jan 29, 2018 at 4:11 PM, Allan Haldane 
wrote:

> On 01/29/2018 04:02 PM, josef.p...@gmail.com wrote:
> >
> >
> > On Mon, Jan 29, 2018 at 3:44 PM, Benjamin Root  > > wrote:
> >
> > I <3 structured arrays. I love the fact that I can access data by
> > row and then by fieldname, or vice versa. There are times when I
> > need to pass just a column into a function, and there are times when
> > I need to process things row by row. Yes, pandas is nice if you want
> > the specialized indexing features, but it becomes a bear to deal
> > with if all you want is normal indexing, or even the ability to
> > easily loop over the dataset.
> >
> >
> > I don't think there is a doubt that structured arrays, arrays with
> > structured dtypes, are a useful container. The question is whether they
> > should be more or the foundation for more.
> >
> > For example, computing a mean, or reduce operation, over numeric element
> > ("columns"). Before padded views it was possible to index by selecting
> > the relevant "columns" and view them as standard array. With padded
> > views that breaks and AFAICS, there is no way in numpy 1.14.0 to compute
> > a mean of some "columns". (I don't have numpy 1.14 to try or find a
> > workaround, like maybe looping over all relevant columns.)
> >
> > Josef
>
> Just to clarify, structured types have always had padding bytes, that
> isn't new.
>
> What *is* new (which we are pushing to 1.15, I think) is that it may be
> somewhat more common to end up with padding than before, and only if you
> are specifically using multi-field indexing, which is a fairly
> specialized case.
>
> I think recfunctions already account properly for padding bytes. Except
> for the bug in #8100, which we will fix, padding-bytes in recarrays are
> more or less invisible to a non-expert who only cares about
> dataframe-like behavior.
>
> In other words, padding is no obstacle at all to computing a mean over a
> column, and single-field indexes in 1.15 behave identically as before.
> The only thing that will change in 1.15 is multi-field indexing, and it
> has never been possible to compute a mean (or any binary operation) on
> multiple fields.
>

from the example in the other thread
a[['b', 'c']].view(('f8', 2)).mean(0)


(from the statsmodels usecase:
read csv with genfromtext to get recarray or structured array
select/index the numeric columns
view them as standard array
do whatever we can do with standard numpy  arrays
)

Josef


>
> Allan
>
> >
> > Cheers!
> > Ben Root
> >
> > On Mon, Jan 29, 2018 at 3:24 PM,  > > wrote:
> >
> >
> >
> > On Mon, Jan 29, 2018 at 2:55 PM, Stefan van der Walt
> > mailto:stef...@berkeley.edu>> wrote:
> >
> > On Mon, 29 Jan 2018 14:10:56 -0500, josef.p...@gmail.com
> >  wrote:
> >
> > Given that there is pandas, xarray, dask and more, numpy
> > could as well drop
> > any pretense of supporting dataframe_likes. Or, adjust
> > the recfunctions so
> > we can still work dataframe_like with structured
> > dtypes/recarrays/recfunctions.
> >
> >
> > I haven't been following the duckarray discussion carefully,
> > but could
> > this be an opportunity for a dataframe protocol, so that we
> > can have
> > libraries ingest structured arrays, record arrays, pandas
> > dataframes,
> > etc. without too much specialized code?
> >
> >
> > AFAIU while not being in the data handling area, pandas defines
> > the interface and other libraries provide pandas compatible
> > interfaces or implementations.
> >
> > statsmodels currently still has recarray support and usage. In
> > some interfaces we support pandas, recarrays and plain arrays,
> > or anything where asarray works correctly.
> >
> > But recarrays became messy to support, one rewrite of some
> > functions last year converts recarrays to pandas, does the
> > manipulation and then converts back to recarrays.
> > Also we need to adjust our recarray usage with new numpy
> > versions. But there is no real benefit because I doubt that
> > statsmodels still has any recarray/structured dtype users. So,
> > we only have to remove our own uses in the datasets and unit
> tests.
> >
> > Josef
> >
> >
> >
> >
> > Stéfan
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org  python.org>
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> >
> >
> >
> > _

Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-29 Thread josef . pktd
On Mon, Jan 29, 2018 at 5:50 PM,  wrote:

>
>
> On Mon, Jan 29, 2018 at 4:11 PM, Allan Haldane 
> wrote:
>
>> On 01/29/2018 04:02 PM, josef.p...@gmail.com wrote:
>> >
>> >
>> > On Mon, Jan 29, 2018 at 3:44 PM, Benjamin Root > > > wrote:
>> >
>> > I <3 structured arrays. I love the fact that I can access data by
>> > row and then by fieldname, or vice versa. There are times when I
>> > need to pass just a column into a function, and there are times when
>> > I need to process things row by row. Yes, pandas is nice if you want
>> > the specialized indexing features, but it becomes a bear to deal
>> > with if all you want is normal indexing, or even the ability to
>> > easily loop over the dataset.
>> >
>> >
>> > I don't think there is a doubt that structured arrays, arrays with
>> > structured dtypes, are a useful container. The question is whether they
>> > should be more or the foundation for more.
>> >
>> > For example, computing a mean, or reduce operation, over numeric element
>> > ("columns"). Before padded views it was possible to index by selecting
>> > the relevant "columns" and view them as standard array. With padded
>> > views that breaks and AFAICS, there is no way in numpy 1.14.0 to compute
>> > a mean of some "columns". (I don't have numpy 1.14 to try or find a
>> > workaround, like maybe looping over all relevant columns.)
>> >
>> > Josef
>>
>> Just to clarify, structured types have always had padding bytes, that
>> isn't new.
>>
>> What *is* new (which we are pushing to 1.15, I think) is that it may be
>> somewhat more common to end up with padding than before, and only if you
>> are specifically using multi-field indexing, which is a fairly
>> specialized case.
>>
>> I think recfunctions already account properly for padding bytes. Except
>> for the bug in #8100, which we will fix, padding-bytes in recarrays are
>> more or less invisible to a non-expert who only cares about
>> dataframe-like behavior.
>>
>> In other words, padding is no obstacle at all to computing a mean over a
>> column, and single-field indexes in 1.15 behave identically as before.
>> The only thing that will change in 1.15 is multi-field indexing, and it
>> has never been possible to compute a mean (or any binary operation) on
>> multiple fields.
>>
>
> from the example in the other thread
> a[['b', 'c']].view(('f8', 2)).mean(0)
>
>
> (from the statsmodels usecase:
> read csv with genfromtext to get recarray or structured array
> select/index the numeric columns
> view them as standard array
> do whatever we can do with standard numpy  arrays
> )
>

Or, to phrase it as a question:

How do we get a standard array with homogeneous dtype from the
corresponding elements of a structured dtype in numpy 1.14.0?

Josef




>
> Josef
>
>
>>
>> Allan
>>
>> >
>> > Cheers!
>> > Ben Root
>> >
>> > On Mon, Jan 29, 2018 at 3:24 PM, > > > wrote:
>> >
>> >
>> >
>> > On Mon, Jan 29, 2018 at 2:55 PM, Stefan van der Walt
>> > mailto:stef...@berkeley.edu>> wrote:
>> >
>> > On Mon, 29 Jan 2018 14:10:56 -0500, josef.p...@gmail.com
>> >  wrote:
>> >
>> > Given that there is pandas, xarray, dask and more, numpy
>> > could as well drop
>> > any pretense of supporting dataframe_likes. Or, adjust
>> > the recfunctions so
>> > we can still work dataframe_like with structured
>> > dtypes/recarrays/recfunctions.
>> >
>> >
>> > I haven't been following the duckarray discussion carefully,
>> > but could
>> > this be an opportunity for a dataframe protocol, so that we
>> > can have
>> > libraries ingest structured arrays, record arrays, pandas
>> > dataframes,
>> > etc. without too much specialized code?
>> >
>> >
>> > AFAIU while not being in the data handling area, pandas defines
>> > the interface and other libraries provide pandas compatible
>> > interfaces or implementations.
>> >
>> > statsmodels currently still has recarray support and usage. In
>> > some interfaces we support pandas, recarrays and plain arrays,
>> > or anything where asarray works correctly.
>> >
>> > But recarrays became messy to support, one rewrite of some
>> > functions last year converts recarrays to pandas, does the
>> > manipulation and then converts back to recarrays.
>> > Also we need to adjust our recarray usage with new numpy
>> > versions. But there is no real benefit because I doubt that
>> > statsmodels still has any recarray/structured dtype users. So,
>> > we only have to remove our own uses in the datasets and unit
>> tests.
>> >
>> > Josef
>> >
>> >
>> >
>> >
>> > Stéfan
>> >
>> > ___

Re: [Numpy-discussion] f2py bug in numpy v1.12.0 and above?

2018-01-29 Thread Solbrig,Jeremy
I think that I actually may have this solved now.  It seems that I can resolve 
the issue by running "conda install gcc" then linking against the anaconda 
gfortran libraries rather than my system install.  The only problem is that 
installing gcc through anaconda causes anaconda's version of gcc to supersede 
the system version...


Anyway, not a numpy issue it seems, just an Anaconda packaging issue as you 
both said.


From: NumPy-Discussion 
 on behalf of 
Andrew Nelson 
Sent: Monday, January 29, 2018 2:02:02 PM
To: Discussion of Numerical Python
Subject: Re: [Numpy-discussion] f2py bug in numpy v1.12.0 and above?

Something similar was mentioned at https://github.com/scipy/scipy/issues/8325.

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-29 Thread Allan Haldane

On 01/29/2018 05:59 PM, josef.p...@gmail.com wrote:



On Mon, Jan 29, 2018 at 5:50 PM, > wrote:




On Mon, Jan 29, 2018 at 4:11 PM, Allan Haldane
mailto:allanhald...@gmail.com>> wrote:

On 01/29/2018 04:02 PM, josef.p...@gmail.com
 wrote:
>
>
> On Mon, Jan 29, 2018 at 3:44 PM, Benjamin Root mailto:ben.v.r...@gmail.com>
> >> wrote:
>
>     I <3 structured arrays. I love the fact that I can access data by
>     row and then by fieldname, or vice versa. There are times when I
>     need to pass just a column into a function, and there are times 
when
>     I need to process things row by row. Yes, pandas is nice if you 
want
>     the specialized indexing features, but it becomes a bear to deal
>     with if all you want is normal indexing, or even the ability to
>     easily loop over the dataset.
>
>
> I don't think there is a doubt that structured arrays, arrays with
> structured dtypes, are a useful container. The question is whether 
they
> should be more or the foundation for more.
>
> For example, computing a mean, or reduce operation, over numeric 
element
> ("columns"). Before padded views it was possible to index by selecting
> the relevant "columns" and view them as standard array. With padded
> views that breaks and AFAICS, there is no way in numpy 1.14.0 to 
compute
> a mean of some "columns". (I don't have numpy 1.14 to try or find a
> workaround, like maybe looping over all relevant columns.)
>
> Josef

Just to clarify, structured types have always had padding bytes,
that
isn't new.

What *is* new (which we are pushing to 1.15, I think) is that it
may be
somewhat more common to end up with padding than before, and
only if you
are specifically using multi-field indexing, which is a fairly
specialized case.

I think recfunctions already account properly for padding bytes.
Except
for the bug in #8100, which we will fix, padding-bytes in
recarrays are
more or less invisible to a non-expert who only cares about
dataframe-like behavior.

In other words, padding is no obstacle at all to computing a
mean over a
column, and single-field indexes in 1.15 behave identically as
before.
The only thing that will change in 1.15 is multi-field indexing,
and it
has never been possible to compute a mean (or any binary
operation) on
multiple fields.


from the example in the other thread
a[['b', 'c']].view(('f8', 2)).mean(0)


(from the statsmodels usecase:
read csv with genfromtext to get recarray or structured array
select/index the numeric columns
view them as standard array
do whatever we can do with standard numpy  arrays
)


Oh ok, I misunderstood. I see your point: a mean over fields is more 
difficult than before.



Or, to phrase it as a question:

How do we get a standard array with homogeneous dtype from the 
corresponding elements of a structured dtype in numpy 1.14.0?


Josef


The answer may be that "numpy has never had a way to that",
even if in a few special cases you might hack a workaround using views.

That's what your example seems like to me. It uses an explicit view, 
which is an "expert" feature since views depend on the exact memory 
layout and binary representation of the array. Your example only works 
if the two fields have exactly the same dtype as each other and as the 
final dtype, and evidently breaks if there is byte padding for any reason.


Pandas can do row means without these problems:

>>> pd.DataFrame(np.ones(10, dtype='i8,f8')).mean(axis=0)

Numpy is missing this functionality, so you or whoever wrote that 
example figured out a fragile workaround using views.


I suggest that if we want to allow either means over fields, or 
conversion of a n-D structured array to an n+1-D regular ndarray, we 
should add a dedicated function to do so in numpy.lib.recfunctions

which does not depend on the binary representation of the array.

Allan



Josef


Allan

>
>     Cheers!
>     Ben Root
>
>     On Mon, Jan 29, 2018 at 3:24 PM, mailto:josef.p...@gmail.com>
>     >> 
wrote:
>
>
>
>         On Mon, Jan 29, 2018 at 2:55 PM, Stefan van der Walt
>         mailto:stef...@berkeley.edu>
>> wrote:
>
>             On Mon, 29 Jan 2018 14:10:56 -0500, josef.p...@gmail.com 


Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-29 Thread josef . pktd
On Mon, Jan 29, 2018 at 10:44 PM, Allan Haldane 
wrote:

> On 01/29/2018 05:59 PM, josef.p...@gmail.com wrote:
>
>>
>>
>> On Mon, Jan 29, 2018 at 5:50 PM, > josef.p...@gmail.com>> wrote:
>>
>>
>>
>> On Mon, Jan 29, 2018 at 4:11 PM, Allan Haldane
>> mailto:allanhald...@gmail.com>> wrote:
>>
>> On 01/29/2018 04:02 PM, josef.p...@gmail.com
>>  wrote:
>> >
>> >
>> > On Mon, Jan 29, 2018 at 3:44 PM, Benjamin Root <
>> ben.v.r...@gmail.com 
>> > >>
>> wrote:
>> >
>> > I <3 structured arrays. I love the fact that I can access
>> data by
>> > row and then by fieldname, or vice versa. There are times
>> when I
>> > need to pass just a column into a function, and there are
>> times when
>> > I need to process things row by row. Yes, pandas is nice if
>> you want
>> > the specialized indexing features, but it becomes a bear to
>> deal
>> > with if all you want is normal indexing, or even the
>> ability to
>> > easily loop over the dataset.
>> >
>> >
>> > I don't think there is a doubt that structured arrays, arrays
>> with
>> > structured dtypes, are a useful container. The question is
>> whether they
>> > should be more or the foundation for more.
>> >
>> > For example, computing a mean, or reduce operation, over
>> numeric element
>> > ("columns"). Before padded views it was possible to index by
>> selecting
>> > the relevant "columns" and view them as standard array. With
>> padded
>> > views that breaks and AFAICS, there is no way in numpy 1.14.0
>> to compute
>> > a mean of some "columns". (I don't have numpy 1.14 to try or
>> find a
>> > workaround, like maybe looping over all relevant columns.)
>> >
>> > Josef
>>
>> Just to clarify, structured types have always had padding bytes,
>> that
>> isn't new.
>>
>> What *is* new (which we are pushing to 1.15, I think) is that it
>> may be
>> somewhat more common to end up with padding than before, and
>> only if you
>> are specifically using multi-field indexing, which is a fairly
>> specialized case.
>>
>> I think recfunctions already account properly for padding bytes.
>> Except
>> for the bug in #8100, which we will fix, padding-bytes in
>> recarrays are
>> more or less invisible to a non-expert who only cares about
>> dataframe-like behavior.
>>
>> In other words, padding is no obstacle at all to computing a
>> mean over a
>> column, and single-field indexes in 1.15 behave identically as
>> before.
>> The only thing that will change in 1.15 is multi-field indexing,
>> and it
>> has never been possible to compute a mean (or any binary
>> operation) on
>> multiple fields.
>>
>>
>> from the example in the other thread
>> a[['b', 'c']].view(('f8', 2)).mean(0)
>>
>>
>> (from the statsmodels usecase:
>> read csv with genfromtext to get recarray or structured array
>> select/index the numeric columns
>> view them as standard array
>> do whatever we can do with standard numpy  arrays
>> )
>>
>
> Oh ok, I misunderstood. I see your point: a mean over fields is more
> difficult than before.
>
> Or, to phrase it as a question:
>>
>> How do we get a standard array with homogeneous dtype from the
>> corresponding elements of a structured dtype in numpy 1.14.0?
>>
>> Josef
>>
>
> The answer may be that "numpy has never had a way to that",
> even if in a few special cases you might hack a workaround using views.
>
> That's what your example seems like to me. It uses an explicit view, which
> is an "expert" feature since views depend on the exact memory layout and
> binary representation of the array. Your example only works if the two
> fields have exactly the same dtype as each other and as the final dtype,
> and evidently breaks if there is byte padding for any reason.
>
> Pandas can do row means without these problems:
>
> >>> pd.DataFrame(np.ones(10, dtype='i8,f8')).mean(axis=0)
>
> Numpy is missing this functionality, so you or whoever wrote that example
> figured out a fragile workaround using views.
>

Once upon a time (*) this wasn't fragile but the only and recommended way.
Because dtypes were low level with clear memory layout and stayed that way,
it was easy to check item size or whatever and get different views on it.
e.g.
https://mail.scipy.org/pipermail/numpy-discussion/2008-December/039340.html

(*) pre-pandas, pre-stackoverflow on the mailing lists which was for me
roughly 2008 to 2012
but a late thread
https://mail.scipy.org/pipermail/numpy-discussion/2015-October/074014

[Numpy-discussion] Optimisation of matrix multiplication

2018-01-29 Thread Andrew Nelson
Hi all,
I have a matrix multiplication that I'd like to optimize.

I have a matrix `a` (dtype=complex) with shape (N, M, 2, 2). I'd like to do
the following multiplication:

a[:, 0] @ a[:, 1] @ ... @ a[:, M-1]

where the first dimension, N, is element wise (and hopefully vectorisable)
and M>=2. So for each row do M-1 matrix multiplications of 2x2 matrices.
The output should have shape (N, 2, 2).

What would be the best way of going about this?

-- 
_
Dr. Andrew Nelson


_
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Optimisation of matrix multiplication

2018-01-29 Thread Andrew Nelson
On 30 January 2018 at 17:22, Andrew Nelson  wrote:

> Hi all,
> I have a matrix multiplication that I'd like to optimize.
>
> I have a matrix `a` (dtype=complex) with shape (N, M, 2, 2). I'd like to
> do the following multiplication:
>
> a[:, 0] @ a[:, 1] @ ... @ a[:, M-1]
>
> where the first dimension, N, is element wise (and hopefully vectorisable)
> and M>=2. So for each row do M-1 matrix multiplications of 2x2 matrices.
> The output should have shape (N, 2, 2).
>
> What would be the best way of going about this?
>

I should add that at the moment I have 4 (N, M) arrays and am doing the
matrix multiplication in a for loop over `range(0, M)`, in an unrolled
fashion for each of the 4 elements.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion