Re: [Numpy-discussion] Moving NumPy's PRNG Forward

2018-01-26 Thread Robert Kern
On Sat, Jan 27, 2018 at 1:14 AM, Kevin Sheppard 
wrote:
>
> I am a firm believer that the current situation is not sustainable.
There are a lot of improvements that can practically be incorporated.
While many of these are performance related, there are also improvements in
accuracy over some ranges of parameters that cannot be incorporated. I also
think that perfect stream reproducibility is a bit of a myth across
versions since this really would require identical OS, compiler and
possibly CPU for some of the generators that produce floats.
>
> I believe there is a case for separating the random generator from core
NumPy.  Some points that favor becoming a subproject:
>
> 1. It is a pure consumer of NumPy API.  Other parts of the API do no
depend on random.
> 2. A stand alone package could be installed along side many different
version of core NumPy which would reduce the pressure on freezing the
stream.

Removing numpy.random (or freezing it as deprecated legacy while all PRNG
development moves elsewhere) is probably a non-starter. It's too used for
us not to provide something. That said, we can (and ought to) make it much
easier for external packages to provide PRNG capabilities (core PRNGs and
distributions) that interoperate with the core functionality that numpy
provides. I'm also happy to place a high barrier on adding more
distributions to numpy.random once that is in place.

Specifically, core uniform PRNGs should have a small common C API that
distribution functions can use. This might just be a struct with an opaque
`void*` state pointer and then 2 function pointers for drawing a uint64
(whole range) and a double in [0,1) from the state. It's important to
expose our core uniform PRNGs as a C API because there has been a desire to
interoperate at that level, using the same PRNG state inside C or Fortran
or GPU code. If that's in place, then people can write new efficient
distribution functions in C that use this small C API agnostic to the core
PRNG algorithm. It also makes it easy to implement new core PRNGs that the
distribution functions provided by numpy.random can use.

> In terms of what is needed, I think that the underlying PRNG should be
swappable.  The will provide a simple mechanism to allow certain types of
advancement while easily providing backward compat.  In the current design
this is very hard and requires compiling many nearly identical copies of
RandomState. In pseudocode something like
>
> standard_normal(prng)
>
> where prng is a basic class that retains the PRNG state and has a small
set of core random number generators that belong to the underlying PRNG --
probably something like int32, int64, double, and possibly int53. I am not
advocating explicitly passing the PRNG as an argument, but having
generators which can take any suitable PRNG would add a lot of flexibility
in terms of taking advantage of improvements in the underlying PRNGs (see,
e.g., xoroshiro128/xorshift1024).  The "small" core PRNG would have
responsibility over state and streams.  The remainder of the module would
transform the underlying PRNG into the required distributions.

(edit: after writing the following verbiage, I realize it can be summed up
with more respect to your suggestion: yes, we should do this design, but we
don't need to and shouldn't give up on a class with distribution methods.)

Once the core PRNG C API is in place, I don't think we necessarily need to
move away from a class structure per se, though it becomes an option. We
just separate the core PRNG object from the distribution-providing class.
We don't need to make copies of the distribution-providing class just to
use a new core PRNG. I'm coming around to Nathaniel's suggestion for the
constructor API (though not the distribution-versioning, for reasons I can
get into later).  We have a couple of core uniform PRNG classes like
`MT19937` and `PCG128`. Those have a tiny API, and probably don't have a
lot of unnecessary code clones between them. Their constructors can be
different depending on the different ways they can be instantiated,
depending on the PRNG's features. I'm not sure that they'll have any common
methods besides `__getstate__/__setstate__` and probably a `copy()`. They
will expose their C API as a Python-opaque attribute. They can have
whatever algorithm-dependent methods they need (e.g. to support jumpahead).
I might not even expose to Python the uint64 and U(0,1) double sampling
methods, but maybe so.

Then we have a single `Distributions` class that provides all of the
distributions that we want to support in numpy.random (i.e. what we
currently have on `RandomState` and whatever passes our higher bar in the
future). It takes one of the core PRNG instances as an argument to the
constructor (nominally, at least; we can design factory functions to make
this more convenient).

  prng = Distributions(PCG128(seed))
  x = prng.normal(mean, std)

If someone wants to write a WELL512 core PRNG, 

Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-26 Thread josef . pktd
On Fri, Jan 26, 2018 at 5:48 PM, Chris Barker  wrote:

> On Fri, Jan 26, 2018 at 2:35 PM, Allan Haldane 
> wrote:
>
>> As I remember, numpy has some fairly convoluted code for array creation
>> which tries to make sense of various nested lists/tuples/ndarray
>> combinations. It makes a difference for structured arrays and object
>> arrays. I don't remember the details right now, but I know in some cases
>> the rule is "If it's a Python list, recurse, otherwise assume it is an
>> object array".
>>
>
> that's at least explainable, and the "try to figure out what the user
> means" array cratinon is pretty much an impossible problem, so what we've
> got is probably about as good as it can get.
>
>> > These points make me think that instead of a `.totuple` method, this
>> > might be more suitable as a new function in np.lib.recfunctions.
>> >
>> > I don't seem to have that module -- and I'm running 1.14.0 -- is this a
>> > new idea?
>>
>> Sorry, I didn't specify it correctly. It is "numpy.lib.recfunctions".
>>
>
> thanks -- found it.
>
>
>> Also, the functions in that module encourage "pandas-like" use of
>> structured arrays, but I'm not sure they should be used that way. I've
>> been thinking they should be primarily used for binary interfaces
>> with/to numpy, eg to talk to C programs or to read complicated binary
>> files.
>>
>
> that's my use-case. And I agree -- if you really want to do that kind of
> thing, pandas is the way to go.
>
> I thought recarrays were pretty cool back in the day, but pandas is a much
> better option.
>
> So I pretty much only use structured arrays for data exchange with C
> code
>


My impression is that this turns into a deprecate recarrays and supporting
recfunction issue.

recfunctions and the associated function from matplotlib.mlab where
explicitly designed for using structured dtypes as dataframe_like.

(old question: does numpy have a sort_rows function now without detouring
to structured dtype views?)

Josef




>
> -CHB
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-26 Thread Chris Barker
On Fri, Jan 26, 2018 at 2:35 PM, Allan Haldane 
wrote:

> As I remember, numpy has some fairly convoluted code for array creation
> which tries to make sense of various nested lists/tuples/ndarray
> combinations. It makes a difference for structured arrays and object
> arrays. I don't remember the details right now, but I know in some cases
> the rule is "If it's a Python list, recurse, otherwise assume it is an
> object array".
>

that's at least explainable, and the "try to figure out what the user
means" array cratinon is pretty much an impossible problem, so what we've
got is probably about as good as it can get.

> > These points make me think that instead of a `.totuple` method, this
> > might be more suitable as a new function in np.lib.recfunctions.
> >
> > I don't seem to have that module -- and I'm running 1.14.0 -- is this a
> > new idea?
>
> Sorry, I didn't specify it correctly. It is "numpy.lib.recfunctions".
>

thanks -- found it.


> Also, the functions in that module encourage "pandas-like" use of
> structured arrays, but I'm not sure they should be used that way. I've
> been thinking they should be primarily used for binary interfaces
> with/to numpy, eg to talk to C programs or to read complicated binary
> files.
>

that's my use-case. And I agree -- if you really want to do that kind of
thing, pandas is the way to go.

I thought recarrays were pretty cool back in the day, but pandas is a much
better option.

So I pretty much only use structured arrays for data exchange with C
code

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-26 Thread Eric Wieser
arr.names should have been arr.dtype.names in that pack_last_axis function

Eric


​

On Fri, 26 Jan 2018 at 12:45 Chris Barker  wrote:

> On Fri, Jan 26, 2018 at 10:48 AM, Allan Haldane 
> wrote:
>
>> > What do folks think about a totuple() method — even before this I’ve
>> > wanted that. But in this case, it seems particularly useful.
>>
>
>
>> Two thoughts:
>>
>
>> 1. `totuple` makes most sense for 2d arrays. But what should it do for
>> 1d or 3+d arrays? I suppose it could make the last dimension a tuple, so
>> 1d arrays would give a list of tuples of size 1.
>>
>
> I was thinking it would be exactly like .tolist() but with tuples -- so
> you'd get tuples all the way down (or is that turtles?)
>
> IN this use case, it would have saved me the generator expression:
>
> (tuple(r) for r in arr)
>
> not a huge deal, but it would be nice to not  have to write that, and to
> have the looping be in C with no intermediate array generation.
>
> 2. structured array's .tolist() already returns a list of tuples. If we
>> have a 2d structured array, would it add one more layer of tuples?
>
>
> no -- why? it would return a tuple of tuples instead.
>
>
>> That
>> would raise an exception if read back in by `np.array` with the same
>> dtype.
>>
>
> Hmm -- indeed, if the top-level structure is a tuple, the array
> constructor gets confused:
>
> This works fine -- as it should:
>
>
> In [*84*]: new_full = np.array(full.tolist(), full.dtype)
>
>
> But this does not:
>
>
> In [*85*]: new_full = np.array(tuple(full.tolist()), full.dtype)
>
> ---
>
> ValueErrorTraceback (most recent call
> last)
>
>  in ()
>
> > 1 new_full = np.array(tuple(full.tolist()), full.dtype)
>
>
> ValueError: could not assign tuple of length 4 to structure with 2 fields.
>
> I was hoping it would dig down to the inner structures looking for a match
> to the dtype, rather than looking at the type of the top level. Oh well.
>
> So yeah, not sure where you would go from tuple to list -- probably at the
> bottom level, but that may not always be unambiguous.
>
> These points make me think that instead of a `.totuple` method, this
>> might be more suitable as a new function in np.lib.recfunctions.
>
>
> I don't seem to have that module -- and I'm running 1.14.0 -- is this a
> new idea?
>
>
>> If the
>> goal is to help manipulate structured arrays, that submodule is
>> appropriate since it already has other functions do manipulate fields in
>> similar ways. What about calling it `pack_last_axis`?
>>
>> def pack_last_axis(arr, names=None):
>> if arr.names:
>> return arr
>> names = names or ['f{}'.format(i) for i in range(arr.shape[-1])]
>> return arr.view([(n, arr.dtype) for n in names]).squeeze(-1)
>>
>> Then you could do:
>>
>> >>> pack_last_axis(uv).tolist()
>>
>> to get a list of tuples.
>>
>
> not sure what idea is here -- in my example, I had a regular 2-d array, so
> no names:
>
> In [*90*]: pack_last_axis(uv)
>
> ---
>
> AttributeErrorTraceback (most recent call
> last)
>
>  in ()
>
> > 1 pack_last_axis(uv)
>
>
>  in pack_last_axis(arr, names)
>
> *  1* def pack_last_axis(arr, names=None):
>
> > 2 if arr.names:
>
> *  3* return arr
>
> *  4* names = names or ['f{}'.format(i) for i in range(arr.shape[-
> 1])]
>
> *  5* return arr.view([(n, arr.dtype) for n in names]).squeeze(-1)
>
>
> AttributeError: 'numpy.ndarray' object has no attribute 'names'
>
>
> So maybe you meants something like:
>
>
> In [*95*]: *def* pack_last_axis(arr, names=None):
>
> ...: *try*:
>
> ...: arr.names
>
> ...: *return* arr
>
> ...: *except* *AttributeError*:
>
> ...: names = names *or* ['f{}'.format(i) *for* i *in* range
> (arr.shape[-1])]
>
> ...: *return* arr.view([(n, arr.dtype) *for* n *in*
> names]).squeeze(-1)
>
> which does work, but seems like a convoluted way to get tuples!
>
> However, I didn't actually need tuples, I needed something I could pack
> into a stuctarray, and this does work, without the tolist:
>
> full = np.array(zip(time, pack_last_axis(uv)), dtype=dt)
>
>
> So maybe that is the way to go.
>
> I'm not sure I'd have thought to look for this function, but what can you
> do?
>
> Thanks for your attention to this,
>
> -CHB
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>

Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-26 Thread Chris Barker
On Fri, Jan 26, 2018 at 10:48 AM, Allan Haldane 
wrote:

> > What do folks think about a totuple() method — even before this I’ve
> > wanted that. But in this case, it seems particularly useful.
>


> Two thoughts:
>
> 1. `totuple` makes most sense for 2d arrays. But what should it do for
> 1d or 3+d arrays? I suppose it could make the last dimension a tuple, so
> 1d arrays would give a list of tuples of size 1.
>

I was thinking it would be exactly like .tolist() but with tuples -- so
you'd get tuples all the way down (or is that turtles?)

IN this use case, it would have saved me the generator expression:

(tuple(r) for r in arr)

not a huge deal, but it would be nice to not  have to write that, and to
have the looping be in C with no intermediate array generation.

2. structured array's .tolist() already returns a list of tuples. If we
> have a 2d structured array, would it add one more layer of tuples?


no -- why? it would return a tuple of tuples instead.


> That
> would raise an exception if read back in by `np.array` with the same dtype.
>

Hmm -- indeed, if the top-level structure is a tuple, the array constructor
gets confused:

This works fine -- as it should:


In [*84*]: new_full = np.array(full.tolist(), full.dtype)


But this does not:


In [*85*]: new_full = np.array(tuple(full.tolist()), full.dtype)

---

ValueErrorTraceback (most recent call last)

 in ()

> 1 new_full = np.array(tuple(full.tolist()), full.dtype)


ValueError: could not assign tuple of length 4 to structure with 2 fields.

I was hoping it would dig down to the inner structures looking for a match
to the dtype, rather than looking at the type of the top level. Oh well.

So yeah, not sure where you would go from tuple to list -- probably at the
bottom level, but that may not always be unambiguous.

These points make me think that instead of a `.totuple` method, this
> might be more suitable as a new function in np.lib.recfunctions.


I don't seem to have that module -- and I'm running 1.14.0 -- is this a new
idea?


> If the
> goal is to help manipulate structured arrays, that submodule is
> appropriate since it already has other functions do manipulate fields in
> similar ways. What about calling it `pack_last_axis`?
>
> def pack_last_axis(arr, names=None):
> if arr.names:
> return arr
> names = names or ['f{}'.format(i) for i in range(arr.shape[-1])]
> return arr.view([(n, arr.dtype) for n in names]).squeeze(-1)
>
> Then you could do:
>
> >>> pack_last_axis(uv).tolist()
>
> to get a list of tuples.
>

not sure what idea is here -- in my example, I had a regular 2-d array, so
no names:

In [*90*]: pack_last_axis(uv)

---

AttributeErrorTraceback (most recent call last)

 in ()

> 1 pack_last_axis(uv)


 in pack_last_axis(arr, names)

*  1* def pack_last_axis(arr, names=None):

> 2 if arr.names:

*  3* return arr

*  4* names = names or ['f{}'.format(i) for i in range(arr.shape[-1
])]

*  5* return arr.view([(n, arr.dtype) for n in names]).squeeze(-1)


AttributeError: 'numpy.ndarray' object has no attribute 'names'


So maybe you meants something like:


In [*95*]: *def* pack_last_axis(arr, names=None):

...: *try*:

...: arr.names

...: *return* arr

...: *except* *AttributeError*:

...: names = names *or* ['f{}'.format(i) *for* i *in* range
(arr.shape[-1])]

...: *return* arr.view([(n, arr.dtype) *for* n *in*
names]).squeeze(-1)

which does work, but seems like a convoluted way to get tuples!

However, I didn't actually need tuples, I needed something I could pack
into a stuctarray, and this does work, without the tolist:

full = np.array(zip(time, pack_last_axis(uv)), dtype=dt)


So maybe that is the way to go.

I'm not sure I'd have thought to look for this function, but what can you
do?

Thanks for your attention to this,

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-26 Thread Eric Wieser
Apologies, it seems that I skipped to the end of @ahaldane's remark - we're
on the same page.

On Fri, 26 Jan 2018 at 11:17 Eric Wieser 
wrote:

> Why is the list of tuples a useful thing to have in the first place? If
> the goal is to convert an array into a structured array, you can do that
> far more efficiently with:
>
> def make_tup_dtype(arr):
> """
> Attempt to make a type capable of viewing the last axis of an array, even 
> if it is non-contiguous.
> Unfortunately `.view` doesn't allow us to use this dtype in that case, 
> which needs a patch...
> """
> n_fields = arr.shape[-1]
> step = arr.strides[-1]
> descr = dict(names=[], formats=[], offsets=[], itemsize=step * n_fields)
> for i in range(n_fields):
> descr['names'].append('f{}'.format(i))
> descr['offsets'].append(step * i)
> descr['formats'].append(arr.dtype)
> return np.dtype(descr)
>
> Used as:
>
> >>> arr = np.arange(6).reshape(3, 2)>>> 
> >>> arr.view(make_tup_dtype(arr)).squeeze(axis=-1)
> array([(0, 1), (2, 3), (4, 5)],
>   dtype=[('f0', '
> Perhaps this should be provided by recfunctions (or maybe it already is,
> in a less rigid form?)
>
> Eric
> ​
>
> On Fri, 26 Jan 2018 at 10:48 Allan Haldane  wrote:
>
>> On 01/25/2018 08:53 PM, Chris Barker - NOAA Federal wrote:
>> >> On Jan 25, 2018, at 4:06 PM, Allan Haldane 
>> wrote:
>> >
>> >>> 1) This is a known change with good reason?
>> >
>> >> . The
>> >> change occurred because the old assignment behavior was dangerous, and
>> >> was not doing what you thought.
>> >
>> > OK, that’s a good reason!
>> >
>> >>> A) improve the error message.
>> >>
>> >> Good idea. I'll see if we can do it for 1.14.1.
>> >
>> > What do folks think about a totuple() method — even before this I’ve
>> > wanted that. But in this case, it seems particularly useful.
>> >
>> > -CHB
>>
>> Two thoughts:
>>
>> 1. `totuple` makes most sense for 2d arrays. But what should it do for
>> 1d or 3+d arrays? I suppose it could make the last dimension a tuple, so
>> 1d arrays would give a list of tuples of size 1.
>>
>> 2. structured array's .tolist() already returns a list of tuples. If we
>> have a 2d structured array, would it add one more layer of tuples? That
>> would raise an exception if read back in by `np.array` with the same
>> dtype.
>>
>> These points make me think that instead of a `.totuple` method, this
>> might be more suitable as a new function in np.lib.recfunctions. If the
>> goal is to help manipulate structured arrays, that submodule is
>> appropriate since it already has other functions do manipulate fields in
>> similar ways. What about calling it `pack_last_axis`?
>>
>> def pack_last_axis(arr, names=None):
>> if arr.names:
>> return arr
>> names = names or ['f{}'.format(i) for i in range(arr.shape[-1])]
>> return arr.view([(n, arr.dtype) for n in names]).squeeze(-1)
>>
>> Then you could do:
>>
>> >>> pack_last_axis(uv).tolist()
>>
>> to get a list of tuples.
>>
>> Allan
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-26 Thread Eric Wieser
Why is the list of tuples a useful thing to have in the first place? If the
goal is to convert an array into a structured array, you can do that far
more efficiently with:

def make_tup_dtype(arr):
"""
Attempt to make a type capable of viewing the last axis of an
array, even if it is non-contiguous.
Unfortunately `.view` doesn't allow us to use this dtype in that
case, which needs a patch...
"""
n_fields = arr.shape[-1]
step = arr.strides[-1]
descr = dict(names=[], formats=[], offsets=[], itemsize=step * n_fields)
for i in range(n_fields):
descr['names'].append('f{}'.format(i))
descr['offsets'].append(step * i)
descr['formats'].append(arr.dtype)
return np.dtype(descr)

Used as:

>>> arr = np.arange(6).reshape(3, 2)>>> 
>>> arr.view(make_tup_dtype(arr)).squeeze(axis=-1)
array([(0, 1), (2, 3), (4, 5)],
  dtype=[('f0', ' wrote:

> On 01/25/2018 08:53 PM, Chris Barker - NOAA Federal wrote:
> >> On Jan 25, 2018, at 4:06 PM, Allan Haldane 
> wrote:
> >
> >>> 1) This is a known change with good reason?
> >
> >> . The
> >> change occurred because the old assignment behavior was dangerous, and
> >> was not doing what you thought.
> >
> > OK, that’s a good reason!
> >
> >>> A) improve the error message.
> >>
> >> Good idea. I'll see if we can do it for 1.14.1.
> >
> > What do folks think about a totuple() method — even before this I’ve
> > wanted that. But in this case, it seems particularly useful.
> >
> > -CHB
>
> Two thoughts:
>
> 1. `totuple` makes most sense for 2d arrays. But what should it do for
> 1d or 3+d arrays? I suppose it could make the last dimension a tuple, so
> 1d arrays would give a list of tuples of size 1.
>
> 2. structured array's .tolist() already returns a list of tuples. If we
> have a 2d structured array, would it add one more layer of tuples? That
> would raise an exception if read back in by `np.array` with the same dtype.
>
> These points make me think that instead of a `.totuple` method, this
> might be more suitable as a new function in np.lib.recfunctions. If the
> goal is to help manipulate structured arrays, that submodule is
> appropriate since it already has other functions do manipulate fields in
> similar ways. What about calling it `pack_last_axis`?
>
> def pack_last_axis(arr, names=None):
> if arr.names:
> return arr
> names = names or ['f{}'.format(i) for i in range(arr.shape[-1])]
> return arr.view([(n, arr.dtype) for n in names]).squeeze(-1)
>
> Then you could do:
>
> >>> pack_last_axis(uv).tolist()
>
> to get a list of tuples.
>
> Allan
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Setting custom dtypes and 1.14

2018-01-26 Thread Allan Haldane
On 01/25/2018 08:53 PM, Chris Barker - NOAA Federal wrote:
>> On Jan 25, 2018, at 4:06 PM, Allan Haldane  wrote:
> 
>>> 1) This is a known change with good reason?
> 
>> . The
>> change occurred because the old assignment behavior was dangerous, and
>> was not doing what you thought.
> 
> OK, that’s a good reason!
> 
>>> A) improve the error message.
>>
>> Good idea. I'll see if we can do it for 1.14.1.
> 
> What do folks think about a totuple() method — even before this I’ve
> wanted that. But in this case, it seems particularly useful.
> 
> -CHB

Two thoughts:

1. `totuple` makes most sense for 2d arrays. But what should it do for
1d or 3+d arrays? I suppose it could make the last dimension a tuple, so
1d arrays would give a list of tuples of size 1.

2. structured array's .tolist() already returns a list of tuples. If we
have a 2d structured array, would it add one more layer of tuples? That
would raise an exception if read back in by `np.array` with the same dtype.

These points make me think that instead of a `.totuple` method, this
might be more suitable as a new function in np.lib.recfunctions. If the
goal is to help manipulate structured arrays, that submodule is
appropriate since it already has other functions do manipulate fields in
similar ways. What about calling it `pack_last_axis`?

def pack_last_axis(arr, names=None):
if arr.names:
return arr
names = names or ['f{}'.format(i) for i in range(arr.shape[-1])]
return arr.view([(n, arr.dtype) for n in names]).squeeze(-1)

Then you could do:

>>> pack_last_axis(uv).tolist()

to get a list of tuples.

Allan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Moving NumPy's PRNG Forward

2018-01-26 Thread Kevin Sheppard
I am a firm believer that the current situation is not sustainable.  There
are a lot of improvements that can practically be incorporated.  While many
of these are performance related, there are also improvements in accuracy
over some ranges of parameters that cannot be incorporated. I also think
that perfect stream reproducibility is a bit of a myth across versions
since this really would require identical OS, compiler and possibly CPU for
some of the generators that produce floats.

I believe there is a case for separating the random generator from core
NumPy.  Some points that favor becoming a subproject:

1. It is a pure consumer of NumPy API.  Other parts of the API do no depend
on random.
2. A stand alone package could be installed along side many different
version of core NumPy which would reduce the pressure on freezing the
stream.

In terms of what is needed, I think that the underlying PRNG should be
swappable.  The will provide a simple mechanism to allow certain types of
advancement while easily providing backward compat.  In the current design
this is very hard and requires compiling many nearly identical copies of
RandomState. In pseudocode something like

standard_normal(prng)

where prng is a basic class that retains the PRNG state and has a small set
of core random number generators that belong to the underlying PRNG --
probably something like int32, int64, double, and possibly int53. I am not
advocating explicitly passing the PRNG as an argument, but having
generators which can take any suitable PRNG would add a lot of flexibility
in terms of taking advantage of improvements in the underlying PRNGs (see,
e.g., xoroshiro128/xorshift1024).  The "small" core PRNG would have
responsibility over state and streams.  The remainder of the module would
transform the underlying PRNG into the required distributions.

This would also simplify making improvements, since old versions could be
saved or improved versions could be added to the API.  For example,

from numpy.random import standard_normal, prng # Preferred versions
standard_normal(prng)  # Ziggurat
from numpy.random.legacy import standard_normal_bm, mt19937 # legacy
generators
standard_normal_bm(mt19937) # Box-Muller

Kevin
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion