Re: [Numpy-discussion] locking np.random.Generator in a cython nogil context?

2020-12-17 Thread Evgeni Burovski
On Tue, Dec 15, 2020 at 1:00 AM Robert Kern  wrote:
>
> On Mon, Dec 14, 2020 at 3:27 PM Evgeni Burovski  
> wrote:
>>
>> 
>>
>> > I also think that the lock only matters for Multithreaded code not 
>> > Multiprocess.  I believe the latter pickles and unpickles any Generator 
>> > object (and the underying BitGenerator) and so each process has its own 
>> > version.  Note that when multiprocessing the recommended procedure is to 
>> > use spawn() to generate a sequence of BitGenerators and to use a distinct 
>> > BitGenerator in each process. If you do this then you are free from the 
>> > lock.
>>
>> Thanks. Just to confirm: does using SeedSequence spawn_key arg
>> generate distinct BitGenerators? As in
>>
>> cdef class Wrapper():
>> def __init__(self, seed):
>> entropy, num = seed
>> py_gen = PCG64(SeedSequence(entropy, spawn_key=(spawn_key,)))
>> self.rng = 
>> py_gen.capsule.PyCapsule_GetPointer(capsule, "BitGenerator")# <---
>> this
>>
>> cdef Wrapper rng_0 = Wrapper(seed=(123, 0))
>> cdef Wrapper rng_1 = Wrapper(seed=(123, 1))
>>
>> And then,of these two objects, do they have distinct BitGenerators?
>
>
> The code you wrote doesn't work (`spawn_key` is never assigned). I can guess 
> what you meant to write, though, and yes, you would get distinct 
> `BitGenerator`s. However, I do not recommend using `spawn_key` explicitly. 
> The `SeedSequence.spawn()` method internally keeps track of how many children 
> it has spawned and uses that to construct the `spawn_key`s for its subsequent 
> children. If you play around with making your own `spawn_key`s, then the 
> parent `SeedSequence(entropy)` might spawn identical `SeedSequence`s to the 
> ones you constructed.
>
> If you don't want to use the `spawn()` API to construct the separate 
> `SeedSequence`s but still want to incorporate some per-process information 
> into the seeds (e.g. the 0 and 1 in your example), then note that a tuple of 
> integers is a valid value for the `entropy` argument. You can have the first 
> item be the same (i.e. per-run information) and the second item be a 
> per-process ID or counter.
>
> cdef class Wrapper():
> def __init__(self, seed):
> py_gen = PCG64(SeedSequence(seed))
> self.rng = py_gen.capsule.PyCapsule_GetPointer(capsule, 
> "BitGenerator")
>
> cdef Wrapper rng_0 = Wrapper(seed=(123, 0))
> cdef Wrapper rng_1 = Wrapper(seed=(123, 1))


Thanks Robert!

I indeed typo'd the spawn_key, and indeed the intention is exactly to
include a worker_id into a seed to make sure each worker gets a
separate stream.

The use of the spawn_key was --- as I now finally realize --- a
misunderstanding of your and Kevin's previous replies in
https://mail.python.org/pipermail/numpy-discussion/2020-July/080833.html

So I'm moving my project to use the `SeedSequence((base_seed,
worker_id))` API --- thanks!

Just as a side note, this is not very prominent in the docs, and I'm
ready to volunteer to send a doc PR --- I'm only not sure which part
of the docs, and would appreciate a pointer.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] locking np.random.Generator in a cython nogil context?

2020-12-17 Thread Matti Picus



On 12/17/20 11:47 AM, Evgeni Burovski wrote:

Just as a side note, this is not very prominent in the docs, and I'm
ready to volunteer to send a doc PR --- I'm only not sure which part
of the docs, and would appreciate a pointer.


Maybe here

https://numpy.org/devdocs/reference/random/bit_generators/index.html#seeding-and-entropy

which is here in the sources

https://github.com/numpy/numpy/blob/master/doc/source/reference/random/bit_generators/index.rst#seeding-and-entropy


And/or in the SeedSequence docstring documentation

https://numpy.org/devdocs/reference/random/bit_generators/generated/numpy.random.SeedSequence.html#numpy.random.SeedSequence

which is here in the sources

https://github.com/numpy/numpy/blob/master/numpy/random/bit_generator.pyx#L255


Matti



___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] datetime64: Remove deprecation warning when constructing with timezone

2020-12-17 Thread k1o0
Noam Yorav-Raphael wrote
> The solution is simple, and is what datetime64 used to do before the
> change
> - have a type that just represents a moment in time. It's not "in UTC" -
> it
> just stores the number of seconds that passed since an agreed moment in
> time (which is usually 1970-01-01 02:00+0200, which is more commonly
> referred to as 1970-01-01 00:00Z - it's the exact same moment).

I agree with this.  I understand the issue of parsing arbitrary timestamps
with incomplete information, however it's not clear to me why it has become
more difficult to work with ISO 8601 timestamps.  For example, we use
numpy.genfromtxt to load an array with UTC offset timestamps e.g.
`2020-08-19T12:42:57.7903616-04:00`.  If loading this array took 0.0352s
without having to convert, it now takes 0.8615s with the following
converter:

>>> lambda x:
>>> dateutil.parser.parse(x).astimezone(timezone.utc).replace(tzinfo=None)

That's a huge performance hit to do something that should be considered a
standard operation, namely loading ISO compliant data.  There may be more
efficient converters out there but it seems strange to employ an external
function to remove precision from an ISO datatype.  As an aside, with or
without the converter, numpy.genfromtxt is consistently faster than
numpy.loadtxt, despite the documentation stating otherwise.

I feel there's a lack of guidance in the documentation on this issue.  In
most threads I've encountered on this the first recommendation is to use
pandas.  The most effective way to crack a nut should not be to use a
sledgehammer.  The purpose of introducing standards should be to make these
sorts of operations trivial and efficient.  Perhaps I'm missing the solution
here...



--
Sent from: http://numpy-discussion.10968.n7.nabble.com/
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] locking np.random.Generator in a cython nogil context?

2020-12-17 Thread Evgeni Burovski
On Thu, Dec 17, 2020 at 1:01 PM Matti Picus  wrote:
>
>
> On 12/17/20 11:47 AM, Evgeni Burovski wrote:
> > Just as a side note, this is not very prominent in the docs, and I'm
> > ready to volunteer to send a doc PR --- I'm only not sure which part
> > of the docs, and would appreciate a pointer.
>
> Maybe here
>
> https://numpy.org/devdocs/reference/random/bit_generators/index.html#seeding-and-entropy
>
> which is here in the sources
>
> https://github.com/numpy/numpy/blob/master/doc/source/reference/random/bit_generators/index.rst#seeding-and-entropy
>
>
> And/or in the SeedSequence docstring documentation
>
> https://numpy.org/devdocs/reference/random/bit_generators/generated/numpy.random.SeedSequence.html#numpy.random.SeedSequence
>
> which is here in the sources
>
> https://github.com/numpy/numpy/blob/master/numpy/random/bit_generator.pyx#L255


Here's the PR, https://github.com/numpy/numpy/pull/18014

Two minor comments, both OT for the PR:

1. The recommendation to seed the generators from the OS --- I've been
bitten by exactly this once. That was a rather exotic combination of a
vendor RNG and a batch queueing system, and some of my runs did end up
with identical random streams. Given that the recommendation is what
it is, it probably means that experience is a singular point and it no
longer happens with modern generators.

2. Robert's comment that `SeedSequence(..., spawn_key=(num,))`  is not
equivalent to `SeedSequence(...).spawn(num)[num]` and that the former
is not recommended. I'm not questioning the recommendation, but then
__repr__ seems to suggest the equivalence:

In [2]: from numpy.random import PCG64, SeedSequence

In [3]: base_seq = SeedSequence(1234)

In [4]: base_seq.spawn(8)
Out[4]:
[SeedSequence(
 entropy=1234,
 spawn_key=(0,),
 ),



Evgeni
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] locking np.random.Generator in a cython nogil context?

2020-12-17 Thread Robert Kern
On Thu, Dec 17, 2020 at 9:56 AM Evgeni Burovski 
wrote:

> On Thu, Dec 17, 2020 at 1:01 PM Matti Picus  wrote:
> >
> >
> > On 12/17/20 11:47 AM, Evgeni Burovski wrote:
> > > Just as a side note, this is not very prominent in the docs, and I'm
> > > ready to volunteer to send a doc PR --- I'm only not sure which part
> > > of the docs, and would appreciate a pointer.
> >
> > Maybe here
> >
> >
> https://numpy.org/devdocs/reference/random/bit_generators/index.html#seeding-and-entropy
> >
> > which is here in the sources
> >
> >
> https://github.com/numpy/numpy/blob/master/doc/source/reference/random/bit_generators/index.rst#seeding-and-entropy
> >
> >
> > And/or in the SeedSequence docstring documentation
> >
> >
> https://numpy.org/devdocs/reference/random/bit_generators/generated/numpy.random.SeedSequence.html#numpy.random.SeedSequence
> >
> > which is here in the sources
> >
> >
> https://github.com/numpy/numpy/blob/master/numpy/random/bit_generator.pyx#L255
>
>
> Here's the PR, https://github.com/numpy/numpy/pull/18014
>
> Two minor comments, both OT for the PR:
>
> 1. The recommendation to seed the generators from the OS --- I've been
> bitten by exactly this once. That was a rather exotic combination of a
> vendor RNG and a batch queueing system, and some of my runs did end up
> with identical random streams. Given that the recommendation is what
> it is, it probably means that experience is a singular point and it no
> longer happens with modern generators.
>

I suspect the vendor RNG was rolling its own entropy using time. We use
`secrets.getrandbits()`, which ultimately uses the best cryptographic
entropy source available. And if there is no cryptographic entropy source
available, I think we fail hard instead of falling back to less reliable
things like time. I'm not entirely sure that's a feature, but it is safe!


> 2. Robert's comment that `SeedSequence(..., spawn_key=(num,))`  is not
> equivalent to `SeedSequence(...).spawn(num)[num]` and that the former
> is not recommended. I'm not questioning the recommendation, but then
> __repr__ seems to suggest the equivalence:
>

I was saying that they were equivalent. That's precisely why it's not
recommended: it's too easy to do both and get identical streams
inadvertently.

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion