Re: [Numpy-discussion] locking np.random.Generator in a cython nogil context?
On Tue, Dec 15, 2020 at 1:00 AM Robert Kern wrote: > > On Mon, Dec 14, 2020 at 3:27 PM Evgeni Burovski > wrote: >> >> >> >> > I also think that the lock only matters for Multithreaded code not >> > Multiprocess. I believe the latter pickles and unpickles any Generator >> > object (and the underying BitGenerator) and so each process has its own >> > version. Note that when multiprocessing the recommended procedure is to >> > use spawn() to generate a sequence of BitGenerators and to use a distinct >> > BitGenerator in each process. If you do this then you are free from the >> > lock. >> >> Thanks. Just to confirm: does using SeedSequence spawn_key arg >> generate distinct BitGenerators? As in >> >> cdef class Wrapper(): >> def __init__(self, seed): >> entropy, num = seed >> py_gen = PCG64(SeedSequence(entropy, spawn_key=(spawn_key,))) >> self.rng = >> py_gen.capsule.PyCapsule_GetPointer(capsule, "BitGenerator")# <--- >> this >> >> cdef Wrapper rng_0 = Wrapper(seed=(123, 0)) >> cdef Wrapper rng_1 = Wrapper(seed=(123, 1)) >> >> And then,of these two objects, do they have distinct BitGenerators? > > > The code you wrote doesn't work (`spawn_key` is never assigned). I can guess > what you meant to write, though, and yes, you would get distinct > `BitGenerator`s. However, I do not recommend using `spawn_key` explicitly. > The `SeedSequence.spawn()` method internally keeps track of how many children > it has spawned and uses that to construct the `spawn_key`s for its subsequent > children. If you play around with making your own `spawn_key`s, then the > parent `SeedSequence(entropy)` might spawn identical `SeedSequence`s to the > ones you constructed. > > If you don't want to use the `spawn()` API to construct the separate > `SeedSequence`s but still want to incorporate some per-process information > into the seeds (e.g. the 0 and 1 in your example), then note that a tuple of > integers is a valid value for the `entropy` argument. You can have the first > item be the same (i.e. per-run information) and the second item be a > per-process ID or counter. > > cdef class Wrapper(): > def __init__(self, seed): > py_gen = PCG64(SeedSequence(seed)) > self.rng = py_gen.capsule.PyCapsule_GetPointer(capsule, > "BitGenerator") > > cdef Wrapper rng_0 = Wrapper(seed=(123, 0)) > cdef Wrapper rng_1 = Wrapper(seed=(123, 1)) Thanks Robert! I indeed typo'd the spawn_key, and indeed the intention is exactly to include a worker_id into a seed to make sure each worker gets a separate stream. The use of the spawn_key was --- as I now finally realize --- a misunderstanding of your and Kevin's previous replies in https://mail.python.org/pipermail/numpy-discussion/2020-July/080833.html So I'm moving my project to use the `SeedSequence((base_seed, worker_id))` API --- thanks! Just as a side note, this is not very prominent in the docs, and I'm ready to volunteer to send a doc PR --- I'm only not sure which part of the docs, and would appreciate a pointer. ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] locking np.random.Generator in a cython nogil context?
On 12/17/20 11:47 AM, Evgeni Burovski wrote: Just as a side note, this is not very prominent in the docs, and I'm ready to volunteer to send a doc PR --- I'm only not sure which part of the docs, and would appreciate a pointer. Maybe here https://numpy.org/devdocs/reference/random/bit_generators/index.html#seeding-and-entropy which is here in the sources https://github.com/numpy/numpy/blob/master/doc/source/reference/random/bit_generators/index.rst#seeding-and-entropy And/or in the SeedSequence docstring documentation https://numpy.org/devdocs/reference/random/bit_generators/generated/numpy.random.SeedSequence.html#numpy.random.SeedSequence which is here in the sources https://github.com/numpy/numpy/blob/master/numpy/random/bit_generator.pyx#L255 Matti ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] datetime64: Remove deprecation warning when constructing with timezone
Noam Yorav-Raphael wrote > The solution is simple, and is what datetime64 used to do before the > change > - have a type that just represents a moment in time. It's not "in UTC" - > it > just stores the number of seconds that passed since an agreed moment in > time (which is usually 1970-01-01 02:00+0200, which is more commonly > referred to as 1970-01-01 00:00Z - it's the exact same moment). I agree with this. I understand the issue of parsing arbitrary timestamps with incomplete information, however it's not clear to me why it has become more difficult to work with ISO 8601 timestamps. For example, we use numpy.genfromtxt to load an array with UTC offset timestamps e.g. `2020-08-19T12:42:57.7903616-04:00`. If loading this array took 0.0352s without having to convert, it now takes 0.8615s with the following converter: >>> lambda x: >>> dateutil.parser.parse(x).astimezone(timezone.utc).replace(tzinfo=None) That's a huge performance hit to do something that should be considered a standard operation, namely loading ISO compliant data. There may be more efficient converters out there but it seems strange to employ an external function to remove precision from an ISO datatype. As an aside, with or without the converter, numpy.genfromtxt is consistently faster than numpy.loadtxt, despite the documentation stating otherwise. I feel there's a lack of guidance in the documentation on this issue. In most threads I've encountered on this the first recommendation is to use pandas. The most effective way to crack a nut should not be to use a sledgehammer. The purpose of introducing standards should be to make these sorts of operations trivial and efficient. Perhaps I'm missing the solution here... -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] locking np.random.Generator in a cython nogil context?
On Thu, Dec 17, 2020 at 1:01 PM Matti Picus wrote: > > > On 12/17/20 11:47 AM, Evgeni Burovski wrote: > > Just as a side note, this is not very prominent in the docs, and I'm > > ready to volunteer to send a doc PR --- I'm only not sure which part > > of the docs, and would appreciate a pointer. > > Maybe here > > https://numpy.org/devdocs/reference/random/bit_generators/index.html#seeding-and-entropy > > which is here in the sources > > https://github.com/numpy/numpy/blob/master/doc/source/reference/random/bit_generators/index.rst#seeding-and-entropy > > > And/or in the SeedSequence docstring documentation > > https://numpy.org/devdocs/reference/random/bit_generators/generated/numpy.random.SeedSequence.html#numpy.random.SeedSequence > > which is here in the sources > > https://github.com/numpy/numpy/blob/master/numpy/random/bit_generator.pyx#L255 Here's the PR, https://github.com/numpy/numpy/pull/18014 Two minor comments, both OT for the PR: 1. The recommendation to seed the generators from the OS --- I've been bitten by exactly this once. That was a rather exotic combination of a vendor RNG and a batch queueing system, and some of my runs did end up with identical random streams. Given that the recommendation is what it is, it probably means that experience is a singular point and it no longer happens with modern generators. 2. Robert's comment that `SeedSequence(..., spawn_key=(num,))` is not equivalent to `SeedSequence(...).spawn(num)[num]` and that the former is not recommended. I'm not questioning the recommendation, but then __repr__ seems to suggest the equivalence: In [2]: from numpy.random import PCG64, SeedSequence In [3]: base_seq = SeedSequence(1234) In [4]: base_seq.spawn(8) Out[4]: [SeedSequence( entropy=1234, spawn_key=(0,), ), Evgeni ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] locking np.random.Generator in a cython nogil context?
On Thu, Dec 17, 2020 at 9:56 AM Evgeni Burovski wrote: > On Thu, Dec 17, 2020 at 1:01 PM Matti Picus wrote: > > > > > > On 12/17/20 11:47 AM, Evgeni Burovski wrote: > > > Just as a side note, this is not very prominent in the docs, and I'm > > > ready to volunteer to send a doc PR --- I'm only not sure which part > > > of the docs, and would appreciate a pointer. > > > > Maybe here > > > > > https://numpy.org/devdocs/reference/random/bit_generators/index.html#seeding-and-entropy > > > > which is here in the sources > > > > > https://github.com/numpy/numpy/blob/master/doc/source/reference/random/bit_generators/index.rst#seeding-and-entropy > > > > > > And/or in the SeedSequence docstring documentation > > > > > https://numpy.org/devdocs/reference/random/bit_generators/generated/numpy.random.SeedSequence.html#numpy.random.SeedSequence > > > > which is here in the sources > > > > > https://github.com/numpy/numpy/blob/master/numpy/random/bit_generator.pyx#L255 > > > Here's the PR, https://github.com/numpy/numpy/pull/18014 > > Two minor comments, both OT for the PR: > > 1. The recommendation to seed the generators from the OS --- I've been > bitten by exactly this once. That was a rather exotic combination of a > vendor RNG and a batch queueing system, and some of my runs did end up > with identical random streams. Given that the recommendation is what > it is, it probably means that experience is a singular point and it no > longer happens with modern generators. > I suspect the vendor RNG was rolling its own entropy using time. We use `secrets.getrandbits()`, which ultimately uses the best cryptographic entropy source available. And if there is no cryptographic entropy source available, I think we fail hard instead of falling back to less reliable things like time. I'm not entirely sure that's a feature, but it is safe! > 2. Robert's comment that `SeedSequence(..., spawn_key=(num,))` is not > equivalent to `SeedSequence(...).spawn(num)[num]` and that the former > is not recommended. I'm not questioning the recommendation, but then > __repr__ seems to suggest the equivalence: > I was saying that they were equivalent. That's precisely why it's not recommended: it's too easy to do both and get identical streams inadvertently. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion