On Sat, Jul 4, 2020 at 1:03 PM Evgeni Burovski <evgeny.burovs...@gmail.com> wrote:
> Thanks Kevin, thanks Robert, this is very helpful! > > I'd strongly agree with Matti that your explanations could/should make > it to the docs. Maybe it's something for the GSoD. > > While we're on the subject, one comment and two (hopefully last) questions: > > 1. My two cents w.r.t. `np.random.simple_seed()` function Robert > mentioned: I personally would find it way more confusing than a clear > explanation + example in the docs. I'd ask myself what's "simple" > here, click through to the source of this `simple_seed`, find out that > it's a docsting and a two-liner, and just copy-paste the latter into > my user code. Again, just FWIW. > Noted. > 2. What would be a preferred way of spelling out "give me the N-th > spawned child SeedSequence"? > The use case is that I prepare (human-readable) input files once and > run a number of computational jobs in separate OS processes. From what > Kevin said, I can of course five each worker a pair of (entropy, > worker_id) and then each of them does at startup > > > parent_seq = SeedSequence(entropy) > > this_sequence = seed_seq.spawn(worker_id)[worker_id] > > Is this a recommended way, or is there a better API? Or does the > number of spawned children need to be known beforehand? > I'd much rather avoid serialization/deserialization if possible. > Assuming that `worker_id` starts at 0: this_sequence = SeedSequence(entropy, spawn_key=(worker_id,)) > 3. Is there a way of telling the number of draws a generator did? > > The use case is to checkpoint the number of draws and `.advance` the > bit generator when resuming from the checkpoint. (The runs are longer > then the batch queue limits). > There are computations you can do on the internal state of PCG64 and Philox to get this information, but not in general, no. I do recommend serializing the Generator or BitGenerator (or at least the BitGenerator's .state property, which is a nice JSONable dict for PCG64) for checkpointing purposes. Among other things, there is a cached uint32 for when odd numbers of uint32s are drawn that you might need to handle. The state of the default PCG64 is much smaller than MT19937. It's less work and more reliable than computing that distance and storing the original seed and the distance. -- Robert Kern
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion