Thank you again Robert. I am using NamedTuple for mye keys, which also are keys in a dictionary. Each key will be unique (tuple on distinct int and enum), so I am thinking maybe the risk of producing duplicate hash is not present, but could as always be wrong :) For positive ints i followed this tip https://stackoverflow.com/questions/18766535/positive-integer-from-python-hash-function , and did:
def stronghash(key:ComponentId): return ctypes.c_size_t(hash(key)).value Since I will be using each process/random sample several times, and keeping all of them in memory at once is not feasible (dimensionality) i did the following: self._rng = default_rng(cs) self._state = dict(self._rng.bit_generator.state) # def scenarios(self) -> npt.NDArray[np.float64]: self._rng.bit_generator.state = self._state .... return .... Would you consider this bad practice, or an ok solution? I Norway we have a saying which directly translates :" He asked for the finger... and took the whole arm" . Best, Stig fre. 27. aug. 2021 kl. 17:01 skrev Robert Kern <robert.k...@gmail.com>: > joblib is a library that uses clever caching of function call results to > make the development of certain kinds of data-heavy computational pipelines > easier. In order to derive the key to be used to check the cache, joblib > has to look at the arguments passed to the function, which may > involve usually-nonhashable things like large numpy arrays. > > https://joblib.readthedocs.io/en/latest/ > > So they constructed joblib.hash() which basically takes the arguments, > pickles them into a bytestring (with some implementation details), then > computes an MD5 hash on that. It's probably overkill for your keys, but > it's easily available and quite generic. It returns a hex-encoded string of > the 128-bit MD5 hash. `int(..., 16)` will convert that to a non-negative > (almost-certainly positive!) integer that can be fed into SeedSequence. > > On Fri, Aug 27, 2021 at 5:03 AM Stig Korsnes <stigkors...@gmail.com> > wrote: > >> Thank you Robert! >> This scheme fits perfectly into what I`m trying to accomplish! :) The >> "smooshing" of ints by supplying a list of ints had eluded me. Thank you >> also for the pointer about built-in hash(). I would not be able to rely on >> it anyways, because it does not return strictly positive ints which >> SeedSequence requires. If you have a minute to spare: Could you briefly >> explain "int(joblib.hash(key) >> <https://joblib.readthedocs.io/en/latest/generated/joblib.hash.html>, >> 16)" , and would this always return non-negative integers? >> Thanks again! >> >> tor. 26. aug. 2021 kl. 22:59 skrev Robert Kern <robert.k...@gmail.com>: >> >>> On Thu, Aug 26, 2021 at 2:22 PM Stig Korsnes <stigkors...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> Is there a way to uniquely spawn child seeds? >>>> I`m doing monte carlo analysis, where I have n random processes, each >>>> with their own generator. >>>> All process models instantiate a generator with default_rng(). I.e >>>> ss=SeedSequence() cs=ss.Spawn(n), and using cs[i] for process i. Now, the >>>> problem I`m facing, is that results using individual process depends on >>>> the order of the process initialization ,and the number of processes used. >>>> However, if I could spawn children with a unique identifier, I would be >>>> able to reproduce my individual results without having to pickle/log >>>> states. For example, all my models have an id (tuple) field which is >>>> hashable. >>>> If I had the ability to SeedSequence(x).Spawn([objects]) where objects >>>> support hash(object), I would have reproducibility for all my processes. I >>>> could do without the spawning, but then I would probably loose independence >>>> when I do multiproc? Is there a way to achieve my goal in the current >>>> version 1.21 of numpy? >>>> >>> >>> I would probably not rely on `hash()` as it is only intended to be >>> pretty good at getting distinct values from distinct inputs. If you can >>> combine the tuple objects into a string of bytes in a reliable, >>> collision-free way and use one of the cryptographic hashes to get them down >>> to a 128bit number, that'd be ideal. `int(joblib.hash(key) >>> <https://joblib.readthedocs.io/en/latest/generated/joblib.hash.html>, >>> 16)` should do nicely. You can combine that with your main process's seed >>> easily. SeedSequence can take arbitrary amounts of integer data and smoosh >>> them all together. The spawning functionality builds off of that, but you >>> can also just manually pass in lists of integers. >>> >>> Let's call that function `stronghash()`. Let's call your main process >>> seed number `seed` (this is the thing that the user can set on the >>> command-line or something you get from `secrets.randbits(128)` if you need >>> a fresh one). Let's call the unique tuple `key`. You can build the >>> `SeedSequence` for each job according to the `key` like so: >>> >>> root_ss = SeedSequence(seed) >>> for key, data in jobs: >>> child_ss = SeedSequence([stronghash(key), seed]) >>> submit_job(key, data, seed=child_ss) >>> >>> Now each job will get its own unique stream regardless of the order the >>> job is assigned. When the user reruns it with the same root `seed`, they >>> will get the same results. When the user chooses a different `seed`, they >>> will get another set of results (this is why you don't want to just use >>> `SeedSequence(stronghash(key))` all by itself). >>> >>> I put the job-specific seed data ahead of the main program's seed to be >>> on the super-safe side. The spawning mechanism will append integers to the >>> end, so there's a super-tiny chance somewhere down a long line of >>> `root_ss.spawn()`s that there would be a collision (and I mean >>> super-extra-tiny). But best practices cost nothing. >>> >>> I hope that helps and is not too confusing! >>> >>> -- >>> Robert Kern >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> > > -- > Robert Kern > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion