On Wed, May 18, 2016 at 1:14 AM, Nathaniel Smith <n...@pobox.com> wrote:
>
> On Tue, May 17, 2016 at 10:41 AM, Robert Kern <robert.k...@gmail.com>
wrote:
> > On Tue, May 17, 2016 at 6:24 PM, Nathaniel Smith <n...@pobox.com> wrote:
> >>
> >> On May 17, 2016 1:50 AM, "Robert Kern" <robert.k...@gmail.com> wrote:
> >> >
> >> [...]
> >> > What you want is a function that returns many RandomState objects
that
> >> > are hopefully spread around the MT19937 space enough that they are
> >> > essentially independent (in the absence of true jumpahead). The
better
> >> > implementation of such a function would look something like this:
> >> >
> >> > def spread_out_prngs(n, root_prng=None):
> >> >     if root_prng is None:
> >> >         root_prng = np.random
> >> >     elif not isinstance(root_prng, np.random.RandomState):
> >> >         root_prng = np.random.RandomState(root_prng)
> >> >     sprouted_prngs = []
> >> >     for i in range(n):
> >> >         seed_array = root_prng.randint(1<<32, size=624)  #
> >> > dtype=np.uint32 under 1.11
> >> >         sprouted_prngs.append(np.random.RandomState(seed_array))
> >> >     return spourted_prngs
> >>
> >> Maybe a nice way to encapsulate this in the RandomState interface
would be
> >> a method RandomState.random_state() that generates and returns a new
child
> >> RandomState.
> >
> > I disagree. This is a workaround in the absence of proper jumpahead or
> > guaranteed-independent streams. I would not encourage it.
> >
> >> > Internally, this generates seed arrays of about the size of the
MT19937
> >> > state so make sure that you can access more of the state space. That
will at
> >> > least make the chance of collision tiny. And it can be easily
rewritten to
> >> > take advantage of one of the newer PRNGs that have true independent
streams:
> >> >
> >> >   https://github.com/bashtage/ng-numpy-randomstate
> >>
> >> ... But unfortunately I'm not sure how to make my interface suggestion
> >> above work on top of one of these RNGs, because for
RandomState.random_state
> >> you really want a tree of independent RNGs and the fancy new PRNGs only
> >> provide a single flat namespace :-/. And even more annoyingly, the
tree API
> >> is actually a nicer API, because with a flat namespace you have to
know up
> >> front about all possible RNGs your code will use, which is an
unfortunate
> >> global coupling that makes it difficult to compose programs out of
> >> independent pieces, while the RandomState.random_state approach
composes
> >> beautifully. Maybe there's some clever way to allocate a 64-bit
namespace to
> >> make it look tree-like? I'm not sure 64 bits is really enough...
> >
> > MT19937 doesn't have a "tree" any more than the others. It's the same
flat
> > state space. You are just getting the illusion of a tree by hoping that
you
> > never collide. You ought to think about precisely the same global
coupling
> > issues with MT19937 as you do with guaranteed-independent streams.
> > Hope-and-prayer isn't really a substitute for properly engineering your
> > problem. It's just a moral hazard to promote this method to the main
API.
>
> Nonsense.
>
> If your definition of "hope and prayer" includes assuming that we
> won't encounter a random collision in a 2**19937 state space, then
> literally all engineering is hope-and-prayer. A collision could
> happen, but if it does it's overwhelmingly more likely to happen
> because of a flaw in the mathematical analysis, or a bug in the
> implementation, or because random quantum fluctuations caused you and
> your program to suddenly be transported to a parallel world where 1 +
> 1 = 1, than that you just got unlucky with your random state. And all
> of these hazards apply equally to both MT19937 and more modern PRNGs.

Granted.

> ...anyway, the real reason I'm a bit grumpy is because there are solid
> engineering reasons why users *want* this API,

I remain unconvinced on this mark. Grumpily.

> so whether or not it
> turns out to be possible I think we should at least be allowed to have
> a discussion about whether there's some way to give it to them.

I'm not shutting down discussion of the option. I *implemented* the option.
I think that discussing whether it should be part of the main API is
premature. There probably ought to be a paper or three out there supporting
its safety and utility first. Let the utility function version flourish
first.

> It's
> not even 100% out of the question that we conclude that existing PRNGs
> are buggy because they don't take this use case into account -- it
> would be far from the first time that numpy found itself going beyond
> the limits of older numerical tools that weren't designed to build the
> kind of large composable systems that numpy gets used for.
>
> MT19937's state space is large enough that you could explicitly encode
> a "tree seed" into it, even if you don't trust the laws of probability
> -- e.g., you start with a RandomState with id [], then its children
> have id [0], [1], [2], ..., and their children have ids [0, 0], [0,
> 1], ..., [1, 0], ..., and you write these into the state (probably
> after sending them through some bit-mixing permutation), to guarantee
> non-collision.

Sure. Not entirely sure if that can be done without preallocating the
branching factor or depth, but I'm sure there's some fancy combinatoric
representation of an unbounded tree that could be exploited here. It seems
likely to me that such a thing could be done with the stream IDs of PRNGs
that support that.

> Or if you do trust the laws of probability, then the
> randomly-sample-a-PRNG approach is not 100% out of the question even
> for more modern PRNGs.

Tread carefully here. PRNGs are not RNGs. Using the root PRNG to generate N
new seeds for the same PRNG does not necessarily generate N good,
uncorrelated streams with high probability. Overlap is not the only factor
in play. Long-term correlations happen even in the case where the N streams
do not overlap, and the structure of the root PRNG could well generate N
such correlated seeds.

http://www.adv-radio-sci.net/12/75/2014/

See section 4.4. It does look like one can get good MT19937 streams with
sequential integer seeds, when expanded by the standard sub-PRNG that we
use for that purpose (yay!). However, if you use a different (but otherwise
good-quality and general purpose) sub-PRNG to expand the key, you get
increasing numbers of failures as you increase the number of parallel
streams. It is not explored in the paper exactly why this is the case, but
I suspect that the similarity of the second sub-PRNG algorithm to that of
the MT itself is part of the problem.

It is *likely* that the scheme I implemented will be okay (our array-seed
expansion algorithm is similar to the integer-seed expansion and *probably*
insulates the sprouted MTs from their MT generating source), but that
remains to be demonstrated.

--
Robert Kern
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to