On 11/05/14 05:58, Nick Sabalausky via Digitalmars-d wrote:
The seed doesn't need to be compromised for synchronized RNGs to fubar security.

Before we go onto the detail of discussion, thanks very much for the extensive explanation. I was slightly worried that my previous email might have come across as dismissive of your (completely understandable) concerns. I'm actually quite keen that we can find a mutually agreeable solution :-)

It's bad enough just to accidentally generate the same byte sequence for
multiple things:

struct CryptoRng { ulong internalState; ...etc... }
CryptoRng myRand;

// Oops! Duplicates internal state!
ubyte[] getRandomJunk(CryptoRng rand) {
     return rand.take(64).array();
}

// Oops! These are all identical! Passwords may as well be unsalted!
auto saltForStevensPassword =  getRandomJunk(myRand);
auto saltForSallysPassword =  getRandomJunk(myRand);
auto saltForGregsPassword =  getRandomJunk(myRand);

// Oops! Also identical!
auto oneTimeURL1 =  "https://verify-email/a@a.a/"~getRandomJunk(myRand);
auto oneTimeURL2 =  "https://verify-email/b@b.b/"~getRandomJunk(myRand);
auto oneTimeURL3 =  "https://verify-email/c@c.c/"~getRandomJunk(myRand);

May as well have just defined getRandomJunk as a constant ;)

Sure, but this is a consequence of two things (i) CryptoRng is a value type and (ii) it gets passed by value, not by reference.

In your example, obviously one can fix the problem by having the function declared instead as

    ubyte[] getRandomJunk(ref CryptoRng rand) { ... }

but I suspect you'd say (and I would agree) that this is inadequate: it's relying on programmer virtue to ensure the correct behaviour, and sooner or later someone will forget that "ref". (It will also not handle other cases, such as an entity that needs to internally store the RNG, as we've discussed many times on the list with reference to e.g. RandomSample or RandomCover.)

Obviously one _can_ solve the problem by the internal state variables of the RNG being static, but I'd suggest to you that RNG-as-reference-type (which doesn't necessarily need to mean "class") solves that particular problem without the constraints that static internal variables have.

Maybe not *too* terrible for randomized alien invader attack patterns (unless it
makes the game boring and doesn't sell), but a fairly major failure for security
purposes.

There's another issue, too:

First a little background: With Hash_DRBG (and I would assume other crypto RNGs
as well), the RNG is periodically reseeded, both automatically and on-demand.
This reseeding is *not* like normal non-crypto reseeding: With non-crypto
reseeding, you're fully resetting the internal state to a certain specific value
and thus resetting the existing entropy, replacing it with the fresh new
entropy. But with Hash_DRBG, reseeding only *adds* additional fresh entropy -
nothing gets reset. Any existing entropy still remains. So reseeding two
different Hash_DRBGs with the *same* reseed data *still* results in two
completely different internal states. This is deliberately defined by the NIST
spec.

There are reasons for that, but one noteworthy side-consequence is this: If all
your Hash_DRBG's share the same internal state, then anytime one is reseeded
with fresh entropy, they *all* benefit from the fresh entropy, for free.

And as a (maybe smaller) additional benefit, if different parts of your program
are using an RNG independently of each other, but the RNGs internally share the
same state, then *each* component contributes additional (semi-)non-determinism
to the pool. That part's true of other RNGs too, but the effect is compounded
further by Hash_DRBG which updates the internal state differently depending on
how much data is requested at each, umm...at each request.

This is a very interesting point. My understanding is that in many (most?) programs, each individual thread does typically operate using a single RNG instance anyway. The question for me is whether that's a design choice of the developer or something that is an unavoidable outcome of the RNG design.

For example, it's readily possible to just instantiate a single RNG (crypto or otherwise, as required) at the start of the program and pass that around: if it's a reference type (as I would argue it should be) then the effects you desire will still take place.

Alternatively, one can also implement a thread-global RNG instance which functions will use. This is already done with rndGen, which is effectively a singleton pattern. There's no reason why one couldn't also define a cryptoGen which works in a similar way, just using an instance of a cryto RNG where rndGen uses a Mersenne Twister:
https://github.com/D-Programming-Language/phobos/blob/d2f2d34d52f89dc9854069b13f52523965a7107e/std/random.d#L1161-L1174
https://github.com/WebDrake/std.random2/blob/a071d8d182eb28516198bb292a0b45f86f4425fe/std/random2/generator.d#L60-L81

The benefit of doing it this way, as opposed to static internal variables, is that you aren't then constrained to have only one single, effectively thread-global instance of _every_ RNG you create.

However...the mentions of Minecraft did raise one point that I'll concede: If
you have a multiplayer environment (or more generally, a large chunk of shared
multi-computer data) that's randomly-generated, then a very careful usage of a
shared seed and RNG algorithm could drastically reduce the network bandwidth
requirements.

You may also recall earlier games like Elite which were able to "store" whole universes on a single 5 1/4" floppy by virtue of auto-generating that universe from a repeatable random sequence. It was amazing what could be done on the old BBC Micro :-)

I do agree strongly that there's a major difference between "relying on
deterministic sequence from an RNG" and "principle of design".

However, I also maintain that the "principle of design" case demands you should
*NOT* get the same results from RNGs unless you *specifically intend* to be
"relying on deterministic sequence from an RNG". After all, they *are* supposed
to be "random number generators". That's their primary function.

I do believe strongly in the same principles of encapsulation that you do. But
the thing is, the usual rules of proper encapsulated design are deliberately and
fundamentally predicated on the assumption that you *want* reliably predictable
deterministic behavior. The vast majority of the time, that's a correct
assumption - but the whole *point* of an RNG is to *not* be deterministic. Or at
least to get as close to non-determinism as reasonably possible within a
fundamentally deterministic system.

Since the basic goals of RNGs are in fundamental disagreement with the
underlying assumptions of usual "well encapsulated" design, that calls into
question the applicability of usual encapsulation rules to RNGs.

At the risk of being presumptuous, I think you do agree on some level. Isn't
that the whole reason you've proposed that RNGs change from structs to classes?
Because you don't want identical sequences from RNGs unless *specifically*
intended, right? The only parts of that I've intended to question are:

I completely agree that _unintended_ identical RNG sequences are a very bad thing that must be avoided. I think that we're in disagreement only about how best to achieve that in a sane/safe/uncomplicated way. With luck we will find something that works for both of us :-)

What I feel is that static internal variables go too far in the opposite direction: they make it impossible to have even _intentionally_ identical RNG sequences. There's no way, for example, to .save or .dup an RNG instance with this design.

However, the effect of a common and reliable source of randomness, used by all parts of the program, can be implemented instead by a thread-global singleton instance of an RNG as described above. This seems to me to be a better solution as it gives you what you achieve using static variables, but without preventing you from instantiating as many other independent RNG instances as you like.

A. Does that really necessitate moving to classes?

No, the important distinction is value vs. reference types, not structs v. classes. One could implement the RNGs as struct-based reference types.

I wrote a class-based implementation because I wanted to see how that came out to various struct-based approaches monarch_dodra and I had tried. I do think that it offers much value in terms of simplicity and elegance of design, but there may be other costs that make it untenable. I am not sure the correct choice will be apparent until the current work on memory allocation settles down.

B. Are there any legitimate use-cases for "specifically relying on deterministic
sequence from an RNG"?

Well, when I was writing large-scale scientific simulations, it was certainly useful to be able to re-run them with the same random sequence. Given the number of random variates I was generating in the process, it was not really practical to store the sequence on memory or disk. The same is true of many circumstances where one is writing a program that relies on randomness, but wants to test it reliably.

However, fundamentally this comes down to a design point. Pseudo-random RNGs are _always_ deterministic in their operation once seeded. The difference we're really talking about here is whether the source of randomness used by different parts of a program is thread-global or not.

And so, the question becomes rather, "Is it legitimate to enforce a thread-global source of randomness in all circumstances?" I'd say the answer to that is a definite no.

Bear in mind that one nasty consequence of using static internal variables to force thread-global behaviour is that it means that the presence or absence of unittests might affect the actual running program. Consider:

    struct SomeRNG { private static InternalState _internals; ... }

    unittest
    {
        SomeRNG rng;
        rng.seed(KnownSeed);
        // verify that SomeRNG produces expected sequence of variates
    }

Now when you start your _actual_ program, the internal state of SomeRNG will already have been set by the unittests. Obviously if you are doing things properly you will scrub this by re-seeding the RNG at the start of your program proper, but the risk is clearly still there.

For B, the above example of "reducing bandwidth/storage requirements for a large
set of shared random-generated data" is making me question my former stance of
"No, there's no legitimate use-case". If B falls (and I suppose it maybe it
does), then A does too, and classes would indeed be necessary.

Yes, I think that is one very good example of a use-case, which is probably one shared between a bunch of different applications in practice (scientific simulation, games, probably other stuff too).

As I said before, I don't think it mandates _classes_, but it does mandate reference-type RNGs that are properly encapsulated.

But that's all for non-crypto RNGs. For crypto-RNGs, determinism is *never*
anything more than a concession forced by the deterministic nature of the
machines themselves. FWIW, this does mean  crypto-RNGs can go either struct or
class, since the internal states really should be tied together anyway (in order
to prevent their *outputs* from being tied together).

What do you think about the global-singleton-instance pattern as a way of achieving what you want?

I think I will stop my reply here for now, because I think that your response to this question (and the explanations leading up to it) probably determines how I will respond to your later remarks. What are your thoughts?

Best wishes,

    -- Joe

Reply via email to