I'm not expert on this, but there are various tests described in this paper which compares different normal random number generation techniques:
http://www.cse.cuhk.edu.hk/%7Ephwl/mt/public/archives/papers/grng_acmcs07.pdf

I'll implement the tests described in this paper.

... and the topic is also discussed in this paper:
http://www.doornik.com/research/ziggurat.pdf

I don't have sufficient knowhow to really feel confident about how to test these things though, and part of me feels it might be worth at some point approaching some stats experts and asking them to help define a test suite for D's random number functionality.

Maybe there are testsuites already written for this.

Well, as I defined the function, the UniformRNG input has a default value of rndGen, so if you call it just as normal(mean, sigma); it should work as intended. But if there's some factor which means it would be better to define a separate function which doesn't receive the RNG as input, I can do that.

I don't see any downside to this.

I think it probably depends on the details of the technique. E.g. with Box-Muller you cache 2 random values, and you need to re-generate them for every pair of normal random variates the user requests, so there's no problem with having it initialized in opCall on the first call to the engine.

In general I favour that approach of initializing in opCall because it means that you don't have to ever pass anything to the constructor and also if I have a simulation which uses a lot of random numbers, its output won't be changed simply by creating an instance of a normal random number generator -- only when I start actually _using_ that generator in place of some other source of randomness.

I have only been thinking about the Ziggurat algorithm, but you are right, it does depend on the details of the technique. For Box-Muller (and other engines that cache samples) it only makes sense to compute the first samples in the opCall. But for the Ziggurat algorithm, tables that must be computed before you can start sampling aren't changed during sampling and computing the tables doesn't require any additional arguments. So it makes the most sense for those tables to be initialized in the struct's constructor in the struct based API.

In my previous post I was talking about initializing static instances of the engine used in the normal() function. The advantage of initializing in a static constructor is that you don't need an additional check every time the normal() function is called. But because we will also have a struct based API, that will not require such checks (at least not for all engines), this isn't really that important. So we can also initialize the global engine instance in a call to normal(), if this simplifies things.

Did you get the code I attached to my original post on this topic? Is there something extra that you need?

I didn't notice the attachments, sorry (but I see them now). It seems it will be easy enough to write NormalZigguratEngine, there's just one thing that I think should be added to the Normal struct. Engines should have a way to do their initialization in Normal's constructor (where possible), so I think something like this:

static if(is(typeof(_engine.initialize())))
    _engine.initialize();

should be added to Normal's constructor. Or maybe

_engine = NormalRandomNumberEngine();

so that a static opCall can be used.

I'm going to turn that into patches for Phobos which I'll put up on GitHub in the next days, so we can pull and push and test/write together as needed.

Maybe we should also have a separate file in a separate branch for tests. There will probably be a lot of code needed to test this well and the tests could take a long time to run, so I don't think they should go into unit test blocks (we can put some simpler tests in unit test blocks later). It may also be useful to use external libraries such as dstats for tests.

Reply via email to