Re: 1st draft of complete class-based std.random successor

Andrea Fontana Thu, 20 Mar 2014 12:07:04 -0700

On Wednesday, 19 March 2014 at 23:49:41 UTC, Joseph RushtonWakeling wrote:

Hello all,
As some of you may already know, monarch_dodra and I have spentquite a lot of time over the last year discussing the state ofstd.random. To cut a long story short, there are significantproblems that arise because the current RNGs are value typesrather than reference types. We had quite a lot of back andforth on different design ideas, with a lot of helpful inputfrom others in the community, but at the end of the day thereare really only two broad approaches: create structs thatimplement reference semantics internally, or use classes. So,as an exercise, I decided to create a class-based std.random.
The preliminary (but comprehensive) results of this are nowavailable here:
https://github.com/WebDrake/std.random2
Besides re-implementing random number generators as classesrather than structs, the new code splits std.random2 into apackage of several different modules:
   * std.random2.generator, pseudo-random number generators;

   * std.random2.device, non-deterministic random sources;
* std.random2.distribution, random distributions such asuniform,
     normal, etc.;
* std.random2.adaptor, random "adaptors" such asrandomShuffle,
     randomSample, etc.
* std.random2.traits, RNG-specific traits such asisUniformRNG
     and isSeedable.
A package.d file groups them together so one can still importall together via "import std.random2". I've also taken theliberty of following the new guideline to place importstatements as locally as possible; it was striking how easy andclean this made things, and it should be easy to port thatparticular change back to std.random.
The new package implements all of the functions, templates andrange objects from std.random except for the oldstd.random.uniformDistribution, whose name I have cannibalizedfor better purposes. Some have been updated: theMersenneTwisterEngine has been tweaked to match thecorresponding code from Boost.Random, and this in turn hasallowed the definition of a 64-bit Mersenne Twister(Mt19937_64) and an alternative 32-bit one (Mt11213b).
There are also a number of entirely new entries.std.random2.distribution contains not just existing functionssuch as dice and uniform, but also range-based randomdistribution classes UniformDistribution, NormalDistributionand DiscreteDistribution; the last of these is effectively arange-based version of dice, and is based on Chris Cain'sexcellent work here:https://github.com/D-Programming-Language/phobos/pull/1702
The principal weak point in terms of functionality isstd.random2.device, where the implemented random devices (basedon Posix' /std/random and /std/urandom) are really veryprimitive and just there to illustrate the principle. However,since their API is pretty simple (they're just input rangeswith min and max defined) there should be plenty of opportunityto improve and extend the internals in future. Advice andpatches are welcome for everything, but particularly here :-)
What's become quite apparent in the course of writing thispackage is how much more natural it is for ranges implementingrandomness to be class objects. The basic fact that anotherrange can store a copy of an RNG internally without creating acopy-by-value is merely the start: for example, in the case ofthe class implementation of RandomSample, we no longer need tohave complications like,
    @property auto ref front()
    {
        assert(!empty);
// The first sample point must be determined here toavoid// having it always correspond to the first element ofthe// input. The rest of the sample points are determinedeach
        // time we call popFront().
        if (_skip == Skip.None)
        {
            initializeFront();
        }
        return _input.front;
    }
that were necessary to avoid bugs likehttps://d.puremagic.com/issues/show_bug.cgi?id=7936; becausethe class-based implementation copies by reference, we can justinitialize everything in the constructor. Similarly, issueslike https://d.puremagic.com/issues/show_bug.cgi?id=7067 andhttps://d.puremagic.com/issues/show_bug.cgi?id=8247 just vanish.
Obvious caveats about the approach include the fact thatclasses need to be new'd, and questions over whether allocationon the heap might create speed issues. The benchmarks I've run(code available in the repo) seem to suggest that at least thelatter is not a worry, but these are obviously things that needto be considered. My own feeling is that ultimately it is aresponsibility of the language to offer nice ways to allocateclasses without necessarily relying on new or the GC.
A few remarks on design and other factors:
* The new range objects have been implemented as finalclasses forspeed purposes. However, I tried another approach wherethe RNG
     class templates were abstract classes, and the individual
parameterizations were final-class subclasses of thoseratherthan aliases. This was noticeably slower. My OO-fu isnot reallysufficient to explain this, so if anybody can offer areason, I'd
     be happy to learn it.
* A design question I considered but have not yet pursued:since atleast two functions require passing the RNG as the firstparameter(dice and discreteDistribution), perhaps this should bemade ageneral design pattern for everything? It would make itharder toadapt code using the existing std.random but would createa useful
     uniformity.
* I would have liked to ensure that every randomdistribution hadboth a range- and function-based version. However, I cameto theconclusion that solely function-based versions should beavoidedif either (i) the function would need to maintain internalstatebetween calls, or (ii) the function would need to allocatememoryper call. The first is why for example NormalDistributionexistsonly as a class/range. The second might in principleraise someobjections to dice, but as dice seems to be a reasonablystandard
     function, I kept it in.
* It might be good to implement helper functions for theindividualRNGs (e.g. just as RandomSample has a randomSample helperfunction
     to deliver instances, so Mt19937 could have a corresponding
mt19937 helper function returning Mt19937 instances seededin line
     with helper function parameters).
* Those with long memories may recall that when I originallywroteup my NormalDistribution code, it was written to allowvarious"normal engines" to be plugged in; mine was Box-Muller,but jerroalso contributed a Ziggurat-based engine. This couldstill beprovided here, although my own inclination is that it'sprobablybest for Phobos to provide one singlegood-for-general-purpose-use
     implementation.

Known issues:
* While every bugfix I've made in the course of implementingthispackage has been propagated back to std.random wherepossible,this package is missing some of the more recentimprovements tostd.random by other people (e.g. I think it's missingChris Cain's
     update to integer-based uniform()).
* The unittest coverage is overall pretty damn good, butthere areweak spots in std.random.distribution andstd.random2.device.Some of the "unittests" in these cases are no more thanbasicdeveloper sanity checks that print results to console, andneedto be replaced by well-defined, silent-unless-failedalternatives.
* Some of the .save functions are implemented with the helpof ratherodd private constructors; it would probably be much betterto redo
     these in terms of public this(typeof(this)) constructors.
* The random devices _really_ need to be better. Considerthe current
     versions as placeholders ... :-)
Finally, a note on authorship: since this is still based verysubstantially on std.random, I've made an effort to check gitlogs and ensure that authors and copyright records (and datesof contribution) are correct. My general principle here hasbeen that listed authors should only include those who've madea substantial contribution (i.e. whole functions, large numbersof unittests, ...), not just various 1-line tweaks. But ifanyone has any objection to any of the names, dates or othercredits given, or if anybody would like their name removed (!),just let me know.
I owe a great debt of gratitude to many people here on theforums, and monarch_dodra in particular, for a huge amount ofuseful discussion, advice and feedback that has made its wayinto the current code. Thank you all for your time, thoughts,ideas and patience.
Anyway, please feel free to review, destroy and otherwise dofun stuff with this module. I hope that some of you will findit immediately useful, but please note that feedback and advicemay result in breaking changes -- this is intended to wind upin Phobos, so it really needs to be perfect when it does so.Let's review it really well and make it happen!
Thanks and best wishes,

    -- Joe


It should be std.pseudorandom (except for /dev/random) :)

Still no cmwc rng... IMO cmwc should replace mt as default RNG.Faster. Looooonger period. More passed tests (if i'm right MTdidn't pass testu01). And it is parametric to get faster resultor longer period.


http://en.wikipedia.org/wiki/Multiply-with-carry#Complementary-multiply-with-carry_generators

Re: 1st draft of complete class-based std.random successor

Reply via email to