On Wednesday, 19 March 2014 at 23:49:41 UTC, Joseph Rushton
Wakeling wrote:
Hello all,
As some of you may already know, monarch_dodra and I have spent
quite a lot of time over the last year discussing the state of
std.random. To cut a long story short, there are significant
problems that arise because the current RNGs are value types
rather than reference types. We had quite a lot of back and
forth on different design ideas, with a lot of helpful input
from others in the community, but at the end of the day there
are really only two broad approaches: create structs that
implement reference semantics internally, or use classes. So,
as an exercise, I decided to create a class-based std.random.
The preliminary (but comprehensive) results of this are now
available here:
https://github.com/WebDrake/std.random2
Besides re-implementing random number generators as classes
rather than structs, the new code splits std.random2 into a
package of several different modules:
* std.random2.generator, pseudo-random number generators;
* std.random2.device, non-deterministic random sources;
* std.random2.distribution, random distributions such as
uniform,
normal, etc.;
* std.random2.adaptor, random "adaptors" such as
randomShuffle,
randomSample, etc.
* std.random2.traits, RNG-specific traits such as
isUniformRNG
and isSeedable.
A package.d file groups them together so one can still import
all together via "import std.random2". I've also taken the
liberty of following the new guideline to place import
statements as locally as possible; it was striking how easy and
clean this made things, and it should be easy to port that
particular change back to std.random.
The new package implements all of the functions, templates and
range objects from std.random except for the old
std.random.uniformDistribution, whose name I have cannibalized
for better purposes. Some have been updated: the
MersenneTwisterEngine has been tweaked to match the
corresponding code from Boost.Random, and this in turn has
allowed the definition of a 64-bit Mersenne Twister
(Mt19937_64) and an alternative 32-bit one (Mt11213b).
There are also a number of entirely new entries.
std.random2.distribution contains not just existing functions
such as dice and uniform, but also range-based random
distribution classes UniformDistribution, NormalDistribution
and DiscreteDistribution; the last of these is effectively a
range-based version of dice, and is based on Chris Cain's
excellent work here:
https://github.com/D-Programming-Language/phobos/pull/1702
The principal weak point in terms of functionality is
std.random2.device, where the implemented random devices (based
on Posix' /std/random and /std/urandom) are really very
primitive and just there to illustrate the principle. However,
since their API is pretty simple (they're just input ranges
with min and max defined) there should be plenty of opportunity
to improve and extend the internals in future. Advice and
patches are welcome for everything, but particularly here :-)
What's become quite apparent in the course of writing this
package is how much more natural it is for ranges implementing
randomness to be class objects. The basic fact that another
range can store a copy of an RNG internally without creating a
copy-by-value is merely the start: for example, in the case of
the class implementation of RandomSample, we no longer need to
have complications like,
@property auto ref front()
{
assert(!empty);
// The first sample point must be determined here to
avoid
// having it always correspond to the first element of
the
// input. The rest of the sample points are determined
each
// time we call popFront().
if (_skip == Skip.None)
{
initializeFront();
}
return _input.front;
}
that were necessary to avoid bugs like
https://d.puremagic.com/issues/show_bug.cgi?id=7936; because
the class-based implementation copies by reference, we can just
initialize everything in the constructor. Similarly, issues
like https://d.puremagic.com/issues/show_bug.cgi?id=7067 and
https://d.puremagic.com/issues/show_bug.cgi?id=8247 just vanish.
Obvious caveats about the approach include the fact that
classes need to be new'd, and questions over whether allocation
on the heap might create speed issues. The benchmarks I've run
(code available in the repo) seem to suggest that at least the
latter is not a worry, but these are obviously things that need
to be considered. My own feeling is that ultimately it is a
responsibility of the language to offer nice ways to allocate
classes without necessarily relying on new or the GC.
A few remarks on design and other factors:
* The new range objects have been implemented as final
classes for
speed purposes. However, I tried another approach where
the RNG
class templates were abstract classes, and the individual
parameterizations were final-class subclasses of those
rather
than aliases. This was noticeably slower. My OO-fu is
not really
sufficient to explain this, so if anybody can offer a
reason, I'd
be happy to learn it.
* A design question I considered but have not yet pursued:
since at
least two functions require passing the RNG as the first
parameter
(dice and discreteDistribution), perhaps this should be
made a
general design pattern for everything? It would make it
harder to
adapt code using the existing std.random but would create
a useful
uniformity.
* I would have liked to ensure that every random
distribution had
both a range- and function-based version. However, I came
to the
conclusion that solely function-based versions should be
avoided
if either (i) the function would need to maintain internal
state
between calls, or (ii) the function would need to allocate
memory
per call. The first is why for example NormalDistribution
exists
only as a class/range. The second might in principle
raise some
objections to dice, but as dice seems to be a reasonably
standard
function, I kept it in.
* It might be good to implement helper functions for the
individual
RNGs (e.g. just as RandomSample has a randomSample helper
function
to deliver instances, so Mt19937 could have a corresponding
mt19937 helper function returning Mt19937 instances seeded
in line
with helper function parameters).
* Those with long memories may recall that when I originally
wrote
up my NormalDistribution code, it was written to allow
various
"normal engines" to be plugged in; mine was Box-Muller,
but jerro
also contributed a Ziggurat-based engine. This could
still be
provided here, although my own inclination is that it's
probably
best for Phobos to provide one single
good-for-general-purpose-use
implementation.
Known issues:
* While every bugfix I've made in the course of implementing
this
package has been propagated back to std.random where
possible,
this package is missing some of the more recent
improvements to
std.random by other people (e.g. I think it's missing
Chris Cain's
update to integer-based uniform()).
* The unittest coverage is overall pretty damn good, but
there are
weak spots in std.random.distribution and
std.random2.device.
Some of the "unittests" in these cases are no more than
basic
developer sanity checks that print results to console, and
need
to be replaced by well-defined, silent-unless-failed
alternatives.
* Some of the .save functions are implemented with the help
of rather
odd private constructors; it would probably be much better
to redo
these in terms of public this(typeof(this)) constructors.
* The random devices _really_ need to be better. Consider
the current
versions as placeholders ... :-)
Finally, a note on authorship: since this is still based very
substantially on std.random, I've made an effort to check git
logs and ensure that authors and copyright records (and dates
of contribution) are correct. My general principle here has
been that listed authors should only include those who've made
a substantial contribution (i.e. whole functions, large numbers
of unittests, ...), not just various 1-line tweaks. But if
anyone has any objection to any of the names, dates or other
credits given, or if anybody would like their name removed (!),
just let me know.
I owe a great debt of gratitude to many people here on the
forums, and monarch_dodra in particular, for a huge amount of
useful discussion, advice and feedback that has made its way
into the current code. Thank you all for your time, thoughts,
ideas and patience.
Anyway, please feel free to review, destroy and otherwise do
fun stuff with this module. I hope that some of you will find
it immediately useful, but please note that feedback and advice
may result in breaking changes -- this is intended to wind up
in Phobos, so it really needs to be perfect when it does so.
Let's review it really well and make it happen!
Thanks and best wishes,
-- Joe
It should be std.pseudorandom (except for /dev/random) :)
Still no cmwc rng... IMO cmwc should replace mt as default RNG.
Faster. Looooonger period. More passed tests (if i'm right MT
didn't pass testu01). And it is parametric to get faster result
or longer period.
http://en.wikipedia.org/wiki/Multiply-with-carry#Complementary-multiply-with-carry_generators