> On 12 Mar 2019, at 16:58, Gilles Sadowski <gillese...@gmail.com> wrote: > > [Replying to the ML.] > > Le mar. 12 mars 2019 à 13:00, Alex Herbert <alex.d.herb...@gmail.com > <mailto:alex.d.herb...@gmail.com>> a écrit : >> >> >> On 11/03/2019 23:54, Gilles Sadowski wrote: >>> Le ven. 8 mars 2019 à 18:12, Alex Herbert <alex.d.herb...@gmail.com> a >>> écrit : >>>> Hi, >>>> >>>> I'd like to start on improving the methods in rng-simple to create the >>>> RandomSource. >>>> >>>> There are a few PRs outstanding but I think the discussion on this list >>>> was that the new XoShiRo generators are OK. So I would like to merge the >>>> following PRs: >>>> >>>> RNG-82: Add XorShift1024StarPhi generator >>> See other thread. >>> >>>> RNG-70: Add new XoShiRo generators >>> +1 >> >> Ah. Most of these generators have more than one constructor (see below >> on the discussion for the change to the SplitMix64 generator). These are >> utility constructors given the small size of the seed. e.g. for a seed >> of length 4: >> >> public XoShiRo256StarStar(long[] seed) >> >> public XoShiRo256StarStar(long seed0, long seed1, long seed2, long seed3) >> >> I did not see this as an issue. I added unit tests to show that the same >> input data in either the array or the primitives produces the same >> generator sequence. The primitive constructor does not have to do array >> checking and filling when the array is not the correct length. The input >> is directly used to set the state. It is a cleaner constructor, but it >> does not fit the model required by the ProviderBuilder which would like >> an object seed. >> >> I am going to rewrite the ProviderBuilder internals to allow the size of >> the seed to be known and to remove the use of reflection to create a >> source. In this instance the primitive constructor would be better. That >> is a motivating reason for the change to the SplitMix64 too. > > I did not look again at the code yet; but if you can work around > the issue (as I recalled it) without impacting the public API, then > fine I guess. :-) > >> >>> >>>> Then use master to set a baseline for the construction speed. This can >>>> be used to compare against the latest code to see how much the changes >>>> have improved construction speed. Given the work will have a big effect >>>> on generators with small state it would be good to have the new >>>> generators in the baseline. >>> IIUC, "big effect" only when constructing and throwing quickly very >>> many objects. >>> The doc should make it clear that it's not the "nominal" use-case. >> >> Yes. There is not a document for this at the moment anyway (i.e. the >> results of the construction benchmark). Should a section be added to the >> user guide? > > Benchmarks can certainly go in the "Performance" section. > >> This can outline that a generator should be picked for its >> intended purpose: output datatype required (int, double, etc) and number >> of samples (period). For short lived generators with low samples then >> construction speed is a factor. > > All good as long as the entry point for application developers to > create RNGs is "RandomSource". > >> >> This is a feature that will probably most benefit the >> ThreadLocalRandomSource where short-lifetime generators are likely to be >> required in a multi-threaded environment. > > ??? > IIUC, "ThreadLocalRandomSource" will store the longest-living RNG > instances!
But only per thread. A new thread will get a new RNG which has to be seeded again. But if you have an application with a few long term threads then the ThreadLocalRandomSource can be used to store and retrieve any RNG of choice without having to write extra code to do that yourself. > >> >> However the benchmark shows that once you are generating 1,000,000 >> samples then the construction cost of even the TwoCmres is much less >> than the sampling time. >> >>> >>>> I note that the only tags in git are for releases. So the baseline can >>>> be just a commit Id and I'll add it to the benchmark Jira RNG-72 for >>>> reference. > > I'm still wondering about applications that require short-lived > generators... > It doesn't seem to ever be a good idea to do that. There was > an issue caused by that in [Digester], fixed only in the latest > release… Short lived generators for example for array shuffling of small arrays that will be done only once. ThreadLocalRandomSource solves this by just giving the thread the current one, or a new one if the thread has never used one before. In the later case a low construction cost is good, especially if the thread is also short lived. For a shuffle of 100 items then a SplitMix64 will do a good enough job and can be built very fast. For simulations then you should be using longer period generators. If doing a multithreaded simulation I use a new RNG per Runnable put into an ExecutorService. However the Runnable can now use ThreadLocalRandomSource.current(…) to get the RNG. So for a fixed thread pool there potentially can be less generators as they get reused, and the Runnable does not have to be written to do lots of work in a loop with the same RNG. Parallel simulations is a case where the split() and jump() functionality would be better still to avoid sequence overlap. > >>>> >>>> There are also some other PRs to be discussed: >>>> >>>> RNG-81: NumberFactory to sample all rationals between 0 and 1. >>>> >>>> This one changes to the implementation in Open JDK for float and double >>>> generation. I think this is OK to merge. >>> +1 >>> [No promise of reproducibility for "derived" types.] >> OK. I'll put that in. >>> >>>> It won't effect the baseline >>>> but is good to get it into the next release. >>>> >>>> >>>> RNG-78: Added a ThreadLocalRandomSource. >>>> >>>> This works as a proof of concept. But it is all new files and so I'm >>>> happy to leave that until it is decided exactly if and where to put it. >>> Would there be a better place than in package "o.a.c.rng.simple"? >> It was either in simple or examples. But I would like to see it as a >> feature in use so I would leave it in simple > > Sure, it's a useful feature to help users avoid bad usage. > >>>> RNG-76: Added primitive long constructor to SplitMix64 >>>> >>>> This is something that will effect the benchmark so I'll merge it after. >>>> It is a trivial change with a noticeable performance gain by avoiding >>>> auto-boxing of the long seed. >>> IIRC, there is some developer doc that refer to having a single >>> constructor. But I do not recall the reason. >> >> I did not know. I cannot find that explained in the project docs. I >> looked in site/xdoc/developers.xml and a poor regex search through the rest. > > I think I meant this[1]: > The "provider" must specify one way for setting the seed. For a given > seed, the generated sequence must always be the same. > >> Maybe the reason was to prevent ambiguous construction, e.g. with a >> non-parameterised constructor. >> >> An exception was made for TwoCmres since it needs more arguments. That >> is understandable. >> >> Could this rule be relaxed if the same data passed to different >> constructors results in the same generated sequence? > > I guess so. > IIRC, the limitation was due to using commons code in order to > instantiate the appropriate RNG (core) class. > >> E.g. the unit tests >> I added for the new XoShiRo generators test this. > > Thanks for all this work, > Gilles I’m happy to do it. I am just adding to the library from the perspective I have as end user. Hopefully the changes will benefit other users as well as me. > > [1] > http://commons.apache.org/proper/commons-rng/commons-rng-core/apidocs/org/apache/commons/rng/core/source32/package-summary.html > > <http://commons.apache.org/proper/commons-rng/commons-rng-core/apidocs/org/apache/commons/rng/core/source32/package-summary.html> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > <mailto:dev-unsubscr...@commons.apache.org> > For additional commands, e-mail: dev-h...@commons.apache.org > <mailto:dev-h...@commons.apache.org>