Le mer. 13 mars 2019 à 00:19, Alex Herbert <alex.d.herb...@gmail.com> a écrit : > > > > > On 12 Mar 2019, at 16:58, Gilles Sadowski <gillese...@gmail.com> wrote: > > > > [Replying to the ML.] > > > > Le mar. 12 mars 2019 à 13:00, Alex Herbert <alex.d.herb...@gmail.com > > <mailto:alex.d.herb...@gmail.com>> a écrit : > >> > >> > >> On 11/03/2019 23:54, Gilles Sadowski wrote: > >>> Le ven. 8 mars 2019 à 18:12, Alex Herbert <alex.d.herb...@gmail.com> a > >>> écrit : > >>>> Hi, > >>>> > >>>> I'd like to start on improving the methods in rng-simple to create the > >>>> RandomSource. > >>>> > >>>> There are a few PRs outstanding but I think the discussion on this list > >>>> was that the new XoShiRo generators are OK. So I would like to merge the > >>>> following PRs: > >>>> > >>>> RNG-82: Add XorShift1024StarPhi generator > >>> See other thread. > >>> > >>>> RNG-70: Add new XoShiRo generators > >>> +1 > >> > >> Ah. Most of these generators have more than one constructor (see below > >> on the discussion for the change to the SplitMix64 generator). These are > >> utility constructors given the small size of the seed. e.g. for a seed > >> of length 4: > >> > >> public XoShiRo256StarStar(long[] seed) > >> > >> public XoShiRo256StarStar(long seed0, long seed1, long seed2, long seed3) > >> > >> I did not see this as an issue. I added unit tests to show that the same > >> input data in either the array or the primitives produces the same > >> generator sequence. The primitive constructor does not have to do array > >> checking and filling when the array is not the correct length. The input > >> is directly used to set the state. It is a cleaner constructor, but it > >> does not fit the model required by the ProviderBuilder which would like > >> an object seed. > >> > >> I am going to rewrite the ProviderBuilder internals to allow the size of > >> the seed to be known and to remove the use of reflection to create a > >> source. In this instance the primitive constructor would be better. That > >> is a motivating reason for the change to the SplitMix64 too. > > > > I did not look again at the code yet; but if you can work around > > the issue (as I recalled it) without impacting the public API, then > > fine I guess. :-) > > > >> > >>> > >>>> Then use master to set a baseline for the construction speed. This can > >>>> be used to compare against the latest code to see how much the changes > >>>> have improved construction speed. Given the work will have a big effect > >>>> on generators with small state it would be good to have the new > >>>> generators in the baseline. > >>> IIUC, "big effect" only when constructing and throwing quickly very > >>> many objects. > >>> The doc should make it clear that it's not the "nominal" use-case. > >> > >> Yes. There is not a document for this at the moment anyway (i.e. the > >> results of the construction benchmark). Should a section be added to the > >> user guide? > > > > Benchmarks can certainly go in the "Performance" section. > > > >> This can outline that a generator should be picked for its > >> intended purpose: output datatype required (int, double, etc) and number > >> of samples (period). For short lived generators with low samples then > >> construction speed is a factor. > > > > All good as long as the entry point for application developers to > > create RNGs is "RandomSource". > > > >> > >> This is a feature that will probably most benefit the > >> ThreadLocalRandomSource where short-lifetime generators are likely to be > >> required in a multi-threaded environment. > > > > ??? > > IIUC, "ThreadLocalRandomSource" will store the longest-living RNG > > instances! > > But only per thread. A new thread will get a new RNG which has to be seeded > again. But if you have an application with a few long term threads then the > ThreadLocalRandomSource can be used to store and retrieve any RNG of choice > without having to write extra code to do that yourself.
The latter is what I think is the "normal" usage. > > > > >> > >> However the benchmark shows that once you are generating 1,000,000 > >> samples then the construction cost of even the TwoCmres is much less > >> than the sampling time. > >> > >>> > >>>> I note that the only tags in git are for releases. So the baseline can > >>>> be just a commit Id and I'll add it to the benchmark Jira RNG-72 for > >>>> reference. > > > > I'm still wondering about applications that require short-lived > > generators... > > It doesn't seem to ever be a good idea to do that. There was > > an issue caused by that in [Digester], fixed only in the latest > > release… > > Short lived generators for example for array shuffling of small arrays that > will be done only once. If it's only *once*, then the absolute time of the whole operation (creation + sorting) will be swamped in the total execution time of the application. ;-) And if it's done repeatedly, I still think that reusing the RNG is a better approach (i.e. define a "Shuffler" class that will store a long-lived RNG instance). > > ThreadLocalRandomSource solves this by just giving the thread the current > one, or a new one if the thread has never used one before. In the later case > a low construction cost is good, especially if the thread is also short lived. AFAIK, a short-lived thread is a waste of resources; better use an executor, as you indicate below. > For a shuffle of 100 items then a SplitMix64 will do a good enough job and > can be built very fast. > > For simulations then you should be using longer period generators. If doing a > multithreaded simulation I use a new RNG per Runnable put into an > ExecutorService. However the Runnable can now use > ThreadLocalRandomSource.current(…) to get the RNG. So for a fixed thread pool > there potentially can be less generators as they get reused, and the Runnable > does not have to be written to do lots of work in a loop with the same RNG. +1 :-) > > Parallel simulations is a case where the split() and jump() functionality > would be better still to avoid sequence overlap. Or use different generators, if the number of threads is small. [Hence the question with "XorShift1024Star".] Gilles > > > >>>> > >>>> There are also some other PRs to be discussed: > >>>> > >>>> RNG-81: NumberFactory to sample all rationals between 0 and 1. > >>>> > >>>> This one changes to the implementation in Open JDK for float and double > >>>> generation. I think this is OK to merge. > >>> +1 > >>> [No promise of reproducibility for "derived" types.] > >> OK. I'll put that in. > >>> > >>>> It won't effect the baseline > >>>> but is good to get it into the next release. > >>>> > >>>> > >>>> RNG-78: Added a ThreadLocalRandomSource. > >>>> > >>>> This works as a proof of concept. But it is all new files and so I'm > >>>> happy to leave that until it is decided exactly if and where to put it. > >>> Would there be a better place than in package "o.a.c.rng.simple"? > >> It was either in simple or examples. But I would like to see it as a > >> feature in use so I would leave it in simple > > > > Sure, it's a useful feature to help users avoid bad usage. > > > >>>> RNG-76: Added primitive long constructor to SplitMix64 > >>>> > >>>> This is something that will effect the benchmark so I'll merge it after. > >>>> It is a trivial change with a noticeable performance gain by avoiding > >>>> auto-boxing of the long seed. > >>> IIRC, there is some developer doc that refer to having a single > >>> constructor. But I do not recall the reason. > >> > >> I did not know. I cannot find that explained in the project docs. I > >> looked in site/xdoc/developers.xml and a poor regex search through the > >> rest. > > > > I think I meant this[1]: > > The "provider" must specify one way for setting the seed. For a given > > seed, the generated sequence must always be the same. > > > >> Maybe the reason was to prevent ambiguous construction, e.g. with a > >> non-parameterised constructor. > >> > >> An exception was made for TwoCmres since it needs more arguments. That > >> is understandable. > >> > >> Could this rule be relaxed if the same data passed to different > >> constructors results in the same generated sequence? > > > > I guess so. > > IIRC, the limitation was due to using commons code in order to > > instantiate the appropriate RNG (core) class. > > > >> E.g. the unit tests > >> I added for the new XoShiRo generators test this. > > > > Thanks for all this work, > > Gilles > > I’m happy to do it. I am just adding to the library from the perspective I > have as end user. Hopefully the changes will benefit other users as well as > me. > > > > > [1] > > http://commons.apache.org/proper/commons-rng/commons-rng-core/apidocs/org/apache/commons/rng/core/source32/package-summary.html > > > > <http://commons.apache.org/proper/commons-rng/commons-rng-core/apidocs/org/apache/commons/rng/core/source32/package-summary.html> > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org