Le mer. 13 mars 2019 à 00:19, Alex Herbert <alex.d.herb...@gmail.com> a écrit :
>
>
>
> > On 12 Mar 2019, at 16:58, Gilles Sadowski <gillese...@gmail.com> wrote:
> >
> > [Replying to the ML.]
> >
> > Le mar. 12 mars 2019 à 13:00, Alex Herbert <alex.d.herb...@gmail.com 
> > <mailto:alex.d.herb...@gmail.com>> a écrit :
> >>
> >>
> >> On 11/03/2019 23:54, Gilles Sadowski wrote:
> >>> Le ven. 8 mars 2019 à 18:12, Alex Herbert <alex.d.herb...@gmail.com> a 
> >>> écrit :
> >>>> Hi,
> >>>>
> >>>> I'd like to start on improving the methods in rng-simple to create the
> >>>> RandomSource.
> >>>>
> >>>> There are a few PRs outstanding but I think the discussion on this list
> >>>> was that the new XoShiRo generators are OK. So I would like to merge the
> >>>> following PRs:
> >>>>
> >>>> RNG-82: Add XorShift1024StarPhi generator
> >>> See other thread.
> >>>
> >>>> RNG-70: Add new XoShiRo generators
> >>> +1
> >>
> >> Ah. Most of these generators have more than one constructor (see below
> >> on the discussion for the change to the SplitMix64 generator). These are
> >> utility constructors given the small size of the seed. e.g. for a seed
> >> of length 4:
> >>
> >> public XoShiRo256StarStar(long[] seed)
> >>
> >> public XoShiRo256StarStar(long seed0, long seed1, long seed2, long seed3)
> >>
> >> I did not see this as an issue. I added unit tests to show that the same
> >> input data in either the array or the primitives produces the same
> >> generator sequence. The primitive constructor does not have to do array
> >> checking and filling when the array is not the correct length. The input
> >> is directly used to set the state. It is a cleaner constructor, but it
> >> does not fit the model required by the ProviderBuilder which would like
> >> an object seed.
> >>
> >> I am going to rewrite the ProviderBuilder internals to allow the size of
> >> the seed to be known and to remove the use of reflection to create a
> >> source. In this instance the primitive constructor would be better. That
> >> is a motivating reason for the change to the SplitMix64 too.
> >
> > I did not look again at the code yet; but if you can work around
> > the issue (as I recalled it) without impacting the public API, then
> > fine I guess. :-)
> >
> >>
> >>>
> >>>> Then use master to set a baseline for the construction speed. This can
> >>>> be used to compare against the latest code to see how much the changes
> >>>> have improved construction speed. Given the work will have a big effect
> >>>> on generators with small state it would be good to have the new
> >>>> generators in the baseline.
> >>> IIUC, "big effect" only when constructing and throwing quickly very
> >>> many objects.
> >>> The doc should make it clear that it's not the "nominal" use-case.
> >>
> >> Yes. There is not a document for this at the moment anyway (i.e. the
> >> results of the construction benchmark). Should a section be added to the
> >> user guide?
> >
> > Benchmarks can certainly go in the "Performance" section.
> >
> >> This can outline that a generator should be picked for its
> >> intended purpose: output datatype required (int, double, etc) and number
> >> of samples (period). For short lived generators with low samples then
> >> construction speed is a factor.
> >
> > All good as long as the entry point for application developers to
> > create RNGs is "RandomSource".
> >
> >>
> >> This is a feature that will probably most benefit the
> >> ThreadLocalRandomSource where short-lifetime generators are likely to be
> >> required in a multi-threaded environment.
> >
> > ???
> > IIUC, "ThreadLocalRandomSource" will store the longest-living RNG
> > instances!
>
> But only per thread. A new thread will get a new RNG which has to be seeded 
> again. But if you have an application with a few long term threads then the 
> ThreadLocalRandomSource can be used to store and retrieve any RNG of choice 
> without having to write extra code to do that yourself.

The latter is what I think is the "normal" usage.

>
> >
> >>
> >> However the benchmark shows that once you are generating 1,000,000
> >> samples then the construction cost of even the TwoCmres is much less
> >> than the sampling time.
> >>
> >>>
> >>>> I note that the only tags in git are for releases. So the baseline can
> >>>> be just a commit Id and I'll add it to the benchmark Jira RNG-72 for
> >>>> reference.
> >
> > I'm still wondering about applications that require short-lived
> > generators...
> > It doesn't seem to ever be a good idea to do that.  There was
> > an issue caused by that in [Digester], fixed only in the latest
> > release…
>
> Short lived generators for example for array shuffling of small arrays that 
> will be done only once.

If it's only *once*, then the absolute time of the whole operation
(creation + sorting)
will be swamped in the total execution time of the application. ;-)
And if it's done repeatedly, I still think that reusing the RNG is a
better approach (i.e.
define a "Shuffler" class that will store a long-lived RNG instance).

>
> ThreadLocalRandomSource solves this by just giving the thread the current 
> one, or a new one if the thread has never used one before. In the later case 
> a low construction cost is good, especially if the thread is also short lived.

AFAIK, a short-lived thread is a waste of resources; better use an
executor, as you
indicate below.

> For a shuffle of 100 items then a SplitMix64 will do a good enough job and 
> can be built very fast.
>
> For simulations then you should be using longer period generators. If doing a 
> multithreaded simulation I use a new RNG per Runnable put into an 
> ExecutorService. However the Runnable can now use 
> ThreadLocalRandomSource.current(…) to get the RNG. So for a fixed thread pool 
> there potentially can be less generators as they get reused, and the Runnable 
> does not have to be written to do lots of work in a loop with the same RNG.

+1 :-)

>
> Parallel simulations is a case where the split() and jump() functionality 
> would be better still to avoid sequence overlap.

Or use different generators, if the number of threads is small.
[Hence the question with "XorShift1024Star".]

Gilles

> >
> >>>>
> >>>> There are also some other PRs to be discussed:
> >>>>
> >>>> RNG-81: NumberFactory to sample all rationals between 0 and 1.
> >>>>
> >>>> This one changes to the implementation in Open JDK for float and double
> >>>> generation. I think this is OK to merge.
> >>> +1
> >>> [No promise of reproducibility for "derived" types.]
> >> OK. I'll put that in.
> >>>
> >>>> It won't effect the baseline
> >>>> but is good to get it into the next release.
> >>>>
> >>>>
> >>>> RNG-78: Added a ThreadLocalRandomSource.
> >>>>
> >>>> This works as a proof of concept. But it is all new files and so I'm
> >>>> happy to leave that until it is decided exactly if and where to put it.
> >>> Would there be a better place than in package "o.a.c.rng.simple"?
> >> It was either in simple or examples. But I would like to see it as a
> >> feature in use so I would leave it in simple
> >
> > Sure, it's a useful feature to help users avoid bad usage.
> >
> >>>> RNG-76: Added primitive long constructor to SplitMix64
> >>>>
> >>>> This is something that will effect the benchmark so I'll merge it after.
> >>>> It is a trivial change with a noticeable performance gain by avoiding
> >>>> auto-boxing of the long seed.
> >>> IIRC, there is some developer doc that refer to having a single
> >>> constructor.  But I do not recall the reason.
> >>
> >> I did not know. I cannot find that explained in the project docs. I
> >> looked in site/xdoc/developers.xml and a poor regex search through the 
> >> rest.
> >
> > I think I meant this[1]:
> > The "provider" must specify one way for setting the seed. For a given
> > seed, the generated sequence must always be the same.
> >
> >> Maybe the reason was to prevent ambiguous construction, e.g. with a
> >> non-parameterised constructor.
> >>
> >> An exception was made for TwoCmres since it needs more arguments. That
> >> is understandable.
> >>
> >> Could this rule be relaxed if the same data passed to different
> >> constructors results in the same generated sequence?
> >
> > I guess so.
> > IIRC, the limitation was due to using commons code in order to
> > instantiate the appropriate RNG (core) class.
> >
> >> E.g. the unit tests
> >> I added for the new XoShiRo generators test this.
> >
> > Thanks for all this work,
> > Gilles
>
> I’m happy to do it. I am just adding to the library from the perspective I 
> have as end user. Hopefully the changes will benefit other users as well as 
> me.
>
> >
> > [1] 
> > http://commons.apache.org/proper/commons-rng/commons-rng-core/apidocs/org/apache/commons/rng/core/source32/package-summary.html
> >  
> > <http://commons.apache.org/proper/commons-rng/commons-rng-core/apidocs/org/apache/commons/rng/core/source32/package-summary.html>
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to