Hi.

Le lun. 10 juin 2019 à 18:33, Alex Herbert <alex.d.herb...@gmail.com> a écrit :
>
>
> On 10/06/2019 17:18, Gilles Sadowski wrote:
> > Le lun. 10 juin 2019 à 17:56, Alex Herbert <alex.d.herb...@gmail.com> a 
> > écrit :
> >>
> >> On 10/06/2019 16:34, Gilles Sadowski wrote:
> >>> Hello.
> >>>
> >>> Le lun. 10 juin 2019 à 17:17, Alex Herbert <alex.d.herb...@gmail.com> a 
> >>> écrit :
> >>>> On 10/06/2019 15:31, Gilles Sadowski wrote:
> >>>>>>> P.S. Thinking of releasing 1.3?
> >>>>>> Not yet. I think there are a few outstanding items that work together
> >>>>>> for the multi-threaded focus of the new code and the new generators:
> >>>>> Sure but some of them could be postponed, if just to RERO.
> >>>>>
> >>>>>> - RNG-98: LongJumpable (easy)
> >>>>>>
> >>>>>> - RNG-102: SharedStateSampler (lots of easy work)
> >>>>>>
> >>>>>> - RNG-106: XorShiRo generators require non-zero input seeds
> >>>>>>
> >>>>>> (I'm still thinking about the best way to do this. The Jira ticket
> >>>>>> suggests a speed test to at least know the implications of different 
> >>>>>> ideas.)
> >>>>> This is only when using the "SeedFactory" (?).  [Otherwise, it's the
> >>>>> user's responsibility to construct an appropriate seed.]
> >>>>>
> >>>>> Couldn't we just check that the output of the internal generator is not
> >>>>> all zero (and call it again if it is)?
> >>>> Yes. The worse case scenario is a 1 in 2^64 collision rate with zero.
> >>>> All other generators have larger state sizes. So this would be fine. An
> >>>> alternative would be to set a single bit to non zero. This throws away 1
> >>>> bit of randomness from the seed and will always work without any
> >>>> recursion. But it makes the seed worse. The ideas are in the header for
> >>>> this Jira ticket:
> >>>>
> >>>> https://issues.apache.org/jira/browse/RNG-106
> >>>>
> >>>> I'll fix this soon.
> >>>>
> >>>> The other item I did not mention is outcome from RNG-104. This seems to
> >>>> indicate that using System.identityHashCode(new Object()) is not as good
> >>>> a mixer as a ThreadLocal random generator, both for speed and also
> >>>> quality. I'm currently testing Well44497b ^ SplitMix in BigCrush but I
> >>>> think this should replace the identity hash code method.
> >>> Didn't you also suggest to use XOR_SHIFT_1024_PHI (given the
> >>> large enough period, better speed score on BigCrush)?
> >> Yes. I'm still thinking about this. My initial calculations were based
> >> on the length of time it would take to sample all the seeds from the
> >> generator. Below we consider the number of possible seeds required and
> >> produced:
> >>
> >> The Well4497b seed size is 1391 of ints so there are (2^32)^1391
> >> possible seeds of random bits, or 2^1423.
> >>
> >> The XorShift1024 period is (2^1024 -1) of longs so there are (2^64)^1024
> >> random bits output before repeat (ideally as the period may be shorter).
> >> So the bit output is max 2^1088 before repeat.
> >>
> >> Assuming the output from XorShift1024 is truly random-per-bit it cannot
> >> output enough unique seeds to cover all those required by the Well44497b
> >> generator.
> >>
> >> But currently we seed using a maximum of 128 values and leave the rest
> >> to the self-seeding routine in the base generator. For a long array this
> >> is (2^64)^128 = 2^192, and only 2^160 for int arrays. So the
> >> XorShift1024 generator can output enough bits to create all the seeds
> >> that the SeedFactory currently produces.
> > I'd think (roughly) that's a good enough compromise for a
> > "convenience" routine. [Stringent requirements can be met
> > explicitly otherwise.]
>
> Sorry. I've done that wrong:
>
> (a^b)^c = a^(b^c) not a^(b+c)
>
> Well44497b seed size is 1391 of ints so there are 32*1391 (44512) bits
> in the seed, and so 2^44512 possible seeds.
>
> XorShift1024 period is (2^1024 -1) of longs so there are 64*2^1024
> random bits output before repeat, or 2^6 * 2^1024 = 2^1030.
>
> Not enough.
>
> The max seed size is long[128] which is 64*128 bits in the seed, and so
> 2^8192 possible seeds.
>
> Still not enough.
>
> So even though it would take 2^969 years to repeat the period of a
> XorShift1024 generator if sampled 10 billion time a second [1], it
> cannot produce every possible seed currently required.
>
> So this is a dilemma. Choose a generator that can theoretically output
> all the seeds required, even though you could never use them, or choose
> a faster generator that can still output more seeds than you could
> possibly use.

How about using both?
Generate the first element of the seed array (or the single "long" seed)
from Well44497b and the rest from XorShift1024?

Gilles

>
> [1]
> https://issues.apache.org/jira/browse/RNG-104?focusedCommentId=16857619&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16857619
>
> >
> >> So a switch to using XorShift1024 would satisfy current requirements
> >> and, given it passes BigCrush, would remove the requirement to mix the
> >> output with a second generator. (IIUC the purpose is to improve
> >> randomness of Well44497.)
> > Is it really necessary to have the utmost randomness for generating
> > seeds?  [It seems that the (rare) correlations of a seed used by some
> > RNG instance with a seed used by some other instance will be "diluted"
> > in the different sequences generated by the two instances.]
> >
> > Whatever, replacing with XorShift1024 will gain on that too, and
> > as you note, keep the SeedFactory simpler (no mix).
> >
> > Regards,
> > Gilles
> >
> >>> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to