2022年2月15日(火) 19:03 Tim Düsterhus <t...@bastelstu.be>:

> Hi
>
> On 2/15/22 04:58, Go Kudo wrote:
> >> Regarding "unintuitive": I disagree. I find it unintuitive that there
> are
> > some RNG sequences that I can't access when providing a seed.
> >
> > This is also the case for RNG implementations in many other languages.
> For
> > example, Java also uses long (64-bit) as the seed value of the argument
> for
> > Math.
> >
> >
> https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Random.html#%3Cinit%3E(long)
>
> java.util.Random is a LCG with only 48 Bits of state. A single 64-bit
> signed long is sufficient to represent the state.
>
> > On the other hand, some languages have access to the complete internal
> > state. Python, for example, accepts bytes or bytearrays.
> >
> > https://docs.python.org/3/library/random.html#random.seed
> >
> > However, making strings available in PHP may lead to incorrect usage.
> >
> > I think we can safely do this by making the seed argument accept both int
> > and string, and only using it as the internal state if string is
> specified
> > and it's 128-bits long.
>
> That's a solution that would work for me.
>
> >> 1. Would you expect those two 'var_dump' calls to result in the same
> > output?
> >
> > Added __debugInfo() magic method supports.
> >
> >
> https://github.com/php/php-src/pull/8094/commits/78efd2bd1e0ac5db48c272b364a615a5611e8caa
>
> Don't forget to update the RFC accordingly. It would probably be helpful
> if you would put the full class stubs into the RFC. I find that easier
> to understand than a list of methods.
>
> >> generate() should return raw bytes instead of a number (as I suggested
> > before).
> >
> > I don't think this is a very good idea.
> >
> > The RNG is a random number generator and should probably not be
> generating
> > strings.
>
> I'd say that the 'number' part in RNG is not technically accurate. All
> RNGs are effectively generators for a random sequence of bits. The
> number part is just an interpretation of those random sequence of bits
> (e.g. 64 of them).
>
> > Of course, I am aware that strings represent binary sequences in PHP.
> > However, this is not user-friendly.
> >
> > The generation of a binary string is a barrier when trying to implement
> > some kind of operation using numeric computation.
>
> I believe the average user of the RNG API would use the Randomizer
> class, instead of the raw generators, thus they would not come in
> contact with the raw bytes coming from the generator.
>
> However by getting PHP integers out of the generator it is much harder
> for me to process the raw bits and bytes, if that's something I need for
> my use case.
>
> As an example if I want to implement the following in userland. Then
> with getting raw bytes:
> - For Randomizer::getBytes() I can just concatenate the raw bytes.
> - For a random uint16BE I can grab 2 bytes and call unpack('n', $bytes)
>
> If I get random 64 Bit integers then:
> - For Randomizer::getBytes() I need to use pack and I'm not even sure,
> whether I need to use 'q', 'Q', 'J', 'P' to receive an unbiased result.
> - For uint16BE I can use "& 0xFFFF", but would waste 48 Bits, unless I
> also perform bit shifting to access the other bytes. But then there's
> also the same signedness issue.
>
> Interpreting numbers as bytes and vice versa in C / C++ is very easy.
> However in PHP userland I believe the bytes -> numbers direction is
> easy-ish. The numbers -> bytes direction is full of edge cases.
>
> > If you want to deal with the problem of generated size, it would be more
> > appropriate to define a method such as getGenerateSize() in the
> interface.
> > Even in this case, generation widths greater than PHP_INT_SIZE cannot be
> > supported, but generation widths greater than 64-bit are not very useful
> in
> > the first place.
> >
> >> The 'Randomizer' object should buffer unused bytes internally and only
> > call generate() if the internal buffer is drained.
> >
> > Likewise, I think this is not a good idea. Buffering reintroduces the
> > problem of complex state management, which has been made so easy. The
> user
> > will always have to worry about the buffering size of the Randomizer.
>
> Unfortunately you did not answer the primary question. The ones you
> answered were just follow-up conclusions from the answer I would give:
>
>       var_dump(\bin2hex($r1->getBytes(8)));
>       var_dump(\bin2hex($r2->getBytes(4)) . \bin2hex($r2->getBytes(4)));
>
> As a user: Would you expect those two 'var_dump' calls to result in the
> same output?
>
> >> Why xorshift instead of xoshiro / xoroshiro?
> >
> > The XorShift128Plus algorithm is still in use in major browsers and is
> dead
> > in a good way.
>
> I believe that that the underlying RNG in web browsers is considered an
> implementation detail, no?
>
> For PHP this would be part of the API surface and would need to be
> maintained indefinitely. Certainly it would make sense to use the latest
> and greatest RNG, instead of something that is outdated when its first
> shipped, no?
>
> > Also, in our local testing, SplitMix64 + XorShift128Plus performed well
> in
> > terms of performance and random number quality, so I don't think it is
> > necessary to choose a different algorithm.
> >
> > If this RFC passes, it will be easier to add algorithms in the future.
> If a
> > new algorithm is needed, it can be implemented immediately.
>
> Best regards
> Tim Düsterhus
>

> java.util.Random is a LCG with only 48 Bits of state. A single 64-bit
signed long is sufficient to represent the state.

Sorry about that. Java was not affected by this problem.

At first, I updated the RFC to the latest status.

https://wiki.php.net/rfc/rng_extension

I need some time to think about the current issue. I understand its
usefulness, but I feel uncomfortable with the fact that the NumberGenerator
generates a string.

I also wonder about the point of changing RNG to XorShift128Plus. There are
a number of derived implementations, which RNG do you think is more
suitable?

Regards,
Go Kudo

Reply via email to