2022年2月15日(火) 19:03 Tim Düsterhus <t...@bastelstu.be>: > Hi > > On 2/15/22 04:58, Go Kudo wrote: > >> Regarding "unintuitive": I disagree. I find it unintuitive that there > are > > some RNG sequences that I can't access when providing a seed. > > > > This is also the case for RNG implementations in many other languages. > For > > example, Java also uses long (64-bit) as the seed value of the argument > for > > Math. > > > > > https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Random.html#%3Cinit%3E(long) > > java.util.Random is a LCG with only 48 Bits of state. A single 64-bit > signed long is sufficient to represent the state. > > > On the other hand, some languages have access to the complete internal > > state. Python, for example, accepts bytes or bytearrays. > > > > https://docs.python.org/3/library/random.html#random.seed > > > > However, making strings available in PHP may lead to incorrect usage. > > > > I think we can safely do this by making the seed argument accept both int > > and string, and only using it as the internal state if string is > specified > > and it's 128-bits long. > > That's a solution that would work for me. > > >> 1. Would you expect those two 'var_dump' calls to result in the same > > output? > > > > Added __debugInfo() magic method supports. > > > > > https://github.com/php/php-src/pull/8094/commits/78efd2bd1e0ac5db48c272b364a615a5611e8caa > > Don't forget to update the RFC accordingly. It would probably be helpful > if you would put the full class stubs into the RFC. I find that easier > to understand than a list of methods. > > >> generate() should return raw bytes instead of a number (as I suggested > > before). > > > > I don't think this is a very good idea. > > > > The RNG is a random number generator and should probably not be > generating > > strings. > > I'd say that the 'number' part in RNG is not technically accurate. All > RNGs are effectively generators for a random sequence of bits. The > number part is just an interpretation of those random sequence of bits > (e.g. 64 of them). > > > Of course, I am aware that strings represent binary sequences in PHP. > > However, this is not user-friendly. > > > > The generation of a binary string is a barrier when trying to implement > > some kind of operation using numeric computation. > > I believe the average user of the RNG API would use the Randomizer > class, instead of the raw generators, thus they would not come in > contact with the raw bytes coming from the generator. > > However by getting PHP integers out of the generator it is much harder > for me to process the raw bits and bytes, if that's something I need for > my use case. > > As an example if I want to implement the following in userland. Then > with getting raw bytes: > - For Randomizer::getBytes() I can just concatenate the raw bytes. > - For a random uint16BE I can grab 2 bytes and call unpack('n', $bytes) > > If I get random 64 Bit integers then: > - For Randomizer::getBytes() I need to use pack and I'm not even sure, > whether I need to use 'q', 'Q', 'J', 'P' to receive an unbiased result. > - For uint16BE I can use "& 0xFFFF", but would waste 48 Bits, unless I > also perform bit shifting to access the other bytes. But then there's > also the same signedness issue. > > Interpreting numbers as bytes and vice versa in C / C++ is very easy. > However in PHP userland I believe the bytes -> numbers direction is > easy-ish. The numbers -> bytes direction is full of edge cases. > > > If you want to deal with the problem of generated size, it would be more > > appropriate to define a method such as getGenerateSize() in the > interface. > > Even in this case, generation widths greater than PHP_INT_SIZE cannot be > > supported, but generation widths greater than 64-bit are not very useful > in > > the first place. > > > >> The 'Randomizer' object should buffer unused bytes internally and only > > call generate() if the internal buffer is drained. > > > > Likewise, I think this is not a good idea. Buffering reintroduces the > > problem of complex state management, which has been made so easy. The > user > > will always have to worry about the buffering size of the Randomizer. > > Unfortunately you did not answer the primary question. The ones you > answered were just follow-up conclusions from the answer I would give: > > var_dump(\bin2hex($r1->getBytes(8))); > var_dump(\bin2hex($r2->getBytes(4)) . \bin2hex($r2->getBytes(4))); > > As a user: Would you expect those two 'var_dump' calls to result in the > same output? > > >> Why xorshift instead of xoshiro / xoroshiro? > > > > The XorShift128Plus algorithm is still in use in major browsers and is > dead > > in a good way. > > I believe that that the underlying RNG in web browsers is considered an > implementation detail, no? > > For PHP this would be part of the API surface and would need to be > maintained indefinitely. Certainly it would make sense to use the latest > and greatest RNG, instead of something that is outdated when its first > shipped, no? > > > Also, in our local testing, SplitMix64 + XorShift128Plus performed well > in > > terms of performance and random number quality, so I don't think it is > > necessary to choose a different algorithm. > > > > If this RFC passes, it will be easier to add algorithms in the future. > If a > > new algorithm is needed, it can be implemented immediately. > > Best regards > Tim Düsterhus >
> java.util.Random is a LCG with only 48 Bits of state. A single 64-bit signed long is sufficient to represent the state. Sorry about that. Java was not affected by this problem. At first, I updated the RFC to the latest status. https://wiki.php.net/rfc/rng_extension I need some time to think about the current issue. I understand its usefulness, but I feel uncomfortable with the fact that the NumberGenerator generates a string. I also wonder about the point of changing RNG to XorShift128Plus. There are a number of derived implementations, which RNG do you think is more suitable? Regards, Go Kudo