[Bug libstdc++/66302] Wrong output sequence of double precision uniform C++ RNG distribution
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66302 Jonathan Wakely redi at gcc dot gnu.org changed: What|Removed |Added Resolution|FIXED |INVALID --- Comment #7 from Jonathan Wakely redi at gcc dot gnu.org --- Oops, wrong resolution.
[Bug libstdc++/66302] Wrong output sequence of double precision uniform C++ RNG distribution
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66302 Jonathan Wakely redi at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #6 from Jonathan Wakely redi at gcc dot gnu.org --- Not a bug then.
[Bug libstdc++/66302] Wrong output sequence of double precision uniform C++ RNG distribution
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66302 --- Comment #5 from Andrey Kolesov andrey.kolesov at intel dot com --- Ok, we understand your points. Obviously there are two approaches: 1) provide maximum random bits in all precisions but not preserve sequences 2) provide reasonable number of random bits but preserve sequences for all precisions. Both approaches have their own customers. First one is applicable to fine accuracy testing suits, for example. Second one is more for Monte-Carlo simulations, finance and general data analytic, etc. Our team chosen the second approach during MKL VSL design since we have a lot of requests and strict requirements from important customers (mostly FSI) to generate the same sequence for all CPU's and precisions. Moreover a number of them just did refuse solutions when random values sequence potentially could be different at different systems and for different precision environments. That's really important feature. What about accuracy: our experience based on customers communication says that ~32 random mantissa bits is quite enough for most of statistical applications. This case the difference between rounded and exact random value is about 10^(-8). During Monte-Carlo simulations these generated random values are being transformed by various math operations and most of them have 1/sqrt(N) statistical error for parameter to be evaluated, where N - number of generated random values. When N = 10^10 the simulation accuracy is about 10^(-5) - this case we did not even see the ~10^(-8) generator error. It means that extra accuracy of double precision generator is almost useless for such kind of applications. At the same time we understand that sometimes customers need full accuracy of high precision generators. This case MKL provides uniform bits generator versions with raw output type for own customer scaling. Ideally it would be nice to have distributions parametrization by accuracy to satisfy different customer needs, but that would require standard update. What about this issue - we agree to close it. Thanks to all.
[Bug libstdc++/66302] Wrong output sequence of double precision uniform C++ RNG distribution
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66302 --- Comment #1 from Jonathan Wakely redi at gcc dot gnu.org --- (In reply to Andrey Kolesov from comment #0) Double precision uniform distribution of C++ random number generators from libstdc++ produces sequence which is significantly different from floating point and integer (direct engine) generators. Double precision sequence contains only every second (odd: 1,3,5,7...) element from float and integer sequences. Generally generator output shouldn't depend on output data type up to precision bounds. Where does it say that in the standard? Your code says: /* All three sequences expected to be equal up to precision bounds */ Where does the standard say you should expect that?
[Bug libstdc++/66302] Wrong output sequence of double precision uniform C++ RNG distribution
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66302 --- Comment #2 from Andrey Kolesov andrey.kolesov at intel dot com --- (In reply to Jonathan Wakely from comment #1) (In reply to Andrey Kolesov from comment #0) Double precision uniform distribution of C++ random number generators from libstdc++ produces sequence which is significantly different from floating point and integer (direct engine) generators. Double precision sequence contains only every second (odd: 1,3,5,7...) element from float and integer sequences. Generally generator output shouldn't depend on output data type up to precision bounds. Where does it say that in the standard? Your code says: /* All three sequences expected to be equal up to precision bounds */ Where does the standard say you should expect that? Right, the C++ standard says that The algorithms for producing each of the specified distributions are implementation-defined (25.8.1.3). The Standard has strict requirements for engines to satisfy equations (for example, for rand0 LCG : x[i+1] ← (a*x[i]+c) mod m ) but not for the distributions based on these engines. Formally it is not a bug, I agree, you may close the issue. From perspective of data scientist or analytic application developer, the way in which double precision output of the uniform distribution generator is produced is questionable. Let's consider the following scenario: a data scientist designs a stochastic model and uses RNG for the model based Monte Carlo simulations. To tune the parameters of the model he/she needs to fix a seed and, say, single precision random number sequence. During tuning of the parameters, the researcher understands that single precision is not sufficient for modeling goals and he needs to switch double precision sequence produced with the same RNG/seed. However, switching to double precision with C++ RNGs will result in different values of the parameters. You can imagine amount of efforts necessary to understand what went wrong with the model, tuning, and simulations. Pseudo random generators are indeed deterministic algorithms (almost like other math functions - sin, exp...) which produce sequences which look like random. But (float)sin(x1) is always equal to (double)sin(x1) up to precision. The same behavior we can expect from RNGs, though the standard doesn't guarantee this. Our team is responsible for statistical features including random number generators in Intel(R) Math Kernel Library. Intel(R) MKL RNGs were designed keeping in mind multiple requirements including similar up to precision sequences produced by the double and single versions of the same distribution relying on the same algorithm and fixed seed. Does it make sense? Does it make sense to approach C++ Standard WG to get their perspective and understand whether this specific behavior of the generators should be clearly described in the standard?
[Bug libstdc++/66302] Wrong output sequence of double precision uniform C++ RNG distribution
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66302 --- Comment #3 from Marc Glisse glisse at gcc dot gnu.org --- Does it make sense? So you expect the random generator for float to throw away half of the random bits it is getting from the engine, just for this questionable benefit? And actually 75%, so it matches with __float128? That seems wrong to me.
[Bug libstdc++/66302] Wrong output sequence of double precision uniform C++ RNG distribution
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66302 --- Comment #4 from Jonathan Wakely redi at gcc dot gnu.org --- I'll just note that the libc++ implementation has the same behaviour. The precise numbers are different (probably due to a slightly different implementation of uniform_real_distribution) but the pattern seen when comparing float and double output is the same.