[Bug libstdc++/66302] Wrong output sequence of double precision uniform C++ RNG distribution

2015-05-28 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66302

Jonathan Wakely redi at gcc dot gnu.org changed:

   What|Removed |Added

 Resolution|FIXED   |INVALID

--- Comment #7 from Jonathan Wakely redi at gcc dot gnu.org ---
Oops, wrong resolution.


[Bug libstdc++/66302] Wrong output sequence of double precision uniform C++ RNG distribution

2015-05-28 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66302

Jonathan Wakely redi at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Jonathan Wakely redi at gcc dot gnu.org ---
Not a bug then.


[Bug libstdc++/66302] Wrong output sequence of double precision uniform C++ RNG distribution

2015-05-28 Thread andrey.kolesov at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66302

--- Comment #5 from Andrey Kolesov andrey.kolesov at intel dot com ---
Ok, we understand your points. Obviously there are two approaches: 
1) provide maximum random bits in all precisions but not preserve sequences
2) provide reasonable number of random bits but preserve sequences for all
precisions.

Both approaches have their own customers. First one is applicable to fine
accuracy testing suits, for example. 
Second one is more for Monte-Carlo simulations, finance and general data
analytic, etc.
Our team chosen the second approach during MKL VSL design since we have a lot
of requests and strict requirements 
from important customers (mostly FSI) to generate the same sequence for all
CPU's and precisions.
Moreover a number of them just did refuse solutions when random values sequence
potentially could be different at different systems 
and for different precision environments. That's really important feature.
What about accuracy: our experience based on customers communication says that
~32 random mantissa bits is quite enough 
for most of statistical applications. This case the difference between rounded
and exact random value is about 10^(-8).
During Monte-Carlo simulations these generated random values are being
transformed 
by various math operations and most of them have 1/sqrt(N) statistical error
for parameter to be evaluated, where N - number of generated random values.
When N = 10^10 the simulation accuracy is about 10^(-5) - this case we did not
even see the ~10^(-8) generator error.
It means that extra accuracy of double precision generator is almost useless
for such kind of applications.
At the same time we understand that sometimes customers need full accuracy of
high precision generators. This case 
MKL provides uniform bits generator versions with raw output type for own
customer scaling.
Ideally it would be nice to have distributions parametrization by accuracy to
satisfy different customer needs, 
but that would require standard update.
What about this issue - we agree to close it.
Thanks to all.


[Bug libstdc++/66302] Wrong output sequence of double precision uniform C++ RNG distribution

2015-05-27 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66302

--- Comment #1 from Jonathan Wakely redi at gcc dot gnu.org ---
(In reply to Andrey Kolesov from comment #0)
 Double precision uniform distribution of C++ random number generators from
 libstdc++ produces sequence which is significantly different from floating
 point and integer (direct engine) generators.
 Double precision sequence contains only every second (odd: 1,3,5,7...)
 element from float and integer sequences. Generally generator output
 shouldn't depend on output data type up to precision bounds.

Where does it say that in the standard?

Your code says:

/* All three sequences expected to be equal up to precision bounds */

Where does the standard say you should expect that?


[Bug libstdc++/66302] Wrong output sequence of double precision uniform C++ RNG distribution

2015-05-27 Thread andrey.kolesov at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66302

--- Comment #2 from Andrey Kolesov andrey.kolesov at intel dot com ---
(In reply to Jonathan Wakely from comment #1)
 (In reply to Andrey Kolesov from comment #0)
  Double precision uniform distribution of C++ random number generators from
  libstdc++ produces sequence which is significantly different from floating
  point and integer (direct engine) generators.
  Double precision sequence contains only every second (odd: 1,3,5,7...)
  element from float and integer sequences. Generally generator output
  shouldn't depend on output data type up to precision bounds.
 
 Where does it say that in the standard?
 
 Your code says:
 
 /* All three sequences expected to be equal up to precision bounds */
 
 Where does the standard say you should expect that?

Right, the C++ standard says that The algorithms for producing each of the
specified distributions are implementation-defined (25.8.1.3). 
The Standard has strict requirements for engines to satisfy equations (for
example, for rand0 LCG :  x[i+1] ← (a*x[i]+c) mod m ) but not for the
distributions based on these engines.
Formally it is not a bug, I agree, you may close the issue.

From perspective of data scientist or analytic application developer, the way
in which double precision output of the uniform distribution generator is
produced is questionable.
Let's consider the following scenario: a data scientist designs a stochastic
model and uses RNG for the model based Monte Carlo simulations.
To tune the parameters of the model he/she needs to fix a seed and, say, single
precision random number sequence. 
During tuning of the parameters, the researcher understands that single
precision is not sufficient for modeling goals and he needs to switch double
precision sequence produced with the same RNG/seed.
However, switching to double precision with C++ RNGs will result in different
values of the parameters. You can imagine amount of efforts necessary to
understand what went wrong with the model, tuning, and simulations.

Pseudo random generators are indeed deterministic algorithms (almost like other
math functions - sin, exp...) which produce sequences which look like random. 
But (float)sin(x1) is always equal to (double)sin(x1) up to precision. The same
behavior we can expect from RNGs, though the standard doesn't guarantee this. 
Our team is responsible for statistical features including random number
generators in Intel(R) Math Kernel Library. Intel(R) MKL RNGs were designed
keeping in mind multiple requirements including similar up to precision
sequences produced by the double and single versions of the same distribution
relying on the same algorithm and fixed seed.

Does it make sense?
Does it make sense to approach C++ Standard WG to get their perspective and
understand whether this specific behavior of the generators should be clearly
described in the standard?

[Bug libstdc++/66302] Wrong output sequence of double precision uniform C++ RNG distribution

2015-05-27 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66302

--- Comment #3 from Marc Glisse glisse at gcc dot gnu.org ---
 Does it make sense?

So you expect the random generator for float to throw away half of the random
bits it is getting from the engine, just for this questionable benefit? And
actually 75%, so it matches with __float128? That seems wrong to me.


[Bug libstdc++/66302] Wrong output sequence of double precision uniform C++ RNG distribution

2015-05-27 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66302

--- Comment #4 from Jonathan Wakely redi at gcc dot gnu.org ---
I'll just note that the libc++ implementation has the same behaviour. The
precise numbers are different (probably due to a slightly different
implementation of uniform_real_distribution) but the pattern seen when
comparing float and double output is the same.