Re: doc: Mention rand and srand limitations

2023-11-11 Thread Bruno Haible
Jeffrey Walton wrote:
> > The attached program tests the MT-safety. Results: OK on all platforms, 
> > except
> > ...
> 
> If rand() and friends do not need to be MT-safe, then does it even need a 
> test?

I wrote the test (but did not commit it into gnulib) because rand() is MT-safe
on GNU. If I made the incorrect assumption that it then would be MT-safe on all
platforms, other people may make the same assumption. Therefore it's worth
documenting the issue: it's a portability issue.

> I think the two tests of interest are:
> 
> (1) srand() produces a repeatable stream

I posted a test case for the Cygwin people here:
  
Since multithreaded use of rand() is better avoided (and random() used
instead), I'm not adding a workaround to this Cygwin bug to Gnulib.

> (2) rand() and friends produce a stream that is uniformly distributed

It is sufficiently well-known that rand() has a poor quality. ISO C 23
has an explicit reminder:
  "Recommended practice
   There are no guarantees as to the quality of the random sequence
   produced and some implementations are known to produce sequences
   with distressingly non-random low-order bits. Applications with
   particular requirements should use a generator that is known to be
   sufficient for their needs."

What I remember from earlier reading (decades ago) is that the two or three
lowest bits of the result of rand() are not equidistributed and therefore
should better be discarded; which is what I'm doing in the Gnulib code.

> For (2), compressibility is a poor man's randomness test.

I don't think good compressibility is equivalent to poor quality as a
random sequence. Good compressibility certainly implies poor quality.
But I conjecture that the opposite inference is not true: That you can
have poor quality pseudo-random generators that don't compress well.
Poor quality of pseudo-random numbers can mean
  - a small cycle length, or
  - not well equi-distributed.
Knuth [1] gives examples of pseudo-random number generators which look
good when you look at their distribution in 2 dimensions, but look bad
when considered in 3 dimensions, IIRC.

> You could also use Maurer's Universal Statistical Test for Random Bit
> Generators from Journal of Cryptology, 1992. Maurer's test is a test
> for physical generators. But a quality pseudo random number generator
> will follow a uniform distribution, and it should be indistinguishable
> from a physical generator. I know they are indistinguishable for a
> cryptographically secure pseudo random number generator.

Such a test would be useful for someone who needs good-quality pseudo-
random numbers _and_ uses an unknown implementation. But
  - rand() is known to be poor-quality,
  - *rand48 has a fixed implementation [2], therefore it's pointless
to want to test its quality.

Bruno

[1] Donald Knuth: The Art Of Computer Programming, vol II:
Seminumerical Algorithms
[2] https://pubs.opengroup.org/onlinepubs/9699919799/functions/drand48.html






Re: doc: Mention rand and srand limitations

2023-11-10 Thread Bruno Haible
I wrote:
> Results: OK on all platforms, except
> 
> FreeBSD 13.2  Expected value #7504 not found in multithreaded results. (4 
> failures among 100 runs)
> macOS 12.5Expected value #31 not found in multithreaded results.
> Solaris 10Expected value #95981 not found in multithreaded results.
> Solaris 11.4  Expected value #55372 not found in multithreaded results.
> OpenBSD 7.4   rand() is non-deterministic.
> Cygwin 3.4.6  rand() is non-deterministic.

And also:

musl libc 3.18  Expected value #13730 not found in multithreaded results. (3 
failures among 100 runs)
NetBSD 9.3  Expected value #751169 not found in multithreaded results. (1 
failure among 1000 runs)
AIX 7.1 Expected value #952507 not found in multithreaded results. (96 
failures among 100 runs)

Bruno






Re: doc: Mention rand and srand limitations

2023-11-10 Thread Jeffrey Walton
Hi Bruno,

On Fri, Nov 10, 2023 at 2:03 PM Bruno Haible  wrote:>
> rand() is not required to be multithread-safe. So, it's not a bug when rand()
> is not MT-safe on some platforms; it's merely a portability problem worth
> documenting.
>
> The attached program tests the MT-safety. Results: OK on all platforms, except
>
> FreeBSD 13.2  Expected value #7504 not found in multithreaded results. (4 
> failures among 100 runs)
> macOS 12.5Expected value #31 not found in multithreaded results.
> Solaris 10Expected value #95981 not found in multithreaded results.
> Solaris 11.4  Expected value #55372 not found in multithreaded results.
> OpenBSD 7.4   rand() is non-deterministic.
> Cygwin 3.4.6  rand() is non-deterministic.
>
> The latter statement regarding Cygwin is not correct. Looking into the Cygwin
> source code, it's clear that the problem is that srand() has no effect on
> other thread - which is a bug since it violates ISO C 23.
>
> On Haiku, rand() is MT-safe while random() is not.
>
>
> 2023-11-10  Bruno Haible  
>
> doc: Mention rand and srand limitations.
> * doc/posix-functions/rand.texi: Mention multithread-safety problem.
> * doc/posix-functions/srand.texi: Mention a Cygwin bug.
>
> diff --git a/doc/posix-functions/rand.texi b/doc/posix-functions/rand.texi
> index 2f36a548a7..48b0bc5758 100644
> --- a/doc/posix-functions/rand.texi
> +++ b/doc/posix-functions/rand.texi
> @@ -18,4 +18,7 @@
>  @item
>  This function is only defined as an inline function on some platforms:
>  Android 4.4.
> +@item
> +This function is not multithread-safe on some platforms:
> +macOS 12.5, FreeBSD 13.2, Solaris 11.4.
>  @end itemize
> diff --git a/doc/posix-functions/srand.texi b/doc/posix-functions/srand.texi
> index 3e8f4b429b..710a5a1270 100644
> --- a/doc/posix-functions/srand.texi
> +++ b/doc/posix-functions/srand.texi
> @@ -15,4 +15,8 @@
>  @item
>  This function is only defined as an inline function on some platforms:
>  Android 4.4.
> +@item
> +This function has no effect on @code{rand} invocations in other threads
> +on some platforms:
> +Cygwin 3.4.6.
>  @end itemize

If rand() and friends do not need to be MT-safe, then does it even need a test?

I think the two tests of interest are:

(1) srand() produces a repeatable stream
(2) rand() and friends produce a stream that is uniformly distributed

For (2), compressibility is a poor man's randomness test. 32 bytes
should not be compressible by an algorithm like zlib or zip. Set up a
loop to generate 255 or 1024 sets of 32-bytes, compress each set of
32-bytes, and ensure most of them are not compressible.

You could also use Maurer's Universal Statistical Test for Random Bit
Generators from Journal of Cryptology, 1992. Maurer's test is a test
for physical generators. But a quality pseudo random number generator
will follow a uniform distribution, and it should be indistinguishable
from a physical generator. I know they are indistinguishable for a
cryptographically secure pseudo random number generator.

While rand() is insecure (likely a LCG), it is supposed to produce a
stream that is uniformly distributed. It does not matter that you can
reverse the stream/recover the coefficents and predict the next
generator output. What matters is the uniform distribution, and
non-compressibility of the stream.

Jeff



doc: Mention rand and srand limitations

2023-11-10 Thread Bruno Haible
rand() is not required to be multithread-safe. So, it's not a bug when rand()
is not MT-safe on some platforms; it's merely a portability problem worth
documenting.

The attached program tests the MT-safety. Results: OK on all platforms, except

FreeBSD 13.2  Expected value #7504 not found in multithreaded results. (4 
failures among 100 runs)
macOS 12.5Expected value #31 not found in multithreaded results.
Solaris 10Expected value #95981 not found in multithreaded results.
Solaris 11.4  Expected value #55372 not found in multithreaded results.
OpenBSD 7.4   rand() is non-deterministic.
Cygwin 3.4.6  rand() is non-deterministic.

The latter statement regarding Cygwin is not correct. Looking into the Cygwin
source code, it's clear that the problem is that srand() has no effect on
other thread - which is a bug since it violates ISO C 23.

On Haiku, rand() is MT-safe while random() is not.


2023-11-10  Bruno Haible  

    doc: Mention rand and srand limitations.
* doc/posix-functions/rand.texi: Mention multithread-safety problem.
* doc/posix-functions/srand.texi: Mention a Cygwin bug.

diff --git a/doc/posix-functions/rand.texi b/doc/posix-functions/rand.texi
index 2f36a548a7..48b0bc5758 100644
--- a/doc/posix-functions/rand.texi
+++ b/doc/posix-functions/rand.texi
@@ -18,4 +18,7 @@
 @item
 This function is only defined as an inline function on some platforms:
 Android 4.4.
+@item
+This function is not multithread-safe on some platforms:
+macOS 12.5, FreeBSD 13.2, Solaris 11.4.
 @end itemize
diff --git a/doc/posix-functions/srand.texi b/doc/posix-functions/srand.texi
index 3e8f4b429b..710a5a1270 100644
--- a/doc/posix-functions/srand.texi
+++ b/doc/posix-functions/srand.texi
@@ -15,4 +15,8 @@
 @item
 This function is only defined as an inline function on some platforms:
 Android 4.4.
+@item
+This function has no effect on @code{rand} invocations in other threads
+on some platforms:
+Cygwin 3.4.6.
 @end itemize
/* Multithread-safety test for rand().
   Copyright (C) 2023 Free Software Foundation, Inc.

   This program is free software: you can redistribute it and/or modify
   it under the terms of the GNU General Public License as published by
   the Free Software Foundation, either version 3 of the License, or
   (at your option) any later version.

   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.

   You should have received a copy of the GNU General Public License
   along with this program.  If not, see <https://www.gnu.org/licenses/>.  */

/* Written by Bruno Haible , 2023.  */

/* Whether to help the scheduler through explicit yield().
   Uncomment this to see if the operating system has a fair scheduler.  */
#define EXPLICIT_YIELD 1

/* Number of simultaneous threads.  */
#define THREAD_COUNT 4

/* Number of rand() invocations operations performed in each thread.
   This value is chosen so that the unit test terminates quickly.
   To reliably determine whether a rand() implementation is multithread-safe,
   set REPEAT_COUNT to 100 and run the test 100 times:
 $ for i in `seq 100`; do ./test-random-mt; done
 */
#define REPEAT_COUNT 100

/* Specification.  */
#include 

#include 
#include 
#include 

#if EXPLICIT_YIELD
# include 
# define yield() sched_yield ()
#else
# define yield()
#endif

/* This test runs REPEAT_COUNT invocations of rand() in each thread and stores
   the result, then compares the first REPEAT_COUNT among these
 THREAD_COUNT * REPEAT_COUNT
   random numbers against a precomputed sequence with the same seed.  */

static void *
random_invocator_thread (void *arg)
{
  int *storage = (int *) arg;
  int repeat;

  for (repeat = 0; repeat < REPEAT_COUNT; repeat++)
{
  storage[repeat] = rand ();
  yield ();
}

  return NULL;
}

int
main ()
{
  unsigned int seed = 19891109;

  /* First, get the expected sequence of rand() results.  */
  srand (seed);
  int *expected = (int *) malloc (REPEAT_COUNT * sizeof (int));
  assert (expected != NULL);
  {
int repeat;
for (repeat = 0; repeat < REPEAT_COUNT; repeat++)
  expected[repeat] = rand ();
  }

  /* Then, run REPEAT_COUNT invocations of rand() each, in THREAD_COUNT
 separate threads.  */
  pthread_t threads[THREAD_COUNT];
  int *thread_results[THREAD_COUNT];
  srand (seed);
  {
int i;
for (i = 0; i < THREAD_COUNT; i++)
  {
thread_results[i] = (int *) malloc (REPEAT_COUNT * sizeof (int));
assert (thread_results[i] != NULL);
  }
for (i = 0; i < THREAD_COUNT; i++)
  assert (pthread_create (&threads[i], NULL, random_invocator_thread, thread_results[i]) == 0);
  }

  /* Wait for the threads to terminate.  */
  {
int i;
for (i = 0; i < THREAD_COUNT; i++)
  assert (pthread_join (threads[i], N