On Tue, 30 Jan 2024 at 12:47, Aleksander Alekseev
<aleksan...@timescale.com> wrote:
>
> Maybe I'm missing something but I'm not sure if I understand what this
> test tests particularly:
>
> ```
> -- There should be no triple duplicates in 1000 full-range 32-bit random()
> -- values.  (Each of the C(1000, 3) choices of triplets from the 1000 values
> -- has a probability of 1/(2^32)^2 of being a triple duplicate, so the
> -- average number of triple duplicates is 1000 * 999 * 998 / 6 / 2^64, which
> -- is roughly 9e-12.)
> SELECT r, count(*)
> FROM (SELECT random(-2147483648, 2147483647) r
>       FROM generate_series(1, 1000)) ss
> GROUP BY r HAVING count(*) > 2;
> ```
>
> The intent seems to be to check the fact that random numbers are
> distributed evenly. If this is the case I think the test is wrong. The
> sequence of numbers 100, 100, 100, 100, 100 is as random as 99, 8, 4,
> 12, 45 and every particular sequence has low probability. All in all
> personally I would argue that this is a meaningless test that just
> fails with a low probability. Same for the tests that follow below.
>

I'm following the same approach used to test the existing random
functions, and the idea is the same. For example, this existing test:

-- There should be no duplicates in 1000 random() values.
-- (Assuming 52 random bits in the float8 results, we could
-- take as many as 3000 values and still have less than 1e-9 chance
-- of failure, per https://en.wikipedia.org/wiki/Birthday_problem)
SELECT r, count(*)
FROM (SELECT random() r FROM generate_series(1, 1000)) ss
GROUP BY r HAVING count(*) > 1;

If the underlying PRNG were non-uniform, or the method of reduction to
the required range was flawed in some way that reduced the number of
actual possible return values, then the probability of duplicates
would be increased. A non-uniform distribution would probably be
caught by the KS tests, but uniform gaps in the possible outputs might
not be, so I think this test still has value.

> The proper way of testing PRNG would be to call setseed() and compare
> return values with expected ones. I don't mind testing the proposed
> invariants but they should do this after calling setseed(). Currently
> the patch places the tests right before the call.
>

There are also new tests of that nature, following the call to
setseed(0.5). They're useful for a quick visual check of the results,
and confirming the expected number of digits after the decimal point
in the numeric case. However, I think those tests are insufficient on
their own.

Regards,
Dean


Reply via email to