On Tue, 30 Jan 2024 at 12:47, Aleksander Alekseev <aleksan...@timescale.com> wrote: > > Maybe I'm missing something but I'm not sure if I understand what this > test tests particularly: > > ``` > -- There should be no triple duplicates in 1000 full-range 32-bit random() > -- values. (Each of the C(1000, 3) choices of triplets from the 1000 values > -- has a probability of 1/(2^32)^2 of being a triple duplicate, so the > -- average number of triple duplicates is 1000 * 999 * 998 / 6 / 2^64, which > -- is roughly 9e-12.) > SELECT r, count(*) > FROM (SELECT random(-2147483648, 2147483647) r > FROM generate_series(1, 1000)) ss > GROUP BY r HAVING count(*) > 2; > ``` > > The intent seems to be to check the fact that random numbers are > distributed evenly. If this is the case I think the test is wrong. The > sequence of numbers 100, 100, 100, 100, 100 is as random as 99, 8, 4, > 12, 45 and every particular sequence has low probability. All in all > personally I would argue that this is a meaningless test that just > fails with a low probability. Same for the tests that follow below. >
I'm following the same approach used to test the existing random functions, and the idea is the same. For example, this existing test: -- There should be no duplicates in 1000 random() values. -- (Assuming 52 random bits in the float8 results, we could -- take as many as 3000 values and still have less than 1e-9 chance -- of failure, per https://en.wikipedia.org/wiki/Birthday_problem) SELECT r, count(*) FROM (SELECT random() r FROM generate_series(1, 1000)) ss GROUP BY r HAVING count(*) > 1; If the underlying PRNG were non-uniform, or the method of reduction to the required range was flawed in some way that reduced the number of actual possible return values, then the probability of duplicates would be increased. A non-uniform distribution would probably be caught by the KS tests, but uniform gaps in the possible outputs might not be, so I think this test still has value. > The proper way of testing PRNG would be to call setseed() and compare > return values with expected ones. I don't mind testing the proposed > invariants but they should do this after calling setseed(). Currently > the patch places the tests right before the call. > There are also new tests of that nature, following the call to setseed(0.5). They're useful for a quick visual check of the results, and confirming the expected number of digits after the decimal point in the numeric case. However, I think those tests are insufficient on their own. Regards, Dean