Re: Functions to return random numbers in a given range
On Tue, 26 Mar 2024 at 06:57, Dean Rasheed wrote: > > Based on the reviews so far, I think this is ready for commit, so > unless anyone objects, I will do so in a day or so. > Committed. Thanks for the reviews. Regards, Dean
Re: Functions to return random numbers in a given range
On Tue, 27 Feb 2024 at 17:33, Dean Rasheed wrote: > > On Sat, 24 Feb 2024 at 17:10, Tomas Vondra > > > > I did a quick review and a little bit of testing on the patch today. I > > think it's a good/useful idea, and I think the code is ready to go (the > > code is certainly much cleaner than anything I'd written ...). > Based on the reviews so far, I think this is ready for commit, so unless anyone objects, I will do so in a day or so. As a quick summary, this adds a new file: src/backend/utils/adt/pseudorandomfuncs.c which contains SQL-callable functions that access a single shared pseudorandom number generator, whose state is private to that file. Currently the functions are: random() returns double precision [moved from float.c] random(min integer, max integer) returns integer [new] random(min bigint, max bigint) returns bigint [new] random(min numeric, max numeric) returns numeric [new] random_normal() returns double precision [moved from float.c] setseed(seed double precision) returns void [moved from float.c] It's possible that functions to return other random distributions or other datatypes might get added in the future, but I have no plans to do so at the moment. Regards, Dean
Re: Functions to return random numbers in a given range
On Sat, 24 Feb 2024 at 17:10, Tomas Vondra wrote: > > Hi Dean, > > I did a quick review and a little bit of testing on the patch today. I > think it's a good/useful idea, and I think the code is ready to go (the > code is certainly much cleaner than anything I'd written ...). > Thanks for reviewing! > I do have one minor comments regarding the docs - it refers to "random > functions" in a couple places, which sounds to me as if it was talking > about some functions arbitrarily taken from some list, although it > clearly means "functions generating random numbers". (I realize this > might be just due to me not being native speaker.) > Yes, I think you're right, that wording was a bit clumsy. Attached is an update that's hopefully a bit better. > Did you think about adding more functions generating either other types > of data distributions (now we have uniform and normal), or random data > for other data types (I often need random strings, for example)? > > Of course, I'm not saying this patch needs to do that. But perhaps it > might affect how we name stuff to make it "extensible". > I don't have any plans to add more random functions, but I did think about it from that perspective. Currently we have "random" and "random_normal", so the natural extension would be "random_${distribution}" for other data distributions, with "uniform" as the default distribution, if omitted. For different result datatypes, it ought to be mostly possible to determine the result type from the arguments. There might be some exceptions, like maybe "random_bytes(length)" to generate a byte array, but I think that would be OK. Regards, Dean From b1a63ecce667377435dc16fc262509bff2355b29 Mon Sep 17 00:00:00 2001 From: Dean Rasheed Date: Fri, 25 Aug 2023 10:42:38 +0100 Subject: [PATCH v3] Add random-number-in-range functions. This adds 3 functions: random(min int, max int) returns int random(min bigint, max bigint) returns bigint random(min numeric, max numeric) returns numeric Each returns a random number in the range [min, max]. In the numeric case, the result scale is Max(scale(min), scale(max)). --- doc/src/sgml/func.sgml| 43 ++- src/backend/utils/adt/Makefile| 1 + src/backend/utils/adt/float.c | 95 -- src/backend/utils/adt/meson.build | 1 + src/backend/utils/adt/numeric.c | 219 + src/backend/utils/adt/pseudorandomfuncs.c | 185 +++ src/common/pg_prng.c | 36 +++ src/include/catalog/pg_proc.dat | 12 + src/include/common/pg_prng.h | 1 + src/include/utils/numeric.h | 4 + src/test/regress/expected/random.out | 360 ++ src/test/regress/sql/random.sql | 164 ++ 12 files changed, 1021 insertions(+), 100 deletions(-) create mode 100644 src/backend/utils/adt/pseudorandomfuncs.c diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index e5fa82c161..e39e569fb6 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -1862,6 +1862,39 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in + + + + random + +random ( min integer, max integer ) +integer + + +random ( min bigint, max bigint ) +bigint + + +random ( min numeric, max numeric ) +numeric + + +Returns a random value in the range +min = x = max. +For type numeric, the result will have the same number of +fractional decimal digits as min or +max, whichever has more. + + +random(1, 10) +7 + + +random(-0.499, 0.499) +0.347 + + + @@ -1906,19 +1939,19 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in - The random() function uses a deterministic - pseudo-random number generator. + The random() and random_normal() + functions listed in use a + deterministic pseudo-random number generator. It is fast but not suitable for cryptographic applications; see the module for a more secure alternative. If setseed() is called, the series of results of - subsequent random() calls in the current session + subsequent calls to these functions in the current session can be repeated by re-issuing setseed() with the same argument. Without any prior setseed() call in the same - session, the first random() call obtains a seed + session, the first call to any of these functions obtains a seed from a platform-dependent source of random bits. - These remarks hold equally for random_normal(). diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile index 199eae525d..610ccf2f79 100644 --- a/src/backend/utils/adt/Makefile +++
Re: Functions to return random numbers in a given range
Hi Dean, I did a quick review and a little bit of testing on the patch today. I think it's a good/useful idea, and I think the code is ready to go (the code is certainly much cleaner than anything I'd written ...). I do have one minor comments regarding the docs - it refers to "random functions" in a couple places, which sounds to me as if it was talking about some functions arbitrarily taken from some list, although it clearly means "functions generating random numbers". (I realize this might be just due to me not being native speaker.) Did you think about adding more functions generating either other types of data distributions (now we have uniform and normal), or random data for other data types (I often need random strings, for example)? Of course, I'm not saying this patch needs to do that. But perhaps it might affect how we name stuff to make it "extensible". regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Functions to return random numbers in a given range
On Tue, 30 Jan 2024 at 12:47, Aleksander Alekseev wrote: > > Maybe I'm missing something but I'm not sure if I understand what this > test tests particularly: > > ``` > -- There should be no triple duplicates in 1000 full-range 32-bit random() > -- values. (Each of the C(1000, 3) choices of triplets from the 1000 values > -- has a probability of 1/(2^32)^2 of being a triple duplicate, so the > -- average number of triple duplicates is 1000 * 999 * 998 / 6 / 2^64, which > -- is roughly 9e-12.) > SELECT r, count(*) > FROM (SELECT random(-2147483648, 2147483647) r > FROM generate_series(1, 1000)) ss > GROUP BY r HAVING count(*) > 2; > ``` > > The intent seems to be to check the fact that random numbers are > distributed evenly. If this is the case I think the test is wrong. The > sequence of numbers 100, 100, 100, 100, 100 is as random as 99, 8, 4, > 12, 45 and every particular sequence has low probability. All in all > personally I would argue that this is a meaningless test that just > fails with a low probability. Same for the tests that follow below. > I'm following the same approach used to test the existing random functions, and the idea is the same. For example, this existing test: -- There should be no duplicates in 1000 random() values. -- (Assuming 52 random bits in the float8 results, we could -- take as many as 3000 values and still have less than 1e-9 chance -- of failure, per https://en.wikipedia.org/wiki/Birthday_problem) SELECT r, count(*) FROM (SELECT random() r FROM generate_series(1, 1000)) ss GROUP BY r HAVING count(*) > 1; If the underlying PRNG were non-uniform, or the method of reduction to the required range was flawed in some way that reduced the number of actual possible return values, then the probability of duplicates would be increased. A non-uniform distribution would probably be caught by the KS tests, but uniform gaps in the possible outputs might not be, so I think this test still has value. > The proper way of testing PRNG would be to call setseed() and compare > return values with expected ones. I don't mind testing the proposed > invariants but they should do this after calling setseed(). Currently > the patch places the tests right before the call. > There are also new tests of that nature, following the call to setseed(0.5). They're useful for a quick visual check of the results, and confirming the expected number of digits after the decimal point in the numeric case. However, I think those tests are insufficient on their own. Regards, Dean
Re: Functions to return random numbers in a given range
Hi, > Interestingly, the cfbot didn't pick up on the fact that it needed > rebasing. Anyway, the copyright years in the new file's header comment > needed updating, so here is a rebase doing that. Maybe I'm missing something but I'm not sure if I understand what this test tests particularly: ``` -- There should be no triple duplicates in 1000 full-range 32-bit random() -- values. (Each of the C(1000, 3) choices of triplets from the 1000 values -- has a probability of 1/(2^32)^2 of being a triple duplicate, so the -- average number of triple duplicates is 1000 * 999 * 998 / 6 / 2^64, which -- is roughly 9e-12.) SELECT r, count(*) FROM (SELECT random(-2147483648, 2147483647) r FROM generate_series(1, 1000)) ss GROUP BY r HAVING count(*) > 2; ``` The intent seems to be to check the fact that random numbers are distributed evenly. If this is the case I think the test is wrong. The sequence of numbers 100, 100, 100, 100, 100 is as random as 99, 8, 4, 12, 45 and every particular sequence has low probability. All in all personally I would argue that this is a meaningless test that just fails with a low probability. Same for the tests that follow below. The proper way of testing PRNG would be to call setseed() and compare return values with expected ones. I don't mind testing the proposed invariants but they should do this after calling setseed(). Currently the patch places the tests right before the call. -- Best regards, Aleksander Alekseev
Re: Functions to return random numbers in a given range
On Fri, 26 Jan 2024 at 20:44, David Zhang wrote: > > Thank you for the patch. > > I applied this patch manually to the master branch, resolving a conflict > in `numeric.h`. It successfully passed both `make check` and `make > check-world`. > Thanks for testing. Interestingly, the cfbot didn't pick up on the fact that it needed rebasing. Anyway, the copyright years in the new file's header comment needed updating, so here is a rebase doing that. Regards, Dean From 15d0ba981ff03eca7143726fe7512adf00ee3a84 Mon Sep 17 00:00:00 2001 From: Dean Rasheed Date: Fri, 25 Aug 2023 10:42:38 +0100 Subject: [PATCH v2] Add random-number-in-range functions. This adds 3 functions: random(min int, max int) returns int random(min bigint, max bigint) returns bigint random(min numeric, max numeric) returns numeric Each returns a random number in the range [min, max]. In the numeric case, the result scale is Max(scale(min), scale(max)). --- doc/src/sgml/func.sgml| 39 ++- src/backend/utils/adt/Makefile| 1 + src/backend/utils/adt/float.c | 95 -- src/backend/utils/adt/meson.build | 1 + src/backend/utils/adt/numeric.c | 219 + src/backend/utils/adt/pseudorandomfuncs.c | 185 +++ src/common/pg_prng.c | 36 +++ src/include/catalog/pg_proc.dat | 12 + src/include/common/pg_prng.h | 1 + src/include/utils/numeric.h | 4 + src/test/regress/expected/random.out | 360 ++ src/test/regress/sql/random.sql | 164 ++ 12 files changed, 1017 insertions(+), 100 deletions(-) create mode 100644 src/backend/utils/adt/pseudorandomfuncs.c diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 6788ba8ef4..6d76fb5853 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -1862,6 +1862,36 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in + + + + random + +random ( min integer, max integer ) +integer + + +random ( min bigint, max bigint ) +bigint + + +random ( min numeric, max numeric ) +numeric + + +Return a random value in the range +min = x = max. + + +random(1, 10) +7 + + +random(-0.499, 0.499) +0.347 + + + @@ -1906,19 +1936,18 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in - The random() function uses a deterministic - pseudo-random number generator. + The random functions listed in + use a deterministic pseudo-random number generator. It is fast but not suitable for cryptographic applications; see the module for a more secure alternative. If setseed() is called, the series of results of - subsequent random() calls in the current session + subsequent calls to these random functions in the current session can be repeated by re-issuing setseed() with the same argument. Without any prior setseed() call in the same - session, the first random() call obtains a seed + session, the first call to any of these random functions obtains a seed from a platform-dependent source of random bits. - These remarks hold equally for random_normal(). diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile index 199eae525d..610ccf2f79 100644 --- a/src/backend/utils/adt/Makefile +++ b/src/backend/utils/adt/Makefile @@ -82,6 +82,7 @@ OBJS = \ pg_lsn.o \ pg_upgrade_support.o \ pgstatfuncs.o \ + pseudorandomfuncs.o \ pseudotypes.o \ quote.o \ rangetypes.o \ diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c index 901edcc896..cbbb8aecaf 100644 --- a/src/backend/utils/adt/float.c +++ b/src/backend/utils/adt/float.c @@ -21,10 +21,8 @@ #include "catalog/pg_type.h" #include "common/int.h" -#include "common/pg_prng.h" #include "common/shortest_dec.h" #include "libpq/pqformat.h" -#include "miscadmin.h" #include "utils/array.h" #include "utils/float.h" #include "utils/fmgrprotos.h" @@ -64,10 +62,6 @@ float8 degree_c_sixty = 60.0; float8 degree_c_one_half = 0.5; float8 degree_c_one = 1.0; -/* State for drandom() and setseed() */ -static bool drandom_seed_set = false; -static pg_prng_state drandom_seed; - /* Local function prototypes */ static double sind_q1(double x); static double cosd_q1(double x); @@ -2785,95 +2779,6 @@ derfc(PG_FUNCTION_ARGS) } -/* == RANDOM FUNCTIONS == */ - - -/* - * initialize_drandom_seed - initialize drandom_seed if not yet done - */ -static void -initialize_drandom_seed(void) -{ - /* Initialize random seed, if not done yet in this process */ - if (unlikely(!drandom_seed_set)) - { - /* - * If possible, initialize the
Re: Functions to return random numbers in a given range
On Thu, 28 Dec 2023 at 07:34, jian he wrote: > > Your patch works. > performance is the best amount for other options in [0]. > I don't have deep knowledge about which one is more random. > Thanks for testing. > Currently we have to explicitly mention the lower and upper bound. > but can we do this: > just give me an int, int means the int data type can be represented. > or just give me a random bigint. > but for numeric, the full numeric values that can be represented are very big. > > Maybe we can use the special value null to achieve this > like use > select random(NULL::int,null) > to represent a random int in the full range of integers values can be > represented. > Hmm, I don't particularly like that idea. It seems pretty ugly. Now that we support literal integers in hex, with underscores, it's relatively easy to pass INT_MIN/MAX as arguments to these functions, if that's what you need. I think if we were going to have a shorthand for getting full-range random integers, it would probably be better to introduce separate no-arg functions for that. I'm not really sure if that's a sufficiently common use case to justify the effort though. Regards, Dean
Re: Functions to return random numbers in a given range
Thank you for the patch. I applied this patch manually to the master branch, resolving a conflict in `numeric.h`. It successfully passed both `make check` and `make check-world`. Best regards, David
Re: Functions to return random numbers in a given range
On Fri, Dec 22, 2023 at 1:07 AM Dean Rasheed wrote: > > Attached is a patch that adds 3 SQL-callable functions to return > random integer/numeric values chosen uniformly from a given range: > > random(min int, max int) returns int > random(min bigint, max bigint) returns bigint > random(min numeric, max numeric) returns numeric > > The return value is in the range [min, max], and in the numeric case, > the result scale equals Max(scale(min), scale(max)), so it can be used > to generate large random integers, as well as decimals. > > The goal is to provide simple, easy-to-use functions that operate > correctly over arbitrary ranges, which is trickier than it might seem > using the existing random() function. The main advantages are: > > 1. Support for arbitrary bounds (provided that max >= min). A SQL or > PL/pgSQL implementation based on the existing random() function can > suffer from integer overflow if the difference max-min is too large. > Your patch works. performance is the best amount for other options in [0]. I don't have deep knowledge about which one is more random. Currently we have to explicitly mention the lower and upper bound. but can we do this: just give me an int, int means the int data type can be represented. or just give me a random bigint. but for numeric, the full numeric values that can be represented are very big. Maybe we can use the special value null to achieve this like use select random(NULL::int,null) to represent a random int in the full range of integers values can be represented. Do you think it makes sense? [0] https://www.postgresql.org/message-id/CAEG8a3LcYXjNU1f2bxMm9c6ThQsPoTcvYO_kOnifx3aGXkbgPw%40mail.gmail.com
Re: Functions to return random numbers in a given range
Hi čt 21. 12. 2023 v 18:06 odesílatel Dean Rasheed napsal: > Attached is a patch that adds 3 SQL-callable functions to return > random integer/numeric values chosen uniformly from a given range: > > random(min int, max int) returns int > random(min bigint, max bigint) returns bigint > random(min numeric, max numeric) returns numeric > The return value is in the range [min, max], and in the numeric case, > the result scale equals Max(scale(min), scale(max)), so it can be used > to generate large random integers, as well as decimals. > > The goal is to provide simple, easy-to-use functions that operate > correctly over arbitrary ranges, which is trickier than it might seem > using the existing random() function. The main advantages are: > > 1. Support for arbitrary bounds (provided that max >= min). A SQL or > PL/pgSQL implementation based on the existing random() function can > suffer from integer overflow if the difference max-min is too large. > > 2. Uniform results over the full range. It's easy to overlook the fact > that in a naive implementation doing something like > "((max-min)*random()+min)::int", the endpoint values will be half as > likely as any other value, since casting to integer rounds to nearest. > > 3. Makes better use of the underlying PRNG, not limited to the 52-bits > of double precision values. > > 4. Simpler and more efficient generation of random numeric values. > This is something I have commonly wanted in the past, and have usually > resorted to hacks involving multiple calls to random() to build > strings of digits, which is horribly slow, and messy. > > The implementation moves the existing random functions to a new source > file, so the new functions all share a common PRNG state with the > existing random functions, and that state is kept private to that > file. > +1 Regards Pavel > Regards, > Dean >
Functions to return random numbers in a given range
Attached is a patch that adds 3 SQL-callable functions to return random integer/numeric values chosen uniformly from a given range: random(min int, max int) returns int random(min bigint, max bigint) returns bigint random(min numeric, max numeric) returns numeric The return value is in the range [min, max], and in the numeric case, the result scale equals Max(scale(min), scale(max)), so it can be used to generate large random integers, as well as decimals. The goal is to provide simple, easy-to-use functions that operate correctly over arbitrary ranges, which is trickier than it might seem using the existing random() function. The main advantages are: 1. Support for arbitrary bounds (provided that max >= min). A SQL or PL/pgSQL implementation based on the existing random() function can suffer from integer overflow if the difference max-min is too large. 2. Uniform results over the full range. It's easy to overlook the fact that in a naive implementation doing something like "((max-min)*random()+min)::int", the endpoint values will be half as likely as any other value, since casting to integer rounds to nearest. 3. Makes better use of the underlying PRNG, not limited to the 52-bits of double precision values. 4. Simpler and more efficient generation of random numeric values. This is something I have commonly wanted in the past, and have usually resorted to hacks involving multiple calls to random() to build strings of digits, which is horribly slow, and messy. The implementation moves the existing random functions to a new source file, so the new functions all share a common PRNG state with the existing random functions, and that state is kept private to that file. Regards, Dean From 0b7015668387c337114adb4b3c24fe1d8053bf9c Mon Sep 17 00:00:00 2001 From: Dean Rasheed Date: Fri, 25 Aug 2023 10:42:38 +0100 Subject: [PATCH v1] Add random-number-in-range functions. This adds 3 functions: random(min int, max int) returns int random(min bigint, max bigint) returns bigint random(min numeric, max numeric) returns numeric Each returns a random number in the range [min, max]. In the numeric case, the result scale is Max(scale(min), scale(max)). --- doc/src/sgml/func.sgml| 39 ++- src/backend/utils/adt/Makefile| 1 + src/backend/utils/adt/float.c | 95 -- src/backend/utils/adt/meson.build | 1 + src/backend/utils/adt/numeric.c | 219 + src/backend/utils/adt/pseudorandomfuncs.c | 185 +++ src/common/pg_prng.c | 36 +++ src/include/catalog/pg_proc.dat | 12 + src/include/common/pg_prng.h | 1 + src/include/utils/numeric.h | 4 + src/test/regress/expected/random.out | 360 ++ src/test/regress/sql/random.sql | 164 ++ 12 files changed, 1017 insertions(+), 100 deletions(-) create mode 100644 src/backend/utils/adt/pseudorandomfuncs.c diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 20da3ed033..b0b65d81dc 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -1862,6 +1862,36 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in + + + + random + +random ( min integer, max integer ) +integer + + +random ( min bigint, max bigint ) +bigint + + +random ( min numeric, max numeric ) +numeric + + +Return a random value in the range +min = x = max. + + +random(1, 10) +7 + + +random(-0.499, 0.499) +0.347 + + + @@ -1906,19 +1936,18 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in - The random() function uses a deterministic - pseudo-random number generator. + The random functions listed in + use a deterministic pseudo-random number generator. It is fast but not suitable for cryptographic applications; see the module for a more secure alternative. If setseed() is called, the series of results of - subsequent random() calls in the current session + subsequent calls to these random functions in the current session can be repeated by re-issuing setseed() with the same argument. Without any prior setseed() call in the same - session, the first random() call obtains a seed + session, the first call to any of these random functions obtains a seed from a platform-dependent source of random bits. - These remarks hold equally for random_normal(). diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile index 199eae525d..610ccf2f79 100644 --- a/src/backend/utils/adt/Makefile +++ b/src/backend/utils/adt/Makefile @@ -82,6 +82,7 @@ OBJS = \ pg_lsn.o \ pg_upgrade_support.o \