Re: [HACKERS] gaussian distribution pgbench -- splits v4

Fabien COELHO Fri, 01 Aug 2014 00:28:27 -0700


Hello,

Version one is "k' = 1 + (a * k + b) modulo n" with "a" prime withrespect to "n", "n" being the number of keys. This is nearly possible,but for the modulo operator which is currently missing, and that I'mplanning to submit for this very reason, but probably another time.
That's pretty crude,

Yep. It is very simple, it is much better than nothing, and for a databasetest is may be "good enough".

although I don't object to a modulo operator. It would be nice to beable to use a truly random permutation, which is not hard to generatebut probably requires O(n) storage, likely a problem for large scalefactors.

That is indeed the actual issue in my mind. I was thinking of permutationswith a formula, which are not so easy to find and may end-up looking like"(a*k+b)%n" anyway. I had the same issue for generating random data for aschema (see http://www.coelho.net/datafiller.html).

Maybe somebody who knows more math than I do (like you, probably!) cancome up with something more clever.

I can certainly suggest other formula, but that does not mean beautifulcode, thus would probably be rejected. I'll see.

An alternative to this whole process may be to hash/modulo a non uniformrandom value.


      id = 1 + hash(some-random()) % n

But the hashing changes the distribution as it adds collisions, so I haveto think about how to be able to control the distribution in that case,and what hash function to use.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gaussian distribution pgbench -- splits v4

Reply via email to