Re: [HACKERS] [WIP] Zipfian distribution in pgbench
On Fri, Jul 7, 2017 at 12:45 AM Alik Khilazhev wrote: > PostgreSQL shows very bad results in YCSB Workload A (50% SELECT and 50% > UPDATE of random row by PK) on benchmarking with big number of clients using > Zipfian distribution. MySQL also has decline but it is not significant as it > is in PostgreSQL. MongoDB does not have decline at all. And if pgbench would > have Zipfian distribution random number generator, everyone will be able to > make research on this topic without using YCSB. I've been thinking about this again, in light of my recent work to improve the B-Tree code by making all entries have fully unique keys, adding suffix truncation, and making better choices about split points [1]. Somebody posted a patch that fixed the issue by making the heavyweight lock manager lock a value rather than a heap TID within the last year. I can't find that thread right now -- perhaps someone can point it out now. I suspect that my patch will have a similar effect to this other patch I'm thinking of with the Zipfian workload, though it will do so without touching anything outside of the B-Tree code, and without changing the user-visible semantics of locking. Can we possibly find a way to rerun this Zipfian experiment/test case? I bet Alexander's recent B-Tree patch will also help. Here is why I suspect my patch will help with the Zipfian workload, particularly on a multi-socket machine: The logic for following downlinks in Postgres is that downlinks are a lower bound, but we still cannot follow them when the key is equal to our insertion scankey -- we must go left, even when we have all keys filled out. This means that we sometimes end up holding an exclusive buffer lock on "the first leaf page the value could be on", even though that page doesn't have any such values (they're all in later pages, to the left). So we accidentally lock-out insertions of the value 1 when we insert the value 2, and of the value 2 when we insert the value 3, and of the value 3 when we insert the value 4, and so on down the line. This is the worse possible thing for the Zipfian test case, I think, since most of the contention is artificial. I think that my patch may ameliorate the problem, because: * We make the tie-breaker heap TID attribute have DESC sort order, so generally speaking contention starts on the leftmost page among leaf pages full of duplicates -- for reads, and for writes. * The patch works very hard to make sure that large groups of duplicates are not spread across pages. There would be no mixing of leaf pages containing the value 1 and the value 2 with the Zipfian test case, for example. Old duplicates would be packed into full pages. * We can invent a fake lower-than-any-real-value heap TID in the insertion scankey in certain cases that don't have one. This way, the scan key as a whole is *not* equal to pivot tuples that are at least equal on non-heap-TID attributes, so we *can* follow a downlink that is "equal" to our scankey, provided it has a truncated-away heap TID attribute value. In short, the artificial "false sharing" that we see in the B-Tree for the Zipfian test case may go away. There will be *no* contention between an inserter (or really UPDATE-er) that inserts the value 2 with concurrent inserters of the value 1. All of the ways in which that was likely to happen are each fixed by the patch. There is still buffer lock contention on heap pages, and the contention of waiting for a row lock. Maybe the busy waiting will automatically be fixed, though, since you have one point of contention for each of the really hot values at the far left of the leaf level. [1] https://commitfest.postgresql.org/20/1787/ -- Peter Geoghegan
Re: [HACKERS] [WIP] Zipfian distribution in pgbench
Thank you for all, pushed Fabien COELHO wrote: Note that the patch may interact with other patches which add functions to pgbench, so might need a rebase depending on the order in which the patch are applied. Attached a minor rebase after 16827d4424. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/
Re: [HACKERS] [WIP] Zipfian distribution in pgbench
Note that the patch may interact with other patches which add functions to pgbench, so might need a rebase depending on the order in which the patch are applied. Attached a minor rebase after 16827d4424. -- Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index f6e93c3..f119ae6 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -1093,6 +1093,14 @@ pgbench options d an integer between 1 and 10 + random_zipfian(lb, ub, parameter) + integer + Zipfian-distributed random integer in [lb, ub], + see below + random_zipfian(1, 10, 1.5) + an integer between 1 and 10 + + sqrt(x) double square root @@ -1173,6 +1181,27 @@ f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) / of the Box-Muller transform. + + + random_zipfian generates an approximated bounded zipfian + distribution. For parameter in (0, 1), an + approximated algorithm is taken from + "Quickly Generating Billion-Record Synthetic Databases", + Jim Gray et al, SIGMOD 1994. For parameter + in (1, 1000), a rejection method is used, based on + "Non-Uniform Random Variate Generation", Luc Devroye, p. 550-551, + Springer 1986. The distribution is not defined when the parameter's + value is 1.0. The drawing performance is poor for parameter values + close and above 1.0 and on a small range. + + + parameter + defines how skewed the distribution is. The larger the parameter, the more + frequently values to the beginning of the interval are drawn. + The closer to 0 parameter is, + the flatter (more uniform) the access distribution. + + diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y index b3a2d9b..25d5ad4 100644 --- a/src/bin/pgbench/exprparse.y +++ b/src/bin/pgbench/exprparse.y @@ -191,6 +191,9 @@ static const struct { "random_exponential", 3, PGBENCH_RANDOM_EXPONENTIAL }, + { + "random_zipfian", 3, PGBENCH_RANDOM_ZIPFIAN + }, /* keep as last array element */ { NULL, 0, 0 diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index bd96eae..7ce6f60 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -95,7 +95,10 @@ static int pthread_join(pthread_t th, void **thread_return); #define LOG_STEP_SECONDS 5 /* seconds between log messages */ #define DEFAULT_NXACTS 10 /* default nxacts */ +#define ZIPF_CACHE_SIZE 15 /* cache cells number */ + #define MIN_GAUSSIAN_PARAM 2.0 /* minimum parameter for gauss */ +#define MAX_ZIPFIAN_PARAM 1000 /* maximum parameter for zipfian */ int nxacts = 0; /* number of transactions per client */ int duration = 0; /* duration in seconds */ @@ -331,6 +334,35 @@ typedef struct } CState; /* + * Cache cell for zipfian_random call + */ +typedef struct +{ + /* cell keys */ + double s;/* s - parameter of zipfan_random function */ + int64 n;/* number of elements in range (max - min + 1) */ + + double harmonicn; /* generalizedHarmonicNumber(n, s) */ + double alpha; + double beta; + double eta; + + uint64 last_used; /* last used logical time */ +} ZipfCell; + +/* + * Zipf cache for zeta values + */ +typedef struct +{ + uint64 current; /* counter for LRU cache replacement algorithm */ + + int nb_cells; /* number of filled cells */ + int overflowCount; /* number of cache overflows */ + ZipfCell cells[ZIPF_CACHE_SIZE]; +} ZipfCache; + +/* * Thread state */ typedef struct @@ -342,6 +374,8 @@ typedef struct unsigned short random_state[3]; /* separate randomness for each thread */ int64 throttle_trigger; /* previous/next throttling (us) */ FILE *logfile; /* where to log, or NULL */ + ZipfCache zipf_cache; /* for thread-safe zipfian random number + * generation */ /* per thread collected stats */ instr_time start_time; /* thread start time */ @@ -746,6 +780,137 @@ getPoissonRand(TState *thread, int64 center) return (int64) (-log(uniform) * ((double) center) + 0.5); } +/* helper function for getZipfianRand */ +static double +generalizedHarmonicNumber(int64 n, double s) +{ + int i; + double ans = 0.0; + + for (i = n; i > 1; i--) + ans += pow(i, -s); + return ans + 1.0; +} + +/* set harmonicn and other parameters to cache cell */ +static void +zipfSetCacheCell(ZipfCell * cell, int64 n, double s) +{ + double harmonic2; + + cell->n = n; + cell->s = s; + + harmonic2 = generalizedHarmonicNumber(2, s); + cell->harmonicn = generalizedHarmonicNumber(n, s); + + cell->alpha = 1.0 / (1.0 - s); + cell->beta = pow(0.5, s); + cell->eta = (1.0 - pow(2.0 / n, 1.0 - s)) / (1.0 - harmonic2 / cell->harmonicn); +} + +/* + * search for cache cell with keys (n, s) + * and create new cell if it does not exist + */ +static ZipfCell * +zipfFindOrCreateCacheCell(ZipfCache * cache, int64 n, double s) +{ + int i, +
Re: [HACKERS] [WIP] Zipfian distribution in pgbench
Hello Fabien, Sorry for not respondingfor long time. > Two typos: > - "usinng" -> "using" > - "a rejection method used" -> "a rejection method is used" > > I'm not sure of "least_recently_used_i", this naming style is not used in > pgbench. "least_recently_used" would be ok. > > "..nb_cells.. != ZIPF_CACHE_SIZE", ISTM that "<" is more logical, > even if the result is the same? Fixed. > Maybe the cache overflow could be counted, to allow for a possible warning > message in the final report? Also done. But one question: Should it be done by printing to stdout or stderr ? > When/if the pgbench tap test patch get through, coverage tests should > be added. I have added few tests and I am not sure if they OK or not. It was my first experience with perl and tests in PostgreSQL. pgbench-zipf-09v.patch Description: Binary data — Thanks and Regards, Alik Khilazhev Postgres Professional: http://www.postgrespro.com The Russian Postgres Company