Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2018-10-08 Thread Peter Geoghegan
On Fri, Jul 7, 2017 at 12:45 AM Alik Khilazhev
 wrote:
> PostgreSQL shows very bad results in YCSB Workload A (50% SELECT and 50% 
> UPDATE of random row by PK) on benchmarking with big number of clients using 
> Zipfian distribution. MySQL also has decline but it is not significant as it 
> is in PostgreSQL. MongoDB does not have decline at all. And if pgbench would 
> have Zipfian distribution random number generator, everyone will be able to 
> make research on this topic without using YCSB.

I've been thinking about this again, in light of my recent work to
improve the B-Tree code by making all entries have fully unique keys,
adding suffix truncation, and making better choices about split points
[1].

Somebody posted a patch that fixed the issue by making the heavyweight
lock manager lock a value rather than a heap TID within the last year.
I can't find that thread right now -- perhaps someone can point it out
now. I suspect that my patch will have a similar effect to this other
patch I'm thinking of with the Zipfian workload, though it will do so
without touching anything outside of the B-Tree code, and without
changing the user-visible semantics of locking.

Can we possibly find a way to rerun this Zipfian experiment/test case?
I bet Alexander's recent B-Tree patch will also help.

Here is why I suspect my patch will help with the Zipfian workload,
particularly on a multi-socket machine:

The logic for following downlinks in Postgres is that downlinks are a
lower bound, but we still cannot follow them when the key is equal to
our insertion scankey -- we must go left, even when we have all keys
filled out. This means that we sometimes end up holding an exclusive
buffer lock on "the first leaf page the value could be on", even
though that page doesn't have any such values (they're all in later
pages, to the left). So we accidentally lock-out insertions of the
value 1 when we insert the value 2, and of the value 2 when we insert
the value 3, and of the value 3 when we insert the value 4, and so on
down the line. This is the worse possible thing for the Zipfian test
case, I think, since most of the contention is artificial.

I think that my patch may ameliorate the problem, because:

* We make the tie-breaker heap TID attribute have DESC sort order, so
generally speaking contention starts on the leftmost page among leaf
pages full of duplicates -- for reads, and for writes.

* The patch works very hard to make sure that large groups of
duplicates are not spread across pages. There would be no mixing of
leaf pages containing the value 1 and the value 2 with the Zipfian
test case, for example. Old duplicates would be packed into full
pages.

* We can invent a fake lower-than-any-real-value heap TID in the
insertion scankey in certain cases that don't have one. This way, the
scan key as a whole is *not* equal to pivot tuples that are at least
equal on non-heap-TID attributes, so we *can* follow a downlink that
is "equal" to our scankey, provided it has a truncated-away heap TID
attribute value.

In short, the artificial "false sharing" that we see in the B-Tree for
the Zipfian test case may go away. There will be *no* contention
between an inserter (or really UPDATE-er) that inserts the value 2
with concurrent inserters of the value 1. All of the ways in which
that was likely to happen are each fixed by the patch. There is still
buffer lock contention on heap pages, and the contention of waiting
for a row lock. Maybe the busy waiting will automatically be fixed,
though, since you have one point of contention for each of the really
hot values at the far left of the leaf level.

[1] https://commitfest.postgresql.org/20/1787/
--
Peter Geoghegan



Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-12-14 Thread Teodor Sigaev

Thank you for all, pushed

Fabien COELHO wrote:


Note that the patch may interact with other patches which add functions to 
pgbench, so might need a rebase depending on the order in which the patch are 
applied.


Attached a minor rebase after 16827d4424.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/



Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-11-21 Thread Fabien COELHO


Note that the patch may interact with other patches which add functions to 
pgbench, so might need a rebase depending on the order in which the patch are 
applied.


Attached a minor rebase after 16827d4424.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index f6e93c3..f119ae6 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -1093,6 +1093,14 @@ pgbench  options  d
an integer between 1 and 10
   
   
+   random_zipfian(lb, ub, parameter)
+   integer
+   Zipfian-distributed random integer in [lb, ub],
+  see below
+   random_zipfian(1, 10, 1.5)
+   an integer between 1 and 10
+  
+  
sqrt(x)
double
square root
@@ -1173,6 +1181,27 @@ f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) /
   of the Box-Muller transform.
  
 
+
+ 
+  random_zipfian generates an approximated bounded zipfian
+  distribution. For parameter in (0, 1), an
+  approximated algorithm is taken from
+  "Quickly Generating Billion-Record Synthetic Databases",
+  Jim Gray et al, SIGMOD 1994. For parameter
+  in (1, 1000), a rejection method is used, based on
+  "Non-Uniform Random Variate Generation", Luc Devroye, p. 550-551,
+  Springer 1986. The distribution is not defined when the parameter's
+  value is 1.0. The drawing performance is poor for parameter values
+  close and above 1.0 and on a small range.
+ 
+ 
+  parameter
+  defines how skewed the distribution is. The larger the parameter, the more
+  frequently values to the beginning of the interval are drawn.
+  The closer to 0 parameter is,
+  the flatter (more uniform) the access distribution.
+ 
+

 
   
diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y
index b3a2d9b..25d5ad4 100644
--- a/src/bin/pgbench/exprparse.y
+++ b/src/bin/pgbench/exprparse.y
@@ -191,6 +191,9 @@ static const struct
 	{
 		"random_exponential", 3, PGBENCH_RANDOM_EXPONENTIAL
 	},
+	{
+		"random_zipfian", 3, PGBENCH_RANDOM_ZIPFIAN
+	},
 	/* keep as last array element */
 	{
 		NULL, 0, 0
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index bd96eae..7ce6f60 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -95,7 +95,10 @@ static int	pthread_join(pthread_t th, void **thread_return);
 #define LOG_STEP_SECONDS	5	/* seconds between log messages */
 #define DEFAULT_NXACTS	10		/* default nxacts */
 
+#define ZIPF_CACHE_SIZE	15		/* cache cells number */
+
 #define MIN_GAUSSIAN_PARAM		2.0 /* minimum parameter for gauss */
+#define MAX_ZIPFIAN_PARAM		1000	/* maximum parameter for zipfian */
 
 int			nxacts = 0;			/* number of transactions per client */
 int			duration = 0;		/* duration in seconds */
@@ -331,6 +334,35 @@ typedef struct
 } CState;
 
 /*
+ * Cache cell for zipfian_random call
+ */
+typedef struct
+{
+	/* cell keys */
+	double		s;/* s - parameter of zipfan_random function */
+	int64		n;/* number of elements in range (max - min + 1) */
+
+	double		harmonicn;		/* generalizedHarmonicNumber(n, s) */
+	double		alpha;
+	double		beta;
+	double		eta;
+
+	uint64		last_used;		/* last used logical time */
+} ZipfCell;
+
+/*
+ * Zipf cache for zeta values
+ */
+typedef struct
+{
+	uint64		current;		/* counter for LRU cache replacement algorithm */
+
+	int			nb_cells;		/* number of filled cells */
+	int			overflowCount;	/* number of cache overflows */
+	ZipfCell	cells[ZIPF_CACHE_SIZE];
+} ZipfCache;
+
+/*
  * Thread state
  */
 typedef struct
@@ -342,6 +374,8 @@ typedef struct
 	unsigned short random_state[3]; /* separate randomness for each thread */
 	int64		throttle_trigger;	/* previous/next throttling (us) */
 	FILE	   *logfile;		/* where to log, or NULL */
+	ZipfCache	zipf_cache;		/* for thread-safe  zipfian random number
+ * generation */
 
 	/* per thread collected stats */
 	instr_time	start_time;		/* thread start time */
@@ -746,6 +780,137 @@ getPoissonRand(TState *thread, int64 center)
 	return (int64) (-log(uniform) * ((double) center) + 0.5);
 }
 
+/* helper function for getZipfianRand */
+static double
+generalizedHarmonicNumber(int64 n, double s)
+{
+	int			i;
+	double		ans = 0.0;
+
+	for (i = n; i > 1; i--)
+		ans += pow(i, -s);
+	return ans + 1.0;
+}
+
+/* set harmonicn and other parameters to cache cell */
+static void
+zipfSetCacheCell(ZipfCell * cell, int64 n, double s)
+{
+	double		harmonic2;
+
+	cell->n = n;
+	cell->s = s;
+
+	harmonic2 = generalizedHarmonicNumber(2, s);
+	cell->harmonicn = generalizedHarmonicNumber(n, s);
+
+	cell->alpha = 1.0 / (1.0 - s);
+	cell->beta = pow(0.5, s);
+	cell->eta = (1.0 - pow(2.0 / n, 1.0 - s)) / (1.0 - harmonic2 / cell->harmonicn);
+}
+
+/*
+ * search for cache cell with keys (n, s)
+ * and create new cell if it does not exist
+ */
+static ZipfCell *
+zipfFindOrCreateCacheCell(ZipfCache * cache, int64 n, double s)
+{
+	int			i,
+	

Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-11-19 Thread Alik Khilazhev
Hello Fabien,

Sorry for not respondingfor long time. 

> Two typos:
> - "usinng" -> "using"
> - "a rejection method used" -> "a rejection method is used"
> 
> I'm not sure of "least_recently_used_i", this naming style is not used in 
> pgbench. "least_recently_used" would be ok.
> 
> "..nb_cells.. != ZIPF_CACHE_SIZE", ISTM that "<" is more logical,
> even if the result is the same?

Fixed.

> Maybe the cache overflow could be counted, to allow for a possible warning 
> message in the final report?
Also done. But one question: Should it be done by printing to stdout or stderr ?

> When/if the pgbench tap test patch get through, coverage tests should
> be added.

I have added few tests and I am not sure if they OK or not. It was my first 
experience with perl and tests in PostgreSQL. 


pgbench-zipf-09v.patch
Description: Binary data

—
Thanks and Regards,
Alik Khilazhev
Postgres Professional:
http://www.postgrespro.com
The Russian Postgres Company