Hello Fabien-san,

I have checked your v13 patch, and tested the new exponential distribution
generating algorithm. It works fine and less or no overhead than previous
version.
Great work! And I agree with your proposal.

And I'm also interested in your "decile percents" output like under
followings,

> [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --exponential=20
> ~
> decile percents: 86.5% 11.7% 1.6% 0.2% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
> ~
> [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --exponential=10
> ~
> decile percents: 63.2% 23.3% 8.6% 3.1% 1.2% 0.4% 0.2% 0.1% 0.0% 0.0%
> ~
> [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --exponential=5
> ~
> decile percents: 39.6% 24.0% 14.6% 8.8% 5.4% 3.3% 2.0% 1.2% 0.7% 0.4%
> ~

I think that it is easy to understand exponential distribution when I check
the exponential parameter. I also agree with it. So I create decile
percents output
 in gaussian distribution.
Here are the examples.

> [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --gaussian=20
> ~
> decile percents: 0.0% 0.0% 0.0% 0.0% 50.0% 50.0% 0.0% 0.0% 0.0% 0.0%
> ~
> [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --gaussian=10
> ~
> decile percents: 0.0% 0.0% 0.0% 2.3% 47.7% 47.7% 2.3% 0.0% 0.0% 0.0%
> ~
> [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --gaussian=5
> ~
> decile percents: 0.0% 0.1% 2.1% 13.6% 34.1% 34.1% 13.6% 2.1% 0.1% 0.0%

I think that it is easier than before. Sum of decile percents is just 100%.


However, I don't prefer "highest/lowest percentage" because it will be
confused
 with decile percentage for users, and anyone cannot understand this
digits.

Here is example when sets exponential=5,
> [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --exponential=5
> ~
> decile percents: 39.6% 24.0% 14.6% 8.8% 5.4% 3.3% 2.0% 1.2% 0.7% 0.4%
> highest/lowest percent of the range: 4.9% 0.0%
> ~

I cannot understand "4.9%, 0.0%" when I see the first time.
Then, I checked the source code, I understood it:( It's not good design...
#Why this parameter use 100?
So I'd like to remove it if you like. It will be more simple.

Attached patch is fixed version, please confirm it.
#Of course, World Cup is being held now. I'm not hurry at all.

Best regards,
-- 
Mitsumasa KONDO
*** a/contrib/pgbench/pgbench.c
--- b/contrib/pgbench/pgbench.c
***************
*** 41,46 ****
--- 41,47 ----
  #include <math.h>
  #include <signal.h>
  #include <sys/time.h>
+ #include <assert.h>
  #ifdef HAVE_SYS_SELECT_H
  #include <sys/select.h>
  #endif
***************
*** 98,103 **** static int	pthread_join(pthread_t th, void **thread_return);
--- 99,106 ----
  #define LOG_STEP_SECONDS	5	/* seconds between log messages */
  #define DEFAULT_NXACTS	10		/* default nxacts */
  
+ #define MIN_GAUSSIAN_THRESHOLD		2.0	/* minimum threshold for gauss */
+ 
  int			nxacts = 0;			/* number of transactions per client */
  int			duration = 0;		/* duration in seconds */
  
***************
*** 171,176 **** bool		is_connect;			/* establish connection for each transaction */
--- 174,187 ----
  bool		is_latencies;		/* report per-command latencies */
  int			main_pid;			/* main process id used in log filename */
  
+ /* gaussian distribution tests: */
+ double		stdev_threshold;   /* standard deviation threshold */
+ bool        use_gaussian = false;
+ 
+ /* exponential distribution tests: */
+ double		exp_threshold;   /* threshold for exponential */
+ bool		use_exponential = false;
+ 
  char	   *pghost = "";
  char	   *pgport = "";
  char	   *login = NULL;
***************
*** 332,337 **** static char *select_only = {
--- 343,430 ----
  	"SELECT abalance FROM pgbench_accounts WHERE aid = :aid;\n"
  };
  
+ /* --exponential case */
+ static char *exponential_tpc_b = {
+ 	"\\set nbranches " CppAsString2(nbranches) " * :scale\n"
+ 	"\\set ntellers " CppAsString2(ntellers) " * :scale\n"
+ 	"\\set naccounts " CppAsString2(naccounts) " * :scale\n"
+ 	"\\setrandom aid 1 :naccounts exponential :exp_threshold\n"
+ 	"\\setrandom bid 1 :nbranches\n"
+ 	"\\setrandom tid 1 :ntellers\n"
+ 	"\\setrandom delta -5000 5000\n"
+ 	"BEGIN;\n"
+ 	"UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;\n"
+ 	"SELECT abalance FROM pgbench_accounts WHERE aid = :aid;\n"
+ 	"UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;\n"
+ 	"UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;\n"
+ 	"INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);\n"
+ 	"END;\n"
+ };
+ 
+ /* --exponential with -N case */
+ static char *exponential_simple_update = {
+ 	"\\set nbranches " CppAsString2(nbranches) " * :scale\n"
+ 	"\\set ntellers " CppAsString2(ntellers) " * :scale\n"
+ 	"\\set naccounts " CppAsString2(naccounts) " * :scale\n"
+ 	"\\setrandom aid 1 :naccounts exponential :exp_threshold\n"
+ 	"\\setrandom bid 1 :nbranches\n"
+ 	"\\setrandom tid 1 :ntellers\n"
+ 	"\\setrandom delta -5000 5000\n"
+ 	"BEGIN;\n"
+ 	"UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;\n"
+ 	"SELECT abalance FROM pgbench_accounts WHERE aid = :aid;\n"
+ 	"INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);\n"
+ 	"END;\n"
+ };
+ 
+ /* --exponential with -S case */
+ static char *exponential_select_only = {
+ 	"\\set naccounts " CppAsString2(naccounts) " * :scale\n"
+ 	"\\setrandom aid 1 :naccounts exponential :exp_threshold\n"
+ 	"SELECT abalance FROM pgbench_accounts WHERE aid = :aid;\n"
+ };
+ 
+ /* --gaussian case */
+ static char *gaussian_tpc_b = {
+ 	"\\set nbranches " CppAsString2(nbranches) " * :scale\n"
+ 	"\\set ntellers " CppAsString2(ntellers) " * :scale\n"
+ 	"\\set naccounts " CppAsString2(naccounts) " * :scale\n"
+ 	"\\setrandom aid 1 :naccounts gaussian :stdev_threshold\n"
+ 	"\\setrandom bid 1 :nbranches\n"
+ 	"\\setrandom tid 1 :ntellers\n"
+ 	"\\setrandom delta -5000 5000\n"
+ 	"BEGIN;\n"
+ 	"UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;\n"
+ 	"SELECT abalance FROM pgbench_accounts WHERE aid = :aid;\n"
+ 	"UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;\n"
+ 	"UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;\n"
+ 	"INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);\n"
+ 	"END;\n"
+ };
+ 
+ /* --gaussian with -N case */
+ static char *gaussian_simple_update = {
+ 	"\\set nbranches " CppAsString2(nbranches) " * :scale\n"
+ 	"\\set ntellers " CppAsString2(ntellers) " * :scale\n"
+ 	"\\set naccounts " CppAsString2(naccounts) " * :scale\n"
+ 	"\\setrandom aid 1 :naccounts gaussian :stdev_threshold\n"
+ 	"\\setrandom bid 1 :nbranches\n"
+ 	"\\setrandom tid 1 :ntellers\n"
+ 	"\\setrandom delta -5000 5000\n"
+ 	"BEGIN;\n"
+ 	"UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;\n"
+ 	"SELECT abalance FROM pgbench_accounts WHERE aid = :aid;\n"
+ 	"INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);\n"
+ 	"END;\n"
+ };
+ 
+ /* --gaussian with -S case */
+ static char *gaussian_select_only = {
+ 	"\\set naccounts " CppAsString2(naccounts) " * :scale\n"
+ 	"\\setrandom aid 1 :naccounts gaussian :stdev_threshold\n"
+ 	"SELECT abalance FROM pgbench_accounts WHERE aid = :aid;\n"
+ };
+ 
  /* Function prototypes */
  static void setalarm(int seconds);
  static void *threadRun(void *arg);
***************
*** 375,380 **** usage(void)
--- 468,475 ----
  		   "  -v, --vacuum-all         vacuum all four standard tables before tests\n"
  		   "  --aggregate-interval=NUM aggregate data over NUM seconds\n"
  		   "  --sampling-rate=NUM      fraction of transactions to log (e.g. 0.01 for 1%%)\n"
+ 		   "  --exponential=NUM        exponential distribution with NUM threshold parameter\n"
+ 		   "  --gaussian=NUM           gaussian distribution with NUM threshold parameter\n"
  		   "\nCommon options:\n"
  		   "  -d, --debug              print debugging output\n"
  	  "  -h, --host=HOSTNAME      database server host or socket directory\n"
***************
*** 471,476 **** getrand(TState *thread, int64 min, int64 max)
--- 566,641 ----
  	return min + (int64) ((max - min + 1) * pg_erand48(thread->random_state));
  }
  
+ /* 
+  * random number generator: exponential distribution from min to max inclusive.
+  * the threshold is so that the density of probability for the last cut-off max
+  * value is exp(-exp_threshold).
+  */
+ static int64
+ getExponentialrand(TState *thread, int64 min, int64 max, double exp_threshold)
+ {
+ 	double cut, uniform, rand;
+ 	assert(exp_threshold > 0.0);
+ 	cut = exp(-exp_threshold);
+ 	/* erand in [0, 1), uniform in (0, 1] */
+ 	uniform = 1.0 - pg_erand48(thread->random_state);
+ 	/*
+ 	 * inner expresion in (cut, 1] (if exp_threshold > 0),
+ 	 * rand in [0, 1)
+ 	 */
+ 	assert((1.0 - cut) != 0.0);
+ 	rand = - log(cut + (1.0 - cut) * uniform) / exp_threshold;
+ 	/* return int64 random number within between min and max */
+ 	return min + (int64)((max - min + 1) * rand);
+ }
+ 
+ /* random number generator: gaussian distribution from min to max inclusive */
+ static int64
+ getGaussianrand(TState *thread, int64 min, int64 max, double stdev_threshold)
+ {
+ 	double		stdev;
+ 	double		rand;
+ 
+ 	/*
+ 	 * Get user specified random number from this loop, with
+ 	 * -stdev_threshold < stdev <= stdev_threshold
+ 	 *
+ 	 * This loop is executed until the number is in the expected range.
+ 	 *
+ 	 * As the minimum threshold is 2.0, the probability of looping is low:
+ 	 * sqrt(-2 ln(r)) <= 2 => r >= e^{-2} ~ 0.135, then when taking the average
+ 	 * sinus multiplier as 2/pi, we have a 8.6% looping probability in the
+ 	 * worst case. For a 5.0 threshold value, the looping proability
+ 	 * is about e^{-5} * 2 / pi ~ 0.43%.
+ 	 */
+ 	do
+ 	{
+ 		/*
+ 		 * pg_erand48 generates [0,1), but for the basic version of the
+ 		 * Box-Muller transform the two uniformly distributed random numbers
+ 		 * are expected in (0, 1] (see http://en.wikipedia.org/wiki/Box_muller)
+ 		 */
+ 		double rand1 = 1.0 - pg_erand48(thread->random_state);
+ 		double rand2 = 1.0 - pg_erand48(thread->random_state);
+ 
+ 		/* Box-Muller basic form transform */
+ 		double var_sqrt = sqrt(-2.0 * log(rand1));
+ 		stdev = var_sqrt * sin(2.0 * M_PI * rand2);
+ 
+ 		/* 
+  		 * we may try with cos, but there may be a bias induced if the previous
+ 		 * value fails the test? To be on the safe side, let us try over.
+ 		 */
+ 	}
+ 	while (stdev < -stdev_threshold || stdev >= stdev_threshold);
+ 
+ 	/* stdev is in [-threshold, threshold), normalization to [0,1) */
+ 	rand = (stdev + stdev_threshold) / (stdev_threshold * 2.0);
+ 
+ 	/* return int64 random number within between min and max */
+ 	return min + (int64)((max - min + 1) * rand);
+ }
+ 
  /* call PQexec() and exit() on failure */
  static void
  executeStatement(PGconn *con, const char *sql)
***************
*** 1319,1324 **** top:
--- 1484,1490 ----
  			char	   *var;
  			int64		min,
  						max;
+ 			double		threshold = 0;
  			char		res[64];
  
  			if (*argv[2] == ':')
***************
*** 1364,1374 **** top:
  			}
  
  			/*
! 			 * getrand() needs to be able to subtract max from min and add one
! 			 * to the result without overflowing.  Since we know max > min, we
! 			 * can detect overflow just by checking for a negative result. But
! 			 * we must check both that the subtraction doesn't overflow, and
! 			 * that adding one to the result doesn't overflow either.
  			 */
  			if (max - min < 0 || (max - min) + 1 < 0)
  			{
--- 1530,1540 ----
  			}
  
  			/*
! 			 * Generate random number functions need to be able to subtract
! 			 * max from min and add one to the result without overflowing.
! 			 * Since we know max > min, we can detect overflow just by checking
! 			 * for a negative result. But we must check both that the subtraction
! 			 * doesn't overflow, and that adding one to the result doesn't overflow either.
  			 */
  			if (max - min < 0 || (max - min) + 1 < 0)
  			{
***************
*** 1377,1386 **** top:
  				return true;
  			}
  
  #ifdef DEBUG
! 			printf("min: " INT64_FORMAT " max: " INT64_FORMAT " random: " INT64_FORMAT "\n", min, max, getrand(thread, min, max));
  #endif
! 			snprintf(res, sizeof(res), INT64_FORMAT, getrand(thread, min, max));
  
  			if (!putVariable(st, argv[0], argv[1], res))
  			{
--- 1543,1605 ----
  				return true;
  			}
  
+ 			if (argc == 4) /* uniform */
+ 			{
  #ifdef DEBUG
! 				printf("min: " INT64_FORMAT " max: " INT64_FORMAT " random: " INT64_FORMAT "\n", min, max, getrand(thread, min, max));
  #endif
! 				snprintf(res, sizeof(res), INT64_FORMAT, getrand(thread, min, max));
! 			}
! 			else if ((pg_strcasecmp(argv[4], "gaussian") == 0) ||
! 				 (pg_strcasecmp(argv[4], "exponential") == 0))
! 			{
! 				if (*argv[5] == ':')
! 				{
! 					if ((var = getVariable(st, argv[5] + 1)) == NULL)
! 					{
! 						fprintf(stderr, "%s: invalid threshold number %s\n", argv[0], argv[5]);
! 						st->ecnt++;
! 						return true;
! 					}
! 					threshold = strtod(var, NULL);
! 				}
! 				else
! 					threshold = strtod(argv[5], NULL);
! 
! 				if (pg_strcasecmp(argv[4], "gaussian") == 0)
! 				{
! 					if (threshold < MIN_GAUSSIAN_THRESHOLD)
! 					{
! 						fprintf(stderr, "%s: gaussian threshold must be more than %f\n,", argv[5], MIN_GAUSSIAN_THRESHOLD);
! 						st->ecnt++;
! 						return true;
! 					}
! #ifdef DEBUG
! 					printf("min: " INT64_FORMAT " max: " INT64_FORMAT " random: " INT64_FORMAT "\n", min, max, getGaussianrand(thread, min, max, threshold));
! #endif
! 					snprintf(res, sizeof(res), INT64_FORMAT, getGaussianrand(thread, min, max, threshold));
! 				}
! 				else if (pg_strcasecmp(argv[4], "exponential") == 0)
! 				{
! 					if (threshold <= 0.0)
! 					{
! 						fprintf(stderr, "%s: exponential threshold must be strictly positive\n,", argv[5]);
! 						st->ecnt++;
! 						return true;
! 					}
! #ifdef DEBUG
! 					printf("min: " INT64_FORMAT " max: " INT64_FORMAT " random: " INT64_FORMAT "\n", min, max, getExponentialrand(thread, min, max, threshold));
! #endif
! 					snprintf(res, sizeof(res), INT64_FORMAT, getExponentialrand(thread, min, max, threshold));
! 				}
! 			}
! 			else /* uniform with extra arguments */
! 			{
! #ifdef DEBUG
! 				printf("min: " INT64_FORMAT " max: " INT64_FORMAT " random: " INT64_FORMAT "\n", min, max, getrand(thread, min, max));
! #endif
! 				snprintf(res, sizeof(res), INT64_FORMAT, getrand(thread, min, max));
! 			}
  
  			if (!putVariable(st, argv[0], argv[1], res))
  			{
***************
*** 1920,1928 **** process_commands(char *buf)
  				exit(1);
  			}
  
! 			for (j = 4; j < my_commands->argc; j++)
! 				fprintf(stderr, "%s: extra argument \"%s\" ignored\n",
! 						my_commands->argv[0], my_commands->argv[j]);
  		}
  		else if (pg_strcasecmp(my_commands->argv[0], "set") == 0)
  		{
--- 2139,2172 ----
  				exit(1);
  			}
  
! 			if (my_commands->argc == 4 ) /* uniform */
! 			{
! 				/* nothing to do */
! 			}
! 			else if ((pg_strcasecmp(my_commands->argv[4], "gaussian") == 0) ||
! 				 (pg_strcasecmp(my_commands->argv[4], "exponential") == 0))
! 			{
! 				if (my_commands->argc < 6)
! 				{
! 					fprintf(stderr, "%s(%s): missing argument\n", my_commands->argv[0], my_commands->argv[4]);
! 					exit(1);
! 				}
! 
! 				for (j = 6; j < my_commands->argc; j++)
! 					fprintf(stderr, "%s(%s): extra argument \"%s\" ignored\n",
! 							my_commands->argv[0], my_commands->argv[4], my_commands->argv[j]);
! 			}
! 			else /* uniform with extra argument */
! 			{
! 				int arg_pos = 4;
! 
! 				if (pg_strcasecmp(my_commands->argv[4], "uniform") == 0)
! 					arg_pos++;
! 
! 				for (j = arg_pos; j < my_commands->argc; j++)
! 					fprintf(stderr, "%s(uniform): extra argument \"%s\" ignored\n",
! 							my_commands->argv[0], my_commands->argv[j]);
! 			}
  		}
  		else if (pg_strcasecmp(my_commands->argv[0], "set") == 0)
  		{
***************
*** 2178,2183 **** process_builtin(char *tb)
--- 2422,2439 ----
  	return my_commands;
  }
  
+ /* 
+  * compute the probability of the truncated exponential random generation
+  * to draw values in the i-th slot of the range.
+  */
+ static double exponentialProbability(int i, int slots, double threshold)
+ {
+ 	assert(1 <= i && i <= slots);
+ 	return (exp(- threshold * (i - 1) / slots) - exp(- threshold * i / slots)) /
+ 		(1.0 - exp(- threshold));
+ }
+ 
+ 
  /* print out results */
  static void
  printResults(int ttype, int64 normal_xacts, int nclients,
***************
*** 2197,2212 **** printResults(int ttype, int64 normal_xacts, int nclients,
  						(INSTR_TIME_GET_DOUBLE(conn_total_time) / nthreads));
  
  	if (ttype == 0)
! 		s = "TPC-B (sort of)";
  	else if (ttype == 2)
! 		s = "Update only pgbench_accounts";
  	else if (ttype == 1)
! 		s = "SELECT only";
  	else
  		s = "Custom query";
  
  	printf("transaction type: %s\n", s);
  	printf("scaling factor: %d\n", scale);
  	printf("query mode: %s\n", QUERYMODE[querymode]);
  	printf("number of clients: %d\n", nclients);
  	printf("number of threads: %d\n", nthreads);
--- 2453,2521 ----
  						(INSTR_TIME_GET_DOUBLE(conn_total_time) / nthreads));
  
  	if (ttype == 0)
! 	{
! 		if (use_gaussian)
! 			s = "Gaussian distribution TPC-B (sort of)";
! 		else if (use_exponential)
! 			s = "Exponential distribution TPC-B (sort of)";
! 		else
! 			s = "TPC-B (sort of)";
! 	}
  	else if (ttype == 2)
! 	{
! 		if (use_gaussian)
! 			s = "Gaussian distribution update only pgbench_accounts";
! 		else if (use_exponential)
! 			s = "Exponential distribution update only pgbench_accounts";
! 		else
! 			s = "Update only pgbench_accounts";
! 	}
  	else if (ttype == 1)
! 	{
! 		if (use_gaussian)
! 			s = "Gaussian distribution SELECT only";
! 		else if (use_exponential)
! 			s = "Exponential distribution SELECT only";
! 		else
! 			s = "SELECT only";
! 	}
  	else
  		s = "Custom query";
  
  	printf("transaction type: %s\n", s);
  	printf("scaling factor: %d\n", scale);
+ 
+ 	/* output in gaussian distribution benchmark */
+ 	if (use_gaussian)
+ 	{
+ 		int i;
+ 		printf("standard deviation threshold: %.5f\n", stdev_threshold);
+ 		printf("decile percents:");
+ 		for (i = 2; i <= 20; i = i + 2)
+ 			printf(" %.1f%%", (double) 50 * (erf (stdev_threshold * (1 - 0.1 * (i - 2)) / sqrt(2.0)) -
+ 				erf (stdev_threshold * (1 - 0.1 * i) / sqrt(2.0))) /
+ 				erf (stdev_threshold / sqrt(2.0)));
+ 		printf("\n");
+ //		printf("access probability of top 20%%, 10%% and 5%% records: %.5f %.5f %.5f\n",
+ //			(double) ((erf (stdev_threshold * 0.2 / sqrt(2.0))) / (erf (stdev_threshold / sqrt(2.0)))),
+ //			(double) ((erf (stdev_threshold * 0.1 / sqrt(2.0))) / (erf (stdev_threshold / sqrt(2.0)))),
+ //			(double) ((erf (stdev_threshold * 0.05 / sqrt(2.0))) / (erf (stdev_threshold / sqrt(2.0)))));
+ 	}
+ 	/* output in exponential distribution benchmark */
+ 	else if (use_exponential)
+ 	{
+ 		int i;
+ 		printf("exponential threshold: %.5f\n", exp_threshold);
+ 		printf("decile percents:");
+ 		for (i = 1; i <= 10; i++)
+ 			printf(" %.1f%%",
+ 				   100.0 * exponentialProbability(i, 10, exp_threshold));
+ 		printf("\n");
+ 		printf("highest/lowest percent of the range: %.1f%% %.1f%%\n",
+ 			   100.0 * exponentialProbability(1, 100, exp_threshold),
+ 			   100.0 * exponentialProbability(100, 100, exp_threshold));
+ 	}
+ 
  	printf("query mode: %s\n", QUERYMODE[querymode]);
  	printf("number of clients: %d\n", nclients);
  	printf("number of threads: %d\n", nthreads);
***************
*** 2337,2342 **** main(int argc, char **argv)
--- 2646,2653 ----
  		{"unlogged-tables", no_argument, &unlogged_tables, 1},
  		{"sampling-rate", required_argument, NULL, 4},
  		{"aggregate-interval", required_argument, NULL, 5},
+ 		{"gaussian", required_argument, NULL, 6},
+ 		{"exponential", required_argument, NULL, 7},
  		{"rate", required_argument, NULL, 'R'},
  		{NULL, 0, NULL, 0}
  	};
***************
*** 2617,2622 **** main(int argc, char **argv)
--- 2928,2952 ----
  				}
  #endif
  				break;
+ 			case 6:
+ 				use_gaussian = true;
+ 				stdev_threshold = atof(optarg);
+ 				if(stdev_threshold < MIN_GAUSSIAN_THRESHOLD)
+ 				{
+ 					fprintf(stderr, "--gaussian=NUM must be more than %f: %f\n",
+ 							MIN_GAUSSIAN_THRESHOLD, stdev_threshold);
+ 					exit(1);
+ 				}
+ 				break;
+ 			case 7:
+ 				use_exponential = true;
+ 				exp_threshold = atof(optarg);
+ 				if(exp_threshold <= 0.0)
+ 				{
+ 					fprintf(stderr, "--exponential=NUM must be more 0.0\n");
+ 					exit(1);
+ 				}
+ 				break;
  			default:
  				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
  				exit(1);
***************
*** 2814,2819 **** main(int argc, char **argv)
--- 3144,3171 ----
  		}
  	}
  
+ 	/* set :stdev_threshold variable */
+ 	if(getVariable(&state[0], "stdev_threshold") == NULL)
+ 	{
+ 		snprintf(val, sizeof(val), "%lf", stdev_threshold);
+ 		for (i = 0; i < nclients; i++)
+ 		{
+ 			if (!putVariable(&state[i], "startup", "stdev_threshold", val))
+ 				exit(1);
+ 		}
+ 	}
+ 
+ 	/* set :exp_threshold variable */
+ 	if(getVariable(&state[0], "exp_threshold") == NULL)
+ 	{
+ 		snprintf(val, sizeof(val), "%lf", exp_threshold);
+ 		for (i = 0; i < nclients; i++)
+ 		{
+ 			if (!putVariable(&state[i], "startup", "exp_threshold", val))
+ 				exit(1);
+ 		}
+ 	}
+ 
  	if (!is_no_vacuum)
  	{
  		fprintf(stderr, "starting vacuum...");
***************
*** 2839,2855 **** main(int argc, char **argv)
  	switch (ttype)
  	{
  		case 0:
! 			sql_files[0] = process_builtin(tpc_b);
  			num_files = 1;
  			break;
  
  		case 1:
! 			sql_files[0] = process_builtin(select_only);
  			num_files = 1;
  			break;
  
  		case 2:
! 			sql_files[0] = process_builtin(simple_update);
  			num_files = 1;
  			break;
  
--- 3191,3222 ----
  	switch (ttype)
  	{
  		case 0:
! 			if (use_gaussian)
! 				sql_files[0] = process_builtin(gaussian_tpc_b);
! 			else if (use_exponential)
! 				sql_files[0] = process_builtin(exponential_tpc_b);
! 			else
! 				sql_files[0] = process_builtin(tpc_b);
  			num_files = 1;
  			break;
  
  		case 1:
! 			if (use_gaussian)
! 				sql_files[0] = process_builtin(gaussian_select_only);
! 			else if (use_exponential)
! 				sql_files[0] = process_builtin(exponential_select_only);
! 			else
! 				sql_files[0] = process_builtin(select_only);
  			num_files = 1;
  			break;
  
  		case 2:
! 			if (use_gaussian)
! 				sql_files[0] = process_builtin(gaussian_simple_update);
! 			else if (use_exponential)
! 				sql_files[0] = process_builtin(exponential_simple_update);
! 			else
! 				sql_files[0] = process_builtin(simple_update);
  			num_files = 1;
  			break;
  
*** a/doc/src/sgml/pgbench.sgml
--- b/doc/src/sgml/pgbench.sgml
***************
*** 307,312 **** pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
--- 307,327 ----
       </varlistentry>
  
       <varlistentry>
+       <term><option>--exponential</option><replaceable>threshold</></term>
+       <listitem>
+        <para>
+          Run exponential distribution pgbench test using this threshold parameter.
+          The threshold controls the distribution of access frequency on the
+          <structname>pgbench_accounts</> table.
+          See the <literal>\setrandom</> documentation below for details about
+          the impact of the threshold value.
+          When set, this option applies to all test variants (<option>-N</> for
+          skipping updates, or <option>-S</> for selects).
+        </para>
+       </listitem>
+      </varlistentry>
+ 
+      <varlistentry>
        <term><option>-f</option> <replaceable>filename</></term>
        <term><option>--file=</option><replaceable>filename</></term>
        <listitem>
***************
*** 320,325 **** pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
--- 335,355 ----
       </varlistentry>
  
       <varlistentry>
+       <term><option>--gaussian</option><replaceable>threshold</></term>
+       <listitem>
+        <para>
+          Run gaussian distribution pgbench test using this threshold parameter.
+          The threshold controls the distribution of access frequency on the
+          <structname>pgbench_accounts</> table.
+          See the <literal>\setrandom</> documentation below for details about
+          the impact of the threshold value.
+          When set, this option applies to all test variants (<option>-N</> for
+          skipping updates, or <option>-S</> for selects).
+        </para>
+       </listitem>
+      </varlistentry>
+ 
+      <varlistentry>
        <term><option>-j</option> <replaceable>threads</></term>
        <term><option>--jobs=</option><replaceable>threads</></term>
        <listitem>
***************
*** 748,755 **** pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
  
     <varlistentry>
      <term>
!      <literal>\setrandom <replaceable>varname</> <replaceable>min</> <replaceable>max</></literal>
!     </term>
  
      <listitem>
       <para>
--- 778,785 ----
  
     <varlistentry>
      <term>
!      <literal>\setrandom <replaceable>varname</> <replaceable>min</> <replaceable>max</> [ uniform | [ { gaussian | exponential } <replaceable>threshold</> ] ]</literal>
!      </term>
  
      <listitem>
       <para>
***************
*** 761,769 **** pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
       </para>
  
       <para>
        Example:
  <programlisting>
! \setrandom aid 1 :naccounts
  </programlisting></para>
      </listitem>
     </varlistentry>
--- 791,834 ----
       </para>
  
       <para>
+       The default random distribution is uniform. The gaussian and exponential
+       options allow to change the distribution. The mandatory
+       <replaceable>threshold</> double value controls the actual distribution.
+      </para>
+ 
+      <para>
+       With the gaussian option, the larger the <replaceable>threshold</>,
+       the more frequently values close to the middle of the interval are drawn,
+       and the less frequently values close to the <replaceable>min</> and
+       <replaceable>max</> bounds.
+       In other worlds, the larger the <replaceable>threshold</>,
+       the narrower the access range around the middle.
+       the smaller the threshold, the smoother the access pattern
+       distribution. The minimum threshold is 2.0 for performance.
+      </para>
+ 
+      <para>
+       With the exponential option, the <replaceable>threshold</> parameter
+       controls the distribution by truncating an exponential distribution at
+       a specific value, and then projecting onto integers between the bounds.
+       To be precise, the <replaceable>threshold</> is so that the density of
+       probability of the exponential distribution at the <replaceable>max</>
+       cut-off value is exp(-threshold), the density at the <replaceable>min</>
+       value being 1.
+       Intuitively, the larger the threshold, the more frequently values close to
+       <replaceable>min</> are accessed, and the less frequently values close to
+       <replaceable>max</> are accessed.
+       A crude approximation of the distribution is that the most frequent 1%
+       values are drawn <replaceable>threshold</>% of the time.
+       The closer to 0.0 the threshold, the flatter (more uniform) the access
+       distribution.
+       The threshold value must be strictly positive with the exponential option.
+      </para>
+ 
+      <para>
        Example:
  <programlisting>
! \setrandom aid 1 :naccounts gaussian 5.0
  </programlisting></para>
      </listitem>
     </varlistentry>
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to