Please find attached 2 patches, which are a split of the patch discussed
in this thread.
(A) add gaussian & exponential options to pgbench \setrandom
the patch includes sql test files.
There is no change in the *code* from previous already reviewed
submissions, so I do not think that it needs another review on that
account.
However I have (yet again) reworked the *documentation* (for Andres Freund
& Robert Haas), in particular both descriptions now follow the same
structure (introduction, formula, intuition, rule of thumb and
constraint). I have differentiated the concept and the option by putting
the later in <literal> tags, and added a link to the corresponding
wikipedia pages.
Please bear in mind that:
1. English is not my native language.
2. this is not easy reading... this is maths, to read slowly:-)
3. word smithing contributions are welcome.
I assume somehow that a user interested in gaussian & exponential
distributions must know a little bit about probabilities...
(B) add pgbench test variants with gauss & exponential.
I have reworked the patch so as to avoid copy pasting the 3 test cases, as
requested by Andres Freund, thus this is new, although quite simple, code.
I have also added explanations in the documentation about how to interpret
the "decile" outputs, so as to hopefully address Robert Haas comments.
--
Fabien.
diff --git a/contrib/pgbench/README b/contrib/pgbench/README
new file mode 100644
index 0000000..6881256
--- /dev/null
+++ b/contrib/pgbench/README
@@ -0,0 +1,5 @@
+# gaussian and exponential tests
+# with XXX as "expo" or "gauss"
+psql test < test-init.sql
+./pgbench -M prepared -f test-XXX-run.sql -t 1000000 -P 1 -n test
+psql test < test-XXX-check.sql
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 4aa8a50..a80c0a5 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -41,6 +41,7 @@
#include <math.h>
#include <signal.h>
#include <sys/time.h>
+#include <assert.h>
#ifdef HAVE_SYS_SELECT_H
#include <sys/select.h>
#endif
@@ -98,6 +99,8 @@ static int pthread_join(pthread_t th, void **thread_return);
#define LOG_STEP_SECONDS 5 /* seconds between log messages */
#define DEFAULT_NXACTS 10 /* default nxacts */
+#define MIN_GAUSSIAN_THRESHOLD 2.0 /* minimum threshold for gauss */
+
int nxacts = 0; /* number of transactions per client */
int duration = 0; /* duration in seconds */
@@ -471,6 +474,76 @@ getrand(TState *thread, int64 min, int64 max)
return min + (int64) ((max - min + 1) * pg_erand48(thread->random_state));
}
+/*
+ * random number generator: exponential distribution from min to max inclusive.
+ * the threshold is so that the density of probability for the last cut-off max
+ * value is exp(-exp_threshold).
+ */
+static int64
+getExponentialrand(TState *thread, int64 min, int64 max, double exp_threshold)
+{
+ double cut, uniform, rand;
+ assert(exp_threshold > 0.0);
+ cut = exp(-exp_threshold);
+ /* erand in [0, 1), uniform in (0, 1] */
+ uniform = 1.0 - pg_erand48(thread->random_state);
+ /*
+ * inner expresion in (cut, 1] (if exp_threshold > 0),
+ * rand in [0, 1)
+ */
+ assert((1.0 - cut) != 0.0);
+ rand = - log(cut + (1.0 - cut) * uniform) / exp_threshold;
+ /* return int64 random number within between min and max */
+ return min + (int64)((max - min + 1) * rand);
+}
+
+/* random number generator: gaussian distribution from min to max inclusive */
+static int64
+getGaussianrand(TState *thread, int64 min, int64 max, double stdev_threshold)
+{
+ double stdev;
+ double rand;
+
+ /*
+ * Get user specified random number from this loop, with
+ * -stdev_threshold < stdev <= stdev_threshold
+ *
+ * This loop is executed until the number is in the expected range.
+ *
+ * As the minimum threshold is 2.0, the probability of looping is low:
+ * sqrt(-2 ln(r)) <= 2 => r >= e^{-2} ~ 0.135, then when taking the average
+ * sinus multiplier as 2/pi, we have a 8.6% looping probability in the
+ * worst case. For a 5.0 threshold value, the looping probability
+ * is about e^{-5} * 2 / pi ~ 0.43%.
+ */
+ do
+ {
+ /*
+ * pg_erand48 generates [0,1), but for the basic version of the
+ * Box-Muller transform the two uniformly distributed random numbers
+ * are expected in (0, 1] (see http://en.wikipedia.org/wiki/Box_muller)
+ */
+ double rand1 = 1.0 - pg_erand48(thread->random_state);
+ double rand2 = 1.0 - pg_erand48(thread->random_state);
+
+ /* Box-Muller basic form transform */
+ double var_sqrt = sqrt(-2.0 * log(rand1));
+ stdev = var_sqrt * sin(2.0 * M_PI * rand2);
+
+ /*
+ * we may try with cos, but there may be a bias induced if the previous
+ * value fails the test? To be on the safe side, let us try over.
+ */
+ }
+ while (stdev < -stdev_threshold || stdev >= stdev_threshold);
+
+ /* stdev is in [-threshold, threshold), normalization to [0,1) */
+ rand = (stdev + stdev_threshold) / (stdev_threshold * 2.0);
+
+ /* return int64 random number within between min and max */
+ return min + (int64)((max - min + 1) * rand);
+}
+
/* call PQexec() and exit() on failure */
static void
executeStatement(PGconn *con, const char *sql)
@@ -1319,6 +1392,7 @@ top:
char *var;
int64 min,
max;
+ double threshold = 0;
char res[64];
if (*argv[2] == ':')
@@ -1364,11 +1438,11 @@ top:
}
/*
- * getrand() needs to be able to subtract max from min and add one
- * to the result without overflowing. Since we know max > min, we
- * can detect overflow just by checking for a negative result. But
- * we must check both that the subtraction doesn't overflow, and
- * that adding one to the result doesn't overflow either.
+ * Generate random number functions need to be able to subtract
+ * max from min and add one to the result without overflowing.
+ * Since we know max > min, we can detect overflow just by checking
+ * for a negative result. But we must check both that the subtraction
+ * doesn't overflow, and that adding one to the result doesn't overflow either.
*/
if (max - min < 0 || (max - min) + 1 < 0)
{
@@ -1377,10 +1451,63 @@ top:
return true;
}
+ if (argc == 4) /* uniform */
+ {
#ifdef DEBUG
- printf("min: " INT64_FORMAT " max: " INT64_FORMAT " random: " INT64_FORMAT "\n", min, max, getrand(thread, min, max));
+ printf("min: " INT64_FORMAT " max: " INT64_FORMAT " random: " INT64_FORMAT "\n", min, max, getrand(thread, min, max));
#endif
- snprintf(res, sizeof(res), INT64_FORMAT, getrand(thread, min, max));
+ snprintf(res, sizeof(res), INT64_FORMAT, getrand(thread, min, max));
+ }
+ else if ((pg_strcasecmp(argv[4], "gaussian") == 0) ||
+ (pg_strcasecmp(argv[4], "exponential") == 0))
+ {
+ if (*argv[5] == ':')
+ {
+ if ((var = getVariable(st, argv[5] + 1)) == NULL)
+ {
+ fprintf(stderr, "%s: invalid threshold number %s\n", argv[0], argv[5]);
+ st->ecnt++;
+ return true;
+ }
+ threshold = strtod(var, NULL);
+ }
+ else
+ threshold = strtod(argv[5], NULL);
+
+ if (pg_strcasecmp(argv[4], "gaussian") == 0)
+ {
+ if (threshold < MIN_GAUSSIAN_THRESHOLD)
+ {
+ fprintf(stderr, "%s: gaussian threshold must be more than %f\n,", argv[5], MIN_GAUSSIAN_THRESHOLD);
+ st->ecnt++;
+ return true;
+ }
+#ifdef DEBUG
+ printf("min: " INT64_FORMAT " max: " INT64_FORMAT " random: " INT64_FORMAT "\n", min, max, getGaussianrand(thread, min, max, threshold));
+#endif
+ snprintf(res, sizeof(res), INT64_FORMAT, getGaussianrand(thread, min, max, threshold));
+ }
+ else if (pg_strcasecmp(argv[4], "exponential") == 0)
+ {
+ if (threshold <= 0.0)
+ {
+ fprintf(stderr, "%s: exponential threshold must be strictly positive\n,", argv[5]);
+ st->ecnt++;
+ return true;
+ }
+#ifdef DEBUG
+ printf("min: " INT64_FORMAT " max: " INT64_FORMAT " random: " INT64_FORMAT "\n", min, max, getExponentialrand(thread, min, max, threshold));
+#endif
+ snprintf(res, sizeof(res), INT64_FORMAT, getExponentialrand(thread, min, max, threshold));
+ }
+ }
+ else /* uniform with extra arguments */
+ {
+#ifdef DEBUG
+ printf("min: " INT64_FORMAT " max: " INT64_FORMAT " random: " INT64_FORMAT "\n", min, max, getrand(thread, min, max));
+#endif
+ snprintf(res, sizeof(res), INT64_FORMAT, getrand(thread, min, max));
+ }
if (!putVariable(st, argv[0], argv[1], res))
{
@@ -1920,9 +2047,34 @@ process_commands(char *buf)
exit(1);
}
- for (j = 4; j < my_commands->argc; j++)
- fprintf(stderr, "%s: extra argument \"%s\" ignored\n",
- my_commands->argv[0], my_commands->argv[j]);
+ if (my_commands->argc == 4 ) /* uniform */
+ {
+ /* nothing to do */
+ }
+ else if ((pg_strcasecmp(my_commands->argv[4], "gaussian") == 0) ||
+ (pg_strcasecmp(my_commands->argv[4], "exponential") == 0))
+ {
+ if (my_commands->argc < 6)
+ {
+ fprintf(stderr, "%s(%s): missing argument\n", my_commands->argv[0], my_commands->argv[4]);
+ exit(1);
+ }
+
+ for (j = 6; j < my_commands->argc; j++)
+ fprintf(stderr, "%s(%s): extra argument \"%s\" ignored\n",
+ my_commands->argv[0], my_commands->argv[4], my_commands->argv[j]);
+ }
+ else /* uniform with extra argument */
+ {
+ int arg_pos = 4;
+
+ if (pg_strcasecmp(my_commands->argv[4], "uniform") == 0)
+ arg_pos++;
+
+ for (j = arg_pos; j < my_commands->argc; j++)
+ fprintf(stderr, "%s(uniform): extra argument \"%s\" ignored\n",
+ my_commands->argv[0], my_commands->argv[j]);
+ }
}
else if (pg_strcasecmp(my_commands->argv[0], "set") == 0)
{
diff --git a/contrib/pgbench/test-expo-check.sql b/contrib/pgbench/test-expo-check.sql
new file mode 100644
index 0000000..fbf35fd
--- /dev/null
+++ b/contrib/pgbench/test-expo-check.sql
@@ -0,0 +1,14 @@
+-- val, min, max, threshold
+CREATE OR REPLACE FUNCTION
+expoProba(INTEGER, INTEGER, INTEGER, DOUBLE PRECISION)
+RETURNS DOUBLE PRECISION IMMUTABLE STRICT AS $$
+ SELECT (exp(-$4*($1-$2)/($3-$2+1)) - exp(-$4*($1-$2+1)/($3-$2+1))) /
+ (1.0 - exp(-$4));
+$$ LANGUAGE SQL;
+
+SELECT SUM(cnt) FROM pgbench_dist;
+
+SELECT id, 1.0*cnt/SUM(cnt) OVER(), expoProba(id, 0, 99, 10.0)
+FROM pgbench_dist
+ORDER BY id;
+
diff --git a/contrib/pgbench/test-expo-run.sql b/contrib/pgbench/test-expo-run.sql
new file mode 100644
index 0000000..1d476bc
--- /dev/null
+++ b/contrib/pgbench/test-expo-run.sql
@@ -0,0 +1,2 @@
+\setrandom id 0 99 exponential 10.0
+UPDATE pgbench_dist SET cnt=cnt+1 WHERE id = :id;
diff --git a/contrib/pgbench/test-gauss-check.sql b/contrib/pgbench/test-gauss-check.sql
new file mode 100644
index 0000000..7d56117
--- /dev/null
+++ b/contrib/pgbench/test-gauss-check.sql
@@ -0,0 +1,57 @@
+-- approximation with maximal error of 1.2 10E-07, as told from
+-- https://en.wikipedia.org/wiki/Error_function#Numerical_approximation
+CREATE OR REPLACE FUNCTION erf(x DOUBLE PRECISION)
+RETURNS DOUBLE PRECISION IMMUTABLE STRICT AS $$
+DECLARE
+ t DOUBLE PRECISION := 1.0 / ( 1.0 + 0.5 * ABS(x));
+ tau DOUBLE PRECISION;
+BEGIN
+ IF ABS(x) >= 6.0 THEN
+ -- avoid underflow error
+ tau := 0.0;
+ ELSE
+ -- use approximation
+ tau := t * exp(-x*x - 1.26551223
+ + t * (1.00002368
+ + t * (0.37409196
+ + t * (0.09678418
+ + t * (-0.18628806
+ + t * (0.27886807
+ + t * (-1.13520398
+ + t * (1.48851587
+ + t * (-0.82215223
+ + t * 0.17087277)))))))));
+ END IF;
+ IF x >= 0 THEN
+ RETURN 1.0 - tau;
+ ELSE
+ RETURN tau - 1.0;
+ END IF;
+END;
+$$ LANGUAGE plpgsql;
+
+CREATE OR REPLACE FUNCTION PHI(DOUBLE PRECISION)
+RETURNS DOUBLE PRECISION IMMUTABLE STRICT AS $$
+ SELECT 0.5 * ( 1.0 + erf( $1 / SQRT(2.0) ) );
+$$ LANGUAGE SQL;
+
+CREATE OR REPLACE FUNCTION
+gaussianProba(i INTEGER, mini INTEGER, maxi INTEGER, threshold DOUBLE PRECISION)
+RETURNS DOUBLE PRECISION IMMUTABLE STRICT AS $$
+DECLARE
+ extent DOUBLE PRECISION;
+ mu DOUBLE PRECISION;
+BEGIN
+ extent := maxi - mini + 1.0;
+ mu := 0.5 * (maxi + mini);
+ RETURN (PHI(2.0 * threshold * (i - mini - mu + 0.5) / extent) -
+ PHI(2.0 * threshold * (i - mini - mu - 0.5) / extent))
+ -- truncated gaussian
+ / ( 2.0 * PHI(threshold) - 1.0 );
+END;
+$$ LANGUAGE plpgsql;
+
+SELECT SUM(cnt) FROM pgbench_dist;
+SELECT id, 1.0*cnt/SUM(cnt) OVER(), gaussianProba(id, 0, 99, 2.0)
+FROM pgbench_dist
+ORDER BY id;
diff --git a/contrib/pgbench/test-gauss-run.sql b/contrib/pgbench/test-gauss-run.sql
new file mode 100644
index 0000000..984a3b4
--- /dev/null
+++ b/contrib/pgbench/test-gauss-run.sql
@@ -0,0 +1,2 @@
+\setrandom id 0 99 gaussian 2.0
+UPDATE pgbench_dist SET cnt=cnt+1 WHERE id = :id;
diff --git a/contrib/pgbench/test-init.sql b/contrib/pgbench/test-init.sql
new file mode 100644
index 0000000..84f7cc9
--- /dev/null
+++ b/contrib/pgbench/test-init.sql
@@ -0,0 +1,4 @@
+DROP TABLE IF EXISTS pgbench_dist;
+CREATE UNLOGGED TABLE pgbench_dist(id SERIAL PRIMARY KEY, cnt INTEGER NOT NULL DEFAULT 0);
+INSERT INTO pgbench_dist(id, cnt)
+ SELECT i, 0 FROM generate_series(0, 99) AS i;
diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml
index f264c24..d6c49d4 100644
--- a/doc/src/sgml/pgbench.sgml
+++ b/doc/src/sgml/pgbench.sgml
@@ -748,8 +748,8 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
<varlistentry>
<term>
- <literal>\setrandom <replaceable>varname</> <replaceable>min</> <replaceable>max</></literal>
- </term>
+ <literal>\setrandom <replaceable>varname</> <replaceable>min</> <replaceable>max</> [ uniform | [ { gaussian | exponential } <replaceable>threshold</> ] ]</literal>
+ </term>
<listitem>
<para>
@@ -761,9 +761,75 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
</para>
<para>
+ The default random distribution is <literal>uniform</>, that is all
+ values in the range are drawn with equal probability.
+ The <literal>gaussian</> and <literal>exponential</> options allow to
+ change this default, with a mandatory <replaceable>threshold</> double
+ value to control the actual distribution.
+ </para>
+
+ <para>
+ <!-- introduction -->
+ With the <literal>gaussian</> option, the interval is mapped onto a
+ standard <ulink url="http://en.wikipedia.org/wiki/Normal_distribution">normal distribution</ulink>
+ (the classical bell-shaped gaussian curve) truncated at
+ <literal>-threshold</> on the left and <literal>+threshold</>
+ on the right.
+ <!-- formula -->
+ To be precise, if <literal>PHI(x)</> is the cumulative distribution
+ function of the standard normal distribution, with mean <literal>mu</>
+ defined as <literal>(max + min) / 2.0</>, then value <replaceable>i</>
+ between <replaceable>min</> and <replaceable>max</> inclusive is drawn
+ with probability:
+ <literal>
+ (PHI(2.0 * threshold * (i - min - mu + 0.5) / (max - min + 1)) -
+ PHI(2.0 * threshold * (i - min - mu - 0.5) / (max - min + 1))) /
+ (2.0 * PHI(threshold) - 1.0)
+ </>
+ <!-- intuition -->
+ The larger the <replaceable>threshold</>, the more frequently values
+ close to the middle of the interval are drawn, and the less frequently
+ values close to the <replaceable>min</> and <replaceable>max</> bounds.
+ <!-- rule of thumb -->
+ With a gaussian distribution, about 67% of values are drawn from
+ the middle <literal>1.0 / threshold</> and 95% in the middle
+ <literal>2.0 / threshold</>.
+ <!-- constraint -->
+ The minimum <replaceable>threshold</> is 2.0 for performance of
+ the Box-Muller transform.
+ </para>
+
+ <para>
+ <!-- introduction -->
+ With the <literal>exponential</> option, the <replaceable>threshold</>
+ parameter controls the distribution by truncating a quickly-decreasing
+ <ulink url="http://en.wikipedia.org/wiki/Exponential_distribution">exponential distribution</ulink>
+ at <replaceable>threshold</>, and then projecting onto integers between
+ the bounds.
+ <!-- formula -->
+ To be precise, value <replaceable>i</> between <replaceable>min</> and
+ <replaceable>max</> inclusive is drawn with probability:
+ <literal>(exp(-threshold*(i-min)/(max+1-min)) -
+ exp(-threshold*(i+1-min)/(max+1-min))) / (1.0 - exp(-threshold))</>.
+ <!-- intuition -->
+ Intuitively, the larger the <replaceable>threshold</>, the more
+ frequently values close to <replaceable>min</> are accessed, and the
+ less frequently values close to <replaceable>max</> are accessed.
+ The closer to 0 the threshold, the flatter (more uniform) the access
+ distribution.
+ <!-- rule of thumb -->
+ A crude approximation of the distribution is that the most frequent 1%
+ values in the range, close to <replaceable>min</>, are drawn
+ <replaceable>threshold</>% of the time.
+ <!-- constraint -->
+ The <replaceable>threshold</> value must be strictly positive with the
+ <literal>exponential</> option.
+ </para>
+
+ <para>
Example:
<programlisting>
-\setrandom aid 1 :naccounts
+\setrandom aid 1 :naccounts gaussian 5.0
</programlisting></para>
</listitem>
</varlistentry>
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index a80c0a5..6622d5b 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -174,6 +174,11 @@ bool is_connect; /* establish connection for each transaction */
bool is_latencies; /* report per-command latencies */
int main_pid; /* main process id used in log filename */
+/* gaussian/exponential distribution tests */
+double threshold; /* threshold for gaussian or exponential */
+bool use_gaussian = false;
+bool use_exponential = false;
+
char *pghost = "";
char *pgport = "";
char *login = NULL;
@@ -295,11 +300,11 @@ static int num_commands = 0; /* total number of Command structs */
static int debug = 0; /* debug flag */
/* default scenario */
-static char *tpc_b = {
+static char *tpc_b_fmt = {
"\\set nbranches " CppAsString2(nbranches) " * :scale\n"
"\\set ntellers " CppAsString2(ntellers) " * :scale\n"
"\\set naccounts " CppAsString2(naccounts) " * :scale\n"
- "\\setrandom aid 1 :naccounts\n"
+ "\\setrandom aid 1 :naccounts%s\n"
"\\setrandom bid 1 :nbranches\n"
"\\setrandom tid 1 :ntellers\n"
"\\setrandom delta -5000 5000\n"
@@ -313,11 +318,11 @@ static char *tpc_b = {
};
/* -N case */
-static char *simple_update = {
+static char *simple_update_fmt = {
"\\set nbranches " CppAsString2(nbranches) " * :scale\n"
"\\set ntellers " CppAsString2(ntellers) " * :scale\n"
"\\set naccounts " CppAsString2(naccounts) " * :scale\n"
- "\\setrandom aid 1 :naccounts\n"
+ "\\setrandom aid 1 :naccounts%s\n"
"\\setrandom bid 1 :nbranches\n"
"\\setrandom tid 1 :ntellers\n"
"\\setrandom delta -5000 5000\n"
@@ -329,9 +334,9 @@ static char *simple_update = {
};
/* -S case */
-static char *select_only = {
+static char *select_only_fmt = {
"\\set naccounts " CppAsString2(naccounts) " * :scale\n"
- "\\setrandom aid 1 :naccounts\n"
+ "\\setrandom aid 1 :naccounts%s\n"
"SELECT abalance FROM pgbench_accounts WHERE aid = :aid;\n"
};
@@ -378,6 +383,8 @@ usage(void)
" -v, --vacuum-all vacuum all four standard tables before tests\n"
" --aggregate-interval=NUM aggregate data over NUM seconds\n"
" --sampling-rate=NUM fraction of transactions to log (e.g. 0.01 for 1%%)\n"
+ " --exponential=NUM exponential distribution with NUM threshold parameter\n"
+ " --gaussian=NUM gaussian distribution with NUM threshold parameter\n"
"\nCommon options:\n"
" -d, --debug print debugging output\n"
" -h, --host=HOSTNAME database server host or socket directory\n"
@@ -477,36 +484,36 @@ getrand(TState *thread, int64 min, int64 max)
/*
* random number generator: exponential distribution from min to max inclusive.
* the threshold is so that the density of probability for the last cut-off max
- * value is exp(-exp_threshold).
+ * value is exp(-threshold).
*/
static int64
-getExponentialrand(TState *thread, int64 min, int64 max, double exp_threshold)
+getExponentialrand(TState *thread, int64 min, int64 max, double threshold)
{
double cut, uniform, rand;
- assert(exp_threshold > 0.0);
- cut = exp(-exp_threshold);
+ assert(threshold > 0.0);
+ cut = exp(-threshold);
/* erand in [0, 1), uniform in (0, 1] */
uniform = 1.0 - pg_erand48(thread->random_state);
/*
- * inner expresion in (cut, 1] (if exp_threshold > 0),
+ * inner expresion in (cut, 1] (if threshold > 0),
* rand in [0, 1)
*/
assert((1.0 - cut) != 0.0);
- rand = - log(cut + (1.0 - cut) * uniform) / exp_threshold;
+ rand = - log(cut + (1.0 - cut) * uniform) / threshold;
/* return int64 random number within between min and max */
return min + (int64)((max - min + 1) * rand);
}
/* random number generator: gaussian distribution from min to max inclusive */
static int64
-getGaussianrand(TState *thread, int64 min, int64 max, double stdev_threshold)
+getGaussianrand(TState *thread, int64 min, int64 max, double threshold)
{
double stdev;
double rand;
/*
* Get user specified random number from this loop, with
- * -stdev_threshold < stdev <= stdev_threshold
+ * -threshold < stdev <= threshold
*
* This loop is executed until the number is in the expected range.
*
@@ -535,10 +542,10 @@ getGaussianrand(TState *thread, int64 min, int64 max, double stdev_threshold)
* value fails the test? To be on the safe side, let us try over.
*/
}
- while (stdev < -stdev_threshold || stdev >= stdev_threshold);
+ while (stdev < -threshold || stdev >= threshold);
/* stdev is in [-threshold, threshold), normalization to [0,1) */
- rand = (stdev + stdev_threshold) / (stdev_threshold * 2.0);
+ rand = (stdev + threshold) / (threshold * 2.0);
/* return int64 random number within between min and max */
return min + (int64)((max - min + 1) * rand);
@@ -2330,6 +2337,18 @@ process_builtin(char *tb)
return my_commands;
}
+/*
+ * compute the probability of the truncated exponential random generation
+ * to draw values in the i-th slot of the range.
+ */
+static double exponentialProbability(int i, int slots, double threshold)
+{
+ assert(1 <= i && i <= slots);
+ return (exp(- threshold * (i - 1) / slots) - exp(- threshold * i / slots)) /
+ (1.0 - exp(- threshold));
+}
+
+
/* print out results */
static void
printResults(int ttype, int64 normal_xacts, int nclients,
@@ -2341,7 +2360,7 @@ printResults(int ttype, int64 normal_xacts, int nclients,
double time_include,
tps_include,
tps_exclude;
- char *s;
+ char *s, *d;
time_include = INSTR_TIME_GET_DOUBLE(total_time);
tps_include = normal_xacts / time_include;
@@ -2357,8 +2376,45 @@ printResults(int ttype, int64 normal_xacts, int nclients,
else
s = "Custom query";
- printf("transaction type: %s\n", s);
+ if (use_gaussian)
+ d = "Gaussian distribution ";
+ else if (use_exponential)
+ d = "Exponential distribution ";
+ else
+ d = ""; /* default uniform case */
+
+ printf("transaction type: %s%s\n", d, s);
printf("scaling factor: %d\n", scale);
+
+ /* output in gaussian distribution benchmark */
+ if (use_gaussian)
+ {
+ int i;
+ printf("pgbench_account's aid selected with a truncated gaussian distribution\n");
+ printf("standard deviation threshold: %.5f\n", threshold);
+ printf("decile percents:");
+ for (i = 2; i <= 20; i = i + 2)
+ printf(" %.1f%%", (double) 50 * (erf (threshold * (1 - 0.1 * (i - 2)) / sqrt(2.0)) -
+ erf (threshold * (1 - 0.1 * i) / sqrt(2.0))) /
+ erf (threshold / sqrt(2.0)));
+ printf("\n");
+ }
+ /* output in exponential distribution benchmark */
+ else if (use_exponential)
+ {
+ int i;
+ printf("pgbench_account's aid selected with a truncated exponential distribution\n");
+ printf("exponential threshold: %.5f\n", threshold);
+ printf("decile percents:");
+ for (i = 1; i <= 10; i++)
+ printf(" %.1f%%",
+ 100.0 * exponentialProbability(i, 10, threshold));
+ printf("\n");
+ printf("probability of fist/last percent of the range: %.1f%% %.1f%%\n",
+ 100.0 * exponentialProbability(1, 100, threshold),
+ 100.0 * exponentialProbability(100, 100, threshold));
+ }
+
printf("query mode: %s\n", QUERYMODE[querymode]);
printf("number of clients: %d\n", nclients);
printf("number of threads: %d\n", nthreads);
@@ -2489,6 +2545,8 @@ main(int argc, char **argv)
{"unlogged-tables", no_argument, &unlogged_tables, 1},
{"sampling-rate", required_argument, NULL, 4},
{"aggregate-interval", required_argument, NULL, 5},
+ {"gaussian", required_argument, NULL, 6},
+ {"exponential", required_argument, NULL, 7},
{"rate", required_argument, NULL, 'R'},
{NULL, 0, NULL, 0}
};
@@ -2769,6 +2827,25 @@ main(int argc, char **argv)
}
#endif
break;
+ case 6:
+ use_gaussian = true;
+ threshold = atof(optarg);
+ if(threshold < MIN_GAUSSIAN_THRESHOLD)
+ {
+ fprintf(stderr, "--gaussian=NUM must be more than %f: %f\n",
+ MIN_GAUSSIAN_THRESHOLD, threshold);
+ exit(1);
+ }
+ break;
+ case 7:
+ use_exponential = true;
+ threshold = atof(optarg);
+ if(threshold <= 0.0)
+ {
+ fprintf(stderr, "--exponential=NUM must be more than 0.0\n");
+ exit(1);
+ }
+ break;
default:
fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
exit(1);
@@ -2966,6 +3043,17 @@ main(int argc, char **argv)
}
}
+ /* set :threshold variable */
+ if(getVariable(&state[0], "threshold") == NULL)
+ {
+ snprintf(val, sizeof(val), "%lf", threshold);
+ for (i = 0; i < nclients; i++)
+ {
+ if (!putVariable(&state[i], "startup", "threshold", val))
+ exit(1);
+ }
+ }
+
if (!is_no_vacuum)
{
fprintf(stderr, "starting vacuum...");
@@ -2988,25 +3076,24 @@ main(int argc, char **argv)
srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time));
/* process builtin SQL scripts */
- switch (ttype)
- {
- case 0:
- sql_files[0] = process_builtin(tpc_b);
- num_files = 1;
- break;
-
- case 1:
- sql_files[0] = process_builtin(select_only);
- num_files = 1;
- break;
-
- case 2:
- sql_files[0] = process_builtin(simple_update);
- num_files = 1;
- break;
-
- default:
- break;
+ if (ttype < 3)
+ {
+ char *fmt, *distribution, *queries;
+ int ret;
+ fmt = (ttype == 0)? tpc_b_fmt:
+ (ttype == 1)? select_only_fmt:
+ (ttype == 2)? simple_update_fmt: NULL;
+ assert(fmt != NULL);
+ distribution =
+ use_gaussian? " gaussian :threshold":
+ use_exponential? " exponential :threshold":
+ "" /* default uniform case */ ;
+ queries = pg_malloc(strlen(fmt) + strlen(distribution) + 1);
+ ret = sprintf(queries, fmt, distribution);
+ assert(ret >= 0);
+ sql_files[0] = process_builtin(queries);
+ num_files = 1;
+ pg_free(queries);
}
/* set up thread data structures */
diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml
index d6c49d4..d217f90 100644
--- a/doc/src/sgml/pgbench.sgml
+++ b/doc/src/sgml/pgbench.sgml
@@ -307,6 +307,49 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
</varlistentry>
<varlistentry>
+ <term><option>--exponential</option><replaceable>threshold</></term>
+ <listitem>
+ <para>
+ Run exponential distribution pgbench test using this threshold parameter.
+ The threshold controls the distribution of access frequency on the
+ <structname>pgbench_accounts</> table.
+ See the <literal>\setrandom</> documentation below for details about
+ the impact of the threshold value.
+ When set, this option applies to all test variants (<option>-N</> for
+ skipping updates, or <option>-S</> for selects).
+ </para>
+
+ <para>
+ When run, the output is expanded to show the distribution
+ depending on the <replaceable>threshold</> value:
+
+<screen>
+...
+pgbench_account's aid selected with a truncated exponential distribution
+exponential threshold: 5.00000
+decile percents: 39.6% 24.0% 14.6% 8.8% 5.4% 3.3% 2.0% 1.2% 0.7% 0.4%
+probability of fist/last percent of the range: 4.9% 0.0%
+...
+</screen>
+
+ The figures are to be interpreted as follows.
+ If the scaling factor is 10, there are 1,000,000 accounts in
+ <literal>pgbench_accounts</>.
+ The first decile, with <literal>aid</> from 1 to 100,000, is
+ drawn 39.6% of the time, that is about 4 times more than average.
+ The second decile, from 100,001 to 200,000 is drawn 24.0% of the time,
+ that is 2.4 times more than average.
+ Up to the last decile, from 900,001 to 1,000,000, which is drawn
+ 0.4% of the time, well below average.
+ Moreover, the first percent of the range, that is <literal>aid</>
+ from 1 to 10,000, is drawn 4.9% of the time, this 4.9 times more
+ than average, and the last percent, with <literal>aid</>
+ from 990,001 to 1,000,000, is drawn less than 0.1% of the time.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><option>-f</option> <replaceable>filename</></term>
<term><option>--file=</option><replaceable>filename</></term>
<listitem>
@@ -320,6 +363,44 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
</varlistentry>
<varlistentry>
+ <term><option>--gaussian</option><replaceable>threshold</></term>
+ <listitem>
+ <para>
+ Run gaussian distribution pgbench test using this threshold parameter.
+ The threshold controls the distribution of access frequency on the
+ <structname>pgbench_accounts</> table.
+ See the <literal>\setrandom</> documentation below for details about
+ the impact of the threshold value.
+ When set, this option applies to all test variants (<option>-N</> for
+ skipping updates, or <option>-S</> for selects).
+ </para>
+
+ <para>
+ When run, the output is expanded to show the distribution
+ depending on the <replaceable>threshold</> value:
+
+<screen>
+...
+pgbench_account's aid selected with a truncated gaussian distribution
+standard deviation threshold: 5.00000
+decile percents: 0.0% 0.1% 2.1% 13.6% 34.1% 34.1% 13.6% 2.1% 0.1% 0.0%
+...
+</screen>
+
+ The figures are to be interpreted as follows.
+ If the scaling factor is 10, there are 1,000,000 accounts in
+ <literal>pgbench_accounts</>.
+ The first decile, with <literal>aid</> from 1 to 100,000, is
+ drawn less than 0.1% of the time.
+ The second, from 100,001 to 200,000 is drawn about 0.1% of the time...
+ up to the fifth decile, from 400,001 to 500,000, which
+ is drawn 34.1% of the time, about 3.4 times more thn average,
+ and then the gaussian curve is symmetric for the last five deciles.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><option>-j</option> <replaceable>threads</></term>
<term><option>--jobs=</option><replaceable>threads</></term>
<listitem>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers