Hello Amit,

  - use search_path to find at most one pgbench_accounts
    It still uses left join because I still think that it is appropriate.
    I added a lateral to avoid repeating the array_position call
    to manage the search_path, and use explicit pg_catalog everywhere.

It would be good if you can add some more comments to explain the
intent of query.

Indeed, I put too few comments on the query.

+ if (ps == NULL)
+ partition_method = PART_NONE;

When can we expect ps as NULL?  If this is not a valid case, then
probably and Assert would be better.

No, ps is really NULL if there is no partitioning, because of the LEFT JOIN and pg_partitioned_table is just empty in that case.

The last else where there is an unexpected entry is different, see comments about v11 below.

+ else if (PQntuples(res) == 0)
+ {
+ /* no pgbench_accounts found, builtin script should fail later */
+ partition_method = PART_NONE;
+ partitions = -1;

If we don't find pgbench_accounts, let's give error here itself rather
than later unless you have a valid case in mind.

I thought of it, but decided not to: Someone could add a builtin script which does not use pgbench_accounts, or a parallel running script could create a table dynamically, whatever, so I prefer the error to be raised by the script itself, rather than deciding that it will fail before even trying.

+ /*
+ * Partition information. Assume no partitioning on any failure, so as
+ * to avoid failing on an older version.
+ */
..
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ /* probably an older version, coldly assume no partitioning */
+ partition_method = PART_NONE;
+ partitions = 0;
+ }

So, here we are silently absorbing the error when pgbench is executed
against older server version which doesn't support partitioning.

Yes, exactly.

If that is the case, then I think if user gives --partitions for the old server version, it will also give an error?

Yes, on -i it will fail because the syntax will not be recognized.

It is not clear in documentation whether we support or not using pgbench with older server versions.

Indeed. We more or less do in practice. Command "psql" works back to 8 AFAICR, and pgbench as well.

I guess it didn't matter, but with this feature, it can matter. Do we need to document this?

This has been discussed in the past, and the conclusion was that it was not worth the effort. We just try not to break things if it is avoidable. On this regard, the patch slightly changes FILLFACTOR output, which is removed if the value is 100 (%) as it is the default, which means that table creation would work on very very old version which did not support fillfactor, unless you specify a lower percentage.

Attached v11:

 - add quite a few comments on the pg_catalog query

 - reverts the partitions >= 1 test; If some new partition method is
   added that pgbench does not know about, the failure mode will be that
   nothing is printed rather than printing something strange like
   "method none with 2 partitions".

--
Fabien.
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index c857aa3cba..e3a0abb4c7 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -306,6 +306,31 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--partitions=<replaceable>NUM</replaceable></option></term>
+      <listitem>
+       <para>
+        Create a partitioned <literal>pgbench_accounts</literal> table with
+        <replaceable>NUM</replaceable> partitions of nearly equal size for
+        the scaled number of accounts.
+        Default is <literal>0</literal>, meaning no partitioning.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--partition-method=<replaceable>NAME</replaceable></option></term>
+      <listitem>
+       <para>
+        Create a partitioned <literal>pgbench_accounts</literal> table with
+        <replaceable>NAME</replaceable> method.
+        Expected values are <literal>range</literal> or <literal>hash</literal>.
+        This option requires that <option>--partitions</option> is set to non-zero.
+        If unspecified, default is <literal>range</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>--tablespace=<replaceable>tablespace</replaceable></option></term>
       <listitem>
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index ed7652bfbf..c07ed42bbb 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -186,6 +186,15 @@ int64		latency_limit = 0;
 char	   *tablespace = NULL;
 char	   *index_tablespace = NULL;
 
+/* partitioning for pgbench_accounts table, 0 for no partitioning, -1 for bad */
+static int 		partitions = 0;
+
+typedef enum { PART_NONE, PART_RANGE, PART_HASH }
+  partition_method_t;
+
+static partition_method_t partition_method = PART_NONE;
+static const char *PARTITION_METHOD[] = { "none", "range", "hash" };
+
 /* random seed used to initialize base_random_sequence */
 int64		random_seed = -1;
 
@@ -617,6 +626,9 @@ usage(void)
 		   "  --foreign-keys           create foreign key constraints between tables\n"
 		   "  --index-tablespace=TABLESPACE\n"
 		   "                           create indexes in the specified tablespace\n"
+		   "  --partitions=NUM         partition pgbench_accounts in NUM parts (default: 0)\n"
+		   "  --partition-method=(range|hash)\n"
+		   "                           partition pgbench_accounts with this method (default: range)\n"
 		   "  --tablespace=TABLESPACE  create tables in the specified tablespace\n"
 		   "  --unlogged-tables        create tables as unlogged tables\n"
 		   "\nOptions to select what to run:\n"
@@ -3601,6 +3613,17 @@ initDropTables(PGconn *con)
 					 "pgbench_tellers");
 }
 
+/*
+ * add fillfactor percent option if not 100.
+ */
+static void
+append_fillfactor(char *opts, int len)
+{
+	if (fillfactor < 100)
+		snprintf(opts + strlen(opts), len - strlen(opts),
+				 " with (fillfactor=%d)", fillfactor);
+}
+
 /*
  * Create pgbench's standard tables
  */
@@ -3664,9 +3687,15 @@ initCreateTables(PGconn *con)
 
 		/* Construct new create table statement. */
 		opts[0] = '\0';
-		if (ddl->declare_fillfactor)
+
+		/* Partition pgbench_accounts table */
+		if (partitions >= 1 && strcmp(ddl->table, "pgbench_accounts") == 0)
 			snprintf(opts + strlen(opts), sizeof(opts) - strlen(opts),
-					 " with (fillfactor=%d)", fillfactor);
+					 " partition by %s (aid)", PARTITION_METHOD[partition_method]);
+		else if (ddl->declare_fillfactor)
+			/* fillfactor is only expected on actual tables */
+			append_fillfactor(opts, sizeof(opts));
+
 		if (tablespace != NULL)
 		{
 			char	   *escape_tablespace;
@@ -3686,6 +3715,57 @@ initCreateTables(PGconn *con)
 
 		executeStatement(con, buffer);
 	}
+
+	/* if needed, pgbench_accounts partitions must be created manually */
+	if (partitions >= 1)
+	{
+		char		ff[64];
+
+		ff[0] = '\0';
+		append_fillfactor(ff, sizeof(ff));
+
+		fprintf(stderr, "creating %d partitions...\n", partitions);
+
+		for (int p = 1; p <= partitions; p++)
+		{
+			char		query[256];
+
+			if (partition_method == PART_RANGE)
+			{
+				int64		part_size = (naccounts * (int64) scale + partitions - 1) / partitions;
+				char		minvalue[32], maxvalue[32];
+
+				/* For RANGE, we use open-ended partitions at the beginning and end */
+				if (p == 1)
+					sprintf(minvalue, "minvalue");
+				else
+					sprintf(minvalue, INT64_FORMAT, (p - 1) * part_size + 1);
+
+				if (p < partitions)
+					sprintf(maxvalue, INT64_FORMAT, p * part_size + 1);
+				else
+					sprintf(maxvalue, "maxvalue");
+
+				snprintf(query, sizeof(query),
+						 "create%s table pgbench_accounts_%d\n"
+						 "  partition of pgbench_accounts\n"
+						 "  for values from (%s) to (%s)%s\n",
+						 unlogged_tables ? " unlogged" : "", p,
+						 minvalue, maxvalue, ff);
+			}
+			else if (partition_method == PART_HASH)
+				snprintf(query, sizeof(query),
+						 "create%s table pgbench_accounts_%d\n"
+						 "  partition of pgbench_accounts\n"
+						 "  for values with (modulus %d, remainder %d)%s\n",
+						 unlogged_tables ? " unlogged" : "", p,
+						 partitions, p-1, ff);
+			else /* cannot get there */
+				Assert(0);
+
+			executeStatement(con, query);
+		}
+	}
 }
 
 /*
@@ -4919,6 +4999,10 @@ printResults(StatsData *total, instr_time total_time,
 	printf("transaction type: %s\n",
 		   num_scripts == 1 ? sql_script[0].desc : "multiple scripts");
 	printf("scaling factor: %d\n", scale);
+	/* only print partitioning information if some partitioning was detected */
+	if (partition_method != PART_NONE)
+		printf("partition method: %s\npartitions: %d\n",
+				PARTITION_METHOD[partition_method], partitions);
 	printf("query mode: %s\n", QUERYMODE[querymode]);
 	printf("number of clients: %d\n", nclients);
 	printf("number of threads: %d\n", nthreads);
@@ -5126,6 +5210,8 @@ main(int argc, char **argv)
 		{"foreign-keys", no_argument, NULL, 8},
 		{"random-seed", required_argument, NULL, 9},
 		{"show-script", required_argument, NULL, 10},
+		{"partitions", required_argument, NULL, 11},
+		{"partition-method", required_argument, NULL, 12},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -5486,6 +5572,29 @@ main(int argc, char **argv)
 					exit(0);
 				}
 				break;
+			case 11:			/* partitions */
+				initialization_option_set = true;
+				partitions = atoi(optarg);
+				if (partitions < 0)
+				{
+					fprintf(stderr, "invalid number of partitions: \"%s\"\n",
+							optarg);
+					exit(1);
+				}
+				break;
+			case 12:			/* partition-method */
+				initialization_option_set = true;
+				if (pg_strcasecmp(optarg, "range") == 0)
+					partition_method = PART_RANGE;
+				else if (pg_strcasecmp(optarg, "hash") == 0)
+					partition_method = PART_HASH;
+				else
+				{
+					fprintf(stderr, "invalid partition method, expecting \"range\" or \"hash\","
+							" got: \"%s\"\n", optarg);
+					exit(1);
+				}
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -5567,6 +5676,16 @@ main(int argc, char **argv)
 			exit(1);
 		}
 
+		if (partitions == 0 && partition_method != PART_NONE)
+		{
+			fprintf(stderr, "--partition-method requires greater than zero --partitions\n");
+			exit(1);
+		}
+
+		/* set default method */
+		if (partitions > 0 && partition_method == PART_NONE)
+			partition_method = PART_RANGE;
+
 		if (initialize_steps == NULL)
 			initialize_steps = pg_strdup(DEFAULT_INIT_STEPS);
 
@@ -5756,6 +5875,61 @@ main(int argc, char **argv)
 			fprintf(stderr,
 					"scale option ignored, using count from pgbench_branches table (%d)\n",
 					scale);
+
+		/*
+		 * Gather partition information from pg_catalog.
+		 *
+		 * We Assume no partitioning on any failure, so as to avoid failing
+		 * on an older version.
+		 */
+		res = PQexec(con,
+					 "select o.n, p.partstrat, pg_catalog.count(p.partrelid) "
+					 /* for all tables */
+					 "from pg_catalog.pg_class as c "
+					 /* get the schema corresponding to the previous table */
+					 "join pg_catalog.pg_namespace as n on (n.oid = c.relnamespace) "
+					 /* get this schema order in search_path */
+					 "cross join lateral (select pg_catalog.array_position(pg_catalog.current_schemas(true), n.nspname)) as o(n) "
+					 /* check whether it is partitionned */
+					 "left join pg_catalog.pg_partitioned_table as p on (p.partrelid = c.oid) "
+					 /* fetch actual partitions which inherits the main table */
+					 "left join pg_catalog.pg_inherits as i on (c.oid = i.inhparent) "
+					 /* check table name and that schema was in search_path */
+					 "where c.relname = 'pgbench_accounts' and o.n is not null "
+					 /* count partitions, possibly 0 */
+					 "group by 1, 2 "
+					 /* and only keep the first encountered */
+					 "order by 1 asc "
+					 "limit 1");
+		if (PQresultStatus(res) != PGRES_TUPLES_OK)
+		{
+			/* probably an older version, coldly assume no partitioning */
+			partition_method = PART_NONE;
+			partitions = 0;
+		}
+		else if (PQntuples(res) == 0)
+		{
+			/* no pgbench_accounts found, builtin script should fail later */
+			partition_method = PART_NONE;
+			partitions = -1;
+		}
+		else
+		{
+			/* PQntupes(res) == 1: normal case, extract the partition status */
+			char *ps = PQgetvalue(res, 0, 1);
+
+			if (ps == NULL)
+				partition_method = PART_NONE;
+			else if (strcmp(ps, "r") == 0)
+				partition_method = PART_RANGE;
+			else if (strcmp(ps, "h") == 0)
+				partition_method = PART_HASH;
+			else /* unexpected partitioning method, ignore it... */
+				partition_method = PART_NONE;
+
+			partitions = atoi(PQgetvalue(res, 0, 2));
+		}
+		PQclear(res);
 	}
 
 	/*
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index b82d3f65c4..fb0f6b677d 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -58,6 +58,17 @@ sub pgbench
 	return;
 }
 
+# tablespace for testing
+my $ts = $node->basedir . '/regress_pgbench_tap_1_ts_dir';
+mkdir $ts or die "cannot create directory $ts";
+my $ets = TestLib::perl2host($ts);
+# add needed escaping!
+$ets =~ s/'/''/;
+
+$node->safe_psql('postgres',
+	"CREATE TABLESPACE regress_pgbench_tap_1_ts LOCATION '$ets';"
+);
+
 # Test concurrent OID generation via pg_enum_oid_index.  This indirectly
 # exercises LWLock and spinlock concurrency.
 my $labels = join ',', map { "'l$_'" } 1 .. 1000;
@@ -100,12 +111,13 @@ pgbench(
 
 # Again, with all possible options
 pgbench(
-	'--initialize --init-steps=dtpvg --scale=1 --unlogged-tables --fillfactor=98 --foreign-keys --quiet --tablespace=pg_default --index-tablespace=pg_default',
+	'--initialize --init-steps=dtpvg --scale=1 --unlogged-tables --fillfactor=98 --foreign-keys --quiet --tablespace=regress_pgbench_tap_1_ts --index-tablespace=regress_pgbench_tap_1_ts --partitions=2 --partition-method=hash',
 	0,
 	[qr{^$}i],
 	[
 		qr{dropping old tables},
 		qr{creating tables},
+		qr{creating 2 partitions},
 		qr{vacuuming},
 		qr{creating primary keys},
 		qr{creating foreign keys},
@@ -116,12 +128,13 @@ pgbench(
 
 # Test interaction of --init-steps with legacy step-selection options
 pgbench(
-	'--initialize --init-steps=dtpvgvv --no-vacuum --foreign-keys --unlogged-tables',
+	'--initialize --init-steps=dtpvgvv --no-vacuum --foreign-keys --unlogged-tables --partitions=3',
 	0,
 	[qr{^$}],
 	[
 		qr{dropping old tables},
 		qr{creating tables},
+		qr{creating 3 partitions},
 		qr{creating primary keys},
 		qr{.* of .* tuples \(.*\) done},
 		qr{creating foreign keys},
@@ -909,6 +922,8 @@ pgbench(
 check_pgbench_logs($bdir, '001_pgbench_log_3', 1, 10, 10,
 	qr{^\d \d{1,2} \d+ \d \d+ \d+$});
 
+$node->safe_psql('postgres', 'DROP TABLESPACE regress_pgbench_tap_1_ts');
+
 # done
 $node->stop;
 done_testing();
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index f7fa18418b..1e9542af3f 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -157,6 +157,13 @@ my @options = (
 			qr{error while setting random seed from --random-seed option}
 		]
 	],
+	[ 'bad partition type', '-i --partition-method=BAD', [qr{"range"}, qr{"hash"}, qr{"BAD"}] ],
+	[ 'bad partition number', '-i --partitions -1', [ qr{invalid number of partitions: "-1"} ] ],
+	[
+		'partition method without partitioning',
+		'-i --partition-method=hash',
+		[ qr{partition-method requires greater than zero --partitions} ]
+	],
 
 	# logging sub-options
 	[

Reply via email to