On 11/30/23 18:47, Matthias van de Meent wrote:
> ...
>
> I just ran some more tests in less favorable environments, and it
> looks like I hit a bug:
> 
> % SET max_parallel_workers = 0;
> % CREATE INDEX ... USING brin (...);
> ERROR:  cannot update tuples during a parallel operation
> 
> Fix attached in 0002.

Yeah, that's a bug, thanks for the fix. Yeah Just jumping to a "cleanup"
label seems a bit cleaner (if that can be said about using goto), so I
tweaked the patch to do that instead.

> In 0003 I add the mentioned backfilling of empty ranges at the end of
> the table. I added it for both normal and parallel index builds, as
> normal builds apparently also didn't yet have this yet.
> 

Right. I was thinking about doing that to, but you beat me to it. I
don't want to bury this in the main patch adding parallel builds, it's
not really related to parallel CREATE INDEX. And it'd be weird to have
this for parallel builds first, so I rebased it as 0001.

As for the backfilling, I think we need to simplify the code a bit. We
have three places doing essentially the same thing (one for serial
builds, two for parallel builds). That's unnecessarily verbose, and
makes it harder to understand the code. But more importantly, the three
places are not doing exactly the same - some increment the current range
before, some do it at the end of the loop, etc. I got confused by this
multiple times.

So 0004 simplifies this - the backfilling is done by a function called
from all the places. The main complexity is in ensuring all three places
have the same concept of how to specify the range (of ranges) to fill.

Note: The serial might have two places too, but the main loop in
brinbuildCallback() does it range by range. It's a bit less efficient as
it can't use the pre-built empty tuple easily, but that's fine IMO.


skipping the last page range?
-----------------------------

I noticed you explicitly skipped backfilling empty tuple for the last
page range. Can you explain? I suspect the idea was that the user
activity would trigger building the tuple once that page range is
filled, but we don't really know if the table receives any changes. It
might easily be just a static table, in which case the last range would
remain unsummarized. If this is the right thing to do, the serial build
should do that too probably ...

But I don't think that's the correct thing to do - I think CREATE INDEX
is expected to always build a complete index, so my version always
builds an index for all table pages.


BlockNumber overflows
---------------------

The one thing that I'm not quite sure is correct is whether this handles
overflows/underflows correctly. I mean, imagine you have a huge table
that's almost 0xFFFFFFFF blocks, pages_per_range is prime, and the last
range ends less than pages_per_range from 0xFFFFFFFF. Then this

    blkno += pages_per_range;

can overflow, and might start inserting index tuples again (so we'd end
up with a duplicate).

I do think the current patch does this correctly, but AFAICS this is a
pre-existing issue ...

Anyway, while working on this / stress-testing it, I realized there's a
bug in how we allocate the emptyTuple. It's allocated lazily, but if can
easily happen in the per-range context we introduced last week. It needs
to be allocated in the context covering the whole index build.

I think the best way to do that is per 0006, i.e. allocate it in the
BrinBuildState, along with the appropriate memory context.

Obviously, all of this (0002-0006) should be squashed into a single
commit, I only keep it separate to make it clearer what changed.


stress-testing script
---------------------

I'm also attaching the bash script I use to stress test this - it's just
a loop that creates somewhat random table (different number of rows,
distinct values, ...), maybe deletes some of it, creates an index
(possibly partial), and then does various checks on it (checks number of
ranges, queries the table, etc.). It's somewhat primitive but it turned
out to be very capable in triggering bugs in BlockNumber arithmetic,
emptyTuple allocations, etc.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From db38b4c3ff1c8ac3a451da67bde41f125a9922f0 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.von...@postgresql.org>
Date: Sat, 2 Dec 2023 17:55:10 +0100
Subject: [PATCH v8 1/6] backfill pages in serial build

---
 src/backend/access/brin/brin.c | 35 +++++++++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 4f2dfdd17b9..b4bedbdc53c 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -51,6 +51,7 @@ typedef struct BrinBuildState
 	Relation	bs_irel;
 	int			bs_numtuples;
 	Buffer		bs_currentInsertBuf;
+	BlockNumber	bs_tablePages;
 	BlockNumber bs_pagesPerRange;
 	BlockNumber bs_currRangeStart;
 	BrinRevmap *bs_rmAccess;
@@ -82,7 +83,9 @@ typedef struct BrinOpaque
 #define BRIN_ALL_BLOCKRANGES	InvalidBlockNumber
 
 static BrinBuildState *initialize_brin_buildstate(Relation idxRel,
-												  BrinRevmap *revmap, BlockNumber pagesPerRange);
+												  BrinRevmap *revmap,
+												  BlockNumber pagesPerRange,
+												  BlockNumber tablePages);
 static BrinInsertState *initialize_brin_insertstate(Relation idxRel, IndexInfo *indexInfo);
 static void terminate_brin_buildstate(BrinBuildState *state);
 static void brinsummarize(Relation index, Relation heapRel, BlockNumber pageRange,
@@ -886,7 +889,8 @@ brinbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	BrinRevmap *revmap;
 	BrinBuildState *state;
 	Buffer		meta;
-	BlockNumber pagesPerRange;
+	BlockNumber pagesPerRange,
+				tablePages;
 
 	/*
 	 * We expect to be called exactly once for any index relation.
@@ -933,7 +937,8 @@ brinbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	 * Initialize our state, including the deformed tuple state.
 	 */
 	revmap = brinRevmapInitialize(index, &pagesPerRange);
-	state = initialize_brin_buildstate(index, revmap, pagesPerRange);
+	tablePages = RelationGetNumberOfBlocks(heap);
+	state = initialize_brin_buildstate(index, revmap, pagesPerRange, tablePages);
 
 	/*
 	 * Now scan the relation.  No syncscan allowed here because we want the
@@ -945,6 +950,24 @@ brinbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	/* process the final batch */
 	form_and_insert_tuple(state);
 
+	/* XXX shouldn't this happen in the brinbuildCallback? */
+	state->bs_currRangeStart += state->bs_pagesPerRange;
+	/*
+	 * Backfill the final ranges with empty data.
+	 *
+	 * This saves us from doing what amounts to full table scans when the
+	 * index is built on stupid index quals like WHERE (nonnull_column IS
+	 * NULL).
+	 */
+	while (state->bs_currRangeStart + state->bs_pagesPerRange - 1 < state->bs_tablePages)
+	{
+		brin_memtuple_initialize(state->bs_dtuple, state->bs_bdesc);
+
+		form_and_insert_tuple(state);
+
+		state->bs_currRangeStart += state->bs_pagesPerRange;
+	}
+
 	/* release resources */
 	idxtuples = state->bs_numtuples;
 	brinRevmapTerminate(state->bs_rmAccess);
@@ -1358,7 +1381,7 @@ brinGetStats(Relation index, BrinStatsData *stats)
  */
 static BrinBuildState *
 initialize_brin_buildstate(Relation idxRel, BrinRevmap *revmap,
-						   BlockNumber pagesPerRange)
+						   BlockNumber pagesPerRange, BlockNumber tablePages)
 {
 	BrinBuildState *state;
 
@@ -1368,6 +1391,7 @@ initialize_brin_buildstate(Relation idxRel, BrinRevmap *revmap,
 	state->bs_numtuples = 0;
 	state->bs_currentInsertBuf = InvalidBuffer;
 	state->bs_pagesPerRange = pagesPerRange;
+	state->bs_tablePages = tablePages;
 	state->bs_currRangeStart = 0;
 	state->bs_rmAccess = revmap;
 	state->bs_bdesc = brin_build_desc(idxRel);
@@ -1612,7 +1636,8 @@ brinsummarize(Relation index, Relation heapRel, BlockNumber pageRange,
 				/* first time through */
 				Assert(!indexInfo);
 				state = initialize_brin_buildstate(index, revmap,
-												   pagesPerRange);
+												   pagesPerRange,
+												   InvalidBlockNumber);
 				indexInfo = BuildIndexInfo(index);
 			}
 			summarize_range(indexInfo, state, heapRel, startBlk, heapNumBlocks);
-- 
2.41.0

From 206f3420b5b406573e6a040ae2954854a0404bf8 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <boekewurm+postg...@gmail.com>
Date: Wed, 29 Nov 2023 14:35:10 +0100
Subject: [PATCH v8 2/6] Allow BRIN to build its index in parallel

---
 contrib/bloom/blutils.c                       |   1 +
 doc/src/sgml/indexam.sgml                     |   7 +
 src/backend/access/brin/brin.c                | 916 +++++++++++++++++-
 src/backend/access/gin/ginutil.c              |   1 +
 src/backend/access/gist/gist.c                |   1 +
 src/backend/access/hash/hash.c                |   1 +
 src/backend/access/nbtree/nbtree.c            |   1 +
 src/backend/access/spgist/spgutils.c          |   1 +
 src/backend/access/transam/parallel.c         |   4 +
 src/backend/catalog/index.c                   |   2 +-
 src/backend/utils/sort/tuplesortvariants.c    | 207 ++++
 src/include/access/amapi.h                    |   2 +
 src/include/access/brin.h                     |   3 +
 src/include/utils/tuplesort.h                 |  11 +
 .../modules/dummy_index_am/dummy_index_am.c   |   1 +
 src/tools/pgindent/typedefs.list              |   5 +
 16 files changed, 1147 insertions(+), 17 deletions(-)

diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
index 4830cb3fee6..a781c5d98d6 100644
--- a/contrib/bloom/blutils.c
+++ b/contrib/bloom/blutils.c
@@ -122,6 +122,7 @@ blhandler(PG_FUNCTION_ARGS)
 	amroutine->amclusterable = false;
 	amroutine->ampredlocks = false;
 	amroutine->amcanparallel = false;
+	amroutine->amcanbuildparallel = false;
 	amroutine->amcaninclude = false;
 	amroutine->amusemaintenanceworkmem = false;
 	amroutine->amparallelvacuumoptions =
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index f107c43d6a6..cc4135e3940 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -123,6 +123,8 @@ typedef struct IndexAmRoutine
     bool        ampredlocks;
     /* does AM support parallel scan? */
     bool        amcanparallel;
+    /* does AM support parallel build? */
+    bool        amcanbuildparallel;
     /* does AM support columns included with clause INCLUDE? */
     bool        amcaninclude;
     /* does AM use maintenance_work_mem? */
@@ -286,6 +288,11 @@ ambuild (Relation heapRelation,
    and compute the keys that need to be inserted into the index.
    The function must return a palloc'd struct containing statistics about
    the new index.
+   The <structfield>amcanbuildparallel</structfield> flag indicates whether
+   the access method supports parallel index builds. When set to <literal>true</literal>,
+   the system will attempt to allocate parallel workers for the build.
+   Access methods supporting only non-parallel index builds should leave
+   this flag set to <literal>false</literal>.
   </para>
 
   <para>
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index b4bedbdc53c..edf2daad0c3 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -33,6 +33,7 @@
 #include "postmaster/autovacuum.h"
 #include "storage/bufmgr.h"
 #include "storage/freespace.h"
+#include "tcop/tcopprot.h"		/* pgrminclude ignore */
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/datum.h"
@@ -40,7 +41,119 @@
 #include "utils/index_selfuncs.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
+#include "utils/tuplesort.h"
 
+/* Magic numbers for parallel state sharing */
+#define PARALLEL_KEY_BRIN_SHARED		UINT64CONST(0xB000000000000001)
+#define PARALLEL_KEY_TUPLESORT			UINT64CONST(0xB000000000000002)
+#define PARALLEL_KEY_QUERY_TEXT			UINT64CONST(0xB000000000000003)
+#define PARALLEL_KEY_WAL_USAGE			UINT64CONST(0xB000000000000004)
+#define PARALLEL_KEY_BUFFER_USAGE		UINT64CONST(0xB000000000000005)
+
+/*
+ * Status record for spooling/sorting phase.
+ */
+typedef struct BrinSpool
+{
+	Tuplesortstate *sortstate;	/* state data for tuplesort.c */
+	Relation	heap;
+	Relation	index;
+} BrinSpool;
+
+/*
+ * Status for index builds performed in parallel.  This is allocated in a
+ * dynamic shared memory segment.
+ */
+typedef struct BrinShared
+{
+	/*
+	 * These fields are not modified during the build.  They primarily exist
+	 * for the benefit of worker processes that need to create state
+	 * corresponding to that used by the leader.
+	 */
+	Oid			heaprelid;
+	Oid			indexrelid;
+	bool		isconcurrent;
+	BlockNumber pagesPerRange;
+	int			scantuplesortstates;
+
+	/*
+	 * workersdonecv is used to monitor the progress of workers.  All parallel
+	 * participants must indicate that they are done before leader can use
+	 * results built by the workers (and before leader can write the data into
+	 * the index).
+	 */
+	ConditionVariable workersdonecv;
+
+	/*
+	 * mutex protects all fields before heapdesc.
+	 *
+	 * These fields contain status information of interest to BRIN index
+	 * builds that must work just the same when an index is built in parallel.
+	 */
+	slock_t		mutex;
+
+	/*
+	 * Mutable state that is maintained by workers, and reported back to
+	 * leader at end of the scans.
+	 *
+	 * nparticipantsdone is number of worker processes finished.
+	 *
+	 * reltuples is the total number of input heap tuples.
+	 *
+	 * indtuples is the total number of tuples that made it into the index.
+	 */
+	int			nparticipantsdone;
+	double		reltuples;
+	double		indtuples;
+
+	/*
+	 * ParallelTableScanDescData data follows. Can't directly embed here, as
+	 * implementations of the parallel table scan desc interface might need
+	 * stronger alignment.
+	 */
+} BrinShared;
+
+/*
+ * Return pointer to a BrinShared's parallel table scan.
+ *
+ * c.f. shm_toc_allocate as to why BUFFERALIGN is used, rather than just
+ * MAXALIGN.
+ */
+#define ParallelTableScanFromBrinShared(shared) \
+	(ParallelTableScanDesc) ((char *) (shared) + BUFFERALIGN(sizeof(BrinShared)))
+
+/*
+ * Status for leader in parallel index build.
+ */
+typedef struct BrinLeader
+{
+	/* parallel context itself */
+	ParallelContext *pcxt;
+
+	/*
+	 * nparticipanttuplesorts is the exact number of worker processes
+	 * successfully launched, plus one leader process if it participates as a
+	 * worker (only DISABLE_LEADER_PARTICIPATION builds avoid leader
+	 * participating as a worker).
+	 */
+	int			nparticipanttuplesorts;
+
+	/*
+	 * Leader process convenience pointers to shared state (leader avoids TOC
+	 * lookups).
+	 *
+	 * brinshared is the shared state for entire build.  sharedsort is the
+	 * shared, tuplesort-managed state passed to each process tuplesort.
+	 * snapshot is the snapshot used by the scan iff an MVCC snapshot is
+	 * required.
+	 */
+	BrinShared *brinshared;
+	Sharedsort *sharedsort;
+	Snapshot	snapshot;
+	WalUsage   *walusage;
+	BufferUsage *bufferusage;
+} BrinLeader;
 
 /*
  * We use a BrinBuildState during initial construction of a BRIN index.
@@ -49,7 +162,8 @@
 typedef struct BrinBuildState
 {
 	Relation	bs_irel;
-	int			bs_numtuples;
+	double		bs_numtuples;
+	double		bs_reltuples;
 	Buffer		bs_currentInsertBuf;
 	BlockNumber	bs_tablePages;
 	BlockNumber bs_pagesPerRange;
@@ -57,6 +171,15 @@ typedef struct BrinBuildState
 	BrinRevmap *bs_rmAccess;
 	BrinDesc   *bs_bdesc;
 	BrinMemTuple *bs_dtuple;
+
+	/*
+	 * bs_leader is only present when a parallel index build is performed, and
+	 * only in the leader process. (Actually, only the leader process has a
+	 * BrinBuildState.)
+	 */
+	BrinLeader *bs_leader;
+	int			bs_worker_id;
+	BrinSpool  *bs_spool;
 } BrinBuildState;
 
 /*
@@ -91,6 +214,7 @@ static void terminate_brin_buildstate(BrinBuildState *state);
 static void brinsummarize(Relation index, Relation heapRel, BlockNumber pageRange,
 						  bool include_partial, double *numSummarized, double *numExisting);
 static void form_and_insert_tuple(BrinBuildState *state);
+static void form_and_spill_tuple(BrinBuildState *state);
 static void union_tuples(BrinDesc *bdesc, BrinMemTuple *a,
 						 BrinTuple *b);
 static void brin_vacuum_scan(Relation idxrel, BufferAccessStrategy strategy);
@@ -98,6 +222,20 @@ static bool add_values_to_range(Relation idxRel, BrinDesc *bdesc,
 								BrinMemTuple *dtup, const Datum *values, const bool *nulls);
 static bool check_null_keys(BrinValues *bval, ScanKey *nullkeys, int nnullkeys);
 
+/* parallel index builds */
+static void _brin_begin_parallel(BrinBuildState *buildstate, Relation heap, Relation index,
+								 bool isconcurrent, int request);
+static void _brin_end_parallel(BrinLeader *btleader, BrinBuildState *state);
+static Size _brin_parallel_estimate_shared(Relation heap, Snapshot snapshot);
+static void _brin_leader_participate_as_worker(BrinBuildState *buildstate,
+											   Relation heap, Relation index);
+static void _brin_parallel_scan_and_build(BrinBuildState *buildstate,
+										  BrinSpool *brinspool,
+										  BrinShared *brinshared,
+										  Sharedsort *sharedsort,
+										  Relation heap, Relation index,
+										  int sortmem, bool progress);
+
 /*
  * BRIN handler function: return IndexAmRoutine with access method parameters
  * and callbacks.
@@ -122,6 +260,7 @@ brinhandler(PG_FUNCTION_ARGS)
 	amroutine->amclusterable = false;
 	amroutine->ampredlocks = false;
 	amroutine->amcanparallel = false;
+	amroutine->amcanbuildparallel = true;
 	amroutine->amcaninclude = false;
 	amroutine->amusemaintenanceworkmem = false;
 	amroutine->amsummarizing = true;
@@ -877,6 +1016,63 @@ brinbuildCallback(Relation index,
 							   values, isnull);
 }
 
+/*
+ * A version of the callback, used by parallel index builds. The main difference
+ * is that instead of writing the BRIN tuples into the index, we write them into
+ * a shared tuplesort, and leave the insertion up to the leader (which may
+ * reorder them a bit etc.). The callback also does not generate empty ranges,
+ * those may be added by the leader when merging results from workers.
+ */
+static void
+brinbuildCallbackParallel(Relation index,
+						  ItemPointer tid,
+						  Datum *values,
+						  bool *isnull,
+						  bool tupleIsAlive,
+						  void *brstate)
+{
+	BrinBuildState *state = (BrinBuildState *) brstate;
+	BlockNumber thisblock;
+
+	thisblock = ItemPointerGetBlockNumber(tid);
+
+	/*
+	 * If we're in a block that belongs to a future range, summarize what
+	 * we've got and start afresh.  Note the scan might have skipped many
+	 * pages, if they were devoid of live tuples; we do not create emptry BRIN
+	 * ranges here - the leader is responsible for filling them in.
+	 */
+	if (thisblock > state->bs_currRangeStart + state->bs_pagesPerRange - 1)
+	{
+
+		BRIN_elog((DEBUG2,
+				   "brinbuildCallback: completed a range: %u--%u",
+				   state->bs_currRangeStart,
+				   state->bs_currRangeStart + state->bs_pagesPerRange));
+
+		/* create the index tuple and write it into the tuplesort */
+		form_and_spill_tuple(state);
+
+		/*
+		 * Set state to correspond to the next range (for this block).
+		 *
+		 * This skips ranges that are either empty (and so we don't get any
+		 * tuples to summarize), or processes by other workers. We can't
+		 * differentiate those cases here easily, so we leave it up to the
+		 * leader to fill empty ranges where needed.
+		 */
+		state->bs_currRangeStart
+			= state->bs_pagesPerRange * (thisblock / state->bs_pagesPerRange);
+
+		/* re-initialize state for it */
+		brin_memtuple_initialize(state->bs_dtuple, state->bs_bdesc);
+	}
+
+	/* Accumulate the current tuple into the running state */
+	(void) add_values_to_range(index, state->bs_bdesc, state->bs_dtuple,
+							   values, isnull);
+}
+
 /*
  * brinbuild() -- build a new BRIN index.
  */
@@ -940,36 +1136,106 @@ brinbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	tablePages = RelationGetNumberOfBlocks(heap);
 	state = initialize_brin_buildstate(index, revmap, pagesPerRange, tablePages);
 
+	state->bs_spool = (BrinSpool *) palloc0(sizeof(BrinSpool));
+	state->bs_spool->heap = heap;
+	state->bs_spool->index = index;
+
 	/*
-	 * Now scan the relation.  No syncscan allowed here because we want the
-	 * heap blocks in physical order.
+	 * Attempt to launch parallel worker scan when required
+	 *
+	 * XXX plan_create_index_workers makes the number of workers dependent on
+	 * maintenance_work_mem, requiring 32MB for each worker. That makes sense
+	 * for btree, but not for BRIN, which can do away with much less memory.
+	 * So maybe make that somehow less strict, optionally?
 	 */
-	reltuples = table_index_build_scan(heap, index, indexInfo, false, true,
-									   brinbuildCallback, (void *) state, NULL);
-
-	/* process the final batch */
-	form_and_insert_tuple(state);
+	if (indexInfo->ii_ParallelWorkers > 0)
+		_brin_begin_parallel(state, heap, index, indexInfo->ii_Concurrent,
+							 indexInfo->ii_ParallelWorkers);
 
-	/* XXX shouldn't this happen in the brinbuildCallback? */
-	state->bs_currRangeStart += state->bs_pagesPerRange;
 	/*
-	 * Backfill the final ranges with empty data.
+	 * Now scan the relation.  No syncscan allowed here because we want the
+	 * heap blocks in physical order.
 	 *
-	 * This saves us from doing what amounts to full table scans when the
-	 * index is built on stupid index quals like WHERE (nonnull_column IS
-	 * NULL).
+	 * If parallel build requested and at least one worker process was
+	 * successfully launched, set up coordination state
 	 */
-	while (state->bs_currRangeStart + state->bs_pagesPerRange - 1 < state->bs_tablePages)
+	if (state->bs_leader)
 	{
-		brin_memtuple_initialize(state->bs_dtuple, state->bs_bdesc);
+		SortCoordinate coordinate;
 
+		coordinate = (SortCoordinate) palloc0(sizeof(SortCoordinateData));
+		coordinate->isWorker = false;
+		coordinate->nParticipants =
+			state->bs_leader->nparticipanttuplesorts;
+		coordinate->sharedsort = state->bs_leader->sharedsort;
+
+
+		/*
+		 * Begin serial/leader tuplesort.
+		 *
+		 * In cases where parallelism is involved, the leader receives the
+		 * same share of maintenance_work_mem as a serial sort (it is
+		 * generally treated in the same way as a serial sort once we return).
+		 * Parallel worker Tuplesortstates will have received only a fraction
+		 * of maintenance_work_mem, though.
+		 *
+		 * We rely on the lifetime of the Leader Tuplesortstate almost not
+		 * overlapping with any worker Tuplesortstate's lifetime.  There may
+		 * be some small overlap, but that's okay because we rely on leader
+		 * Tuplesortstate only allocating a small, fixed amount of memory
+		 * here. When its tuplesort_performsort() is called (by our caller),
+		 * and significant amounts of memory are likely to be used, all
+		 * workers must have already freed almost all memory held by their
+		 * Tuplesortstates (they are about to go away completely, too).  The
+		 * overall effect is that maintenance_work_mem always represents an
+		 * absolute high watermark on the amount of memory used by a CREATE
+		 * INDEX operation, regardless of the use of parallelism or any other
+		 * factor.
+		 */
+		state->bs_spool->sortstate =
+			tuplesort_begin_index_brin(heap, index,
+									   maintenance_work_mem, coordinate,
+									   TUPLESORT_NONE);
+
+		/*
+		 * In parallel mode, wait for workers to complete, and then read all
+		 * tuples from the shared tuplesort and insert them into the index.
+		 */
+		_brin_end_parallel(state->bs_leader, state);
+	}
+	else						/* no parallel index build */
+	{
+		reltuples = table_index_build_scan(heap, index, indexInfo, false, true,
+										   brinbuildCallback, (void *) state, NULL);
+
+		/* process the final batch */
 		form_and_insert_tuple(state);
 
+		/* XXX shouldn't this happen in the brinbuildCallback? */
 		state->bs_currRangeStart += state->bs_pagesPerRange;
+		/*
+		 * Backfill the final ranges with empty data.
+		 *
+		 * This saves us from doing what amounts to full table scans when the
+		 * index is built on stupid index quals like WHERE (nonnull_column IS
+		 * NULL).
+		 */
+		while (state->bs_currRangeStart + state->bs_pagesPerRange - 1 < state->bs_tablePages)
+		{
+			brin_memtuple_initialize(state->bs_dtuple, state->bs_bdesc);
+
+			form_and_insert_tuple(state);
+
+			state->bs_currRangeStart += state->bs_pagesPerRange;
+		}
+
+		/* track the number of relation tuples */
+		state->bs_reltuples = reltuples;
 	}
 
 	/* release resources */
 	idxtuples = state->bs_numtuples;
+	reltuples = state->bs_reltuples;
 	brinRevmapTerminate(state->bs_rmAccess);
 	terminate_brin_buildstate(state);
 
@@ -1389,6 +1655,7 @@ initialize_brin_buildstate(Relation idxRel, BrinRevmap *revmap,
 
 	state->bs_irel = idxRel;
 	state->bs_numtuples = 0;
+	state->bs_reltuples = 0;
 	state->bs_currentInsertBuf = InvalidBuffer;
 	state->bs_pagesPerRange = pagesPerRange;
 	state->bs_tablePages = tablePages;
@@ -1396,6 +1663,9 @@ initialize_brin_buildstate(Relation idxRel, BrinRevmap *revmap,
 	state->bs_rmAccess = revmap;
 	state->bs_bdesc = brin_build_desc(idxRel);
 	state->bs_dtuple = brin_new_memtuple(state->bs_bdesc);
+	state->bs_leader = NULL;
+	state->bs_worker_id = 0;
+	state->bs_spool = NULL;
 
 	return state;
 }
@@ -1688,6 +1958,32 @@ form_and_insert_tuple(BrinBuildState *state)
 	pfree(tup);
 }
 
+/*
+ * Given a deformed tuple in the build state, convert it into the on-disk
+ * format and write it to a (shared) tuplesort (the leader will insert it
+ * into the index later).
+ */
+static void
+form_and_spill_tuple(BrinBuildState *state)
+{
+	BrinTuple  *tup;
+	Size		size;
+
+	/* don't insert empty tuples in parallel build */
+	if (state->bs_dtuple->bt_empty_range)
+		return;
+
+	tup = brin_form_tuple(state->bs_bdesc, state->bs_currRangeStart,
+						  state->bs_dtuple, &size);
+
+	/* write the BRIN tuple to the tuplesort */
+	tuplesort_putbrintuple(state->bs_spool->sortstate, tup, size);
+
+	state->bs_numtuples++;
+
+	pfree(tup);
+}
+
 /*
  * Given two deformed tuples, adjust the first one so that it's consistent
  * with the summary values in both.
@@ -2007,3 +2303,591 @@ check_null_keys(BrinValues *bval, ScanKey *nullkeys, int nnullkeys)
 
 	return true;
 }
+
+static void
+_brin_begin_parallel(BrinBuildState *buildstate, Relation heap, Relation index,
+					 bool isconcurrent, int request)
+{
+	ParallelContext *pcxt;
+	int			scantuplesortstates;
+	Snapshot	snapshot;
+	Size		estbrinshared;
+	Size		estsort;
+	BrinShared *brinshared;
+	Sharedsort *sharedsort;
+	BrinLeader *brinleader = (BrinLeader *) palloc0(sizeof(BrinLeader));
+	WalUsage   *walusage;
+	BufferUsage *bufferusage;
+	bool		leaderparticipates = true;
+	int			querylen;
+
+#ifdef DISABLE_LEADER_PARTICIPATION
+	leaderparticipates = false;
+#endif
+
+	/*
+	 * Enter parallel mode, and create context for parallel build of brin
+	 * index
+	 */
+	EnterParallelMode();
+	Assert(request > 0);
+	pcxt = CreateParallelContext("postgres", "_brin_parallel_build_main",
+								 request);
+
+	scantuplesortstates = leaderparticipates ? request + 1 : request;
+
+	/*
+	 * Prepare for scan of the base relation.  In a normal index build, we use
+	 * SnapshotAny because we must retrieve all tuples and do our own time
+	 * qual checks (because we have to index RECENTLY_DEAD tuples).  In a
+	 * concurrent build, we take a regular MVCC snapshot and index whatever's
+	 * live according to that.
+	 */
+	if (!isconcurrent)
+		snapshot = SnapshotAny;
+	else
+		snapshot = RegisterSnapshot(GetTransactionSnapshot());
+
+	/*
+	 * Estimate size for our own PARALLEL_KEY_BRIN_SHARED workspace.
+	 */
+	estbrinshared = _brin_parallel_estimate_shared(heap, snapshot);
+	shm_toc_estimate_chunk(&pcxt->estimator, estbrinshared);
+	estsort = tuplesort_estimate_shared(scantuplesortstates);
+	shm_toc_estimate_chunk(&pcxt->estimator, estsort);
+
+	shm_toc_estimate_keys(&pcxt->estimator, 2);
+
+	/*
+	 * Estimate space for WalUsage and BufferUsage -- PARALLEL_KEY_WAL_USAGE
+	 * and PARALLEL_KEY_BUFFER_USAGE.
+	 *
+	 * If there are no extensions loaded that care, we could skip this.  We
+	 * have no way of knowing whether anyone's looking at pgWalUsage or
+	 * pgBufferUsage, so do it unconditionally.
+	 */
+	shm_toc_estimate_chunk(&pcxt->estimator,
+						   mul_size(sizeof(WalUsage), pcxt->nworkers));
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+	shm_toc_estimate_chunk(&pcxt->estimator,
+						   mul_size(sizeof(BufferUsage), pcxt->nworkers));
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+	/* Finally, estimate PARALLEL_KEY_QUERY_TEXT space */
+	if (debug_query_string)
+	{
+		querylen = strlen(debug_query_string);
+		shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
+		shm_toc_estimate_keys(&pcxt->estimator, 1);
+	}
+	else
+		querylen = 0;			/* keep compiler quiet */
+
+	/* Everyone's had a chance to ask for space, so now create the DSM */
+	InitializeParallelDSM(pcxt);
+
+	/* If no DSM segment was available, back out (do serial build) */
+	if (pcxt->seg == NULL)
+	{
+		if (IsMVCCSnapshot(snapshot))
+			UnregisterSnapshot(snapshot);
+		DestroyParallelContext(pcxt);
+		ExitParallelMode();
+		return;
+	}
+
+	/* Store shared build state, for which we reserved space */
+	brinshared = (BrinShared *) shm_toc_allocate(pcxt->toc, estbrinshared);
+	/* Initialize immutable state */
+	brinshared->heaprelid = RelationGetRelid(heap);
+	brinshared->indexrelid = RelationGetRelid(index);
+	brinshared->isconcurrent = isconcurrent;
+	brinshared->scantuplesortstates = scantuplesortstates;
+	brinshared->pagesPerRange = buildstate->bs_pagesPerRange;
+	ConditionVariableInit(&brinshared->workersdonecv);
+	SpinLockInit(&brinshared->mutex);
+
+	/* Initialize mutable state */
+	brinshared->nparticipantsdone = 0;
+	brinshared->reltuples = 0.0;
+	brinshared->indtuples = 0.0;
+
+	table_parallelscan_initialize(heap,
+								  ParallelTableScanFromBrinShared(brinshared),
+								  snapshot);
+
+	/*
+	 * Store shared tuplesort-private state, for which we reserved space.
+	 * Then, initialize opaque state using tuplesort routine.
+	 */
+	sharedsort = (Sharedsort *) shm_toc_allocate(pcxt->toc, estsort);
+	tuplesort_initialize_shared(sharedsort, scantuplesortstates,
+								pcxt->seg);
+
+	/*
+	 * Store shared tuplesort-private state, for which we reserved space.
+	 * Then, initialize opaque state using tuplesort routine.
+	 */
+	shm_toc_insert(pcxt->toc, PARALLEL_KEY_BRIN_SHARED, brinshared);
+	shm_toc_insert(pcxt->toc, PARALLEL_KEY_TUPLESORT, sharedsort);
+
+	/* Store query string for workers */
+	if (debug_query_string)
+	{
+		char	   *sharedquery;
+
+		sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
+		memcpy(sharedquery, debug_query_string, querylen + 1);
+		shm_toc_insert(pcxt->toc, PARALLEL_KEY_QUERY_TEXT, sharedquery);
+	}
+
+	/*
+	 * Allocate space for each worker's WalUsage and BufferUsage; no need to
+	 * initialize.
+	 */
+	walusage = shm_toc_allocate(pcxt->toc,
+								mul_size(sizeof(WalUsage), pcxt->nworkers));
+	shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage);
+	bufferusage = shm_toc_allocate(pcxt->toc,
+								   mul_size(sizeof(BufferUsage), pcxt->nworkers));
+	shm_toc_insert(pcxt->toc, PARALLEL_KEY_BUFFER_USAGE, bufferusage);
+
+	/* Launch workers, saving status for leader/caller */
+	LaunchParallelWorkers(pcxt);
+	brinleader->pcxt = pcxt;
+	brinleader->nparticipanttuplesorts = pcxt->nworkers_launched;
+	if (leaderparticipates)
+		brinleader->nparticipanttuplesorts++;
+	brinleader->brinshared = brinshared;
+	brinleader->sharedsort = sharedsort;
+	brinleader->snapshot = snapshot;
+	brinleader->walusage = walusage;
+	brinleader->bufferusage = bufferusage;
+
+	/* If no workers were successfully launched, back out (do serial build) */
+	if (pcxt->nworkers_launched == 0)
+	{
+		_brin_end_parallel(brinleader, NULL);
+		return;
+	}
+
+	/* Save leader state now that it's clear build will be parallel */
+	buildstate->bs_leader = brinleader;
+
+	/* Join heap scan ourselves */
+	if (leaderparticipates)
+		_brin_leader_participate_as_worker(buildstate, heap, index);
+
+	/*
+	 * Caller needs to wait for all launched workers when we return.  Make
+	 * sure that the failure-to-start case will not hang forever.
+	 */
+	WaitForParallelWorkersToAttach(pcxt);
+}
+
+/*
+ * Shut down workers, destroy parallel context, and end parallel mode.
+ */
+static void
+_brin_end_parallel(BrinLeader *brinleader, BrinBuildState *state)
+{
+	int			i;
+	BrinTuple  *btup;
+	BrinMemTuple *memtuple = NULL;
+	Size		tuplen;
+	BrinShared *brinshared = brinleader->brinshared;
+	BlockNumber prevblkno = InvalidBlockNumber;
+	BrinTuple  *emptyTuple = NULL;
+	Size		emptySize;
+	BrinSpool  *spool;
+	MemoryContext rangeCxt,
+				oldCxt;
+
+	/* Shutdown worker processes */
+	WaitForParallelWorkersToFinish(brinleader->pcxt);
+
+	if (!state)
+		return;
+
+	/* copy the data into leader state (we have to wait for the workers ) */
+	state->bs_reltuples = brinshared->reltuples;
+	state->bs_numtuples = brinshared->indtuples;
+
+	/* do the actual sort in the leader */
+	spool = state->bs_spool;
+	tuplesort_performsort(spool->sortstate);
+
+	/*
+	 * Initialize BrinMemTuple we'll use to union summaries from workers (in
+	 * case they happened to produce parts of the same paga range).
+	 */
+	memtuple = brin_new_memtuple(state->bs_bdesc);
+
+	/*
+	 * Create a memory context we'll reset to combine results for a single
+	 * page range (received from the workers). We don't expect huge number of
+	 * overlaps under regular circumstances, because for large tables the
+	 * chunk size is likely larger than the BRIN page range), but it can
+	 * happen, and the union functions may do all kinds of stuff. So we better
+	 * reset the context once in a while.
+	 */
+	rangeCxt = AllocSetContextCreate(CurrentMemoryContext,
+									 "brin union",
+									 ALLOCSET_DEFAULT_SIZES);
+	oldCxt = MemoryContextSwitchTo(rangeCxt);
+
+	/*
+	 * Read the BRIN tuples from the shared tuplesort, sorted by block number.
+	 * That probably gives us an index that is cheaper to scan, thanks to
+	 * mostly getting data from the same index page as before.
+	 */
+	while ((btup = tuplesort_getbrintuple(spool->sortstate, &tuplen, true)) != NULL)
+	{
+		/* Ranges should be multiples of pages_per_range for the index. */
+		Assert(btup->bt_blkno % brinshared->pagesPerRange == 0);
+
+		/*
+		 * Do we need to union summaries for the same page range?
+		 *
+		 * If this is the first brin tuple we read, then just deform it into
+		 * the memtuple, and continue with the next one from tuplesort. We
+		 * however may need to insert empty summaries into the index.
+		 *
+		 * If it's the same block as the last we saw, we simply union the brin
+		 * tuple into it, and we're done - we don't even need to insert empty
+		 * ranges, because that was done earlier when we saw the first brin
+		 * tuple (for this range).
+		 *
+		 * Finally, if it's not the first brin tuple, and it's not the same
+		 * page range, we need to do the insert and then deform the tuple into
+		 * the memtuple. Then we'll insert empty ranges before the new brin
+		 * tuple, if needed.
+		 */
+		if (prevblkno == InvalidBlockNumber)
+		{
+			/* First brin tuples, just deform into memtuple. */
+			memtuple = brin_deform_tuple(state->bs_bdesc, btup, memtuple);
+
+			/* continue to insert empty pages before thisblock */
+		}
+		else if (memtuple->bt_blkno == btup->bt_blkno)
+		{
+			/*
+			 * Not the first brin tuple, but same page range as the previous
+			 * one, so we can merge it into the memtuple.
+			 */
+			union_tuples(state->bs_bdesc, memtuple, btup);
+			continue;
+		}
+		else
+		{
+			BrinTuple  *tmp;
+			Size		len;
+
+			/*
+			 * We got brin tuple for a different page range, so form a brin
+			 * tuple from the memtuple, insert it, and re-init the memtuple
+			 * from the new brin tuple.
+			 */
+			tmp = brin_form_tuple(state->bs_bdesc, memtuple->bt_blkno,
+								  memtuple, &len);
+
+			brin_doinsert(state->bs_irel, state->bs_pagesPerRange, state->bs_rmAccess,
+						  &state->bs_currentInsertBuf, tmp->bt_blkno, tmp, len);
+
+			/*
+			 * Reset the per-output-range context. This frees all the memory
+			 * possibly allocated by the union functions, and also the BRIN
+			 * tuple we just formed and inserted.
+			 */
+			MemoryContextReset(rangeCxt);
+
+			memtuple = brin_deform_tuple(state->bs_bdesc, btup, memtuple);
+
+			/* continue to insert empty pages before thisblock */
+		}
+
+		/* Fill empty ranges for all ranges missing in the tuplesort. */
+		prevblkno = (prevblkno == InvalidBlockNumber) ? 0 : prevblkno;
+		while (prevblkno + state->bs_pagesPerRange < btup->bt_blkno)
+		{
+			/* the missing range */
+			prevblkno += state->bs_pagesPerRange;
+
+			/* Did we already build the empty range? If not, do it now. */
+			if (emptyTuple == NULL)
+			{
+				BrinMemTuple *dtuple = brin_new_memtuple(state->bs_bdesc);
+
+				emptyTuple = brin_form_tuple(state->bs_bdesc, prevblkno, dtuple, &emptySize);
+			}
+			else
+			{
+				/* we already have am "empty range" tuple, just set the block */
+				emptyTuple->bt_blkno = prevblkno;
+			}
+
+			brin_doinsert(state->bs_irel, state->bs_pagesPerRange, state->bs_rmAccess,
+						  &state->bs_currentInsertBuf,
+						  emptyTuple->bt_blkno, emptyTuple, emptySize);
+		}
+
+		prevblkno = btup->bt_blkno;
+	}
+
+	tuplesort_end(spool->sortstate);
+
+	/* Fill empty ranges at the end, for all ranges missing in the tuplesort. */
+	prevblkno = (prevblkno == InvalidBlockNumber) ? 0 : prevblkno;
+	while (prevblkno + state->bs_pagesPerRange < memtuple->bt_blkno)
+	{
+		/* the missing range */
+		prevblkno += state->bs_pagesPerRange;
+
+		/* Did we already build the empty range? If not, do it now. */
+		if (emptyTuple == NULL)
+		{
+			BrinMemTuple *dtuple = brin_new_memtuple(state->bs_bdesc);
+
+			emptyTuple = brin_form_tuple(state->bs_bdesc, prevblkno, dtuple, &emptySize);
+		}
+		else
+		{
+			/* we already have am "empty range" tuple, just set the block */
+			emptyTuple->bt_blkno = prevblkno;
+		}
+
+		brin_doinsert(state->bs_irel, state->bs_pagesPerRange, state->bs_rmAccess,
+					  &state->bs_currentInsertBuf,
+					  emptyTuple->bt_blkno, emptyTuple, emptySize);
+	}
+
+	/* Fill the BRIN tuple for the last page range. */
+	if (prevblkno != InvalidBlockNumber)
+	{
+		BrinTuple  *tmp;
+		Size		len;
+
+		tmp = brin_form_tuple(state->bs_bdesc, memtuple->bt_blkno,
+							  memtuple, &len);
+
+		brin_doinsert(state->bs_irel, state->bs_pagesPerRange, state->bs_rmAccess,
+					  &state->bs_currentInsertBuf, tmp->bt_blkno, tmp, len);
+
+		pfree(tmp);
+	}
+
+	/*
+	 * Switch back to the originam memory context, and destroy the one we
+	 * created to isolate the union_tuple calls.
+	 */
+	MemoryContextSwitchTo(oldCxt);
+	MemoryContextDelete(rangeCxt);
+
+	/*
+	 * Next, accumulate WAL usage.  (This must wait for the workers to finish,
+	 * or we might get incomplete data.)
+	 */
+	for (i = 0; i < brinleader->pcxt->nworkers_launched; i++)
+		InstrAccumParallelQuery(&brinleader->bufferusage[i], &brinleader->walusage[i]);
+
+	/* Free last reference to MVCC snapshot, if one was used */
+	if (IsMVCCSnapshot(brinleader->snapshot))
+		UnregisterSnapshot(brinleader->snapshot);
+	DestroyParallelContext(brinleader->pcxt);
+	ExitParallelMode();
+}
+
+/*
+ * Returns size of shared memory required to store state for a parallel
+ * brin index build based on the snapshot its parallel scan will use.
+ */
+static Size
+_brin_parallel_estimate_shared(Relation heap, Snapshot snapshot)
+{
+	/* c.f. shm_toc_allocate as to why BUFFERALIGN is used */
+	return add_size(BUFFERALIGN(sizeof(BrinShared)),
+					table_parallelscan_estimate(heap, snapshot));
+}
+
+/*
+ * Within leader, participate as a parallel worker.
+ */
+static void
+_brin_leader_participate_as_worker(BrinBuildState *buildstate, Relation heap, Relation index)
+{
+	BrinLeader *brinleader = buildstate->bs_leader;
+	int			sortmem;
+
+	/* Allocate memory and initialize private spool */
+	buildstate->bs_spool = (BrinSpool *) palloc0(sizeof(BrinSpool));
+	buildstate->bs_spool->heap = buildstate->bs_spool->heap;
+	buildstate->bs_spool->index = buildstate->bs_spool->index;
+
+	/*
+	 * Might as well use reliable figure when doling out maintenance_work_mem
+	 * (when requested number of workers were not launched, this will be
+	 * somewhat higher than it is for other workers).
+	 */
+	sortmem = maintenance_work_mem / brinleader->nparticipanttuplesorts;
+
+	/* Perform work common to all participants */
+	_brin_parallel_scan_and_build(buildstate, buildstate->bs_spool, brinleader->brinshared,
+								  brinleader->sharedsort, heap, index, sortmem, true);
+}
+
+/*
+ * Perform a worker's portion of a parallel sort.
+ *
+ * This generates a tuplesort for passed btspool, and a second tuplesort
+ * state if a second btspool is need (i.e. for unique index builds).  All
+ * other spool fields should already be set when this is called.
+ *
+ * sortmem is the amount of working memory to use within each worker,
+ * expressed in KBs.
+ *
+ * When this returns, workers are done, and need only release resources.
+ */
+static void
+_brin_parallel_scan_and_build(BrinBuildState *state, BrinSpool *brinspool,
+							  BrinShared *brinshared, Sharedsort *sharedsort,
+							  Relation heap, Relation index, int sortmem,
+							  bool progress)
+{
+	SortCoordinate coordinate;
+	TableScanDesc scan;
+	double		reltuples;
+	IndexInfo  *indexInfo;
+
+	/* Initialize local tuplesort coordination state */
+	coordinate = palloc0(sizeof(SortCoordinateData));
+	coordinate->isWorker = true;
+	coordinate->nParticipants = -1;
+	coordinate->sharedsort = sharedsort;
+
+	/* Begin "partial" tuplesort */
+	brinspool->sortstate = tuplesort_begin_index_brin(brinspool->heap,
+													  brinspool->index,
+													  sortmem, coordinate,
+													  TUPLESORT_NONE);
+
+	/* Join parallel scan */
+	indexInfo = BuildIndexInfo(index);
+	indexInfo->ii_Concurrent = brinshared->isconcurrent;
+
+	scan = table_beginscan_parallel(heap,
+									ParallelTableScanFromBrinShared(brinshared));
+
+	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
+									   brinbuildCallbackParallel, state, scan);
+
+	/* insert the last item */
+	form_and_spill_tuple(state);
+
+	/* sort the BRIN ranges built by this worker */
+	tuplesort_performsort(brinspool->sortstate);
+
+	state->bs_reltuples += reltuples;
+
+	/*
+	 * Done.  Record ambuild statistics.
+	 */
+	SpinLockAcquire(&brinshared->mutex);
+	brinshared->nparticipantsdone++;
+	brinshared->reltuples += state->bs_reltuples;
+	brinshared->indtuples += state->bs_numtuples;
+	SpinLockRelease(&brinshared->mutex);
+
+	/* Notify leader */
+	ConditionVariableSignal(&brinshared->workersdonecv);
+
+	tuplesort_end(brinspool->sortstate);
+}
+
+/*
+ * Perform work within a launched parallel process.
+ */
+void
+_brin_parallel_build_main(dsm_segment *seg, shm_toc *toc)
+{
+	char	   *sharedquery;
+	BrinShared *brinshared;
+	Sharedsort *sharedsort;
+	BrinBuildState *buildstate;
+	Relation	heapRel;
+	Relation	indexRel;
+	LOCKMODE	heapLockmode;
+	LOCKMODE	indexLockmode;
+	WalUsage   *walusage;
+	BufferUsage *bufferusage;
+	int			sortmem;
+
+	/*
+	 * The only possible status flag that can be set to the parallel worker is
+	 * PROC_IN_SAFE_IC.
+	 */
+	Assert((MyProc->statusFlags == 0) ||
+		   (MyProc->statusFlags == PROC_IN_SAFE_IC));
+
+	/* Set debug_query_string for individual workers first */
+	sharedquery = shm_toc_lookup(toc, PARALLEL_KEY_QUERY_TEXT, true);
+	debug_query_string = sharedquery;
+
+	/* Report the query string from leader */
+	pgstat_report_activity(STATE_RUNNING, debug_query_string);
+
+	/* Look up brin shared state */
+	brinshared = shm_toc_lookup(toc, PARALLEL_KEY_BRIN_SHARED, false);
+
+	/* Open relations using lock modes known to be obtained by index.c */
+	if (!brinshared->isconcurrent)
+	{
+		heapLockmode = ShareLock;
+		indexLockmode = AccessExclusiveLock;
+	}
+	else
+	{
+		heapLockmode = ShareUpdateExclusiveLock;
+		indexLockmode = RowExclusiveLock;
+	}
+
+	/* Open relations within worker */
+	heapRel = table_open(brinshared->heaprelid, heapLockmode);
+	indexRel = index_open(brinshared->indexrelid, indexLockmode);
+
+	buildstate = initialize_brin_buildstate(indexRel, NULL,
+											brinshared->pagesPerRange,
+											InvalidBlockNumber);
+
+	/* Initialize worker's own spool */
+	buildstate->bs_spool = (BrinSpool *) palloc0(sizeof(BrinSpool));
+	buildstate->bs_spool->heap = heapRel;
+	buildstate->bs_spool->index = indexRel;
+
+	/* Look up shared state private to tuplesort.c */
+	sharedsort = shm_toc_lookup(toc, PARALLEL_KEY_TUPLESORT, false);
+	tuplesort_attach_shared(sharedsort, seg);
+
+	/* Prepare to track buffer usage during parallel execution */
+	InstrStartParallelQuery();
+
+	/*
+	 * Might as well use reliable figure when doling out maintenance_work_mem
+	 * (when requested number of workers were not launched, this will be
+	 * somewhat higher than it is for other workers).
+	 */
+	sortmem = maintenance_work_mem / brinshared->scantuplesortstates;
+
+	_brin_parallel_scan_and_build(buildstate, buildstate->bs_spool,
+								  brinshared, sharedsort,
+								  heapRel, indexRel, sortmem, false);
+
+	/* Report WAL/buffer usage during parallel execution */
+	bufferusage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
+	walusage = shm_toc_lookup(toc, PARALLEL_KEY_WAL_USAGE, false);
+	InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber],
+						  &walusage[ParallelWorkerNumber]);
+
+	index_close(indexRel, indexLockmode);
+	table_close(heapRel, heapLockmode);
+}
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index a875c5d3d7a..9b1a0ac345d 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -54,6 +54,7 @@ ginhandler(PG_FUNCTION_ARGS)
 	amroutine->amclusterable = false;
 	amroutine->ampredlocks = true;
 	amroutine->amcanparallel = false;
+	amroutine->amcanbuildparallel = false;
 	amroutine->amcaninclude = false;
 	amroutine->amusemaintenanceworkmem = true;
 	amroutine->amsummarizing = false;
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 9a1bf8f66cb..e052ba8bda2 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -76,6 +76,7 @@ gisthandler(PG_FUNCTION_ARGS)
 	amroutine->amclusterable = true;
 	amroutine->ampredlocks = true;
 	amroutine->amcanparallel = false;
+	amroutine->amcanbuildparallel = false;
 	amroutine->amcaninclude = true;
 	amroutine->amusemaintenanceworkmem = false;
 	amroutine->amsummarizing = false;
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 6443ff21bda..905519692c6 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -73,6 +73,7 @@ hashhandler(PG_FUNCTION_ARGS)
 	amroutine->amclusterable = false;
 	amroutine->ampredlocks = true;
 	amroutine->amcanparallel = false;
+	amroutine->amcanbuildparallel = false;
 	amroutine->amcaninclude = false;
 	amroutine->amusemaintenanceworkmem = false;
 	amroutine->amsummarizing = false;
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 0930f9b37e3..6c8cd93fa0a 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -112,6 +112,7 @@ bthandler(PG_FUNCTION_ARGS)
 	amroutine->amclusterable = true;
 	amroutine->ampredlocks = true;
 	amroutine->amcanparallel = true;
+	amroutine->amcanbuildparallel = true;
 	amroutine->amcaninclude = true;
 	amroutine->amusemaintenanceworkmem = false;
 	amroutine->amsummarizing = false;
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 30c00876a56..fd4b6157101 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -60,6 +60,7 @@ spghandler(PG_FUNCTION_ARGS)
 	amroutine->amclusterable = false;
 	amroutine->ampredlocks = false;
 	amroutine->amcanparallel = false;
+	amroutine->amcanbuildparallel = false;
 	amroutine->amcaninclude = true;
 	amroutine->amusemaintenanceworkmem = false;
 	amroutine->amsummarizing = false;
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 194a1207be6..d78314062e0 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -14,6 +14,7 @@
 
 #include "postgres.h"
 
+#include "access/brin.h"
 #include "access/nbtree.h"
 #include "access/parallel.h"
 #include "access/session.h"
@@ -145,6 +146,9 @@ static const struct
 	{
 		"_bt_parallel_build_main", _bt_parallel_build_main
 	},
+	{
+		"_brin_parallel_build_main", _brin_parallel_build_main
+	},
 	{
 		"parallel_vacuum_main", parallel_vacuum_main
 	}
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 143fae01ebd..40abbaf476b 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2982,7 +2982,7 @@ index_build(Relation heapRelation,
 	 * Note that planner considers parallel safety for us.
 	 */
 	if (parallel && IsNormalProcessingMode() &&
-		indexRelation->rd_rel->relam == BTREE_AM_OID)
+		indexRelation->rd_indam->amcanbuildparallel)
 		indexInfo->ii_ParallelWorkers =
 			plan_create_index_workers(RelationGetRelid(heapRelation),
 									  RelationGetRelid(indexRelation));
diff --git a/src/backend/utils/sort/tuplesortvariants.c b/src/backend/utils/sort/tuplesortvariants.c
index 2cd508e5130..90fc605f1ca 100644
--- a/src/backend/utils/sort/tuplesortvariants.c
+++ b/src/backend/utils/sort/tuplesortvariants.c
@@ -19,6 +19,7 @@
 
 #include "postgres.h"
 
+#include "access/brin_tuple.h"
 #include "access/hash.h"
 #include "access/htup_details.h"
 #include "access/nbtree.h"
@@ -43,6 +44,8 @@ static void removeabbrev_cluster(Tuplesortstate *state, SortTuple *stups,
 								 int count);
 static void removeabbrev_index(Tuplesortstate *state, SortTuple *stups,
 							   int count);
+static void removeabbrev_index_brin(Tuplesortstate *state, SortTuple *stups,
+									int count);
 static void removeabbrev_datum(Tuplesortstate *state, SortTuple *stups,
 							   int count);
 static int	comparetup_heap(const SortTuple *a, const SortTuple *b,
@@ -69,10 +72,16 @@ static int	comparetup_index_hash(const SortTuple *a, const SortTuple *b,
 								  Tuplesortstate *state);
 static int	comparetup_index_hash_tiebreak(const SortTuple *a, const SortTuple *b,
 										   Tuplesortstate *state);
+static int	comparetup_index_brin(const SortTuple *a, const SortTuple *b,
+								  Tuplesortstate *state);
 static void writetup_index(Tuplesortstate *state, LogicalTape *tape,
 						   SortTuple *stup);
 static void readtup_index(Tuplesortstate *state, SortTuple *stup,
 						  LogicalTape *tape, unsigned int len);
+static void writetup_index_brin(Tuplesortstate *state, LogicalTape *tape,
+								SortTuple *stup);
+static void readtup_index_brin(Tuplesortstate *state, SortTuple *stup,
+							   LogicalTape *tape, unsigned int len);
 static int	comparetup_datum(const SortTuple *a, const SortTuple *b,
 							 Tuplesortstate *state);
 static int	comparetup_datum_tiebreak(const SortTuple *a, const SortTuple *b,
@@ -128,6 +137,16 @@ typedef struct
 	uint32		max_buckets;
 } TuplesortIndexHashArg;
 
+/*
+ * Data struture pointed by "TuplesortPublic.arg" for the index_brin subcase.
+ */
+typedef struct
+{
+	TuplesortIndexArg index;
+
+	/* XXX do we need something here? */
+} TuplesortIndexBrinArg;
+
 /*
  * Data struture pointed by "TuplesortPublic.arg" for the Datum case.
  * Set by tuplesort_begin_datum and used only by the DatumTuple routines.
@@ -140,6 +159,21 @@ typedef struct
 	int			datumTypeLen;
 } TuplesortDatumArg;
 
+/*
+ * Computing BrinTuple size with only the tuple is difficult, so we want to track
+ * the length referenced by the SortTuple. That's what BrinSortTuple is meant
+ * to do - it's essentially a BrinTuple prefixed by its length.
+ */
+typedef struct BrinSortTuple
+{
+	Size		tuplen;
+	BrinTuple	tuple;
+} BrinSortTuple;
+
+/* Size of the BrinSortTuple, given length of the BrinTuple. */
+#define BRINSORTTUPLE_SIZE(len)		(offsetof(BrinSortTuple, tuple) + (len))
+
+
 Tuplesortstate *
 tuplesort_begin_heap(TupleDesc tupDesc,
 					 int nkeys, AttrNumber *attNums,
@@ -527,6 +561,47 @@ tuplesort_begin_index_gist(Relation heapRel,
 	return state;
 }
 
+Tuplesortstate *
+tuplesort_begin_index_brin(Relation heapRel,
+						   Relation indexRel,
+						   int workMem,
+						   SortCoordinate coordinate,
+						   int sortopt)
+{
+	Tuplesortstate *state = tuplesort_begin_common(workMem, coordinate,
+												   sortopt);
+	TuplesortPublic *base = TuplesortstateGetPublic(state);
+	MemoryContext oldcontext;
+	TuplesortIndexBrinArg *arg;
+
+	oldcontext = MemoryContextSwitchTo(base->maincontext);
+	arg = (TuplesortIndexBrinArg *) palloc(sizeof(TuplesortIndexBrinArg));
+
+#ifdef TRACE_SORT
+	if (trace_sort)
+		elog(LOG,
+			 "begin index sort: workMem = %d, randomAccess = %c",
+			 workMem,
+			 sortopt & TUPLESORT_RANDOMACCESS ? 't' : 'f');
+#endif
+
+	base->nKeys = 1;			/* Only one sort column, the block number */
+
+	base->removeabbrev = removeabbrev_index_brin;
+	base->comparetup = comparetup_index_brin;
+	base->writetup = writetup_index_brin;
+	base->readtup = readtup_index_brin;
+	base->haveDatum1 = true;
+	base->arg = arg;
+
+	arg->index.heapRel = heapRel;
+	arg->index.indexRel = indexRel;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return state;
+}
+
 Tuplesortstate *
 tuplesort_begin_datum(Oid datumType, Oid sortOperator, Oid sortCollation,
 					  bool nullsFirstFlag, int workMem,
@@ -707,6 +782,35 @@ tuplesort_putindextuplevalues(Tuplesortstate *state, Relation rel,
 							  !stup.isnull1);
 }
 
+/*
+ * Collect one BRIN tuple while collecting input data for sort.
+ */
+void
+tuplesort_putbrintuple(Tuplesortstate *state, BrinTuple *tuple, Size size)
+{
+	SortTuple	stup;
+	BrinSortTuple *bstup;
+	TuplesortPublic *base = TuplesortstateGetPublic(state);
+	MemoryContext oldcontext = MemoryContextSwitchTo(base->tuplecontext);
+
+	/* allocate space for the whole BRIN sort tuple */
+	bstup = palloc(BRINSORTTUPLE_SIZE(size));
+
+	bstup->tuplen = size;
+	memcpy(&bstup->tuple, tuple, size);
+
+	stup.tuple = bstup;
+	stup.datum1 = tuple->bt_blkno;
+	stup.isnull1 = false;
+
+	tuplesort_puttuple_common(state, &stup,
+							  base->sortKeys &&
+							  base->sortKeys->abbrev_converter &&
+							  !stup.isnull1);
+
+	MemoryContextSwitchTo(oldcontext);
+}
+
 /*
  * Accept one Datum while collecting input data for sort.
  *
@@ -850,6 +954,35 @@ tuplesort_getindextuple(Tuplesortstate *state, bool forward)
 	return (IndexTuple) stup.tuple;
 }
 
+/*
+ * Fetch the next BRIN tuple in either forward or back direction.
+ * Returns NULL if no more tuples.  Returned tuple belongs to tuplesort memory
+ * context, and must not be freed by caller.  Caller may not rely on tuple
+ * remaining valid after any further manipulation of tuplesort.
+ */
+BrinTuple *
+tuplesort_getbrintuple(Tuplesortstate *state, Size *len, bool forward)
+{
+	TuplesortPublic *base = TuplesortstateGetPublic(state);
+	MemoryContext oldcontext = MemoryContextSwitchTo(base->sortcontext);
+	SortTuple	stup;
+	BrinSortTuple *btup;
+
+	if (!tuplesort_gettuple_common(state, forward, &stup))
+		stup.tuple = NULL;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	if (!stup.tuple)
+		return NULL;
+
+	btup = (BrinSortTuple *) stup.tuple;
+
+	*len = btup->tuplen;
+
+	return &btup->tuple;
+}
+
 /*
  * Fetch the next Datum in either forward or back direction.
  * Returns false if no more datums.
@@ -1564,6 +1697,80 @@ readtup_index(Tuplesortstate *state, SortTuple *stup,
 								 &stup->isnull1);
 }
 
+/*
+ * Routines specialized for BrinTuple case
+ */
+
+static void
+removeabbrev_index_brin(Tuplesortstate *state, SortTuple *stups, int count)
+{
+	int			i;
+
+	for (i = 0; i < count; i++)
+	{
+		BrinSortTuple *tuple;
+
+		tuple = stups[i].tuple;
+		stups[i].datum1 = tuple->tuple.bt_blkno;
+	}
+}
+
+static int
+comparetup_index_brin(const SortTuple *a, const SortTuple *b,
+					  Tuplesortstate *state)
+{
+	Assert(TuplesortstateGetPublic(state)->haveDatum1);
+
+	if (DatumGetUInt32(a->datum1) > DatumGetUInt32(b->datum1))
+		return 1;
+
+	if (DatumGetUInt32(a->datum1) < DatumGetUInt32(b->datum1))
+		return -1;
+
+	/* silence compilers */
+	return 0;
+}
+
+static void
+writetup_index_brin(Tuplesortstate *state, LogicalTape *tape, SortTuple *stup)
+{
+	TuplesortPublic *base = TuplesortstateGetPublic(state);
+	BrinSortTuple *tuple = (BrinSortTuple *) stup->tuple;
+	unsigned int tuplen = tuple->tuplen;
+
+	tuplen = tuplen + sizeof(tuplen);
+	LogicalTapeWrite(tape, &tuplen, sizeof(tuplen));
+	LogicalTapeWrite(tape, &tuple->tuple, tuple->tuplen);
+	if (base->sortopt & TUPLESORT_RANDOMACCESS) /* need trailing length word? */
+		LogicalTapeWrite(tape, &tuplen, sizeof(tuplen));
+}
+
+static void
+readtup_index_brin(Tuplesortstate *state, SortTuple *stup,
+				   LogicalTape *tape, unsigned int len)
+{
+	BrinSortTuple *tuple;
+	TuplesortPublic *base = TuplesortstateGetPublic(state);
+	unsigned int tuplen = len - sizeof(unsigned int);
+
+	/*
+	 * Allocate space for the BRIN sort tuple, which is BrinTuple with an
+	 * extra length field.
+	 */
+	tuple = (BrinSortTuple *) tuplesort_readtup_alloc(state,
+													  BRINSORTTUPLE_SIZE(tuplen));
+
+	tuple->tuplen = tuplen;
+
+	LogicalTapeReadExact(tape, &tuple->tuple, tuplen);
+	if (base->sortopt & TUPLESORT_RANDOMACCESS) /* need trailing length word? */
+		LogicalTapeReadExact(tape, &tuplen, sizeof(tuplen));
+	stup->tuple = (void *) tuple;
+
+	/* set up first-column key value, which is block number */
+	stup->datum1 = tuple->tuple.bt_blkno;
+}
+
 /*
  * Routines specialized for DatumTuple case
  */
diff --git a/src/include/access/amapi.h b/src/include/access/amapi.h
index 244459587fc..df85ae3aace 100644
--- a/src/include/access/amapi.h
+++ b/src/include/access/amapi.h
@@ -243,6 +243,8 @@ typedef struct IndexAmRoutine
 	bool		ampredlocks;
 	/* does AM support parallel scan? */
 	bool		amcanparallel;
+	/* does AM support parallel build? */
+	bool		amcanbuildparallel;
 	/* does AM support columns included with clause INCLUDE? */
 	bool		amcaninclude;
 	/* does AM use maintenance_work_mem? */
diff --git a/src/include/access/brin.h b/src/include/access/brin.h
index ed66f1b3d51..3451ecb211f 100644
--- a/src/include/access/brin.h
+++ b/src/include/access/brin.h
@@ -11,6 +11,7 @@
 #define BRIN_H
 
 #include "nodes/execnodes.h"
+#include "storage/shm_toc.h"
 #include "utils/relcache.h"
 
 
@@ -52,4 +53,6 @@ typedef struct BrinStatsData
 
 extern void brinGetStats(Relation index, BrinStatsData *stats);
 
+extern void _brin_parallel_build_main(dsm_segment *seg, shm_toc *toc);
+
 #endif							/* BRIN_H */
diff --git a/src/include/utils/tuplesort.h b/src/include/utils/tuplesort.h
index 9ed2de76cd6..357eb35311d 100644
--- a/src/include/utils/tuplesort.h
+++ b/src/include/utils/tuplesort.h
@@ -21,6 +21,7 @@
 #ifndef TUPLESORT_H
 #define TUPLESORT_H
 
+#include "access/brin_tuple.h"
 #include "access/itup.h"
 #include "executor/tuptable.h"
 #include "storage/dsm.h"
@@ -282,6 +283,9 @@ typedef struct
  * The "index_hash" API is similar to index_btree, but the tuples are
  * actually sorted by their hash codes not the raw data.
  *
+ * The "index_brin" API is similar to index_btree, but the tuples are
+ * BrinTuple and are sorted by their block number not the raw data.
+ *
  * Parallel sort callers are required to coordinate multiple tuplesort states
  * in a leader process and one or more worker processes.  The leader process
  * must launch workers, and have each perform an independent "partial"
@@ -426,6 +430,10 @@ extern Tuplesortstate *tuplesort_begin_index_gist(Relation heapRel,
 												  Relation indexRel,
 												  int workMem, SortCoordinate coordinate,
 												  int sortopt);
+extern Tuplesortstate *tuplesort_begin_index_brin(Relation heapRel,
+												  Relation indexRel,
+												  int workMem, SortCoordinate coordinate,
+												  int sortopt);
 extern Tuplesortstate *tuplesort_begin_datum(Oid datumType,
 											 Oid sortOperator, Oid sortCollation,
 											 bool nullsFirstFlag,
@@ -438,6 +446,7 @@ extern void tuplesort_putheaptuple(Tuplesortstate *state, HeapTuple tup);
 extern void tuplesort_putindextuplevalues(Tuplesortstate *state,
 										  Relation rel, ItemPointer self,
 										  const Datum *values, const bool *isnull);
+extern void tuplesort_putbrintuple(Tuplesortstate *state, BrinTuple *tup, Size len);
 extern void tuplesort_putdatum(Tuplesortstate *state, Datum val,
 							   bool isNull);
 
@@ -445,6 +454,8 @@ extern bool tuplesort_gettupleslot(Tuplesortstate *state, bool forward,
 								   bool copy, TupleTableSlot *slot, Datum *abbrev);
 extern HeapTuple tuplesort_getheaptuple(Tuplesortstate *state, bool forward);
 extern IndexTuple tuplesort_getindextuple(Tuplesortstate *state, bool forward);
+extern BrinTuple *tuplesort_getbrintuple(Tuplesortstate *state, Size *len,
+										 bool forward);
 extern bool tuplesort_getdatum(Tuplesortstate *state, bool forward, bool copy,
 							   Datum *val, bool *isNull, Datum *abbrev);
 
diff --git a/src/test/modules/dummy_index_am/dummy_index_am.c b/src/test/modules/dummy_index_am/dummy_index_am.c
index cbdae7ab7a5..eaa0c483b7e 100644
--- a/src/test/modules/dummy_index_am/dummy_index_am.c
+++ b/src/test/modules/dummy_index_am/dummy_index_am.c
@@ -294,6 +294,7 @@ dihandler(PG_FUNCTION_ARGS)
 	amroutine->amclusterable = false;
 	amroutine->ampredlocks = false;
 	amroutine->amcanparallel = false;
+	amroutine->amcanbuildparallel = false;
 	amroutine->amcaninclude = false;
 	amroutine->amusemaintenanceworkmem = false;
 	amroutine->amsummarizing = false;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index d659adbfd6c..8d94ef16623 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -297,13 +297,17 @@ BpChar
 BrinBuildState
 BrinDesc
 BrinInsertState
+BrinLeader
 BrinMemTuple
 BrinMetaPageData
 BrinOpaque
 BrinOpcInfo
 BrinOptions
 BrinRevmap
+BrinShared
+BrinSortTuple
 BrinSpecialSpace
+BrinSpool
 BrinStatsData
 BrinTuple
 BrinValues
@@ -2882,6 +2886,7 @@ TupleTableSlotOps
 TuplesortClusterArg
 TuplesortDatumArg
 TuplesortIndexArg
+TuplesortIndexBrinArg
 TuplesortIndexBTreeArg
 TuplesortIndexHashArg
 TuplesortInstrumentation
-- 
2.41.0

From 2242c6719c81552cf7bceaedcdbde759600d0973 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <boekewurm+postg...@gmail.com>
Date: Thu, 30 Nov 2023 18:19:40 +0100
Subject: [PATCH v8 3/6] BRIN: Exit parallel mode when not starting parallel
 create index

---
 src/backend/access/brin/brin.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index edf2daad0c3..bc519825f2e 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2506,8 +2506,12 @@ _brin_end_parallel(BrinLeader *brinleader, BrinBuildState *state)
 	/* Shutdown worker processes */
 	WaitForParallelWorkersToFinish(brinleader->pcxt);
 
+	/*
+	 * If we didn't actually launch workers, we still have to make sure to exit
+	 * parallel mode.
+	 */
 	if (!state)
-		return;
+		goto cleanup;
 
 	/* copy the data into leader state (we have to wait for the workers ) */
 	state->bs_reltuples = brinshared->reltuples;
@@ -2691,6 +2695,8 @@ _brin_end_parallel(BrinLeader *brinleader, BrinBuildState *state)
 	for (i = 0; i < brinleader->pcxt->nworkers_launched; i++)
 		InstrAccumParallelQuery(&brinleader->bufferusage[i], &brinleader->walusage[i]);
 
+cleanup:
+
 	/* Free last reference to MVCC snapshot, if one was used */
 	if (IsMVCCSnapshot(brinleader->snapshot))
 		UnregisterSnapshot(brinleader->snapshot);
-- 
2.41.0

From 27675f10b328fc6d80331dfa99a1b8e3309a90ff Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <boekewurm+postg...@gmail.com>
Date: Thu, 30 Nov 2023 18:12:16 +0100
Subject: [PATCH v8 4/6] BRIN: Backfill empty ranges

If we don't, then an index with WHERE (practically_nonnull IS NULL)
would be a full table scan due to missing entries in the ranges table.

The issue is fixed for both normal and parallel index creation.
---
 src/backend/access/brin/brin.c | 96 ++++++++++++++++++++--------------
 1 file changed, 58 insertions(+), 38 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index bc519825f2e..00b437d9c1f 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -221,6 +221,8 @@ static void brin_vacuum_scan(Relation idxrel, BufferAccessStrategy strategy);
 static bool add_values_to_range(Relation idxRel, BrinDesc *bdesc,
 								BrinMemTuple *dtup, const Datum *values, const bool *nulls);
 static bool check_null_keys(BrinValues *bval, ScanKey *nullkeys, int nnullkeys);
+static BrinTuple *brin_init_empty_tuple(BrinBuildState *state, BlockNumber blkno,
+										BrinTuple *emptyTuple, Size *emptySize);
 
 /* parallel index builds */
 static void _brin_begin_parallel(BrinBuildState *buildstate, Relation heap, Relation index,
@@ -2619,17 +2621,7 @@ _brin_end_parallel(BrinLeader *brinleader, BrinBuildState *state)
 			prevblkno += state->bs_pagesPerRange;
 
 			/* Did we already build the empty range? If not, do it now. */
-			if (emptyTuple == NULL)
-			{
-				BrinMemTuple *dtuple = brin_new_memtuple(state->bs_bdesc);
-
-				emptyTuple = brin_form_tuple(state->bs_bdesc, prevblkno, dtuple, &emptySize);
-			}
-			else
-			{
-				/* we already have am "empty range" tuple, just set the block */
-				emptyTuple->bt_blkno = prevblkno;
-			}
+			emptyTuple = brin_init_empty_tuple(state, prevblkno, emptyTuple, &emptySize);
 
 			brin_doinsert(state->bs_irel, state->bs_pagesPerRange, state->bs_rmAccess,
 						  &state->bs_currentInsertBuf,
@@ -2641,32 +2633,7 @@ _brin_end_parallel(BrinLeader *brinleader, BrinBuildState *state)
 
 	tuplesort_end(spool->sortstate);
 
-	/* Fill empty ranges at the end, for all ranges missing in the tuplesort. */
-	prevblkno = (prevblkno == InvalidBlockNumber) ? 0 : prevblkno;
-	while (prevblkno + state->bs_pagesPerRange < memtuple->bt_blkno)
-	{
-		/* the missing range */
-		prevblkno += state->bs_pagesPerRange;
-
-		/* Did we already build the empty range? If not, do it now. */
-		if (emptyTuple == NULL)
-		{
-			BrinMemTuple *dtuple = brin_new_memtuple(state->bs_bdesc);
-
-			emptyTuple = brin_form_tuple(state->bs_bdesc, prevblkno, dtuple, &emptySize);
-		}
-		else
-		{
-			/* we already have am "empty range" tuple, just set the block */
-			emptyTuple->bt_blkno = prevblkno;
-		}
-
-		brin_doinsert(state->bs_irel, state->bs_pagesPerRange, state->bs_rmAccess,
-					  &state->bs_currentInsertBuf,
-					  emptyTuple->bt_blkno, emptyTuple, emptySize);
-	}
-
-	/* Fill the BRIN tuple for the last page range. */
+	/* Fill the BRIN tuple for the last page range with data. */
 	if (prevblkno != InvalidBlockNumber)
 	{
 		BrinTuple  *tmp;
@@ -2682,7 +2649,32 @@ _brin_end_parallel(BrinLeader *brinleader, BrinBuildState *state)
 	}
 
 	/*
-	 * Switch back to the originam memory context, and destroy the one we
+	 * Fill empty ranges at the end, for all ranges missing in the tuplesort.
+	 *
+	 * Starting from here, prevblkno is the to-be-inserted range's start block
+	 * number. Note that we don't fill in the relation's last page range.
+	 */
+	if (prevblkno == InvalidBlockNumber)
+		prevblkno = 0;
+	else
+		prevblkno += state->bs_pagesPerRange;
+
+	while (prevblkno + state->bs_pagesPerRange < state->bs_tablePages)
+	{
+		/* Did we already build the empty range? If not, do it now. */
+		emptyTuple = brin_init_empty_tuple(state, prevblkno, emptyTuple, &emptySize);
+
+		/* Insert the missing range */
+		brin_doinsert(state->bs_irel, state->bs_pagesPerRange, state->bs_rmAccess,
+					  &state->bs_currentInsertBuf,
+					  emptyTuple->bt_blkno, emptyTuple, emptySize);
+
+		/* ... and update to the next range's block number */
+		prevblkno += state->bs_pagesPerRange;
+	}
+
+	/*
+	 * Switch back to the original memory context, and destroy the one we
 	 * created to isolate the union_tuple calls.
 	 */
 	MemoryContextSwitchTo(oldCxt);
@@ -2897,3 +2889,31 @@ _brin_parallel_build_main(dsm_segment *seg, shm_toc *toc)
 	index_close(indexRel, indexLockmode);
 	table_close(heapRel, heapLockmode);
 }
+
+/*
+ * brin_init_empty_tuple
+ *		Maybe initialize a BRIN tuple representing empty range.
+ *
+ * If emptyTuple is NULL, initializes new tuple representing empty range at
+ * block blkno. Otherwise the tuple is reused, and only the bt_blkno field
+ * is updated.
+ */
+static BrinTuple *
+brin_init_empty_tuple(BrinBuildState *state, BlockNumber blkno,
+					  BrinTuple *emptyTuple, Size *emptySize)
+{
+	/* Did we already build the empty range? If not, do it now. */
+	if (emptyTuple == NULL)
+	{
+		BrinMemTuple *dtuple = brin_new_memtuple(state->bs_bdesc);
+
+		emptyTuple = brin_form_tuple(state->bs_bdesc, blkno, dtuple, emptySize);
+	}
+	else
+	{
+		/* we already have an "empty range" tuple, just set the block */
+		emptyTuple->bt_blkno = blkno;
+	}
+
+	return emptyTuple;
+}
-- 
2.41.0

From 7411a50e5789886f257934df74e89466773cbcdd Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.von...@postgresql.org>
Date: Sat, 2 Dec 2023 20:02:09 +0100
Subject: [PATCH v8 5/6] simplify filling empty ranges

---
 src/backend/access/brin/brin.c | 151 +++++++++++++++++++++------------
 1 file changed, 97 insertions(+), 54 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 00b437d9c1f..84bce1220a0 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -165,7 +165,7 @@ typedef struct BrinBuildState
 	double		bs_numtuples;
 	double		bs_reltuples;
 	Buffer		bs_currentInsertBuf;
-	BlockNumber	bs_tablePages;
+	BlockNumber	bs_maxRangeStart;
 	BlockNumber bs_pagesPerRange;
 	BlockNumber bs_currRangeStart;
 	BrinRevmap *bs_rmAccess;
@@ -223,6 +223,10 @@ static bool add_values_to_range(Relation idxRel, BrinDesc *bdesc,
 static bool check_null_keys(BrinValues *bval, ScanKey *nullkeys, int nnullkeys);
 static BrinTuple *brin_init_empty_tuple(BrinBuildState *state, BlockNumber blkno,
 										BrinTuple *emptyTuple, Size *emptySize);
+static void brin_fill_empty_ranges(BrinBuildState *state,
+								   BlockNumber prevRange, BlockNumber maxRange,
+								   BrinTuple **emptyTuple, Size *emptySize);
+static BlockNumber brin_next_range(BrinBuildState *state, BlockNumber blkno);
 
 /* parallel index builds */
 static void _brin_begin_parallel(BrinBuildState *buildstate, Relation heap, Relation index,
@@ -1087,8 +1091,7 @@ brinbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	BrinRevmap *revmap;
 	BrinBuildState *state;
 	Buffer		meta;
-	BlockNumber pagesPerRange,
-				tablePages;
+	BlockNumber pagesPerRange;
 
 	/*
 	 * We expect to be called exactly once for any index relation.
@@ -1135,8 +1138,8 @@ brinbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	 * Initialize our state, including the deformed tuple state.
 	 */
 	revmap = brinRevmapInitialize(index, &pagesPerRange);
-	tablePages = RelationGetNumberOfBlocks(heap);
-	state = initialize_brin_buildstate(index, revmap, pagesPerRange, tablePages);
+	state = initialize_brin_buildstate(index, revmap, pagesPerRange,
+									   RelationGetNumberOfBlocks(heap));
 
 	state->bs_spool = (BrinSpool *) palloc0(sizeof(BrinSpool));
 	state->bs_spool->heap = heap;
@@ -1207,14 +1210,19 @@ brinbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	}
 	else						/* no parallel index build */
 	{
+		BrinTuple  *emptyTuple = NULL;
+		Size		emptySize;
+
 		reltuples = table_index_build_scan(heap, index, indexInfo, false, true,
 										   brinbuildCallback, (void *) state, NULL);
 
-		/* process the final batch */
+		/* process the final batch
+		 *
+		 * XXX Note this does not update state->bs_currRangeStart, i.e.
+		 * it stays set to the last range added to the index.
+		 */
 		form_and_insert_tuple(state);
 
-		/* XXX shouldn't this happen in the brinbuildCallback? */
-		state->bs_currRangeStart += state->bs_pagesPerRange;
 		/*
 		 * Backfill the final ranges with empty data.
 		 *
@@ -1222,14 +1230,10 @@ brinbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 		 * index is built on stupid index quals like WHERE (nonnull_column IS
 		 * NULL).
 		 */
-		while (state->bs_currRangeStart + state->bs_pagesPerRange - 1 < state->bs_tablePages)
-		{
-			brin_memtuple_initialize(state->bs_dtuple, state->bs_bdesc);
-
-			form_and_insert_tuple(state);
-
-			state->bs_currRangeStart += state->bs_pagesPerRange;
-		}
+		brin_fill_empty_ranges(state,
+							   state->bs_currRangeStart,
+							   state->bs_maxRangeStart,
+							   &emptyTuple, &emptySize);
 
 		/* track the number of relation tuples */
 		state->bs_reltuples = reltuples;
@@ -1660,7 +1664,6 @@ initialize_brin_buildstate(Relation idxRel, BrinRevmap *revmap,
 	state->bs_reltuples = 0;
 	state->bs_currentInsertBuf = InvalidBuffer;
 	state->bs_pagesPerRange = pagesPerRange;
-	state->bs_tablePages = tablePages;
 	state->bs_currRangeStart = 0;
 	state->bs_rmAccess = revmap;
 	state->bs_bdesc = brin_build_desc(idxRel);
@@ -1669,6 +1672,19 @@ initialize_brin_buildstate(Relation idxRel, BrinRevmap *revmap,
 	state->bs_worker_id = 0;
 	state->bs_spool = NULL;
 
+	/*
+	 * Calculate the start of the last page range. Page numbers are 0-based,
+	 * so to get the index of the last page we need to subtract one. Then the
+	 * integer division gives us the proper 0-based range index.
+	 */
+	state->bs_maxRangeStart = ((tablePages - 1) / pagesPerRange) * pagesPerRange;
+
+	/*
+	 * But, we actually need the start of the next range, or InvalidBlockNumber
+	 * if it would overflow.
+	 */
+	state->bs_maxRangeStart = brin_next_range(state, state->bs_maxRangeStart);
+
 	return state;
 }
 
@@ -2614,19 +2630,8 @@ _brin_end_parallel(BrinLeader *brinleader, BrinBuildState *state)
 		}
 
 		/* Fill empty ranges for all ranges missing in the tuplesort. */
-		prevblkno = (prevblkno == InvalidBlockNumber) ? 0 : prevblkno;
-		while (prevblkno + state->bs_pagesPerRange < btup->bt_blkno)
-		{
-			/* the missing range */
-			prevblkno += state->bs_pagesPerRange;
-
-			/* Did we already build the empty range? If not, do it now. */
-			emptyTuple = brin_init_empty_tuple(state, prevblkno, emptyTuple, &emptySize);
-
-			brin_doinsert(state->bs_irel, state->bs_pagesPerRange, state->bs_rmAccess,
-						  &state->bs_currentInsertBuf,
-						  emptyTuple->bt_blkno, emptyTuple, emptySize);
-		}
+		brin_fill_empty_ranges(state, prevblkno, btup->bt_blkno,
+							   &emptyTuple, &emptySize);
 
 		prevblkno = btup->bt_blkno;
 	}
@@ -2648,30 +2653,9 @@ _brin_end_parallel(BrinLeader *brinleader, BrinBuildState *state)
 		pfree(tmp);
 	}
 
-	/*
-	 * Fill empty ranges at the end, for all ranges missing in the tuplesort.
-	 *
-	 * Starting from here, prevblkno is the to-be-inserted range's start block
-	 * number. Note that we don't fill in the relation's last page range.
-	 */
-	if (prevblkno == InvalidBlockNumber)
-		prevblkno = 0;
-	else
-		prevblkno += state->bs_pagesPerRange;
-
-	while (prevblkno + state->bs_pagesPerRange < state->bs_tablePages)
-	{
-		/* Did we already build the empty range? If not, do it now. */
-		emptyTuple = brin_init_empty_tuple(state, prevblkno, emptyTuple, &emptySize);
-
-		/* Insert the missing range */
-		brin_doinsert(state->bs_irel, state->bs_pagesPerRange, state->bs_rmAccess,
-					  &state->bs_currentInsertBuf,
-					  emptyTuple->bt_blkno, emptyTuple, emptySize);
-
-		/* ... and update to the next range's block number */
-		prevblkno += state->bs_pagesPerRange;
-	}
+	/* Fill empty ranges at the end, for all ranges missing in the tuplesort. */
+	brin_fill_empty_ranges(state, prevblkno, state->bs_maxRangeStart,
+						   &emptyTuple, &emptySize);
 
 	/*
 	 * Switch back to the original memory context, and destroy the one we
@@ -2917,3 +2901,62 @@ brin_init_empty_tuple(BrinBuildState *state, BlockNumber blkno,
 
 	return emptyTuple;
 }
+
+/*
+ * brin_fill_empty_ranges
+ *		Add BRIN index tuples representing empty page ranges.
+ *
+ * prevRange/nextRange determine the for which page ranges to add the empty
+ * summaries, and both are exclusive. That is, only ranges starting at blkno
+ * for which (prevRange < blkno < nextRange) will be added to the index.
+ *
+ * Both values may be InvalidBlockNumber. For prevRange this means there is
+ * no previous range, so the first range inserted should be for blkno=0. When
+ * nextRange is InvalidBlockNumber, it means the table is large enough for
+ * the blkno to overflow.
+ *
+ * The empty tuple is built only once when needed, and then kept and reused
+ * for all future calls.
+ */
+static void
+brin_fill_empty_ranges(BrinBuildState *state,
+					   BlockNumber prevRange, BlockNumber maxRange,
+					   BrinTuple **emptyTuple, Size *emptySize)
+{
+	BlockNumber	blkno;
+
+	/*
+	 * If we already summarized some ranges, we need to start with the next one.
+	 * Otherwise we need to start from the first range of the table.
+	 */
+	blkno = (prevRange == InvalidBlockNumber) ? 0 : brin_next_range(state, prevRange);
+
+	/*
+	 * Generate empty ranges until we hit the next non-empty range or summarize
+	 * the last range of the table.
+	 */
+	while (blkno < maxRange)
+	{
+		/* Did we already build the empty tuple? If not, do it now. */
+		*emptyTuple = brin_init_empty_tuple(state, blkno, *emptyTuple, emptySize);
+
+		brin_doinsert(state->bs_irel, state->bs_pagesPerRange, state->bs_rmAccess,
+					  &state->bs_currentInsertBuf,
+					  (*emptyTuple)->bt_blkno, *emptyTuple, *emptySize);
+
+		/* try next page range */
+		blkno = brin_next_range(state, blkno);
+	}
+}
+
+static BlockNumber
+brin_next_range(BrinBuildState *state, BlockNumber blkno)
+{
+	BlockNumber	ret = (blkno + state->bs_pagesPerRange);
+
+	/* overflow */
+	if (ret < blkno)
+		ret = InvalidBlockNumber;
+
+	return ret;
+}
-- 
2.41.0

From b93f75f8954c2e6b1170a61a52fae35cb9dbcf95 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.von...@postgresql.org>
Date: Sun, 3 Dec 2023 12:59:25 +0100
Subject: [PATCH v8 6/6] memcontext emptyTuple

---
 src/backend/access/brin/brin.c | 60 ++++++++++++++++++----------------
 1 file changed, 31 insertions(+), 29 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 84bce1220a0..af0453dbdbc 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -172,6 +172,10 @@ typedef struct BrinBuildState
 	BrinDesc   *bs_bdesc;
 	BrinMemTuple *bs_dtuple;
 
+	BrinTuple  *bs_emptyTuple;
+	Size		bs_emptyTupleLen;
+	MemoryContext bs_context;
+
 	/*
 	 * bs_leader is only present when a parallel index build is performed, and
 	 * only in the leader process. (Actually, only the leader process has a
@@ -221,11 +225,8 @@ static void brin_vacuum_scan(Relation idxrel, BufferAccessStrategy strategy);
 static bool add_values_to_range(Relation idxRel, BrinDesc *bdesc,
 								BrinMemTuple *dtup, const Datum *values, const bool *nulls);
 static bool check_null_keys(BrinValues *bval, ScanKey *nullkeys, int nnullkeys);
-static BrinTuple *brin_init_empty_tuple(BrinBuildState *state, BlockNumber blkno,
-										BrinTuple *emptyTuple, Size *emptySize);
 static void brin_fill_empty_ranges(BrinBuildState *state,
-								   BlockNumber prevRange, BlockNumber maxRange,
-								   BrinTuple **emptyTuple, Size *emptySize);
+								   BlockNumber prevRange, BlockNumber maxRange);
 static BlockNumber brin_next_range(BrinBuildState *state, BlockNumber blkno);
 
 /* parallel index builds */
@@ -1210,9 +1211,6 @@ brinbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	}
 	else						/* no parallel index build */
 	{
-		BrinTuple  *emptyTuple = NULL;
-		Size		emptySize;
-
 		reltuples = table_index_build_scan(heap, index, indexInfo, false, true,
 										   brinbuildCallback, (void *) state, NULL);
 
@@ -1232,8 +1230,7 @@ brinbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 		 */
 		brin_fill_empty_ranges(state,
 							   state->bs_currRangeStart,
-							   state->bs_maxRangeStart,
-							   &emptyTuple, &emptySize);
+							   state->bs_maxRangeStart);
 
 		/* track the number of relation tuples */
 		state->bs_reltuples = reltuples;
@@ -1671,6 +1668,9 @@ initialize_brin_buildstate(Relation idxRel, BrinRevmap *revmap,
 	state->bs_leader = NULL;
 	state->bs_worker_id = 0;
 	state->bs_spool = NULL;
+	state->bs_context = CurrentMemoryContext;
+	state->bs_emptyTuple = NULL;
+	state->bs_emptyTupleLen = 0;
 
 	/*
 	 * Calculate the start of the last page range. Page numbers are 0-based,
@@ -2515,8 +2515,6 @@ _brin_end_parallel(BrinLeader *brinleader, BrinBuildState *state)
 	Size		tuplen;
 	BrinShared *brinshared = brinleader->brinshared;
 	BlockNumber prevblkno = InvalidBlockNumber;
-	BrinTuple  *emptyTuple = NULL;
-	Size		emptySize;
 	BrinSpool  *spool;
 	MemoryContext rangeCxt,
 				oldCxt;
@@ -2630,8 +2628,7 @@ _brin_end_parallel(BrinLeader *brinleader, BrinBuildState *state)
 		}
 
 		/* Fill empty ranges for all ranges missing in the tuplesort. */
-		brin_fill_empty_ranges(state, prevblkno, btup->bt_blkno,
-							   &emptyTuple, &emptySize);
+		brin_fill_empty_ranges(state, prevblkno, btup->bt_blkno);
 
 		prevblkno = btup->bt_blkno;
 	}
@@ -2654,8 +2651,7 @@ _brin_end_parallel(BrinLeader *brinleader, BrinBuildState *state)
 	}
 
 	/* Fill empty ranges at the end, for all ranges missing in the tuplesort. */
-	brin_fill_empty_ranges(state, prevblkno, state->bs_maxRangeStart,
-						   &emptyTuple, &emptySize);
+	brin_fill_empty_ranges(state, prevblkno, state->bs_maxRangeStart);
 
 	/*
 	 * Switch back to the original memory context, and destroy the one we
@@ -2875,31 +2871,38 @@ _brin_parallel_build_main(dsm_segment *seg, shm_toc *toc)
 }
 
 /*
- * brin_init_empty_tuple
+ * brin_build_empty_tuple
  *		Maybe initialize a BRIN tuple representing empty range.
  *
  * If emptyTuple is NULL, initializes new tuple representing empty range at
  * block blkno. Otherwise the tuple is reused, and only the bt_blkno field
  * is updated.
  */
-static BrinTuple *
-brin_init_empty_tuple(BrinBuildState *state, BlockNumber blkno,
-					  BrinTuple *emptyTuple, Size *emptySize)
+static void
+brin_build_empty_tuple(BrinBuildState *state, BlockNumber blkno)
 {
 	/* Did we already build the empty range? If not, do it now. */
-	if (emptyTuple == NULL)
+	if (state->bs_emptyTuple == NULL)
 	{
-		BrinMemTuple *dtuple = brin_new_memtuple(state->bs_bdesc);
+		MemoryContext	oldcxt;
+		BrinMemTuple   *dtuple = brin_new_memtuple(state->bs_bdesc);
+
+		/*
+		 * Make sure to allocate the tuple in context that lasts for the
+		 * whole index build.
+		 */
+		oldcxt = MemoryContextSwitchTo(state->bs_context);
 
-		emptyTuple = brin_form_tuple(state->bs_bdesc, blkno, dtuple, emptySize);
+		state->bs_emptyTuple = brin_form_tuple(state->bs_bdesc, blkno, dtuple,
+											   &state->bs_emptyTupleLen);
+
+		MemoryContextSwitchTo(oldcxt);
 	}
 	else
 	{
 		/* we already have an "empty range" tuple, just set the block */
-		emptyTuple->bt_blkno = blkno;
+		state->bs_emptyTuple->bt_blkno = blkno;
 	}
-
-	return emptyTuple;
 }
 
 /*
@@ -2920,8 +2923,7 @@ brin_init_empty_tuple(BrinBuildState *state, BlockNumber blkno,
  */
 static void
 brin_fill_empty_ranges(BrinBuildState *state,
-					   BlockNumber prevRange, BlockNumber maxRange,
-					   BrinTuple **emptyTuple, Size *emptySize)
+					   BlockNumber prevRange, BlockNumber maxRange)
 {
 	BlockNumber	blkno;
 
@@ -2938,11 +2940,11 @@ brin_fill_empty_ranges(BrinBuildState *state,
 	while (blkno < maxRange)
 	{
 		/* Did we already build the empty tuple? If not, do it now. */
-		*emptyTuple = brin_init_empty_tuple(state, blkno, *emptyTuple, emptySize);
+		brin_build_empty_tuple(state, blkno);
 
 		brin_doinsert(state->bs_irel, state->bs_pagesPerRange, state->bs_rmAccess,
 					  &state->bs_currentInsertBuf,
-					  (*emptyTuple)->bt_blkno, *emptyTuple, *emptySize);
+					  blkno, state->bs_emptyTuple, state->bs_emptyTupleLen);
 
 		/* try next page range */
 		blkno = brin_next_range(state, blkno);
-- 
2.41.0

Attachment: run.sh
Description: application/shellscript

Reply via email to