Re: Yet another fast GiST build

Heikki Linnakangas Wed, 09 Sep 2020 08:40:12 -0700

On 09/09/2020 15:20, Darafei "Komяpa" Praliaskouski wrote:

On Wed, Sep 9, 2020 at 3:09 PM Heikki Linnakangas <[email protected]> wrote:

Come to think of it, the point z-order comparator could benefit a lot
from key abbreviation, too. You could do the point -> zorder conversion
in the abbreviation routine.


That's how it works in PostGIS, only that we moved to more
effecient Hilbert curve:
https://github.com/postgis/postgis/blob/54399b9f6b0f02e8db9444f9f042b8d4ca6d4fa4/postgis/lwgeom_btree.c#L171


Thanks, that's interesting.

I implemented the abbreviated keys for the point opclass, too, andnoticed that the patch as it was never used it. I reworked the patch sothat tuplesort_begin_index_gist() is responsible for looking up thesortsupport function, like tuplesort_begin_index_btree() does, and usesabbreviation when possible.

I think this is pretty much ready for commit now. I'll do a bit moretesting (do we have regression test coverage for this?), also on aSIZEOF_DATUM==4 system since the abbreviation works differently withthat, and push if nothing new comes up. And clarify the documentationand/or comments that the sortsupport function sees "compressed" values.

I wonder if we could use sorting to also speed up building tsvectorindexes? The values stored there are bit signatures, what would be agood sort order for those?


- Heikki

>From 3a4d9c14631ae54b983d75433e0286f3dfedf432 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <[email protected]>
Date: Wed, 9 Sep 2020 18:33:18 +0300
Subject: [PATCH v17 1/1] Add sort support for point gist_point_sortsupport

Implement GiST build using sort support

We use special sorting function provided by opclass to approximate
GiST tree with B-tree-like structure. This approach allows to
radically reduce build time in some cases.

Author: Andrey Borodin
Reviewed-by: Pavel Borisov, Thomas Munro
Discussion: https://www.postgresql.org/message-id/1A36620E-CAD8-4267-9067-FB31385E7C0D%40yandex-team.ru
---
 doc/src/sgml/gist.sgml                  |  66 ++++
 src/backend/access/gist/gistbuild.c     | 499 ++++++++++++++++++++----
 src/backend/access/gist/gistproc.c      | 224 +++++++++++
 src/backend/access/gist/gistutil.c      |  53 ++-
 src/backend/access/gist/gistvalidate.c  |   6 +-
 src/backend/access/transam/xloginsert.c |  57 +++
 src/backend/utils/sort/sortsupport.c    |  34 ++
 src/backend/utils/sort/tuplesort.c      |  57 +++
 src/include/access/gist.h               |   3 +-
 src/include/access/gist_private.h       |   3 +
 src/include/access/xloginsert.h         |   2 +
 src/include/catalog/catversion.h        |   1 +
 src/include/catalog/pg_amproc.dat       |   2 +
 src/include/catalog/pg_proc.dat         |   3 +
 src/include/utils/sortsupport.h         |   1 +
 src/include/utils/tuplesort.h           |   4 +
 16 files changed, 912 insertions(+), 103 deletions(-)

diff --git a/doc/src/sgml/gist.sgml b/doc/src/sgml/gist.sgml
index f9226e7a35c..b049094c811 100644
--- a/doc/src/sgml/gist.sgml
+++ b/doc/src/sgml/gist.sgml
@@ -259,6 +259,8 @@ CREATE INDEX ON my_table USING GIST (my_inet_column inet_ops);
    <function>compress</function> method is omitted. The optional tenth method
    <function>options</function> is needed if the operator class provides
    the user-specified parameters.
+   The <function>sortsupport</function> method is also optional and is used to speed up
+   building a <acronym>GiST</acronym> index.
  </para>
 
  <variablelist>
@@ -1065,6 +1067,70 @@ my_compress(PG_FUNCTION_ARGS)
       </para>
      </listitem>
     </varlistentry>
+
+    <varlistentry>
+     <term><function>sortsupport</function></term>
+     <listitem>
+      <para>
+       Returns a comparator function to sort data in a way that preserves
+       locality. It is used by <command>CREATE INDEX</command> and
+       <command>REINDEX</command>. The quality of the created index depends on
+       how well the sort order determined by the comparator routine preserves
+       locality of the inputs.
+      </para>
+      <para>
+       The <function>sortsupport</function> method is optional. If it is not
+       provided, <command>CREATE INDEX</command> builds the index by inserting
+       each tuple to the tree using the <function>penalty</function> and
+       <function>picksplit</function> functions, which is much slower.
+      </para>
+
+      <para>
+       The <acronym>SQL</acronym> declaration of the function must look like
+       this:
+
+<programlisting>
+CREATE OR REPLACE FUNCTION my_sortsupport(internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+</programlisting>
+
+       The argument is a pointer to a <structname>SortSupport</structname>
+       struct. At a minimum, the function must fill in its comparator field,
+       the full API is defined in
+       <filename>src/include/utils/sortsupport.h</filename>.
+       </para>
+
+       <para>
+        The matching code in the C module could then follow this skeleton:
+
+<programlisting>
+PG_FUNCTION_INFO_V1(my_sortsupport);
+
+static int
+my_fastcmp(Datum x, Datum y, SortSupport ssup)
+{
+  /* establish order between x and y by computing some sorting value z */
+
+  int z1 = ComputeSpatialCode(x);
+  int z2 = ComputeSpatialCode(y);
+
+  return z1 == z2 ? 0 : z1 > z2 ? 1 : -1;
+}
+
+Datum
+my_sortsupport(PG_FUNCTION_ARGS)
+{
+  SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
+
+  ssup->comparator = my_fastcmp;
+  PG_RETURN_VOID();
+}
+</programlisting>
+      </para>
+     </listitem>
+    </varlistentry>
   </variablelist>
 
   <para>
diff --git a/src/backend/access/gist/gistbuild.c b/src/backend/access/gist/gistbuild.c
index 671b5e9186f..826de23e41e 100644
--- a/src/backend/access/gist/gistbuild.c
+++ b/src/backend/access/gist/gistbuild.c
@@ -4,6 +4,24 @@
  *	  build algorithm for GiST indexes implementation.
  *
  *
+ * There are two different strategies:
+ *
+ * 1. Sort all input tuples, pack the tuple into GiST pages in the sorted
+ *    order. This builds the index from the bottom up, similar to how the
+ *    B-tree build works.
+ *
+ * 2. Start with an empty index, and insert all tuples one by one.
+ *
+ * The sorted method is used if the operator classes for all columns have
+ * a 'sortsupport' defined. Otherwise, we resort to the second strategy.
+ *
+ * The second strategy can optionally use buffers at different levels of
+ * the tree to reduce I/O, see "Buffering build algorithm" in the README
+ * for a more detailed explanation. It initially calls insert over and over,
+ * but switches to the buffered algorithm after a certain number of tuples
+ * (unless buffering mode is disabled).
+ *
+
  * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
@@ -28,6 +46,7 @@
 #include "storage/smgr.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
+#include "utils/tuplesort.h"
 
 /* Step of index tuples for check whether to switch to buffering build mode */
 #define BUFFERING_MODE_SWITCH_CHECK_STEP 256
@@ -40,8 +59,14 @@
  */
 #define BUFFERING_MODE_TUPLE_SIZE_STATS_TARGET 4096
 
+/*
+ * Strategy used to build the index. It can change between the
+ * GIST_BUFFERING_* modes on the fly, but if the Sorted method is used,
+ * that needs to be decided up-front and cannot be changed afterwards.
+ */
 typedef enum
 {
+	GIST_SORTED_BUILD,			/* bottom-up build by sorting */
 	GIST_BUFFERING_DISABLED,	/* in regular build mode and aren't going to
 								 * switch */
 	GIST_BUFFERING_AUTO,		/* in regular build mode, but will switch to
@@ -51,7 +76,7 @@ typedef enum
 								 * before switching to the buffering build
 								 * mode */
 	GIST_BUFFERING_ACTIVE		/* in buffering build mode */
-} GistBufferingMode;
+} GistBuildMode;
 
 /* Working state for gistbuild and its callback */
 typedef struct
@@ -60,23 +85,57 @@ typedef struct
 	Relation	heaprel;
 	GISTSTATE  *giststate;
 
-	int64		indtuples;		/* number of tuples indexed */
-	int64		indtuplesSize;	/* total size of all indexed tuples */
-
 	Size		freespace;		/* amount of free space to leave on pages */
 
+	GistBuildMode buildMode;
+
+	int64		indtuples;		/* number of tuples indexed */
+
 	/*
 	 * Extra data structures used during a buffering build. 'gfbb' contains
 	 * information related to managing the build buffers. 'parentMap' is a
 	 * lookup table of the parent of each internal page.
 	 */
+	int64		indtuplesSize;	/* total size of all indexed tuples */
 	GISTBuildBuffers *gfbb;
 	HTAB	   *parentMap;
 
-	GistBufferingMode bufferingMode;
+	/*
+	 * Extra data structures used during a sorting build.
+	 */
+	Tuplesortstate *sortstate;	/* state data for tuplesort.c */
+
+	BlockNumber pages_allocated;
+	BlockNumber pages_written;
+
+	int			ready_num_pages;
+	BlockNumber ready_blknos[XLR_MAX_BLOCK_ID];
+	Page		ready_pages[XLR_MAX_BLOCK_ID];
 } GISTBuildState;
 
+/*
+ * In sorted build, we use a stack of these structs, one for each level,
+ * to pack the tuples in pages.
+ */
+typedef struct GistSortedBuildPageState
+{
+	Page		page;
+	struct GistSortedBuildPageState *parent;	/* Upper level, if any */
+} GistSortedBuildPageState;
+
 /* prototypes for private functions */
+
+static void gistSortedBuildCallback(Relation index, ItemPointer tid,
+									Datum *values, bool *isnull,
+									bool tupleIsAlive, void *state);
+static void gist_indexsortbuild(GISTBuildState *state);
+static void gist_indexsortbuild_pagestate_add(GISTBuildState *state,
+											  GistSortedBuildPageState *pagestate,
+											  IndexTuple itup);
+static void gist_indexsortbuild_pagestate_flush(GISTBuildState *state,
+												GistSortedBuildPageState *pagestate);
+static void gist_indexsortbuild_flush_ready_pages(GISTBuildState *state);
+
 static void gistInitBuffering(GISTBuildState *buildstate);
 static int	calculatePagesPerBuffer(GISTBuildState *buildstate, int levelStep);
 static void gistBuildCallback(Relation index,
@@ -107,10 +166,9 @@ static void gistMemorizeParent(GISTBuildState *buildstate, BlockNumber child,
 static void gistMemorizeAllDownlinks(GISTBuildState *buildstate, Buffer parent);
 static BlockNumber gistGetParent(GISTBuildState *buildstate, BlockNumber child);
 
+
 /*
- * Main entry point to GiST index build. Initially calls insert over and over,
- * but switches to more efficient buffering build algorithm after a certain
- * number of tuples (unless buffering mode is disabled).
+ * Main entry point to GiST index build.
  */
 IndexBuildResult *
 gistbuild(Relation heap, Relation index, IndexInfo *indexInfo)
@@ -118,124 +176,397 @@ gistbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	IndexBuildResult *result;
 	double		reltuples;
 	GISTBuildState buildstate;
-	Buffer		buffer;
-	Page		page;
 	MemoryContext oldcxt = CurrentMemoryContext;
 	int			fillfactor;
+	Oid			SortSupportFnOids[INDEX_MAX_KEYS];
+	bool		hasallsortsupports;
+	int			keyscount = IndexRelationGetNumberOfKeyAttributes(index);
+	GiSTOptions *options = NULL;
+
+	/*
+	 * We expect to be called exactly once for any index relation. If that's
+	 * not the case, big trouble's what we have.
+	 */
+	if (RelationGetNumberOfBlocks(index) != 0)
+		elog(ERROR, "index \"%s\" already contains data",
+			 RelationGetRelationName(index));
+
+	if (index->rd_options)
+		options = (GiSTOptions *) index->rd_options;
 
 	buildstate.indexrel = index;
 	buildstate.heaprel = heap;
+	buildstate.sortstate = NULL;
+	buildstate.giststate = initGISTstate(index);
 
-	if (index->rd_options)
+	/*
+	 * Create a temporary memory context that is reset once for each tuple
+	 * processed.  (Note: we don't bother to make this a child of the
+	 * giststate's scanCxt, so we have to delete it separately at the end.)
+	 */
+	buildstate.giststate->tempCxt = createTempGistContext();
+
+	/*
+	 * Choose build strategy. If all keys support sorting, do that. Otherwise
+	 * the default strategy is switch to buffering mode when the index grows
+	 * too large to fit in cache.
+	 */
+	hasallsortsupports = true;
+	for (int i = 0; i < keyscount; i++)
 	{
-		/* Get buffering mode from the options string */
-		GiSTOptions *options = (GiSTOptions *) index->rd_options;
+		SortSupportFnOids[i] = index_getprocid(index, i + 1, GIST_SORTSUPPORT_PROC);
+		if (!OidIsValid(SortSupportFnOids[i]))
+		{
+			hasallsortsupports = false;
+			break;
+		}
+	}
 
+	if (hasallsortsupports)
+	{
+		buildstate.buildMode = GIST_SORTED_BUILD;
+	}
+	else if (options)
+	{
 		if (options->buffering_mode == GIST_OPTION_BUFFERING_ON)
-			buildstate.bufferingMode = GIST_BUFFERING_STATS;
+			buildstate.buildMode = GIST_BUFFERING_STATS;
 		else if (options->buffering_mode == GIST_OPTION_BUFFERING_OFF)
-			buildstate.bufferingMode = GIST_BUFFERING_DISABLED;
+			buildstate.buildMode = GIST_BUFFERING_DISABLED;
 		else
-			buildstate.bufferingMode = GIST_BUFFERING_AUTO;
-
-		fillfactor = options->fillfactor;
+			buildstate.buildMode = GIST_BUFFERING_AUTO;
 	}
 	else
 	{
-		/*
-		 * By default, switch to buffering mode when the index grows too large
-		 * to fit in cache.
-		 */
-		buildstate.bufferingMode = GIST_BUFFERING_AUTO;
-		fillfactor = GIST_DEFAULT_FILLFACTOR;
+		buildstate.buildMode = GIST_BUFFERING_AUTO;
 	}
-	/* Calculate target amount of free space to leave on pages */
+
+	/*
+	 * Calculate target amount of free space to leave on pages.
+	 */
+	fillfactor = options ? options->fillfactor : GIST_DEFAULT_FILLFACTOR;
 	buildstate.freespace = BLCKSZ * (100 - fillfactor) / 100;
 
 	/*
-	 * We expect to be called exactly once for any index relation. If that's
-	 * not the case, big trouble's what we have.
+	 * Build the index using the chosen strategy.
 	 */
-	if (RelationGetNumberOfBlocks(index) != 0)
-		elog(ERROR, "index \"%s\" already contains data",
-			 RelationGetRelationName(index));
+	buildstate.indtuples = 0;
+	buildstate.indtuplesSize = 0;
 
-	/* no locking is needed */
-	buildstate.giststate = initGISTstate(index);
+	if (buildstate.buildMode == GIST_SORTED_BUILD)
+	{
+		/*
+		 * Sort all data, build the index from bottom up.
+		 */
+		buildstate.sortstate = tuplesort_begin_index_gist(heap,
+														  index,
+														  maintenance_work_mem,
+														  NULL,
+														  false);
+
+		/* Scan the table, adding all tuples to the tuplesort */
+		reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
+										   gistSortedBuildCallback,
+										   (void *) &buildstate, NULL);
+
+		/*
+		 * Perform the sort and build index pages.
+		 */
+		tuplesort_performsort(buildstate.sortstate);
+
+		gist_indexsortbuild(&buildstate);
+
+		tuplesort_end(buildstate.sortstate);
+	}
+	else
+	{
+		/*
+		 * Initialize an empty index and insert all tuples, possibly using
+		 * buffers on intermediate levels.
+		 */
+		Buffer		buffer;
+		Page		page;
+
+		/* initialize the root page */
+		buffer = gistNewBuffer(index);
+		Assert(BufferGetBlockNumber(buffer) == GIST_ROOT_BLKNO);
+		page = BufferGetPage(buffer);
+
+		START_CRIT_SECTION();
+
+		GISTInitBuffer(buffer, F_LEAF);
+
+		MarkBufferDirty(buffer);
+		PageSetLSN(page, GistBuildLSN);
+
+		UnlockReleaseBuffer(buffer);
+
+		END_CRIT_SECTION();
+
+		/* Scan the table, inserting all the tuples to the index. */
+		reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
+										   gistBuildCallback,
+										   (void *) &buildstate, NULL);
+
+		/*
+		 * If buffering was used, flush out all the tuples that are still in
+		 * the buffers.
+		 */
+		if (buildstate.buildMode == GIST_BUFFERING_ACTIVE)
+		{
+			elog(DEBUG1, "all tuples processed, emptying buffers");
+			gistEmptyAllBuffers(&buildstate);
+			gistFreeBuildBuffers(buildstate.gfbb);
+		}
+
+		/*
+		 * We didn't write WAL records as we built the index, so if
+		 * WAL-logging is required, write all pages to the WAL now.
+		 */
+		if (RelationNeedsWAL(index))
+		{
+			log_newpage_range(index, MAIN_FORKNUM,
+							  0, RelationGetNumberOfBlocks(index),
+							  true);
+		}
+	}
+
+	/* okay, all heap tuples are indexed */
+	MemoryContextSwitchTo(oldcxt);
+	MemoryContextDelete(buildstate.giststate->tempCxt);
+
+	freeGISTstate(buildstate.giststate);
 
 	/*
-	 * Create a temporary memory context that is reset once for each tuple
-	 * processed.  (Note: we don't bother to make this a child of the
-	 * giststate's scanCxt, so we have to delete it separately at the end.)
+	 * Return statistics
 	 */
-	buildstate.giststate->tempCxt = createTempGistContext();
+	result = (IndexBuildResult *) palloc(sizeof(IndexBuildResult));
 
-	/* initialize the root page */
-	buffer = gistNewBuffer(index);
-	Assert(BufferGetBlockNumber(buffer) == GIST_ROOT_BLKNO);
-	page = BufferGetPage(buffer);
+	result->heap_tuples = reltuples;
+	result->index_tuples = (double) buildstate.indtuples;
+
+	return result;
+}
+
+/*-------------------------------------------------------------------------
+ * Routines for sorted build
+ *-------------------------------------------------------------------------
+ */
+
+/*
+ * Per-tuple callback for table_index_build_scan.
+ */
+static void
+gistSortedBuildCallback(Relation index,
+						ItemPointer tid,
+						Datum *values,
+						bool *isnull,
+						bool tupleIsAlive,
+						void *state)
+{
+	GISTBuildState *buildstate = (GISTBuildState *) state;
+	MemoryContext oldCtx;
+	Datum		compressed_values[INDEX_MAX_KEYS];
 
-	START_CRIT_SECTION();
+	oldCtx = MemoryContextSwitchTo(buildstate->giststate->tempCxt);
 
-	GISTInitBuffer(buffer, F_LEAF);
+	/* Form an index tuple and point it at the heap tuple */
+	gistCompressValues(buildstate->giststate, index,
+					   values, isnull,
+					   true, compressed_values);
 
-	MarkBufferDirty(buffer);
-	PageSetLSN(page, GistBuildLSN);
+	tuplesort_putindextuplevalues(buildstate->sortstate,
+								  buildstate->indexrel,
+								  tid,
+								  compressed_values, isnull);
 
-	UnlockReleaseBuffer(buffer);
+	MemoryContextSwitchTo(oldCtx);
+	MemoryContextReset(buildstate->giststate->tempCxt);
 
-	END_CRIT_SECTION();
+	/* Update tuple count. */
+	buildstate->indtuples += 1;
+}
 
-	/* build the index */
-	buildstate.indtuples = 0;
-	buildstate.indtuplesSize = 0;
+/*
+ * Build GiST index from bottom up from pre-sorted tuples.
+ */
+static void
+gist_indexsortbuild(GISTBuildState *state)
+{
+	IndexTuple	itup;
+	GistSortedBuildPageState *leafstate;
+	GistSortedBuildPageState *pagestate;
+	Page		page;
+
+	state->pages_allocated = 0;
+	state->pages_written = 0;
+	state->ready_num_pages = 0;
 
 	/*
-	 * Do the heap scan.
+	 * Write an empty page as a placeholder for the root page. It will be
+	 * replaced with the real root page at the end.
 	 */
-	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
-									   gistBuildCallback,
-									   (void *) &buildstate, NULL);
+	page = palloc0(BLCKSZ);
+	smgrextend(state->indexrel->rd_smgr, MAIN_FORKNUM, GIST_ROOT_BLKNO,
+			   page, true);
+	state->pages_allocated++;
+	state->pages_written++;
+
+	/* Allocate a scratch page in memory to collect the tuples */
+	leafstate = palloc(sizeof(GistSortedBuildPageState));
+	leafstate->page = page;
+	leafstate->parent = NULL;
+	gistinitpage(page, F_LEAF);
 
 	/*
-	 * If buffering was used, flush out all the tuples that are still in the
-	 * buffers.
+	 * Fill index pages with tuples in the sorted order.
 	 */
-	if (buildstate.bufferingMode == GIST_BUFFERING_ACTIVE)
+	while ((itup = tuplesort_getindextuple(state->sortstate, true)) != NULL)
 	{
-		elog(DEBUG1, "all tuples processed, emptying buffers");
-		gistEmptyAllBuffers(&buildstate);
-		gistFreeBuildBuffers(buildstate.gfbb);
+		gist_indexsortbuild_pagestate_add(state, leafstate, itup);
 	}
 
-	/* okay, all heap tuples are indexed */
-	MemoryContextSwitchTo(oldcxt);
-	MemoryContextDelete(buildstate.giststate->tempCxt);
-
-	freeGISTstate(buildstate.giststate);
-
 	/*
-	 * We didn't write WAL records as we built the index, so if WAL-logging is
-	 * required, write all pages to the WAL now.
+	 * Write out the partially full non-root pages.
+	 *
+	 * Keep in mind that flush can build a new root.
 	 */
-	if (RelationNeedsWAL(index))
+	pagestate = leafstate;
+	while (pagestate->parent != NULL)
 	{
-		log_newpage_range(index, MAIN_FORKNUM,
-						  0, RelationGetNumberOfBlocks(index),
-						  true);
+		GistSortedBuildPageState *parent;
+
+		gist_indexsortbuild_pagestate_flush(state, pagestate);
+		parent = pagestate->parent;
+		pfree(pagestate->page);
+		pfree(pagestate);
+		pagestate = parent;
 	}
 
+	gist_indexsortbuild_flush_ready_pages(state);
+
+	/* Write out the root */
+	smgrwrite(state->indexrel->rd_smgr, MAIN_FORKNUM, GIST_ROOT_BLKNO,
+			  pagestate->page, true);
+	if (RelationNeedsWAL(state->indexrel))
+		log_newpage(&state->indexrel->rd_node, MAIN_FORKNUM, GIST_ROOT_BLKNO,
+					pagestate->page, true);
+
+	pfree(pagestate->page);
+	pfree(pagestate);
+}
+
+/*
+ * Add tuple to a page. If the pages is full, write it out and re-initialize
+ * a new page first.
+ */
+static void
+gist_indexsortbuild_pagestate_add(GISTBuildState *state,
+								  GistSortedBuildPageState * pagestate,
+								  IndexTuple itup)
+{
+	Page		page = pagestate->page;
+
+	/* Does the tuple fit? If not, flush */
+	if (PageGetFreeSpace(page) < IndexTupleSize(itup) + sizeof(ItemIdData) + state->freespace)
+		gist_indexsortbuild_pagestate_flush(state, pagestate);
+
+	gistfillbuffer(page, &itup, 1, InvalidOffsetNumber);
+}
+
+static void
+gist_indexsortbuild_pagestate_flush(GISTBuildState *state,
+									GistSortedBuildPageState * pagestate)
+{
+	GistSortedBuildPageState *parent;
+	IndexTuple *itvec;
+	IndexTuple	union_tuple;
+	int			vect_len;
+	bool		isleaf;
+	BlockNumber blkno;
+
+	if (state->ready_num_pages == XLR_MAX_BLOCK_ID)
+		gist_indexsortbuild_flush_ready_pages(state);
+
 	/*
-	 * Return statistics
+	 * The page is now complete. Assign a block number to it, and add it to
+	 * the list of finished pages. (We don't write it out immediately, because
+	 * we want to WAL-log the pages in batches.)
 	 */
-	result = (IndexBuildResult *) palloc(sizeof(IndexBuildResult));
+	blkno = state->pages_allocated++;
+	state->ready_blknos[state->ready_num_pages] = blkno;
+	state->ready_pages[state->ready_num_pages] = pagestate->page;
+	state->ready_num_pages++;
 
-	result->heap_tuples = reltuples;
-	result->index_tuples = (double) buildstate.indtuples;
+	/* check once per page */
+	CHECK_FOR_INTERRUPTS();
 
-	return result;
+	isleaf = GistPageIsLeaf(pagestate->page);
+
+	/*
+	 * Form a downlink tuple to represent all the tuples on the page.
+	 */
+	itvec = gistextractpage(pagestate->page, &vect_len);
+	union_tuple = gistunion(state->indexrel, itvec, vect_len,
+							state->giststate);
+	ItemPointerSetBlockNumber(&(union_tuple->t_tid), blkno);
+	pfree(itvec);
+
+	/*
+	 * Insert the downlink to the parent page. If this was the root, create a
+	 * new page as the parent, which becomes the new root.
+	 */
+	parent = pagestate->parent;
+	if (parent == NULL)
+	{
+		parent = palloc(sizeof(GistSortedBuildPageState));
+		parent->page = (Page) palloc(BLCKSZ);
+		parent->parent = NULL;
+		gistinitpage(parent->page, 0);
+
+		pagestate->parent = parent;
+	}
+	gist_indexsortbuild_pagestate_add(state, parent, union_tuple);
+	pfree(union_tuple);
+
+	/* Re-initialize the page buffer for next page on this level. */
+	pagestate->page = palloc(BLCKSZ);
+	gistinitpage(pagestate->page, isleaf ? F_LEAF : 0);
 }
 
+static void
+gist_indexsortbuild_flush_ready_pages(GISTBuildState *state)
+{
+	if (state->ready_num_pages == 0)
+		return;
+
+	for (int i = 0; i < state->ready_num_pages; i++)
+	{
+		/*
+		 * currently, the blocks must be buffered in order. Otherwise we
+		 * should do a similar smgrextend/ smgrwrite dance as in nbtsort.c
+		 */
+		if (state->ready_blknos[i] != state->pages_written)
+			elog(ERROR, "unexpected block number to flush GiST sorting build");
+
+		smgrextend(state->indexrel->rd_smgr,
+				   MAIN_FORKNUM,
+				   state->pages_written++,
+				   state->ready_pages[i],
+				   true);
+	}
+
+	if (RelationNeedsWAL(state->indexrel))
+		log_newpages(&state->indexrel->rd_node, MAIN_FORKNUM, state->ready_num_pages,
+					 state->ready_blknos, state->ready_pages, true);
+	state->ready_num_pages = 0;
+}
+
+
+/*-------------------------------------------------------------------------
+ * Routines for non-sorted build
+ *-------------------------------------------------------------------------
+ */
+
 /*
  * Attempt to switch to buffering mode.
  *
@@ -375,7 +706,7 @@ gistInitBuffering(GISTBuildState *buildstate)
 	if (levelStep <= 0)
 	{
 		elog(DEBUG1, "failed to switch to buffered GiST build");
-		buildstate->bufferingMode = GIST_BUFFERING_DISABLED;
+		buildstate->buildMode = GIST_BUFFERING_DISABLED;
 		return;
 	}
 
@@ -392,7 +723,7 @@ gistInitBuffering(GISTBuildState *buildstate)
 
 	gistInitParentMap(buildstate);
 
-	buildstate->bufferingMode = GIST_BUFFERING_ACTIVE;
+	buildstate->buildMode = GIST_BUFFERING_ACTIVE;
 
 	elog(DEBUG1, "switched to buffered GiST build; level step = %d, pagesPerBuffer = %d",
 		 levelStep, pagesPerBuffer);
@@ -453,10 +784,12 @@ gistBuildCallback(Relation index,
 	oldCtx = MemoryContextSwitchTo(buildstate->giststate->tempCxt);
 
 	/* form an index tuple and point it at the heap tuple */
-	itup = gistFormTuple(buildstate->giststate, index, values, isnull, true);
+	itup = gistFormTuple(buildstate->giststate, index,
+						 values, isnull,
+						 true);
 	itup->t_tid = *tid;
 
-	if (buildstate->bufferingMode == GIST_BUFFERING_ACTIVE)
+	if (buildstate->buildMode == GIST_BUFFERING_ACTIVE)
 	{
 		/* We have buffers, so use them. */
 		gistBufferingBuildInsert(buildstate, itup);
@@ -478,7 +811,7 @@ gistBuildCallback(Relation index,
 	MemoryContextSwitchTo(oldCtx);
 	MemoryContextReset(buildstate->giststate->tempCxt);
 
-	if (buildstate->bufferingMode == GIST_BUFFERING_ACTIVE &&
+	if (buildstate->buildMode == GIST_BUFFERING_ACTIVE &&
 		buildstate->indtuples % BUFFERING_MODE_TUPLE_SIZE_STATS_TARGET == 0)
 	{
 		/* Adjust the target buffer size now */
@@ -493,10 +826,10 @@ gistBuildCallback(Relation index,
 	 * To avoid excessive calls to smgrnblocks(), only check this every
 	 * BUFFERING_MODE_SWITCH_CHECK_STEP index tuples
 	 */
-	if ((buildstate->bufferingMode == GIST_BUFFERING_AUTO &&
+	if ((buildstate->buildMode == GIST_BUFFERING_AUTO &&
 		 buildstate->indtuples % BUFFERING_MODE_SWITCH_CHECK_STEP == 0 &&
 		 effective_cache_size < smgrnblocks(index->rd_smgr, MAIN_FORKNUM)) ||
-		(buildstate->bufferingMode == GIST_BUFFERING_STATS &&
+		(buildstate->buildMode == GIST_BUFFERING_STATS &&
 		 buildstate->indtuples >= BUFFERING_MODE_TUPLE_SIZE_STATS_TARGET))
 	{
 		/*
diff --git a/src/backend/access/gist/gistproc.c b/src/backend/access/gist/gistproc.c
index 9ace64c3c4a..ce078426526 100644
--- a/src/backend/access/gist/gistproc.c
+++ b/src/backend/access/gist/gistproc.c
@@ -24,12 +24,17 @@
 #include "utils/builtins.h"
 #include "utils/float.h"
 #include "utils/geo_decls.h"
+#include "utils/sortsupport.h"
 
 
 static bool gist_box_leaf_consistent(BOX *key, BOX *query,
 									 StrategyNumber strategy);
 static bool rtree_internal_consistent(BOX *key, BOX *query,
 									  StrategyNumber strategy);
+static uint64 part_bits32_by2(uint32 x);
+static uint32 ieee_float32_to_uint32(float f);
+static uint64 point_zorder_internal(Point *p);
+
 
 /* Minimum accepted ratio of split */
 #define LIMIT_RATIO 0.3
@@ -1540,3 +1545,222 @@ gist_poly_distance(PG_FUNCTION_ARGS)
 
 	PG_RETURN_FLOAT8(distance);
 }
+
+/* Z-order routines */
+
+/*
+ * Compute Z-order for a point
+ *
+ * Map a two-dimensional point to a single integer, in a way that preserves
+ * locality. Points that are close in the two-dimensional space are mapped to
+ * integer that are not far from each other. We do that by interleaving the
+ * bits in the X and Y components, this is called a Z-order or Morton Code.
+ *
+ * A Morton Code is normally defined only for integers, but the X and Y values
+ * of a point are floating point. We expect floats to be in IEEE format, and
+ * the sort order of IEEE floats is mostly correlated to the binary sort order
+ * of the bits reinterpreted as an int.  It isn't in some special cases, but
+ * for this use case we don't really care about that, we're just trying to
+ * encourage locality.
+ */
+static uint64
+point_zorder_internal(Point *p)
+{
+	uint32		x = ieee_float32_to_uint32(p->x);
+	uint32		y = ieee_float32_to_uint32(p->y);
+
+	/* Interleave the bits */
+	return part_bits32_by2(x) | (part_bits32_by2(y) << 1);
+}
+
+/* Interleave 32 bits with zeroes */
+static uint64
+part_bits32_by2(uint32 x)
+{
+	uint64		n = x;
+
+	n = (n | (n << 16)) & UINT64CONST(0x0000FFFF0000FFFF);
+	n = (n | (n << 8)) & UINT64CONST(0x00FF00FF00FF00FF);
+	n = (n | (n << 4)) & UINT64CONST(0x0F0F0F0F0F0F0F0F);
+	n = (n | (n << 2)) & UINT64CONST(0x3333333333333333);
+	n = (n | (n << 1)) & UINT64CONST(0x5555555555555555);
+
+	return n;
+}
+
+/*
+ * Convert a 32-bit IEEE float to uint32 in a way that preserves the ordering.
+ */
+static uint32
+ieee_float32_to_uint32(float f)
+{
+	/*----
+	 *
+	 * IEEE 754 floating point format
+	 * ------------------------------
+	 *
+	 * IEEE 754 floating point numbers have this format:
+	 *
+	 *   exponent (8 bits)
+	 *   |
+	 * s eeeeeeee mmmmmmmmmmmmmmmmmmmmmmm
+	 * |          |
+	 * sign       mantissa (23 bits)
+	 *
+	 * Infinity has all bits in the exponent set and the mantissa is
+	 * all-zeros. Negative infinity is the same but with the sign bit set.
+	 *
+	 * NaNs are represented with all bits in the exponent set, and the least
+	 * significant bit in the mantissa also set. The rest of the mantissa bits
+	 * can be used to distinguish different kinds of NaNs.
+	 *
+	 * The IEEE format has the nice property that when you take the bit
+	 * representation and interpret it as an integer, the order is preserved,
+	 * except for the sign. That holds for the +-Infinity values too.
+	 *
+	 * Mapping to uint32
+	 * -----------------
+	 *
+	 * In order to have a smooth transition from negative to positive numbers,
+	 * we map floats to unsigned integers like this:
+	 *
+	 * x < 0 to range 0-7FFFFFFF
+	 * x = 0 to value 8000000 (both positive and negative zero)
+	 * x > 0 to range 8000001-FFFFFFFF
+	 *
+	 * We don't care to distinguish different kind of NaNs, so they are all
+	 * mapped to the same arbitrary value, FFFFFFFF. Because of the IEEE bit
+	 * representation of NaNs, there aren't any non-NaN values that would be
+	 * mapped to FFFFFFFF. In fact, there is a range of unused values on both
+	 * ends of the uint32 space.
+	 */
+	if (isnan(f))
+		return 0xFFFFFFFF;
+	else
+	{
+		union
+		{
+			float		f;
+			uint32		i;
+		}			u;
+
+		u.f = f;
+
+		/* Check the sign bit */
+		if ((u.i & 0x80000000) != 0)
+		{
+			/*
+			 * Map the negative value to range 0-7FFFFFFF. This flips the sign
+			 * bit to 0 in the same instruction.
+			 */
+			Assert(f < 0);
+			u.i ^= 0xFFFFFFFF;
+		}
+		else
+		{
+			/* Map the positive value (or 0) to range 80000000-FFFFFFFF */
+			u.i |= 0x80000000;
+		}
+
+		return u.i;
+	}
+}
+
+/*
+ * Compare the Z-order of points.
+ */
+static int
+gist_bbox_zorder_cmp(Datum a, Datum b, SortSupport ssup)
+{
+	Point	   *p1 = &(DatumGetBoxP(a)->low);
+	Point	   *p2 = &(DatumGetBoxP(b)->low);
+	uint64		z1 = point_zorder_internal(p1);
+	uint64		z2 = point_zorder_internal(p2);
+
+	if (z1 > z2)
+		return 1;
+	else if (z1 < z2)
+		return -1;
+	else
+		return 0;
+}
+
+/*
+ * Abbreviated version of Z-order comparison.
+ *
+ * The abbreviated format is a Z-order value computed from the two 32-bit
+ * floats. If SIZEOF_DATUM == 8, the 64-bit Z-order value fits fully in the
+ * abbreviated format, so the "full comparator" can just return 0. This
+ * assumes that the sort code always calls the abbreviated comparator first,
+ * so that if the full comparator is called, we already know that the
+ * abbreviated comparator deemed the values equal. If SIZEOF_DATUM == 4, we
+ * store the upper half of the Z-order value, and the tie-breaker computes all
+ * the bits.
+ */
+static Datum
+gist_bbox_zorder_abbrev_convert(Datum original, SortSupport ssup)
+{
+	Point	   *p = &(DatumGetBoxP(original)->low);
+	uint64		z;
+
+	z = point_zorder_internal(p);
+
+	PG_RETURN_UINT64(z);
+}
+
+static int
+gist_bbox_zorder_cmp_abbrev(Datum a, Datum b, SortSupport ssup)
+{
+	if (z1 > z2)
+		return 1;
+	else if (z1 < z2)
+		return -1;
+	else
+		return 0;
+}
+
+static int
+gist_bbox_zorder_full_cmp(Datum a, Datum b, SortSupport ssup)
+{
+	/*
+	 * On 64-bit systems, the abbreviation is not lossy, so if the
+	 * abbreviated comparator returned 0, the values really are equal.
+	 */
+#if SIZEOF_DATUM == 8
+	return 0;
+#else
+	return gist_bbox_zorder_cmp(a, b, ssup);
+#endif
+}
+
+/*
+ * We never consider aborting the abbreviation. (Should we,
+ * when SIZEOF_DATUM == 4?)
+ */
+static bool
+gist_bbox_zorder_abbrev_abort(int memtupcount, SortSupport ssup)
+{
+	return false;
+}
+
+/*
+ * Sort support routine for fast GiST index build by sorting.
+ */
+Datum
+gist_point_sortsupport(PG_FUNCTION_ARGS)
+{
+	SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
+
+	if (ssup->abbreviate)
+	{
+		ssup->comparator = gist_bbox_zorder_cmp_abbrev;
+		ssup->abbrev_converter = gist_bbox_zorder_abbrev_convert;
+		ssup->abbrev_abort = gist_bbox_zorder_abbrev_abort;
+		ssup->abbrev_full_comparator = gist_bbox_zorder_full_cmp;
+	}
+	else
+	{
+		ssup->comparator = gist_bbox_zorder_cmp;
+	}
+	PG_RETURN_VOID();
+}
diff --git a/src/backend/access/gist/gistutil.c b/src/backend/access/gist/gistutil.c
index 0516059e3dd..615b5ade233 100644
--- a/src/backend/access/gist/gistutil.c
+++ b/src/backend/access/gist/gistutil.c
@@ -572,12 +572,31 @@ gistdentryinit(GISTSTATE *giststate, int nkey, GISTENTRY *e,
 
 IndexTuple
 gistFormTuple(GISTSTATE *giststate, Relation r,
-			  Datum attdata[], bool isnull[], bool isleaf)
+			  Datum *attdata, bool *isnull, bool isleaf)
 {
 	Datum		compatt[INDEX_MAX_KEYS];
-	int			i;
 	IndexTuple	res;
 
+	gistCompressValues(giststate, r, attdata, isnull, isleaf, compatt);
+
+	res = index_form_tuple(isleaf ? giststate->leafTupdesc :
+						   giststate->nonLeafTupdesc,
+						   compatt, isnull);
+
+	/*
+	 * The offset number on tuples on internal pages is unused. For historical
+	 * reasons, it is set to 0xffff.
+	 */
+	ItemPointerSetOffsetNumber(&(res->t_tid), 0xffff);
+	return res;
+}
+
+void
+gistCompressValues(GISTSTATE *giststate, Relation r,
+				   Datum *attdata, bool *isnull, bool isleaf, Datum *compatt)
+{
+	int			i;
+
 	/*
 	 * Call the compress method on each attribute.
 	 */
@@ -617,17 +636,6 @@ gistFormTuple(GISTSTATE *giststate, Relation r,
 				compatt[i] = attdata[i];
 		}
 	}
-
-	res = index_form_tuple(isleaf ? giststate->leafTupdesc :
-						   giststate->nonLeafTupdesc,
-						   compatt, isnull);
-
-	/*
-	 * The offset number on tuples on internal pages is unused. For historical
-	 * reasons, it is set to 0xffff.
-	 */
-	ItemPointerSetOffsetNumber(&(res->t_tid), 0xffff);
-	return res;
 }
 
 /*
@@ -745,14 +753,11 @@ gistpenalty(GISTSTATE *giststate, int attno,
  * Initialize a new index page
  */
 void
-GISTInitBuffer(Buffer b, uint32 f)
+gistinitpage(Page page, uint32 f)
 {
 	GISTPageOpaque opaque;
-	Page		page;
-	Size		pageSize;
+	Size		pageSize = BLCKSZ;
 
-	pageSize = BufferGetPageSize(b);
-	page = BufferGetPage(b);
 	PageInit(page, pageSize, sizeof(GISTPageOpaqueData));
 
 	opaque = GistPageGetOpaque(page);
@@ -763,6 +768,18 @@ GISTInitBuffer(Buffer b, uint32 f)
 	opaque->gist_page_id = GIST_PAGE_ID;
 }
 
+/*
+ * Initialize a new index buffer
+ */
+void
+GISTInitBuffer(Buffer b, uint32 f)
+{
+	Page		page;
+
+	page = BufferGetPage(b);
+	gistinitpage(page, f);
+}
+
 /*
  * Verify that a freshly-read page looks sane.
  */
diff --git a/src/backend/access/gist/gistvalidate.c b/src/backend/access/gist/gistvalidate.c
index 2b9ab693be1..8a14620fab2 100644
--- a/src/backend/access/gist/gistvalidate.c
+++ b/src/backend/access/gist/gistvalidate.c
@@ -143,6 +143,10 @@ gistvalidate(Oid opclassoid)
 			case GIST_OPTIONS_PROC:
 				ok = check_amoptsproc_signature(procform->amproc);
 				break;
+			case GIST_SORTSUPPORT_PROC:
+				ok = check_amproc_signature(procform->amproc, VOIDOID, true,
+											1, 1, INTERNALOID);
+				break;
 			default:
 				ereport(INFO,
 						(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
@@ -263,7 +267,7 @@ gistvalidate(Oid opclassoid)
 			continue;			/* got it */
 		if (i == GIST_DISTANCE_PROC || i == GIST_FETCH_PROC ||
 			i == GIST_COMPRESS_PROC || i == GIST_DECOMPRESS_PROC ||
-			i == GIST_OPTIONS_PROC)
+			i == GIST_OPTIONS_PROC  || i == GIST_SORTSUPPORT_PROC)
 			continue;			/* optional methods */
 		ereport(INFO,
 				(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index c526bb19281..1f0e4e01e69 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -1019,6 +1019,63 @@ log_newpage(RelFileNode *rnode, ForkNumber forkNum, BlockNumber blkno,
 	return recptr;
 }
 
+/*
+ * Like log_newpage(), but allows logging multiple pages in one operation.
+ * It is more efficient than calling log_newpage() for each page separately,
+ * because we can write multiple pages in a single WAL record.
+ */
+void
+log_newpages(RelFileNode *rnode, ForkNumber forkNum, int num_pages,
+			 BlockNumber *blknos, Page *pages, bool page_std)
+{
+	int			flags;
+	XLogRecPtr	recptr;
+	int			i;
+	int			j;
+
+	flags = REGBUF_FORCE_IMAGE;
+	if (page_std)
+		flags |= REGBUF_STANDARD;
+
+	/*
+	 * Iterate over all the pages. They are collected into batches of
+	 * XLR_MAX_BLOCK_ID pages, and a single WAL-record is written for each
+	 * batch.
+	 */
+	XLogEnsureRecordSpace(XLR_MAX_BLOCK_ID - 1, 0);
+
+	i = 0;
+	while (i < num_pages)
+	{
+		int			batch_start = i;
+		int			nbatch;
+
+		XLogBeginInsert();
+
+		nbatch = 0;
+		while (nbatch < XLR_MAX_BLOCK_ID && i < num_pages)
+		{
+			XLogRegisterBlock(nbatch, rnode, forkNum, blknos[i], pages[i], flags);
+			i++;
+			nbatch++;
+		}
+
+		recptr = XLogInsert(RM_XLOG_ID, XLOG_FPI);
+
+		for (j = batch_start; j < i; j++)
+		{
+			/*
+			 * The page may be uninitialized. If so, we can't set the LSN because that
+			 * would corrupt the page.
+			 */
+			if (!PageIsNew(pages[j]))
+			{
+				PageSetLSN(pages[j], recptr);
+			}
+		}
+	}
+}
+
 /*
  * Write a WAL record containing a full image of a page.
  *
diff --git a/src/backend/utils/sort/sortsupport.c b/src/backend/utils/sort/sortsupport.c
index fcfe6e831a1..94a9de317e9 100644
--- a/src/backend/utils/sort/sortsupport.c
+++ b/src/backend/utils/sort/sortsupport.c
@@ -15,6 +15,7 @@
 
 #include "postgres.h"
 
+#include "access/gist.h"
 #include "access/nbtree.h"
 #include "catalog/pg_am.h"
 #include "fmgr.h"
@@ -175,3 +176,36 @@ PrepareSortSupportFromIndexRel(Relation indexRel, int16 strategy,
 
 	FinishSortSupportFunction(opfamily, opcintype, ssup);
 }
+
+/*
+ * Fill in SortSupport given an GiST index relation
+ *
+ * Caller must previously have zeroed the SortSupportData structure and then
+ * filled in ssup_cxt, ssup_attno, ssup_collation, and ssup_nulls_first.  This
+ * will fill in ssup_reverse (always false for GiST index build), as well as
+ * the comparator function pointer.
+ */
+void
+PrepareSortSupportFromGistIndexRel(Relation indexRel, SortSupport ssup)
+{
+	Oid			opfamily = indexRel->rd_opfamily[ssup->ssup_attno - 1];
+	Oid			opcintype = indexRel->rd_opcintype[ssup->ssup_attno - 1];
+	Oid			sortSupportFunction;
+
+	Assert(ssup->comparator == NULL);
+
+	if (indexRel->rd_rel->relam != GIST_AM_OID)
+		elog(ERROR, "unexpected non-gist AM: %u", indexRel->rd_rel->relam);
+	ssup->ssup_reverse = false;
+
+	/*
+	 * Look up the sort support function. This is simpler than for B-tree
+	 * indexes because we don't support the old-style btree comparators.
+	 */
+	sortSupportFunction = get_opfamily_proc(opfamily, opcintype, opcintype,
+											GIST_SORTSUPPORT_PROC);
+	if (!OidIsValid(sortSupportFunction))
+		elog(ERROR, "missing support function %d(%u,%u) in opfamily %u",
+			 GIST_SORTSUPPORT_PROC, opcintype, opcintype, opfamily);
+	OidFunctionCall1(sortSupportFunction, PointerGetDatum(ssup));
+}
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 3c49476483b..fb4ac4d2f9f 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -1167,6 +1167,63 @@ tuplesort_begin_index_hash(Relation heapRel,
 	return state;
 }
 
+Tuplesortstate *
+tuplesort_begin_index_gist(Relation heapRel,
+						   Relation indexRel,
+						   int workMem,
+						   SortCoordinate coordinate,
+						   bool randomAccess)
+{
+	Tuplesortstate *state = tuplesort_begin_common(workMem, coordinate,
+												   randomAccess);
+	MemoryContext oldcontext;
+	int			i;
+
+	oldcontext = MemoryContextSwitchTo(state->sortcontext);
+
+#ifdef TRACE_SORT
+	if (trace_sort)
+		elog(LOG,
+			 "begin index sort: workMem = %d, randomAccess = %c",
+			 workMem, randomAccess ? 't' : 'f');
+#endif
+
+	state->nKeys = IndexRelationGetNumberOfKeyAttributes(indexRel);
+
+	state->comparetup = comparetup_index_btree;
+	state->copytup = copytup_index;
+	state->writetup = writetup_index;
+	state->readtup = readtup_index;
+
+	state->heapRel = heapRel;
+	state->indexRel = indexRel;
+
+	/* Prepare SortSupport data for each column */
+	state->sortKeys = (SortSupport) palloc0(state->nKeys *
+											sizeof(SortSupportData));
+
+	for (i = 0; i < state->nKeys; i++)
+	{
+		SortSupport sortKey = state->sortKeys + i;
+
+		sortKey->ssup_cxt = CurrentMemoryContext;
+		sortKey->ssup_collation = indexRel->rd_indcollation[i];
+		sortKey->ssup_nulls_first = false;
+		sortKey->ssup_attno = i + 1;
+		/* Convey if abbreviation optimization is applicable in principle */
+		sortKey->abbreviate = (i == 0);
+
+		AssertState(sortKey->ssup_attno != 0);
+
+		/* Look for a sort support function */
+		PrepareSortSupportFromGistIndexRel(indexRel, sortKey);
+	}
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return state;
+}
+
 Tuplesortstate *
 tuplesort_begin_datum(Oid datumType, Oid sortOperator, Oid sortCollation,
 					  bool nullsFirstFlag, int workMem,
diff --git a/src/include/access/gist.h b/src/include/access/gist.h
index 4994351697c..4f6dae9a76b 100644
--- a/src/include/access/gist.h
+++ b/src/include/access/gist.h
@@ -37,7 +37,8 @@
 #define GIST_DISTANCE_PROC				8
 #define GIST_FETCH_PROC					9
 #define GIST_OPTIONS_PROC				10
-#define GISTNProcs						10
+#define GIST_SORTSUPPORT_PROC			11
+#define GISTNProcs					11
 
 /*
  * Page opaque data in a GiST index page.
diff --git a/src/include/access/gist_private.h b/src/include/access/gist_private.h
index 02e985549f6..b68c01a5f24 100644
--- a/src/include/access/gist_private.h
+++ b/src/include/access/gist_private.h
@@ -501,12 +501,15 @@ extern IndexTuple gistgetadjusted(Relation r,
 								  GISTSTATE *giststate);
 extern IndexTuple gistFormTuple(GISTSTATE *giststate,
 								Relation r, Datum *attdata, bool *isnull, bool isleaf);
+extern void gistCompressValues(GISTSTATE *giststate, Relation r,
+							   Datum *attdata, bool *isnull, bool isleaf, Datum *compatt);
 
 extern OffsetNumber gistchoose(Relation r, Page p,
 							   IndexTuple it,
 							   GISTSTATE *giststate);
 
 extern void GISTInitBuffer(Buffer b, uint32 f);
+extern void gistinitpage(Page page, uint32 f);
 extern void gistdentryinit(GISTSTATE *giststate, int nkey, GISTENTRY *e,
 						   Datum k, Relation r, Page pg, OffsetNumber o,
 						   bool l, bool isNull);
diff --git a/src/include/access/xloginsert.h b/src/include/access/xloginsert.h
index 63df25ae90f..674c376053e 100644
--- a/src/include/access/xloginsert.h
+++ b/src/include/access/xloginsert.h
@@ -54,6 +54,8 @@ extern bool XLogCheckBufferNeedsBackup(Buffer buffer);
 
 extern XLogRecPtr log_newpage(RelFileNode *rnode, ForkNumber forkNum,
 							  BlockNumber blk, char *page, bool page_std);
+extern void log_newpages(RelFileNode *rnode, ForkNumber forkNum, int num_pages,
+							   BlockNumber *blknos, char **pages, bool page_std);
 extern XLogRecPtr log_newpage_buffer(Buffer buffer, bool page_std);
 extern void log_newpage_range(Relation rel, ForkNumber forkNum,
 							  BlockNumber startblk, BlockNumber endblk, bool page_std);
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 0bbe0a122af..06ddb1f16b4 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -53,6 +53,7 @@
  */
 
 /*							yyyymmddN */
+/* FIXME: bump this before pushing! */
 #define CATALOG_VERSION_NO	202009031
 
 #endif
diff --git a/src/include/catalog/pg_amproc.dat b/src/include/catalog/pg_amproc.dat
index 37b580883fc..a8e0c4ff8a5 100644
--- a/src/include/catalog/pg_amproc.dat
+++ b/src/include/catalog/pg_amproc.dat
@@ -480,6 +480,8 @@
   amproc => 'gist_point_distance' },
 { amprocfamily => 'gist/point_ops', amproclefttype => 'point',
   amprocrighttype => 'point', amprocnum => '9', amproc => 'gist_point_fetch' },
+{ amprocfamily => 'gist/point_ops', amproclefttype => 'point',
+  amprocrighttype => 'point', amprocnum => '11', amproc => 'gist_point_sortsupport' },
 { amprocfamily => 'gist/box_ops', amproclefttype => 'box',
   amprocrighttype => 'box', amprocnum => '1', amproc => 'gist_box_consistent' },
 { amprocfamily => 'gist/box_ops', amproclefttype => 'box',
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 687509ba926..96d7efd4270 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8062,6 +8062,9 @@
   proname => 'gist_poly_distance', prorettype => 'float8',
   proargtypes => 'internal polygon int2 oid internal',
   prosrc => 'gist_poly_distance' },
+{ oid => '3435', descr => 'sort support',
+  proname => 'gist_point_sortsupport', prorettype => 'void',
+  proargtypes => 'internal', prosrc => 'gist_point_sortsupport' },
 
 # GIN array support
 { oid => '2743', descr => 'GIN array support',
diff --git a/src/include/utils/sortsupport.h b/src/include/utils/sortsupport.h
index 264aec820b1..fb262c6e8d4 100644
--- a/src/include/utils/sortsupport.h
+++ b/src/include/utils/sortsupport.h
@@ -272,5 +272,6 @@ extern void PrepareSortSupportComparisonShim(Oid cmpFunc, SortSupport ssup);
 extern void PrepareSortSupportFromOrderingOp(Oid orderingOp, SortSupport ssup);
 extern void PrepareSortSupportFromIndexRel(Relation indexRel, int16 strategy,
 										   SortSupport ssup);
+extern void PrepareSortSupportFromGistIndexRel(Relation indexRel, SortSupport ssup);
 
 #endif							/* SORTSUPPORT_H */
diff --git a/src/include/utils/tuplesort.h b/src/include/utils/tuplesort.h
index 9e76666fe94..c69b36e209a 100644
--- a/src/include/utils/tuplesort.h
+++ b/src/include/utils/tuplesort.h
@@ -217,6 +217,10 @@ extern Tuplesortstate *tuplesort_begin_index_hash(Relation heapRel,
 												  uint32 max_buckets,
 												  int workMem, SortCoordinate coordinate,
 												  bool randomAccess);
+extern Tuplesortstate *tuplesort_begin_index_gist(Relation heapRel,
+												  Relation indexRel,
+												  int workMem, SortCoordinate coordinate,
+												  bool randomAccess);
 extern Tuplesortstate *tuplesort_begin_datum(Oid datumType,
 											 Oid sortOperator, Oid sortCollation,
 											 bool nullsFirstFlag,
-- 
2.20.1

Re: Yet another fast GiST build

Reply via email to