Our docs for GiST indexes say the compress function is only used for
internal pages, not leaf pages, but actually it is used everywhere.
Here are two patches to clean things up.

You can see that we store compressed values with the pageinspect
extension. For instance, multiranges are compressed to ranges. Here
they are in leaf pages:

[v19devel:5432][314069] postgres=# create table mr (id int4multirange);
CREATE TABLE
[v19devel:5432][314069] postgres=# create index idx_mr on mr using gist (id);
CREATE INDEX
[v19devel:5432][314069] postgres=# insert into mr values ('{[1,2)}'),
('{[2,3)}');
INSERT 0 2
[v19devel:5432][314069] postgres=# select * from
gist_page_opaque_info(get_raw_page('idx_mr', 0));
    lsn     |    nsn     | rightlink  | flags
------------+------------+------------+--------
 0/01917320 | 0/00000000 | 4294967295 | {leaf}
(1 row)

[v19devel:5432][314069] postgres=# select * from
gist_page_items(get_raw_page('idx_mr', 0), 'idx_mr');
 itemoffset | ctid  | itemlen | dead |      keys
------------+-------+---------+------+----------------
          1 | (0,1) |      24 | f    | (id)=("[1,2)")
          2 | (0,2) |      24 | f    | (id)=("[2,3)")

Similarly, btree_gist stores gbtreekey entries in leaf pages:

[v19devel:5432][314069] postgres=# create table i (id int);
CREATE TABLE
[v19devel:5432][314069] postgres=# create index idx_i on i using gist (id);
CREATE INDEX
[v19devel:5432][314069] postgres=# insert into i values (1), (2), (3);
INSERT 0 3
[v19devel:5432][314069] postgres=# select * from
gist_page_items(get_raw_page('idx_i', 0), 'idx_i');
ERROR:  cannot display a value of type gbtreekey?
[v19devel:5432][314069] postgres=# select * from
gist_page_items_bytea(get_raw_page('idx_i', 0));
 itemoffset | ctid  | itemlen | dead |              key_data
------------+-------+---------+------+------------------------------------
          1 | (0,1) |      16 | f    | \x00000000010010000100000001000000
          2 | (0,2) |      16 | f    | \x00000000020010000200000002000000
          3 | (0,3) |      16 | f    | \x00000000030010000300000003000000

I think this error goes back to the second GiST paper referenced in
access/gist/README, titled "Concurrency and Recovery in Generalized
Search Trees" (from 1997). On page 2 it says that internal pages store
the predicate and leaf pages store the key. (The original 1995 paper
doesn't differentiate like that though.) Since our README has a list
of ways that our implementation diverges from the research, I added a
note there as well.

I've also supplied a patch to clarify that there are two papers. The
old wording is a bit confusing:

> GiST stands for Generalized Search Tree. It was introduced in the seminal 
> paper
> "Generalized Search Trees for Database Systems", 1995, Joseph M. Hellerstein,
> Jeffrey F. Naughton, Avi Pfeffer:
>
>     http://www.sai.msu.su/~megera/postgres/gist/papers/gist.ps
>     https://dsf.berkeley.edu/papers/sigmod97-gist.pdf

Clarifying the two papers helps me call out the leaf page difference
in the main patch.

Yours,

-- 
Paul              ~{:-)
[email protected]
From 42b8df1f669fedc8328524173f1799379c80bab4 Mon Sep 17 00:00:00 2001
From: "Paul A. Jungwirth" <[email protected]>
Date: Fri, 16 Jan 2026 12:31:06 -0800
Subject: [PATCH v1 2/2] Correct GiST documentation about compressed values in
 leaf pages

Our docs stated that the GiST compress support function is only used to store
values in internal pages, not leaf pages. But actually we use compress for all
pages (if supplied). This error may go back to the 1997 paper "Concurrency and
Recovery in Generalized Search Trees", which states that internal pages should
hold a predicate and leaf pages should hold the key itself (page 2). But the
original paper from 1995 has the predicate everywhere. I also updated the README
to call out this difference.
---
 doc/src/sgml/gist.sgml         | 5 ++---
 src/backend/access/gist/README | 3 +++
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/gist.sgml b/doc/src/sgml/gist.sgml
index 1871f742721..c527757a7ef 100644
--- a/doc/src/sgml/gist.sgml
+++ b/doc/src/sgml/gist.sgml
@@ -273,9 +273,8 @@ CREATE INDEX ON my_table USING GIST (my_inet_column inet_ops);
    index will depend on the <function>penalty</function> and <function>picksplit</function>
    methods.
    Two optional methods are <function>compress</function> and
-   <function>decompress</function>, which allow an index to have internal tree data of
-   a different type than the data it indexes. The leaves are to be of the
-   indexed data type, while the other tree nodes can be of any C struct (but
+   <function>decompress</function>, which allow an index to store keys of
+   a different type than the data it indexes. The index tuples can be of any C struct (but
    you still have to follow <productname>PostgreSQL</productname> data type rules here,
    see about <literal>varlena</literal> for variable sized data). If the tree's
    internal data type exists at the SQL level, the <literal>STORAGE</literal> option
diff --git a/src/backend/access/gist/README b/src/backend/access/gist/README
index 52b14119ff6..c8fc749dcf2 100644
--- a/src/backend/access/gist/README
+++ b/src/backend/access/gist/README
@@ -59,6 +59,9 @@ The original algorithms were modified in several ways:
   it is now a single-pass algorithm.
 * Since the papers were theoretical, some details were omitted and we
   had to find out ourself how to solve some specific problems.
+* We store the (potentially-compressed) predicate at all node levels. The
+  1997 paper above (but not the 1995 one) states that leaf pages should store
+  the original key.
 
 Because of the above reasons, we have revised the interaction of GiST
 core and PostgreSQL WAL system. Moreover, we encountered (and solved)
-- 
2.47.3

From a0abaf597877135e77de923d17eed4b40face8c3 Mon Sep 17 00:00:00 2001
From: "Paul A. Jungwirth" <[email protected]>
Date: Fri, 16 Jan 2026 12:38:12 -0800
Subject: [PATCH v1 1/2] Clarify GiST README references to the original
 research papers

The README text gives the name of the original 1995 paper, followed by two
bullet points, one to the 1995 paper then another to a 1997 paper. This commit
adds the 1997 paper details.
---
 src/backend/access/gist/README | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/gist/README b/src/backend/access/gist/README
index 76e0e11f228..52b14119ff6 100644
--- a/src/backend/access/gist/README
+++ b/src/backend/access/gist/README
@@ -10,9 +10,13 @@ GiST stands for Generalized Search Tree. It was introduced in the seminal paper
 Jeffrey F. Naughton, Avi Pfeffer:
 
     http://www.sai.msu.su/~megera/postgres/gist/papers/gist.ps
+
+Concurrency support was described in "Concurrency and Recovery in Generalized
+Search Trees", 1997, Marcel Kornacker, C. Mohan, Joseph M. Hellerstein:
+
     https://dsf.berkeley.edu/papers/sigmod97-gist.pdf
 
-and implemented by J. Hellerstein and P. Aoki in an early version of
+GiST was implemented by J. Hellerstein and P. Aoki in an early version of
 PostgreSQL (more details are available from The GiST Indexing Project
 at Berkeley at http://gist.cs.berkeley.edu/). As a "university"
 project it had a limited number of features and was in rare use.
-- 
2.47.3

Reply via email to