GiST index on INT8, possible bug in query planner?

Jan Behrens Mon, 03 Dec 2018 08:09:59 -0800

Dear colleagues,

I have developed two indices using PostgreSQL's awesome GiST support,
one of them available here:


http://www.public-software-group.org/pgLatLon

(which is a lightweight and MIT-licensed alternative to PostGIS for
certain simple tasks involving geographic coordinates on the WGS-84
spheroid)


Recently I had the requirement of creating a multi-column index on
an integer in the first column and a custom data type in the second
column of the index. Since integers are not supported by GiST indices
by default, I used the btree_gist extension by Teodor Sigaev,
Oleg Bartunov, Janko Richter, and Paul Jungwirth, see:
https://www.postgresql.org/docs/10/btree-gist.html

However, the GiST index seems not to work as expected by me when
64-bit integers are involved. I tried to create a minimal
proof-of-concept to demonstrate this. Consider the following setup:

CREATE EXTENSION btree_gist;

CREATE TABLE test4_btree (id SERIAL4, ctx INT4);
CREATE TABLE test8_btree (id SERIAL4, ctx INT8);
CREATE TABLE test4_gist (id SERIAL4, ctx INT4);
CREATE TABLE test8_gist (id SERIAL4, ctx INT8);

I create multi-column indices on all four tables, with "ctx" as primary
and "id" as secondary column:

CREATE INDEX ON test4_btree (ctx, id);
CREATE INDEX ON test8_btree (ctx, id);
CREATE INDEX ON test4_gist USING gist (ctx, id);
CREATE INDEX ON test8_gist USING gist (ctx, id);

Now we add some data:

INSERT INTO test4_btree (ctx) SELECT floor(random()*100)+1 FROM 
generate_series(1, 10000);
INSERT INTO test8_btree (ctx) SELECT floor(random()*100)+1 FROM 
generate_series(1, 10000);
INSERT INTO test4_gist (ctx) SELECT floor(random()*100)+1 FROM 
generate_series(1, 10000);
INSERT INTO test8_gist (ctx) SELECT floor(random()*100)+1 FROM 
generate_series(1, 10000);

Only the tables directly using the B-tree index ("test4_btree" and
"test8_btree") and the table where "ctx" is 32-bit wide seem to work
properly:

EXPLAIN SELECT * FROM test4_btree WHERE ctx = 1 AND id = 2;
-- uses Index Cond: ((ctx = 1) AND (id = 2))

EXPLAIN SELECT * FROM test8_btree WHERE ctx = 1 AND id = 2;
-- uses Index Cond: ((ctx = 1) AND (id = 2))

EXPLAIN SELECT * FROM test4_gist WHERE ctx = 1 AND id = 2;
-- uses Index Cond: ((ctx = 1) AND (id = 2))

EXPLAIN SELECT * FROM test8_gist WHERE ctx = 1 AND id = 2;
-- uses Index Cond: (id = 2)

The query planning for the select on table "test8_gist" does not
include "ctx" in the "Index Cond".


To verify that the above problem isn't just an optimization because of
a low row count, I created a larger example with different values:

CREATE EXTENSION btree_gist;

CREATE TABLE test4_btree (ctx INT4, src INT4);
CREATE TABLE test8_btree (ctx INT8, src INT4);
CREATE TABLE test4_gist (ctx INT4, src INT4);
CREATE TABLE test8_gist (ctx INT8, src INT4);

CREATE INDEX ON test4_btree (ctx, src);
CREATE INDEX ON test8_btree (ctx, src);
CREATE INDEX ON test4_gist USING gist (ctx, src);
CREATE INDEX ON test8_gist USING gist (ctx, src);

INSERT INTO test4_btree SELECT floor(random()*10000)+1, floor(random()*2)+1 
FROM generate_series(1, 1000000);
INSERT INTO test8_btree SELECT floor(random()*10000)+1, floor(random()*2)+1 
FROM generate_series(1, 1000000);
INSERT INTO test4_gist SELECT floor(random()*10000)+1, floor(random()*2)+1 FROM 
generate_series(1, 1000000);
INSERT INTO test8_gist SELECT floor(random()*10000)+1, floor(random()*2)+1 FROM 
generate_series(1, 1000000);

EXPLAIN SELECT count(1) FROM test4_btree WHERE ctx = 1 AND src = 2;
-- uses Index Cond: ((ctx = 1) AND (src = 2))

EXPLAIN SELECT count(1) FROM test8_btree WHERE ctx = 1 AND src = 2;
-- uses Index Cond: ((ctx = 1) AND (src = 2))

EXPLAIN SELECT count(1) FROM test4_gist WHERE ctx = 1 AND src = 2;
-- uses Index Cond: ((ctx = 1) AND (src = 2))

EXPLAIN SELECT count(1) FROM test8_gist WHERE ctx = 1 AND src = 2;
-- uses Index Cond: (src = 2)

ANALYZE;

EXPLAIN SELECT count(1) FROM test4_btree WHERE ctx = 1 AND src = 2;
-- uses Index Cond: ((ctx = 1) AND (src = 2))

EXPLAIN SELECT count(1) FROM test8_btree WHERE ctx = 1 AND src = 2;
-- uses Index Cond: ((ctx = 1) AND (src = 2))

EXPLAIN SELECT count(1) FROM test4_gist WHERE ctx = 1 AND src = 2;
-- uses Index Cond: ((ctx = 1) AND (src = 2))

EXPLAIN SELECT count(1) FROM test8_gist WHERE ctx = 1 AND src = 2;
-- does not use Index Cond at all, but Filter: ((ctx = 1) AND (src = 2))

SELECT count(1) FROM test4_btree WHERE ctx = 1 AND src = 2;  -- fast
SELECT count(1) FROM test8_btree WHERE ctx = 1 AND src = 2;  -- fast
SELECT count(1) FROM test4_gist WHERE ctx = 1 AND src = 2;  -- fast
SELECT count(1) FROM test8_gist WHERE ctx = 1 AND src = 2;  -- slow!

The query on "test8_gist" is significantly slower than in all other
three cases.


I wonder if this is a bug in the query planner, in the GiST facilities
of PostgreSQL, a problem of the "btree_gist" extension, or something
else? Can anyone help me?


Kind regards,
Jan Behrens


-- 
Public Software Group e. V.
Johannisstr. 12, 10117 Berlin, Germany

www.public-software-group.org
[email protected]

eingetragen in das Vereinregister
des Amtsgerichtes Charlottenburg
Registernummer: VR 28873 B

Vorstände (einzelvertretungsberechtigt):
Jan Behrens
Axel Kistner
Andreas Nitsche
Björn Swierczek

GiST index on INT8, possible bug in query planner?

Reply via email to