Dear colleagues, I have developed two indices using PostgreSQL's awesome GiST support, one of them available here:
http://www.public-software-group.org/pgLatLon (which is a lightweight and MIT-licensed alternative to PostGIS for certain simple tasks involving geographic coordinates on the WGS-84 spheroid) Recently I had the requirement of creating a multi-column index on an integer in the first column and a custom data type in the second column of the index. Since integers are not supported by GiST indices by default, I used the btree_gist extension by Teodor Sigaev, Oleg Bartunov, Janko Richter, and Paul Jungwirth, see: https://www.postgresql.org/docs/10/btree-gist.html However, the GiST index seems not to work as expected by me when 64-bit integers are involved. I tried to create a minimal proof-of-concept to demonstrate this. Consider the following setup: CREATE EXTENSION btree_gist; CREATE TABLE test4_btree (id SERIAL4, ctx INT4); CREATE TABLE test8_btree (id SERIAL4, ctx INT8); CREATE TABLE test4_gist (id SERIAL4, ctx INT4); CREATE TABLE test8_gist (id SERIAL4, ctx INT8); I create multi-column indices on all four tables, with "ctx" as primary and "id" as secondary column: CREATE INDEX ON test4_btree (ctx, id); CREATE INDEX ON test8_btree (ctx, id); CREATE INDEX ON test4_gist USING gist (ctx, id); CREATE INDEX ON test8_gist USING gist (ctx, id); Now we add some data: INSERT INTO test4_btree (ctx) SELECT floor(random()*100)+1 FROM generate_series(1, 10000); INSERT INTO test8_btree (ctx) SELECT floor(random()*100)+1 FROM generate_series(1, 10000); INSERT INTO test4_gist (ctx) SELECT floor(random()*100)+1 FROM generate_series(1, 10000); INSERT INTO test8_gist (ctx) SELECT floor(random()*100)+1 FROM generate_series(1, 10000); Only the tables directly using the B-tree index ("test4_btree" and "test8_btree") and the table where "ctx" is 32-bit wide seem to work properly: EXPLAIN SELECT * FROM test4_btree WHERE ctx = 1 AND id = 2; -- uses Index Cond: ((ctx = 1) AND (id = 2)) EXPLAIN SELECT * FROM test8_btree WHERE ctx = 1 AND id = 2; -- uses Index Cond: ((ctx = 1) AND (id = 2)) EXPLAIN SELECT * FROM test4_gist WHERE ctx = 1 AND id = 2; -- uses Index Cond: ((ctx = 1) AND (id = 2)) EXPLAIN SELECT * FROM test8_gist WHERE ctx = 1 AND id = 2; -- uses Index Cond: (id = 2) The query planning for the select on table "test8_gist" does not include "ctx" in the "Index Cond". To verify that the above problem isn't just an optimization because of a low row count, I created a larger example with different values: CREATE EXTENSION btree_gist; CREATE TABLE test4_btree (ctx INT4, src INT4); CREATE TABLE test8_btree (ctx INT8, src INT4); CREATE TABLE test4_gist (ctx INT4, src INT4); CREATE TABLE test8_gist (ctx INT8, src INT4); CREATE INDEX ON test4_btree (ctx, src); CREATE INDEX ON test8_btree (ctx, src); CREATE INDEX ON test4_gist USING gist (ctx, src); CREATE INDEX ON test8_gist USING gist (ctx, src); INSERT INTO test4_btree SELECT floor(random()*10000)+1, floor(random()*2)+1 FROM generate_series(1, 1000000); INSERT INTO test8_btree SELECT floor(random()*10000)+1, floor(random()*2)+1 FROM generate_series(1, 1000000); INSERT INTO test4_gist SELECT floor(random()*10000)+1, floor(random()*2)+1 FROM generate_series(1, 1000000); INSERT INTO test8_gist SELECT floor(random()*10000)+1, floor(random()*2)+1 FROM generate_series(1, 1000000); EXPLAIN SELECT count(1) FROM test4_btree WHERE ctx = 1 AND src = 2; -- uses Index Cond: ((ctx = 1) AND (src = 2)) EXPLAIN SELECT count(1) FROM test8_btree WHERE ctx = 1 AND src = 2; -- uses Index Cond: ((ctx = 1) AND (src = 2)) EXPLAIN SELECT count(1) FROM test4_gist WHERE ctx = 1 AND src = 2; -- uses Index Cond: ((ctx = 1) AND (src = 2)) EXPLAIN SELECT count(1) FROM test8_gist WHERE ctx = 1 AND src = 2; -- uses Index Cond: (src = 2) ANALYZE; EXPLAIN SELECT count(1) FROM test4_btree WHERE ctx = 1 AND src = 2; -- uses Index Cond: ((ctx = 1) AND (src = 2)) EXPLAIN SELECT count(1) FROM test8_btree WHERE ctx = 1 AND src = 2; -- uses Index Cond: ((ctx = 1) AND (src = 2)) EXPLAIN SELECT count(1) FROM test4_gist WHERE ctx = 1 AND src = 2; -- uses Index Cond: ((ctx = 1) AND (src = 2)) EXPLAIN SELECT count(1) FROM test8_gist WHERE ctx = 1 AND src = 2; -- does not use Index Cond at all, but Filter: ((ctx = 1) AND (src = 2)) SELECT count(1) FROM test4_btree WHERE ctx = 1 AND src = 2; -- fast SELECT count(1) FROM test8_btree WHERE ctx = 1 AND src = 2; -- fast SELECT count(1) FROM test4_gist WHERE ctx = 1 AND src = 2; -- fast SELECT count(1) FROM test8_gist WHERE ctx = 1 AND src = 2; -- slow! The query on "test8_gist" is significantly slower than in all other three cases. I wonder if this is a bug in the query planner, in the GiST facilities of PostgreSQL, a problem of the "btree_gist" extension, or something else? Can anyone help me? Kind regards, Jan Behrens -- Public Software Group e. V. Johannisstr. 12, 10117 Berlin, Germany www.public-software-group.org vorst...@public-software-group.org eingetragen in das Vereinregister des Amtsgerichtes Charlottenburg Registernummer: VR 28873 B Vorstände (einzelvertretungsberechtigt): Jan Behrens Axel Kistner Andreas Nitsche Björn Swierczek