Hi, hackers! Currently, GiST stores each attribute in a compressed form. That is, each time attribute is written it's calling compress function, and when the attribute is accessed the decompress functions is called. Some types can't get any advantage out of this technique since the context of one value is not enough for seeding effective compression algorithm. And we have some of the compress functions like this Datum gist_box_compress(PG_FUNCTION_ARGS) { PG_RETURN_POINTER(PG_GETARG_POINTER(0)); }
https://github.com/postgres/postgres/blob/master/src/backend/access/gist/gistproc.c#L195 https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/rangetypes_gist.c#L221 https://github.com/postgres/postgres/blob/master/contrib/seg/seg.c#L255 https://github.com/postgres/postgres/blob/master/contrib/cube/cube.c#L384 Maybe we should make compress\decompress functions optional? Also, this brings some bit of performance. For the attached test I observe 6% faster GiST build and 4% faster scans. Not a big deal, but something. How do you think? In some cases, there are strange things in the code of compress\decompress. E.g. cube's decompress function is detoasting entry twice, to be very sure :) Best regards, Andrey Borodin, Octonica.
CREATE EXTENSION IF NOT EXISTS CUBE ; DROP TABLE IF EXISTS TESTTABLE ; DROP OPERATOR CLASS IF EXISTS gist_cube_ops_nocompress USING GIST; CREATE OPERATOR CLASS gist_cube_ops_nocompress FOR TYPE CUBE USING GIST AS OPERATOR 3 && , OPERATOR 6 = , OPERATOR 7 @> , OPERATOR 8 <@ , OPERATOR 13 @ , OPERATOR 14 ~ , FUNCTION 1 g_cube_consistent (INTERNAL, CUBE, SMALLINT, OID, INTERNAL), FUNCTION 2 g_cube_union (INTERNAL, INTERNAL), --we do not have functions 3 and for (compress and decompress) FUNCTION 5 g_cube_penalty (INTERNAL, INTERNAL, INTERNAL), FUNCTION 6 g_cube_picksplit (INTERNAL, INTERNAL), FUNCTION 7 g_cube_same (CUBE, CUBE, INTERNAL), FUNCTION 8 g_cube_distance (INTERNAL, CUBE, SMALLINT, OID, INTERNAL), FUNCTION 9 g_cube_decompress (INTERNAL);--fetch function, not for compression BEGIN; SELECT SETSEED(0.5); CREATE TABLE TESTTABLE AS SELECT CUBE(RANDOM()) C FROM GENERATE_SERIES(1,500000) I; \timing --testing time to create compressed index CREATE INDEX COMPRESSED ON TESTTABLE USING GIST(C gist_cube_ops); --testing time to create uncompressed index CREATE INDEX UNCOMPRESSED ON TESTTABLE USING GIST(C gist_cube_ops_nocompress); \timing set enable_bitmapscan = false; UPDATE PG_INDEX SET INDISVALID = FALSE WHERE INDEXRELID = 'UNCOMPRESSED'::REGCLASS; UPDATE PG_INDEX SET INDISVALID = TRUE WHERE INDEXRELID = 'COMPRESSED'::REGCLASS; EXPLAIN ANALYZE select (SELECT COUNT(*) FROM TESTTABLE WHERE C <@ CUBE(-b,1)) from generate_series(1,50) b; EXPLAIN ANALYZE select (SELECT COUNT(*) FROM TESTTABLE WHERE C <@ CUBE(-b,1)) from generate_series(1,50) b; UPDATE PG_INDEX SET INDISVALID = TRUE WHERE INDEXRELID = 'UNCOMPRESSED'::REGCLASS; UPDATE PG_INDEX SET INDISVALID = FALSE WHERE INDEXRELID = 'COMPRESSED'::REGCLASS; EXPLAIN ANALYZE select (SELECT COUNT(*) FROM TESTTABLE WHERE C <@ CUBE(-b,1)) from generate_series(1,50) b; EXPLAIN ANALYZE select (SELECT COUNT(*) FROM TESTTABLE WHERE C <@ CUBE(-b,1)) from generate_series(1,50) b; COMMIT;
0001-Allow-uncompressed-GiST.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers