Hi, hackers!

Currently, GiST stores each attribute in a compressed form.
That is, each time attribute is written it's calling compress
function, and when the attribute is accessed the decompress functions
is called.
Some types can't get any advantage out of this technique since the
context of one value is not enough for seeding effective compression
algorithm.
And we have some of the compress functions like this
Datum
gist_box_compress(PG_FUNCTION_ARGS)
{
PG_RETURN_POINTER(PG_GETARG_POINTER(0));
}

https://github.com/postgres/postgres/blob/master/src/backend/access/gist/gistproc.c#L195
https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/rangetypes_gist.c#L221
https://github.com/postgres/postgres/blob/master/contrib/seg/seg.c#L255
https://github.com/postgres/postgres/blob/master/contrib/cube/cube.c#L384

Maybe we should make compress\decompress functions optional?
Also, this brings some bit of performance.
For the attached test I observe 6% faster GiST build and 4% faster scans.
Not a big deal, but something. How do you think?

In some cases, there are strange things in the code of
compress\decompress. E.g. cube's decompress function is detoasting
entry twice, to be very sure :)



Best regards, Andrey Borodin, Octonica.
CREATE EXTENSION IF NOT EXISTS CUBE ;

DROP TABLE IF EXISTS TESTTABLE ;
DROP OPERATOR CLASS IF EXISTS gist_cube_ops_nocompress USING GIST;

CREATE OPERATOR CLASS gist_cube_ops_nocompress
    FOR TYPE CUBE USING GIST AS
        OPERATOR        3       && ,
        OPERATOR        6       = ,
        OPERATOR        7       @> ,
        OPERATOR        8       <@ ,
        OPERATOR        13      @ ,
        OPERATOR        14      ~ ,

        FUNCTION        1       g_cube_consistent (INTERNAL, CUBE, SMALLINT, 
OID, INTERNAL),
        FUNCTION        2       g_cube_union (INTERNAL, INTERNAL),
        --we do not have functions 3 and for (compress and decompress)
        FUNCTION        5       g_cube_penalty (INTERNAL, INTERNAL, INTERNAL),
        FUNCTION        6       g_cube_picksplit (INTERNAL, INTERNAL),
        FUNCTION        7       g_cube_same (CUBE, CUBE, INTERNAL),
        FUNCTION        8       g_cube_distance (INTERNAL, CUBE, SMALLINT, OID, 
INTERNAL),
        FUNCTION        9       g_cube_decompress (INTERNAL);--fetch function, 
not for compression



BEGIN;
SELECT SETSEED(0.5);

CREATE TABLE TESTTABLE AS SELECT CUBE(RANDOM()) C FROM 
GENERATE_SERIES(1,500000) I;

\timing

--testing time to create compressed index
CREATE INDEX COMPRESSED ON TESTTABLE USING GIST(C gist_cube_ops);
--testing time to create uncompressed index
CREATE INDEX UNCOMPRESSED ON TESTTABLE USING GIST(C gist_cube_ops_nocompress);

\timing
set enable_bitmapscan = false;

UPDATE PG_INDEX SET INDISVALID = FALSE WHERE INDEXRELID = 
'UNCOMPRESSED'::REGCLASS;
UPDATE PG_INDEX SET INDISVALID = TRUE WHERE INDEXRELID = 'COMPRESSED'::REGCLASS;

EXPLAIN ANALYZE select (SELECT COUNT(*) FROM TESTTABLE WHERE C <@ CUBE(-b,1)) 
from generate_series(1,50) b;
EXPLAIN ANALYZE select (SELECT COUNT(*) FROM TESTTABLE WHERE C <@ CUBE(-b,1)) 
from generate_series(1,50) b;

UPDATE PG_INDEX SET INDISVALID = TRUE WHERE INDEXRELID = 
'UNCOMPRESSED'::REGCLASS;
UPDATE PG_INDEX SET INDISVALID = FALSE WHERE INDEXRELID = 
'COMPRESSED'::REGCLASS;

EXPLAIN ANALYZE select (SELECT COUNT(*) FROM TESTTABLE WHERE C <@ CUBE(-b,1)) 
from generate_series(1,50) b;
EXPLAIN ANALYZE select (SELECT COUNT(*) FROM TESTTABLE WHERE C <@ CUBE(-b,1)) 
from generate_series(1,50) b;

COMMIT;



Attachment: 0001-Allow-uncompressed-GiST.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to