[HACKERS] issues with range types, btree_gist and constraints

Tomas Vondra Fri, 01 Feb 2013 15:40:30 -0800

Hi,

I'm having trouble with range types and btree_gist - after some playingI believe it'scaused by a bug in how btree_gist handles text columns. All this is onfreshly compiled

9.2.2.

I'm trying to achieve almost exactly what's described in the secondexample on

http://www.postgresql.org/docs/9.2/interactive/rangetypes.html#RANGETYPES-CONSTRAINT

i.e. I maintaining a list of ranges for each ID, except that I'm usingtext instead of

integers for an ID. so the table looks like this:

=========================================================================================
CREATE TABLE test (
    id        TEXT,

validity TSRANGE NOT NULL DEFAULT tsrange('-infinity'::timestamp,'infinity'::timestamp),CONSTRAINT test_validity_check EXCLUDE USING GIST (id WITH =,validity WITH &&)

);
=========================================================================================

This table is repeatedly filled with new versions of the data (whichwere removed fromthe demo for sake of simplicity), so I've defined a trigger that checksif there's a

range with overlapping range, and split the range accordingly.

Each record starts with validity=[-infinity, infinity). On the firstupdate this wouldbe split into [-infinity, now()) and [now(), infinity) and so on. Thisis what the following

trigger should do:

=========================================================================================
CREATE OR REPLACE FUNCTION test_close() RETURNS trigger AS $$
BEGIN

    -- close the previous record (set upper bound of the range)

UPDATE test SET validity = tsrange(lower(validity),now()::timestamp)

     WHERE id = NEW.id AND (upper(validity) = 'infinity'::timestamp);

-- if there was a preceding record, set the lower bound (otherwiseuse unbounded range)

    IF FOUND THEN

NEW.validity := tsrange(now()::timestamp,'infinity'::timestamp);

    END IF;

    RETURN NEW;

END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER test_close BEFORE INSERT ON test FOR EACH ROW EXECUTEPROCEDURE test_close();

=========================================================================================

To generate the sample data, do this:

=========================================================================================
    echo "SimpleTestString" > /tmp/data.csv
    for f in `seq 1 20000`; do
        echo $f > /tmp/x.log;
        md5sum /tmp/x.log | awk '{print $1}' >> /tmp/data.csv;
    done;
=========================================================================================

The first line (combination of upper and lower-case letters) is whatseems to trigger thebehavior. Now load the file into the table repeatedly, and you'lleventually get this error


=========================================================================================
db=# copy test(id) from '/tmp/data.csv';
COPY 10001
db=# copy test(id) from '/tmp/data.csv';
COPY 10001
db=# copy test(id) from '/tmp/data.csv';

ERROR: conflicting key value violates exclusion constraint"test_validity_check"DETAIL: Key (id, validity)=(SimpleTestString, ["2013-02-0123:32:04.329975",infinity))conflicts with existing key (id, validity)=(SimpleTestString,[-infinity,infinity)).

CONTEXT:  COPY test, line 1: "SimpleTestString"
=========================================================================================

The number of necessary COPY executions varies - what's even strangeris the result of

this select once it fails:

=========================================================================================
test=# select * from test where id = 'SimpleTestString';
        id        |       validity
------------------+----------------------
 SimpleTestString | [-infinity,infinity)
 SimpleTestString | [-infinity,infinity)
(2 rows)
=========================================================================================

Yup, there are two overlapping ranges for the same ID. Moreover afterdisabling bitmap andindex scans, the COPY takes much longer but works just fine (includingthe trigger).

Creating a plain b-tree index on the "ID" column seems to fix that too.

That leads me to the belief that this is a bug in the GIST indexing,and the variationsare probably caused by the index scan kicking in after one of the COPYexecutions (and

reaching some threshold). I'm using en_US.UTF-8 for the database.

By replacing the "infinity" with a plain NULL (in the table andtrigger), it fails too,but in a slightly different way. For example I'm seeing this after thefailure:


=========================================================================================
test=# select * from test where id = 'SimpleTest';
     id     |            validity
------------+---------------------------------
 SimpleTest | (,"2013-02-02 00:07:07.038324")
(1 row)

test=# set enable_bitmapscan to off;
SET
test=# set enable_indexscan to off;
SET
test=# select * from test where id = 'SimpleTest';
     id     |            validity
------------+---------------------------------
 SimpleTest | (,"2013-02-02 00:07:07.038324")
 SimpleTest | ["2013-02-02 00:07:07.038324",)
(2 rows)
=========================================================================================

I've been unable to achieve this using a generated sample, thereforeprepared sample

scripts and CSV files

1) with-infinity.sql + sample-1.csv (this is described in the textabove)

  2) with-nulls.sql + sample-2.csv (this is the NULL version)

available for download at http://www.fuzzy.cz/tmp/samples.tgz (~1MB).

kind regards
Tomas


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] issues with range types, btree_gist and constraints

Reply via email to