On 01/21/2014 04:02 AM, Tomas Vondra wrote:
On 20.1.2014 19:30, Heikki Linnakangas wrote:

Attached is a yet another version, with more bugs fixed and more
comments added and updated. I would appreciate some heavy-testing of
this patch now. If you could re-run the tests you've been using,
that could be great. I've tested the WAL replay by replicating GIN
operations over streaming replication. That doesn't guarantee it's
correct, but it's a good smoke test.

I gave it a try - the OOM error seems to be gone, but now get this

    PANIC:  cannot insert duplicate items to GIN index page

This only happens when building the index incrementally (i.e. using a
sequence of INSERT statements into a table with GIN index). When I
create a new index on a table (already containing the same dataset) it
works just fine.

Also, I tried to reproduce the issue by running a simple plpgsql loop
(instead of a complex python script):

DO LANGUAGE plpgsql $$
DECLARE
     r tsvector;
BEGIN
     FOR r IN SELECT body_tsvector FROM data_table LOOP
         INSERT INTO idx_table (body_tsvector) VALUES (r);
     END LOOP;
END$$;

where data_table is the table with imported data (the same data I
mentioned in the post about OOM errors), and index_table is an empty
table with a GIN index. And indeed it fails, but only if I run the block
in multiple sessions in parallel.

Oh, I see what's going on. I had assumed that there cannot be duplicate insertions into the posting tree, but that's dead wrong. The fast insertion mechanism depends on a duplicate insertion to do nothing.

Will fix, thanks for the testing!

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to