On 01/21/2014 04:02 AM, Tomas Vondra wrote:
On 20.1.2014 19:30, Heikki Linnakangas wrote:
Attached is a yet another version, with more bugs fixed and more
comments added and updated. I would appreciate some heavy-testing of
this patch now. If you could re-run the tests you've been using,
that could be great. I've tested the WAL replay by replicating GIN
operations over streaming replication. That doesn't guarantee it's
correct, but it's a good smoke test.
I gave it a try - the OOM error seems to be gone, but now get this
PANIC: cannot insert duplicate items to GIN index page
This only happens when building the index incrementally (i.e. using a
sequence of INSERT statements into a table with GIN index). When I
create a new index on a table (already containing the same dataset) it
works just fine.
Also, I tried to reproduce the issue by running a simple plpgsql loop
(instead of a complex python script):
DO LANGUAGE plpgsql $$
DECLARE
r tsvector;
BEGIN
FOR r IN SELECT body_tsvector FROM data_table LOOP
INSERT INTO idx_table (body_tsvector) VALUES (r);
END LOOP;
END$$;
where data_table is the table with imported data (the same data I
mentioned in the post about OOM errors), and index_table is an empty
table with a GIN index. And indeed it fails, but only if I run the block
in multiple sessions in parallel.
Oh, I see what's going on. I had assumed that there cannot be duplicate
insertions into the posting tree, but that's dead wrong. The fast
insertion mechanism depends on a duplicate insertion to do nothing.
Will fix, thanks for the testing!
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers