Hi, hackers!

It seems that if btree index with a unique constraint is corrupted by
duplicates, amcheck now can not catch this. Reindex becomes impossible as
it throws an error but otherwise the index will let the user know that it
is corrupted, and amcheck will tell that the index is clean. So I'd like to
propose a short patch to improve amcheck for checking the unique
constraint. It will output tid's of tuples that are duplicated in the index
(i.e. more than one tid for the same index key is visible) and the user can
easily investigate and delete corresponding table entries.

0001 - is the actual patch, and
0002 - is a temporary hack for testing. It will allow inserting duplicates
in a table even if an index with the exact name "idx" has a unique
constraint (generally it is prohibited to insert). Then a new amcheck will
tell us about these duplicates. It's pity but testing can not be done
automatically, as it needs a core recompile. For testing I'd recommend a
protocol similar to the following:

- Apply patch 0002
- Set autovaccum = off in postgresql.conf



*create table tbl2 (a varchar(50), b varchar(150), c bytea, d
varchar(50));create unique index idx on tbl2(a,b);insert into tbl2 select
i::text::varchar, i::text::varchar, i::text::bytea, i::text::varchar from
generate_series(0,3) as i, generate_series(0,10000) as x;*

So we'll have a generous amount of duplicates in the table and index. Then:

*create extension amcheck;*
*select bt_index_check('idx', true);*

There will be output about each pair of duplicates: index tid's, position
in a posting list (if the index item is deduplicated) and table tid's.

WARNING:  index uniqueness is violated for index "idx": Index tid=(26,6)
posting 218 and posting 219 (point to heap tid=(126,93) and tid=(126,94))
page lsn=0/1B3D420.
WARNING:  index uniqueness is violated for index "idx": Index tid=(26,6)
posting 219 and posting 220 (point to heap tid=(126,94) and tid=(126,95))
page lsn=0/1B3D420.
WARNING:  index uniqueness is violated for index "idx": Index tid=(26,6)
posting 220 and posting 221 (point to heap tid=(126,95) and tid=(126,96))
page lsn=0/1B3D420.
WARNING:  index uniqueness is violated for index "idx": Index tid=(26,6)
posting 221 and tid=(26,7) posting 0 (point to heap tid=(126,96) and
tid=(126,97)) page lsn=0/1B3D420.
WARNING:  index uniqueness is violated for index "idx": Index tid=(26,7)
posting 0 and posting 1 (point to heap tid=(126,97) and tid=(126,98)) page
lsn=0/1B3D420.
WARNING:  index uniqueness is violated for index "idx": Index tid=(26,7)
posting 1 and posting 2 (point to heap tid=(126,98) and tid=(126,99)) page
lsn=0/1B3D420.

Things to notice in the test output:
- cross-page duplicates (when some of them on the one index page and the
other are on the next). (Under patch 0002 they are marked by an additional
message "*INFO:  cross page equal keys"* to catch them among the other)

- If we delete table entries corresponding to some duplicates in between
the other duplicates (vacuum should be off), the message for this entry
should disappear but the other duplicates should be detected as previous.

*delete from tbl2 where ctid::text='(126,94)';*
*select bt_index_check('idx', true);*

WARNING:  index uniqueness is violated for index "idx": Index tid=(26,6)
posting 218 and posting 220 (point to heap tid=(126,93) and tid=(126,95))
page lsn=0/1B3D420.
WARNING:  index uniqueness is violated for index "idx": Index tid=(26,6)
posting 220 and posting 221 (point to heap tid=(126,95) and tid=(126,96))
page lsn=0/1B3D420.
WARNING:  index uniqueness is violated for index "idx": Index tid=(26,6)
posting 221 and tid=(26,7) posting 0 (point to heap tid=(126,96) and
tid=(126,97)) page lsn=0/1B3D420.
WARNING:  index uniqueness is violated for index "idx": Index tid=(26,7)
posting 0 and posting 1 (point to heap tid=(126,97) and tid=(126,98)) page
lsn=0/1B3D420.
WARNING:  index uniqueness is violated for index "idx": Index tid=(26,7)
posting 1 and posting 2 (point to heap tid=(126,98) and tid=(126,99)) page
lsn=0/1B3D420.

Caveat: if the first entry on the next index page has a key equal to the
key on a previous page AND all heap tid's corresponding to this entry are
invisible, currently cross-page check can not detect unique constraint
violation between previous index page entry and 2nd, 3d and next current
index page entries. In this case, there would be a message that recommends
doing VACUUM to remove the invisible entries from the index and repeat the
check. (Generally, it is recommended to do vacuum before the check, but for
the testing purpose I'd recommend turning it off to check the detection of
visible-invisible-visible duplicates scenarios)

Your feedback is very much welcome, as usual.

-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com <http://www.postgrespro.com>

Attachment: v1-0002-Ignore-unique-constraint-check-on-btree-insert.-P.patch
Description: Binary data

Attachment: v1-0001-Make-amcheck-checking-UNIQUE-constraint-for-btree.patch
Description: Binary data

Reply via email to