On Tue, Sep 8, 2015 at 2:35 PM, Anastasia Lubennikova <
a.lubennik...@postgrespro.ru> wrote:

>
> Fixed patch is attached.
>
>
The commit of this patch seems to have created a bug in which updated
tuples can disappear from the index, while remaining in the table.

It looks like the bug depends on going through a crash-recovery cycle, but
I am not sure of that yet.

I've looked through the commit diff and don't see anything obviously
wrong.  I notice index tuples are marked dead with only a buffer content
share lock, and the page is defragmented with only a buffer exclusive lock
(as opposed to a super-exclusive buffer clean up lock).  But as far as I
can tell, both of those should be safe on an index.  Also, if that was the
bug, it should happen without crash-recovery.

The test is pretty simple.  I create a 10,000 row table with a
unique-by-construction id column with a btree_gist index on it and a
counter column, and fire single-row updates of the counter for random ids
in high concurrency (8 processes running flat out).  I force the server to
crash frequently with simulated torn-page writes in which md.c writes a
partial page and then PANICs.  Eventually (1 to 3 hours) the updates start
indicating they updated 0 rows.  At that point, a forced table scan will
find the row, but the index doesn't.

Any hints on how to proceed with debugging this?  If I can't get it to
reproduce the problem in the absence of crash-recovery cycles with an
overnight run, then I think my next step will be to run it over hot-standby
and see if WAL replay in the absence of crashes might be broken as well.

Cheers,

Jeff

Reply via email to