9.3.1

Heikki Linnakangas Fri, 22 Nov 2013 04:24:18 -0800

On 19.11.2013 16:20, Andres Freund wrote:

On 2013-11-18 23:15:59 +0100, Andres Freund wrote:

Afaics it's likely a combination/interaction of bugs and fixes between:
* the initial HS code
* 5a031a5556ff83b8a9646892715d7fef415b83c3
* f44eedc3f0f347a856eea8590730769125964597


Yes, the combination of those is guilty.

Man, this is (to a good part my) bad.

But that'd mean nobody noticed it during 9.3's beta...


It's fairly hard to reproduce artificially since a) there have to be
enough transactions starting and committing from the start of the
checkpoint the standby is starting from to the point it does
LogStandbySnapshot() to cross a 32768 boundary b) hint bits often save
the game by not accessing clog at all anymore and thus not noticing the
corruption.
I've reproduced the issue by having an INSERT ONLY table that's never
read from. It's helpful to disable autovacuum.

For the archive, here's what I used to reproduce this. It creates master and a standby, and also uses an INSERT only table. To make it trigger more easily, it helps to insert sleeps in CreateCheckpoint(), around the LogStandbySnapshot() call.


- Heikki

test-hot-standby-bug.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1

Reply via email to