On 19.11.2013 16:20, Andres Freund wrote:
On 2013-11-18 23:15:59 +0100, Andres Freund wrote:Afaics it's likely a combination/interaction of bugs and fixes between: * the initial HS code * 5a031a5556ff83b8a9646892715d7fef415b83c3 * f44eedc3f0f347a856eea8590730769125964597Yes, the combination of those is guilty. Man, this is (to a good part my) bad.But that'd mean nobody noticed it during 9.3's beta...It's fairly hard to reproduce artificially since a) there have to be enough transactions starting and committing from the start of the checkpoint the standby is starting from to the point it does LogStandbySnapshot() to cross a 32768 boundary b) hint bits often save the game by not accessing clog at all anymore and thus not noticing the corruption. I've reproduced the issue by having an INSERT ONLY table that's never read from. It's helpful to disable autovacuum.
For the archive, here's what I used to reproduce this. It creates master and a standby, and also uses an INSERT only table. To make it trigger more easily, it helps to insert sleeps in CreateCheckpoint(), around the LogStandbySnapshot() call.
- Heikki
test-hot-standby-bug.sh
Description: Bourne shell script
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers