On Wed, Mar 28, 2012 at 10:54 PM, Simon Riggs <si...@2ndquadrant.com> wrote: > On Wed, Mar 28, 2012 at 10:24 PM, Simon Riggs <si...@2ndquadrant.com> wrote: >> On Wed, Mar 28, 2012 at 9:48 PM, Marko Kreen <mark...@gmail.com> wrote: >>> On Fri, Mar 23, 2012 at 08:52:40AM +0000, Simon Riggs wrote: >>>> Master pg_controldata - OK txid_current_snapshot() - OK >>>> Standby pg_controldata - OK txid_current_snapshot() - lower value >>> >>> On Skytools list is report about master with slaves, but the >>> lower value appears on master too: >>> >>> http://lists.pgfoundry.org/pipermail/skytools-users/2012-March/001601.html >>> >>> Cc'd original reporter too. >> >> Thanks. Am looking. > > I can't see how this could happen on the master at all. > > On the standby, it can happen if we skip restartpoints for about a > couple of billion xids. Which would be a problem. > > More on this tomorrow.
I've not been able to recreate the problem up till now. But "knowing" there is a bug helps develop a theory based upon the code. CreateCheckpoint() increments the epoch on the master at the next checkpoint after wraparound. (I'd be happier if there was an explicit link between those two points; there's not but I can't yet see a problem). When the standby receives the checkpoint record, it stores the information in 2 places: i) directly into ControlFile->checkPointCopy ii) and then into XLogCtl when a safe restartpoint occurs It's possible that a safe restartpoint could be delayed. When that happens, the XLogCtl copy grows stale. If the delay is long enough, then the NextXid counter will increase and will eventually be higher than the last XLogCtl, causing the epoch returned by GetNextXidAndEpoch() to go backwards by 1. If it is delayed even further it would wrap again and increase again by one, then decrease, then increase. Given enough time and/or a very busy server using lots of xids. At the same time, when we do UpdateControlFile() the other copy of the epoch is written to the controlfile, so the controlfile shows the accurate value for the epoch. So I can explain how we get two different answers from the standby, and I can explain how the error is very hard to reproduce and apparently transient. I can't explain anything involving the master. The key is "what delays the restartpoint?". And the answer there is probably bugs in index code which purport to see invalid pages when they probably shouldn't be there. So a REINDEX would likely help. I'll look at a patch to improve this. Definite bug and will be backpatched. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers