On Wed, Mar 28, 2012 at 10:54 PM, Simon Riggs <si...@2ndquadrant.com> wrote:
> On Wed, Mar 28, 2012 at 10:24 PM, Simon Riggs <si...@2ndquadrant.com> wrote:
>> On Wed, Mar 28, 2012 at 9:48 PM, Marko Kreen <mark...@gmail.com> wrote:
>>> On Fri, Mar 23, 2012 at 08:52:40AM +0000, Simon Riggs wrote:
>>>> Master pg_controldata - OK txid_current_snapshot() - OK
>>>> Standby pg_controldata - OK txid_current_snapshot() - lower value
>>>
>>> On Skytools list is report about master with slaves, but the
>>> lower value appears on master too:
>>>
>>>  http://lists.pgfoundry.org/pipermail/skytools-users/2012-March/001601.html
>>>
>>> Cc'd original reporter too.
>>
>> Thanks. Am looking.
>
> I can't see how this could happen on the master at all.
>
> On the standby, it can happen if we skip restartpoints for about a
> couple of billion xids. Which would be a problem.
>
> More on this tomorrow.

I've not been able to recreate the problem up till now. But "knowing"
there is a bug helps develop a theory based upon the code.

CreateCheckpoint() increments the epoch on the master at the next
checkpoint after wraparound. (I'd be happier if there was an explicit
link between those two points; there's not but I can't yet see a
problem).

When the standby receives the checkpoint record, it stores the
information in 2 places:
i) directly into ControlFile->checkPointCopy
ii) and then into XLogCtl when a safe restartpoint occurs

It's possible that a safe restartpoint could be delayed. When that
happens, the XLogCtl copy grows stale. If the delay is long enough,
then the NextXid counter will increase and will eventually be higher
than the last XLogCtl, causing the epoch returned by
GetNextXidAndEpoch() to go backwards by 1. If it is delayed even
further it would wrap again and increase again by one, then decrease,
then increase. Given enough time and/or a very busy server using lots
of xids.

At the same time, when we do UpdateControlFile() the other copy of the
epoch is written to the controlfile, so the controlfile shows the
accurate value for the epoch.

So I can explain how we get two different answers from the standby,
and I can explain how the error is very hard to reproduce and
apparently transient. I can't explain anything involving the master.

The key is "what delays the restartpoint?". And the answer there is
probably bugs in index code which purport to see invalid pages when
they probably shouldn't be there. So a REINDEX would likely help.

I'll look at a patch to improve this. Definite bug and will be backpatched.

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to