Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

Tom Lane Thu, 06 Feb 2014 15:43:13 -0800

Greg Stark <[email protected]> writes:
> On Thu, Feb 6, 2014 at 11:48 PM, Andres Freund <[email protected]> wrote:
>> That's not necessarily true. If e.g. the buffer mapping would change
>> racily, the result write from the bgwriter could very well end up
>> increasing the file size, leaving a hole inbetween its write and the
>> original size.

> a) the segment isn't sparse and b) there were whole segments full of
> nuls between the end of the tables and the final blocks.

> So the file was definitely extended by Postgres, not the OS and the
> bgwriter passes EXTENSION_FAIL which means it wouldn't create those
> intervening segments.

But ... when InRecovery, md.c will create such segments too.  We had
dismissed that on the grounds that the files would be sparse because
of the way md.c creates them.  However, it is real damn hard to see
how the loop in XLogReadBufferExtended could've accessed a bogus block,
other than hardware misfeasance which I don't believe any more than
you do.  The blkno that's passed to that function came directly out
of a WAL record that's in the private memory of the startup process
and recently passed a CRC check.  You'd have to believe some sort
of asynchronous memory clobber inside the startup process.

On the other hand, if _mdfd_getseg did the deed, there's a whole lot
more space for something funny to have happened, because now we're
talking about a buffer being written in preparation for eviction
from shared buffers, long after WAL replay filled it.

So I'm wondering if there's something wrong with our deduction from
file non-sparseness.  In this connection, google quickly found me
a report of XFS "losing" the sparse state of a file across multiple
writes:
http://oss.sgi.com/archives/xfs/2011-06/msg00225.html
I wonder whether that bug or a similar one exists in your production
kernel.

                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

Reply via email to