Re: [HACKERS] New WAL code dumps core trivially on replay of bad data

Amit kapila Fri, 17 Aug 2012 22:52:49 -0700

Tom Lane Sent: Saturday, August 18, 2012 7:16 AM

> The startup process's stack trace is


> #0  0x26fd1c in RecordIsValid (record=0x4008d7a0, recptr=80658424, emode=15)
>    at xlog.c:3713
> 3713            COMP_CRC32(crc, XLogRecGetData(record), len);
> (gdb) bt
> #0  0x26fd1c in RecordIsValid (record=0x4008d7a0, recptr=80658424, emode=15)
>    at xlog.c:3713
> #1  0x270690 in ReadRecord (RecPtr=0x7b03bad0, emode=15,
>    fetching_ckpt=0 '\000') at xlog.c:4006

> The current WAL address is 80658424 == 0x04cebff8, that is just 8 bytes
> short of a page boundary, and what RecordIsValid thinks it is dealing
> with is


> so it merrily tries to compute a checksum on a gigabyte worth of data,
> and soon falls off the end of memory.

> In reality, inspection of the WAL file suggests that this is the end of
> valid data and what should have happened is that replay just stopped.
> The xl_len and so forth shown above are just garbage from off the end of
> what was actually read from the file (everything beyond offset 0xcebff8
> in file 4 is in fact zeroes).

> I'm not sure whether this is just a matter of having failed to
> sanity-check that xl_tot_len is at least SizeOfXLogRecord, or whether
> there is a deeper problem with the new design of continuation records
> that makes it impossible to validate records safely.

Earlier there was a check related to total length in ReadRecord, before it 
calls RecordIsValid()
     if (record->xl_tot_len < SizeOfXLogRecord + record->xl_len ||
               record->xl_tot_len > SizeOfXLogRecord + record->xl_len +
                         XLR_MAX_BKP_BLOCKS * (sizeof(BkpBlock) + BLCKSZ))

I think that missing check of total length has caused this problem. However now 
this check will be different. 

With Regards,
Amit Kapila.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New WAL code dumps core trivially on replay of bad data

Reply via email to