Re: [HACKERS] gcc 4.6 and hot standby
Mark Kirkwood mark.kirkw...@catalyst.net.nz writes: On 09/06/11 06:58, Alex Hunsaker wrote: Yeah :-). However ill note it looks like its the default compiler for fedora 15, ubuntu natty and debian sid. FWIW Ubuntu natty uses gcc 4.5.2, probably just as as well in the light of your findings :-) I've been able to reproduce this on released Fedora 15, and sure enough it is a compiler bug. The problem comes from these fragments of ReadRecord(): ReadRecord(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt) { XLogRecPtrtmpRecPtr = EndRecPtr; if (RecPtr == NULL) RecPtr = tmpRecPtr; targetRecOff = RecPtr-xrecoff % XLOG_BLCKSZ; if (targetRecOff == 0) tmpRecPtr.xrecoff += pageHeaderSize; record = (XLogRecord *) ((char *) readBuf + RecPtr-xrecoff % XLOG_BLCKSZ); gcc 4.6.0 is assuming that the value it first fetches for RecPtr-xrecoff is still usable for computing the record pointer, despite the possible intervening update of tmpRecPtr. That of course leads to record pointing to the wrong place, which results in an incorrect conclusion that we're looking at an invalid record header, which leads to killing and restarting the walreceiver. I haven't traced what happens after that, but apparently in standby mode we'll come back to ReadRecord with a record pointer that's already advanced over the page header, else it'd be an infinite loop. Note that this means that crash recovery, not only standby mode, is broken with this compiler. I've filed a bug report for this: https://bugzilla.redhat.com/show_bug.cgi?id=712480 but I wouldn't care to hold my breath while waiting for a fix to appear upstream, let alone propagate to all the distros already using 4.6.0. I think we need a workaround. The obvious question to ask here is why are we updating tmpRecPtr.xrecoff and not RecPtr-xrecoff? The latter choice would be a lot more readable, since the immediately surrounding code doesn't refer to tmpRecPtr. I think the idea is to guarantee that the caller's referenced record pointer (if any) isn't modified, but if RecPtr is not pointing at tmpRecPtr here, we have got big problems anyway. So I'm tempted to propose this fix: if (targetRecOff == 0) { /* * Can only get here in the continuing-from-prev-page case, because * XRecOffIsValid eliminated the zero-page-offset case otherwise. Need * to skip over the new page's header. */ - tmpRecPtr.xrecoff += pageHeaderSize; + Assert(RecPtr == tmpRecPtr); + RecPtr-xrecoff += pageHeaderSize; targetRecOff = pageHeaderSize; } Another possibility, which might be less ugly, is to delete the above code entirely and handle the page-header case in the RecPtr == NULL branch a few lines above. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gcc 4.6 and hot standby
On Fri, Jun 10, 2011 at 12:38, Tom Lane t...@sss.pgh.pa.us wrote: I've been able to reproduce this on released Fedora 15, and sure enough it is a compiler bug. The problem comes from these fragments of ReadRecord(): [ ... ] Whoa, awesome. I spent a few more hours comparing the assembly-- then I tried compiling a subset of xlog.c with some educated guesses as to what function might be getting mis-compiled. Obviously my guesses were not educated enough! :-) I've filed a bug report for this: https://bugzilla.redhat.com/show_bug.cgi?id=712480 but I wouldn't care to hold my breath while waiting for a fix to appear upstream, let alone propagate to all the distros already using 4.6.0. I wouldn't hold my breath either. I think we need a workaround. The obvious question to ask here is why are we updating tmpRecPtr.xrecoff and not RecPtr-xrecoff? The latter choice would be a lot more readable, since the immediately surrounding code doesn't refer to tmpRecPtr. I think the idea is to guarantee that the caller's referenced record pointer (if any) isn't modified, but if RecPtr is not pointing at tmpRecPtr here, we have got big problems anyway. Hrm, Couldn't we change all the references to tmpRecPtr to use RecPtr instead? (Except of course where we assign RecPtr = tmpRecPtr); I think that would make the code look a lot less confused. Something like the attached? FYI Im happy to test whatever you fix you propose for a few days on this machine. It gets a fair amount of traffic... hopefully enough to exercise some possible corner cases. *** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *** *** 3681,3694 ReadRecord(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt) */ if (XLOG_BLCKSZ - (RecPtr-xrecoff % XLOG_BLCKSZ) SizeOfXLogRecord) { ! NextLogPage(tmpRecPtr); /* We will account for page header size below */ } ! if (tmpRecPtr.xrecoff = XLogFileSize) { ! (tmpRecPtr.xlogid)++; ! tmpRecPtr.xrecoff = 0; } } else --- 3681,3694 */ if (XLOG_BLCKSZ - (RecPtr-xrecoff % XLOG_BLCKSZ) SizeOfXLogRecord) { ! NextLogPage(RecPtr); /* We will account for page header size below */ } ! if (RecPtr-xrecoff = XLogFileSize) { ! (RecPtr-xlogid)++; ! RecPtr-xrecoff = 0; } } else *** *** 3725,3731 retry: * XRecOffIsValid eliminated the zero-page-offset case otherwise. Need * to skip over the new page's header. */ ! tmpRecPtr.xrecoff += pageHeaderSize; targetRecOff = pageHeaderSize; } else if (targetRecOff pageHeaderSize) --- 3725,3731 * XRecOffIsValid eliminated the zero-page-offset case otherwise. Need * to skip over the new page's header. */ ! RecPtr-xrecoff += pageHeaderSize; targetRecOff = pageHeaderSize; } else if (targetRecOff pageHeaderSize) *** a/src/include/access/xlog_internal.h --- b/src/include/access/xlog_internal.h *** *** 154,166 typedef XLogLongPageHeaderData *XLogLongPageHeader; /* Align a record pointer to next page */ #define NextLogPage(recptr) \ do { \ ! if (recptr.xrecoff % XLOG_BLCKSZ != 0) \ ! recptr.xrecoff += \ ! (XLOG_BLCKSZ - recptr.xrecoff % XLOG_BLCKSZ); \ ! if (recptr.xrecoff = XLogFileSize) \ { \ ! (recptr.xlogid)++; \ ! recptr.xrecoff = 0; \ } \ } while (0) --- 154,166 /* Align a record pointer to next page */ #define NextLogPage(recptr) \ do { \ ! if (recptr-xrecoff % XLOG_BLCKSZ != 0) \ ! recptr-xrecoff += \ ! (XLOG_BLCKSZ - recptr-xrecoff % XLOG_BLCKSZ); \ ! if (recptr-xrecoff = XLogFileSize) \ { \ ! (recptr-xlogid)++; \ ! recptr-xrecoff = 0; \ } \ } while (0) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gcc 4.6 and hot standby
Alex Hunsaker bada...@gmail.com writes: On Fri, Jun 10, 2011 at 12:38, Tom Lane t...@sss.pgh.pa.us wrote: I think we need a workaround. My second idea about moving the test up doesn't work, because we can't know the page header size until after we've read the page. But I've verified that the attached patch does make the problem go away on my F15 box. Hrm, Couldn't we change all the references to tmpRecPtr to use RecPtr instead? (Except of course where we assign RecPtr = tmpRecPtr); I think that would make the code look a lot less confused. Something like the attached? Yeah, we could do that too; slightly modified version of your change included in the attached. regards, tom lane diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 5c3ca479fb33fff646e3a7b08b53efea92b9a97f..aa0b0291ee1c7781a36c62e3d89abbc98d3b8499 100644 *** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *** ReadRecord(XLogRecPtr *RecPtr, int emode *** 3728,3750 RecPtr = tmpRecPtr; /* ! * Align recptr to next page if no more records can fit on the current ! * page. */ if (XLOG_BLCKSZ - (RecPtr-xrecoff % XLOG_BLCKSZ) SizeOfXLogRecord) ! { ! NextLogPage(tmpRecPtr); ! /* We will account for page header size below */ ! } ! if (tmpRecPtr.xrecoff = XLogFileSize) { ! (tmpRecPtr.xlogid)++; ! tmpRecPtr.xrecoff = 0; } } else { if (!XRecOffIsValid(RecPtr-xrecoff)) ereport(PANIC, (errmsg(invalid record offset at %X/%X, --- 3728,3759 RecPtr = tmpRecPtr; /* ! * RecPtr is pointing to end+1 of the previous WAL record. We must ! * advance it if necessary to where the next record starts. First, ! * align to next page if no more records can fit on the current page. */ if (XLOG_BLCKSZ - (RecPtr-xrecoff % XLOG_BLCKSZ) SizeOfXLogRecord) ! NextLogPage(*RecPtr); ! /* Check for crossing of xlog segment boundary */ ! if (RecPtr-xrecoff = XLogFileSize) { ! (RecPtr-xlogid)++; ! RecPtr-xrecoff = 0; } + + /* + * If at page start, we must skip over the page header. But we can't + * do that until we've read in the page, since the header size is + * variable. + */ } else { + /* + * In this case, the passed-in record pointer should already be + * pointing to a valid record starting position. + */ if (!XRecOffIsValid(RecPtr-xrecoff)) ereport(PANIC, (errmsg(invalid record offset at %X/%X, *** retry: *** 3773,3783 if (targetRecOff == 0) { /* ! * Can only get here in the continuing-from-prev-page case, because ! * XRecOffIsValid eliminated the zero-page-offset case otherwise. Need ! * to skip over the new page's header. */ ! tmpRecPtr.xrecoff += pageHeaderSize; targetRecOff = pageHeaderSize; } else if (targetRecOff pageHeaderSize) --- 3782,3794 if (targetRecOff == 0) { /* ! * At page start, so skip over page header. The Assert checks that ! * we're not scribbling on caller's record pointer; it's OK because we ! * can only get here in the continuing-from-prev-record case, since ! * XRecOffIsValid rejected the zero-page-offset case otherwise. */ ! Assert(RecPtr == tmpRecPtr); ! RecPtr-xrecoff += pageHeaderSize; targetRecOff = pageHeaderSize; } else if (targetRecOff pageHeaderSize) diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h index eeccdce31d076322cc5431117c6705b839b1b162..7e39630c1bf5d7cbf1a721b641a9481069e92816 100644 *** a/src/include/access/xlog_internal.h --- b/src/include/access/xlog_internal.h *** typedef XLogLongPageHeaderData *XLogLong *** 154,166 /* Align a record pointer to next page */ #define NextLogPage(recptr) \ do { \ ! if (recptr.xrecoff % XLOG_BLCKSZ != 0) \ ! recptr.xrecoff += \ ! (XLOG_BLCKSZ - recptr.xrecoff % XLOG_BLCKSZ); \ ! if (recptr.xrecoff = XLogFileSize) \ { \ ! (recptr.xlogid)++; \ ! recptr.xrecoff = 0; \ } \ } while (0) --- 154,166 /* Align a record pointer to next page */ #define NextLogPage(recptr) \ do { \ ! if ((recptr).xrecoff % XLOG_BLCKSZ != 0) \ ! (recptr).xrecoff += \ ! (XLOG_BLCKSZ - (recptr).xrecoff % XLOG_BLCKSZ); \ ! if ((recptr).xrecoff = XLogFileSize) \ { \ ! ((recptr).xlogid)++; \ ! (recptr).xrecoff = 0; \ } \ } while (0) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gcc 4.6 and hot standby
On Fri, Jun 10, 2011 at 14:24, Tom Lane t...@sss.pgh.pa.us wrote: Alex Hunsaker bada...@gmail.com writes: Hrm, Couldn't we change all the references to tmpRecPtr to use RecPtr instead? (Except of course where we assign RecPtr = tmpRecPtr); I think that would make the code look a lot less confused. Something like the attached? Yeah, we could do that too; slightly modified version of your change included in the attached. A cassert enabled build seems to be working, ill leave it running over the weekend... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gcc 4.6 and hot standby
Alex Hunsaker bada...@gmail.com writes: So I've been delaying moving some production boxes over to 9.0.4 from 2011-06-08 11:41:03 MDT [6078]: [1-1] user= FATAL: terminating walreceiver process due to administrator command [ repeats... ] I suppose the next step is to narrow it down to a specific flag -O2 uses... But I thought I would post here first-- maybe someone else has hit this? Or maybe someone has a bright idea on how to narrow this down? Maybe using a prerelease gcc version isn't such a hot idea for production. It's very, very, very difficult to see how the behavior you describe isn't a compiler bug. (Well, I could also believe that something external is repeatedly hitting the walreceiver with a SIGTERM, but it's hard to square that with the behavior changing when you recompile with different -O levels ...) It might be useful to strace the postmaster and walreceiver processes just to see if any signal is actually being sent or received. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gcc 4.6 and hot standby
On Wed, Jun 8, 2011 at 12:12, Alex Hunsaker bada...@gmail.com wrote: So I've been delaying moving some production boxes over to 9.0.4 from 9.0.2 because hot standby fails with: (this is on the hot standby machine that connects to the master) [ ...] 2011-06-08 11:41:03 MDT [6072]: [18-1] user= LOG: invalid record length at 86/E3B32010 2011-06-08 11:41:03 MDT [6078]: [1-1] user= FATAL: terminating walreceiver process due to administrator command [ repeats... ] [...] I then tired various optimization levels on 4.6: -O0: works -O1: works -O2: fails -Os: works So I tracked it down to -fgcse, that is CFLAGS=-O2 -fno-gcse ./configure works. I then took a few guesses and compiled all of postgres with -O2, then manually recompiled xlog.c with -f-no-gcse. that combination seems to work. [ One thing im not sure is why -Os works, I tried -O2 and added all the -fno-XXX options it says -Os adds. I suppose its either they turn off/on other optimizations the man page does not mention, or I guess thats compiler bugs for ya ] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gcc 4.6 and hot standby
On Wed, Jun 8, 2011 at 12:49, Tom Lane t...@sss.pgh.pa.us wrote: Alex Hunsaker bada...@gmail.com writes: So I've been delaying moving some production boxes over to 9.0.4 from 2011-06-08 11:41:03 MDT [6078]: [1-1] user= FATAL: terminating walreceiver process due to administrator command [ repeats... ] I suppose the next step is to narrow it down to a specific flag -O2 uses... But I thought I would post here first-- maybe someone else has hit this? Or maybe someone has a bright idea on how to narrow this down? Maybe using a prerelease gcc version isn't such a hot idea for production. It's very, very, very difficult to see how the behavior you describe isn't a compiler bug. Yeah :-). However ill note it looks like its the default compiler for fedora 15, ubuntu natty and debian sid. It might be useful to strace the postmaster and walreceiver processes just to see if any signal is actually being sent or received. Will do. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gcc 4.6 and hot standby
Alex Hunsaker bada...@gmail.com writes: So I tracked it down to -fgcse, that is CFLAGS=-O2 -fno-gcse ./configure works. I then took a few guesses and compiled all of postgres with -O2, then manually recompiled xlog.c with -f-no-gcse. that combination seems to work. Huh, interesting. So the bug must be lurking somewhere around the logic that deals with failedSources: somehow we're getting to the ShutdownWalRcv call in XLogPageRead. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gcc 4.6 and hot standby
Alex Hunsaker bada...@gmail.com writes: On Wed, Jun 8, 2011 at 12:49, Tom Lane t...@sss.pgh.pa.us wrote: It might be useful to strace the postmaster and walreceiver processes just to see if any signal is actually being sent or received. Find it attached. Well, the trace shows exactly what I thought was happening: each time the startup process hits one of these: 2011-06-08 14:01:22 MDT [27781]: [12-1] user= LOG: invalid record length at 86/F4E82010 it sends a SIGTERM to kill the walreceiver, because it thinks this indicates a walreceiver problem. Then we launch another one and manage to process a few more WAL records, lather rinse repeat. So it's interesting that this only happens with a particular gcc version, because it's not apparent to me why it works properly for anybody. Isn't hitting a zero record length an expected case when we run ahead of the amount of WAL produced by the master? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gcc 4.6 and hot standby
On 09/06/11 06:58, Alex Hunsaker wrote: Yeah :-). However ill note it looks like its the default compiler for fedora 15, ubuntu natty and debian sid. FWIW Ubuntu natty uses gcc 4.5.2, probably just as as well in the light of your findings :-) Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gcc 4.6 and hot standby
On Wed, Jun 8, 2011 at 16:20, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: On 09/06/11 06:58, Alex Hunsaker wrote: Yeah :-). However ill note it looks like its the default compiler for fedora 15, ubuntu natty and debian sid. FWIW Ubuntu natty uses gcc 4.5.2, probably just as as well in the light of your findings :-) Yeah I was just looking at distrowatch, its snapshot natty that uses 4.6.0. ubuntu 11.04 uses 4.5.2 like you said. http://distrowatch.com/table.php?distribution=ubuntu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gcc 4.6 and hot standby
On Thu, Jun 9, 2011 at 5:31 AM, Tom Lane t...@sss.pgh.pa.us wrote: So it's interesting that this only happens with a particular gcc version, because it's not apparent to me why it works properly for anybody. Isn't hitting a zero record length an expected case when we run ahead of the amount of WAL produced by the master? At least while walreceiver is running, recovery doesn't go ahead of the last receive location. So that's not an expected case. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers