On Tue, Mar 6, 2012 at 1:50 AM, Heikki Linnakangas <heikki.linnakan...@enterprisedb.com> wrote: >> + * An xlog-switch record consumes all the remaining space on the >> + * WAL segment. We have already reserved it for us, but we still >> need >> + * to make sure it's been allocated and zeroed in the WAL buffers >> so >> + * that when the caller (or someone else) does XLogWrite(), it can >> + * really write out all the zeros. >> >> Why do we need to write out all the remaining space with zeros? In >> current master, we don't do that. A recovery code ignores the data >> following XLOG_SWITCH record, so I don't think that's required. > > > It's to keep the logic simpler. Before the patch, an xlog-switch just > initialized the next page in the WAL buffers to insert to, to be the first > page in the next segment. With this patch, we rely on a simple linear > mapping from an XLogRecPtr to the WAL buffer that should contain that page > (see XLogRecPtrToBufIdx()). Such a mapping is not possible if you sometimes > skip over pages in the WAL buffers, so we allocate the buffers for those > empty pages, too. Note that this means that an xlog-switch can blow through > all your WAL buffers. > > We could probably optimize that so that you don't need to actually write() > and fsync() all the zeros, perhaps by setting a flag on the WAL buffer to > indicate that it only contains padding for an xlog-switch. However, I don't > see any easy way to avoid blowing the cache. > > I'm thinking that xlog-switching happens so seldom, and typically on a > fairly idle system, so we don't need to optimize it much. I guess we should > measure the impact, though..
Online backup which forces an xlog-switch twice might be performed under a certain amount of load. So, to avoid the performance spike during online backup, I think that the optimization which skips write() and fsync() of the padding bytes is helpful. > On 22.02.2012 03:34, Fujii Masao wrote: >> When I ran the long-running performance test, I encountered the following >> panic error. >> >> PANIC: could not find WAL buffer for 0/FF000000 >> >> 0/FF000000 is the xlog file boundary, so the patch seems to handle >> the xlog file boundary incorrectly. In the patch, current insertion lsn >> is advanced by directly incrementing XLogRecPtr.xrecoff as follows. >> But to handle the xlog file boundary correctly, we should use >> XLByteAdvance() for that, instead? > > Thanks, fixed this, too. > > I made the locking a bit more strict in WaitXLogInsertionsToFinish(), so > that it grabs the insertpos_lck to read nextslot. I previously thought that > was not necessary, assuming that reading/writing an int32 is atomic, but I'm > afraid there might be memory-ordering issues where the CurrPos of the most > recent slot has not become visible to other backends yet, while the > advancing of nextslot has. > > That particular issue would be very hard to hit in practice, so I don't know > if this could explain the recovery failures that Jeff saw. I got the test > script running (thanks for that Jeff!), but unfortunately have not seen any > failures yet (aside from the issue with crossing xlogid boundary), with > either this or the older version of the patch. > > Attached is a new version of the patch. Thanks for the new patch! In this new patch, I again was able to reproduce the assertion failure which I described on the upthread. http://archives.postgresql.org/message-id/CAHGQGwGRuNJ%3D_ctXwteNkFkdvMDNFYxFdn0D1cd-CqL0OgNCLg%40mail.gmail.com $ uname -a Linux hermes 3.0.0-16-generic #28-Ubuntu SMP Fri Jan 27 17:50:54 UTC 2012 i686 i686 i386 GNU/Linux Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers