On Thu, 2009-11-12 at 17:03 +0900, Fujii Masao wrote: > On Thu, Nov 12, 2009 at 4:32 PM, Heikki Linnakangas > <heikki.linnakan...@enterprisedb.com> wrote: > > Fujii Masao wrote: > >> The problem is that fsync needs to be issued too frequently, which would > >> be harmless in asynchronous replication, but not in synchronous one. > >> A transaction would have to wait for the primary's and standby's fsync > >> before returning a "success" to a client. > >> > >> So I'm inclined to change the startup process and bgwriter, instead of > >> walreceiver, so as to fsync the WAL for the WAL rule. > > > > Let's keep it simple for now. Just make the walreceiver do the fsync. We > > can optimize later. For now, we're only going to have async mode anyway. > > Okey, I'll do that; the walreceiver issues the fsync for each arrival of > the WAL records, and the startup process replays only the records already > fsynced.
I agree with you, though it has taken some time to understand what you said and at first my reaction was to disagree. I think the responses you got on this are because you dived straight in with a question before explaining other things around this. We already have a number of options for how to handle incoming WAL. We can choose to fsync or not when WAL arrives. Choosing *not* to fsync would be the typical choice because it provides reasonable performance; fsyncing after each transaction commit would be worse. In any case, if WAL receiver does the fsyncs then we will get worse performance. If we reduce the number of fsyncs it does we just get spiky behaviour around the fsyncs. If recovery starts reading WAL records that have not been fsynced then we may need to flush a shared buffer to disk that depends upon a non-fsynced(yet) WAL record. Fsyncing WAL after *every* WAL record is going to make performance suck even worse and is completely out of the question. So implementing the fsync-WAL-before-buffer-flush rule during recovery makes much more sense. It's also only small change during XlogFlush(). Another way of doing this would be to only allow recovery to progress as far as has been fsynced. That seems a more plausible approach, but would lead to delays if we had a small number of long write transactions. The benefit of streaming is that it potentially allows us to keep as near to real-time recovery as possible. So overall, yes, we need to do as you suggested: implement WAL rule in recovery. WALreceiver smoothly does write(), Startup replays and we leave the WAL file fsyncs to be performed by the bgwriter. But I also agree with Heikki. Let's plan to do this later in this release. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers