On Thu, Dec 1, 2011 at 9:58 AM, Jeff Janes <jeff.ja...@gmail.com> wrote: > Waiting until the other one completes is how it currently is > implemented, but is it necessary from a correctness view? It seems > like the WALWriteLock only needs to protect the write, and not the > sync (assuming the sync method allows those to be separate actions), > and that there could be multiple fsync requests from different > processes pending at the same time without a correctness problem.
I've wondered about that, too. At least on Linux, the overhead of a system call seems to be pretty low - e.g. the ridiculous number of lseek calls we do on a pgbench -S doesn't seem create much overhead until the inode mutex starts to become contended; and that problem should be fixed in Linux 3.2. But I'm not sure if system calls are similarly cheap on all platforms, or even if it's true on Linux for fsync() in particular. There's another possible approach here, too: instead of waiting to set hint bits until the commit record hits the disk, we could allow the hint bits to set immediately on the condition that we don't write it out until the commit record hits the disk. Bumping the page LSN would do that, but I think that might be problematic since setting hint bits isn't WAL-logged. If so, we could possibly fix that by storing a second LSN for the page out of line, e.g. in the buffer descriptor. That might be even faster than speeding up the WAL flush. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers