On Tue, Jan 21, 2014 at 09:20:52PM +0100, Jan Kara wrote:
> > If we're forcing the WAL out to disk because of transaction commit or
> > because we need to write the buffer protected by a certain WAL record
> > only after the WAL hits the platter, then it's fine.  But sometimes
> > we're writing WAL just because we've run out of internal buffer space,
> > and we don't want to block waiting for the write to complete.  Opening
> > the file with O_SYNC deprives us of the ability to control the timing
> > of the sync relative to the timing of the write.
>   O_SYNC has a heavy performance penalty. For ext4 it means an extra fs
> transaction commit whenever there's any metadata changed on the filesystem.
> Since mtime & ctime of files will be changed often, the will be a case very
> often.

Also, there is the issue of writes that don't need sycning being synced
because sync is set on the file descriptor.  Here is output from our
pg_test_fsync tool when run on an SSD with a BBU:

        $ pg_test_fsync
        5 seconds per test
        O_DIRECT supported on this platform for open_datasync and open_sync.
        
        Compare file sync methods using one 8kB write:
        (in wal_sync_method preference order, except fdatasync
        is Linux's default)
                open_datasync                                   n/a
                fdatasync                          8424.785 ops/sec     119 
usecs/op
                fsync                              7127.072 ops/sec     140 
usecs/op
                fsync_writethrough                              n/a
                open_sync                         10548.469 ops/sec      95 
usecs/op
        
        Compare file sync methods using two 8kB writes:
        (in wal_sync_method preference order, except fdatasync
        is Linux's default)
                open_datasync                                   n/a
                fdatasync                          4367.375 ops/sec     229 
usecs/op
                fsync                              4427.761 ops/sec     226 
usecs/op
                fsync_writethrough                              n/a
                open_sync                          4303.564 ops/sec     232 
usecs/op
        
        Compare open_sync with different write sizes:
        (This is designed to compare the cost of writing 16kB
        in different write open_sync sizes.)
-->              1 * 16kB open_sync write          4938.711 ops/sec     202 
usecs/op
-->              2 *  8kB open_sync writes         4233.897 ops/sec     236 
usecs/op
-->              4 *  4kB open_sync writes         2904.710 ops/sec     344 
usecs/op
-->              8 *  2kB open_sync writes         1736.720 ops/sec     576 
usecs/op
-->             16 *  1kB open_sync writes          935.917 ops/sec    1068 
usecs/op
        
        Test if fsync on non-write file descriptor is honored:
        (If the times are similar, fsync() can sync data written
        on a different descriptor.)
                write, fsync, close                7626.783 ops/sec     131 
usecs/op
                write, close, fsync                6492.697 ops/sec     154 
usecs/op
        
        Non-Sync'ed 8kB writes:
                write                            351517.178 ops/sec       3 
usecs/op

-- 
  Bruce Momjian  <br...@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to