Re: [HACKERS] sync_file_range()

Tom Lane Mon, 19 Jun 2006 18:36:38 -0700

Greg Stark <[EMAIL PROTECTED]> writes:
> Come to think of it I wonder whether there's anything to be gained by using
> smaller files for tables. Instead of 1G files maybe 256M files or something
> like that to reduce the hit of fsyncing a file.


Actually probably not.  The weak part of our current approach is that we
tell the kernel "sync this file", then "sync that file", etc, in a more
or less random order.  This leads to a probably non-optimal sequence of
disk accesses to complete a checkpoint.  What we would really like is a
way to tell the kernel "sync all these files, and let me know when
you're done" --- then the kernel and hardware have some shot at
scheduling all the writes in an intelligent fashion.

sync_file_range() is not that exactly, but since it lets you request
syncing and then go back and wait for the syncs later, we could get the
desired effect with two passes over the file list.  (If the file list
is longer than our allowed number of open files, though, the extra
opens/closes could hurt.)

Smaller files would make the I/O scheduling problem worse not better.
Indeed, I've been wondering lately if we shouldn't resurrect
LET_OS_MANAGE_FILESIZE and make that the default on systems with
largefile support.  If nothing else it would cut down on open/close
overhead on very large relations.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Re: [HACKERS] sync_file_range()

Reply via email to