Greg Stark <[EMAIL PROTECTED]> writes: > Come to think of it I wonder whether there's anything to be gained by using > smaller files for tables. Instead of 1G files maybe 256M files or something > like that to reduce the hit of fsyncing a file.
Actually probably not. The weak part of our current approach is that we tell the kernel "sync this file", then "sync that file", etc, in a more or less random order. This leads to a probably non-optimal sequence of disk accesses to complete a checkpoint. What we would really like is a way to tell the kernel "sync all these files, and let me know when you're done" --- then the kernel and hardware have some shot at scheduling all the writes in an intelligent fashion. sync_file_range() is not that exactly, but since it lets you request syncing and then go back and wait for the syncs later, we could get the desired effect with two passes over the file list. (If the file list is longer than our allowed number of open files, though, the extra opens/closes could hurt.) Smaller files would make the I/O scheduling problem worse not better. Indeed, I've been wondering lately if we shouldn't resurrect LET_OS_MANAGE_FILESIZE and make that the default on systems with largefile support. If nothing else it would cut down on open/close overhead on very large relations. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend