Hi, On 2015-09-10 17:15:26 +0200, Fabien COELHO wrote: > > >Thanks for the hints! Two-part v12 attached fixes these. > > Here is a v13, which is just a rebase after 1aba62ec.
I'm working on this patch, to get it into a state I think it'd be commitable. In my performance testing it showed that calling PerformFileFlush() only at segment boundaries and in CheckpointWriteDelay() can lead to rather spikey IO - not that surprisingly. The sync in CheckpointWriteDelay() is problematic because it only is triggered while on schedule, and not when behind. My testing seems to show that just adding a limit of 32 buffers to FileAsynchronousFlush() leads to markedly better results. I wonder if mmap() && msync(MS_ASYNC) isn't a better replacement for sync_file_range(SYNC_FILE_RANGE_WRITE) than posix_fadvise(DONTNEED). It might even be possible to later approximate that on windows using FlushViewOfFile(). As far as I can see the while (nb_spaces != 0)/NextBufferToWrite() logic doesn't work correctly if tablespaces aren't actually sorted. I'm actually inclined to fix this by simply removing the flag to enable/disable sorting. Having defined(HAVE_SYNC_FILE_RANGE) || defined(HAVE_POSIX_FADVISE) in so many places looks ugly, I want to push that to the underlying functions. If we add a different flushing approach we shouldn't have to touch several places that don't actually really care. I've replaced the NextBufferToWrite() logic with a binaryheap.h heap - seems to work well, with a bit less code actually. I'll post this after some more cleanup & testing. I've also noticed that sleeping logic in CheckpointWriteDelay() isn't particularly good. In high throughput workloads the 100ms sleep is too long, leading to bursty IO behaviour. If 1k+ buffers a written out a second 100ms is a rather long sleep. For another that we only sleep 100ms when the write rate is low makes the checkpoint finish rather quickly - on a slow disk (say microsd) that can cause unneccesary slowdowns for concurrent activity. ISTM we should calculate the sleep time in a better way. The SIGHUP behaviour is also weird. Anyway, this probably belongs on a new thread. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers