On Sat, Aug 30, 2014 at 8:50 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Andres Freund <and...@2ndquadrant.com> writes: >> On 2014-08-27 19:23:04 +0300, Heikki Linnakangas wrote: >>> A long time ago, Itagaki Takahiro wrote a patch sort the buffers and write >>> them out in order >>> (http://www.postgresql.org/message-id/flat/20070614153758.6a62.itagaki.takah...@oss.ntt.co.jp). >>> The performance impact of that was inconclusive, but one thing that it >>> allows nicely is to interleave the fsyncs, so that you write all the buffers >>> for one file, then fsync it, then next file and so on. > >> ... >> So, *very* clearly sorting is a benefit. > > pg_bench alone doesn't convince me on this. The original thread found > cases where it was a loss, IIRC; you will need to test many more than > one scenario to prove the point.
The same objection came up last time I tried to push for sorted checkpoints. I did not find any reference to where it caused a loss, nor was I able to come up with a case where writing out in arbitrary order would be better than writing out in file sequential order. In fact if we ask for low latency this means that the OS must keep the backlog small eliminating any chance of write combining writes that arrive out of order. I have a use case where the system continuously loads data into time partitioned indexed tables, at every checkpoint all of the indexes of the latest partition need to be written out. The only way I could get the write out to happen with sequential I/O was to set checkpoint_completion_target to zero and ensure OS cache allows for enough dirty pages to absorb the whole checkpoint. The fsync that followed did obviously nasty things to latency. Whereas sorted checkpoints were able to do sequential I/O with checkpoint spreading and low latency tuned OS virtual memory settings. I can create a benchmark that shows this behavior if you need additional data points to pgbench's OLTP workload to convince you that sorting checkpoint writes is a good idea. I did just come up with a case where plain sorting might cause an issue. If the writes go to different I/O devices then naive sorting will first use one device then the other, whereas arbitrary writing will load balance between the devices. Assuming that separate tablespaces are used for separate I/O devices, it should be enough to just interleave writes of each tablespace, weighed by the amount of writes per tablespace. Regards, Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers