Hello, all. I think that there is room for improvement in WAL. Here is a patch for it. - Multiple pages are written in one write() if it is contiguous. - Add 'open_direct' to wal_sync_method.
WAL writer writes one page in one write(). This is not efficient when wal_sync_method is 'open_sync', because the writer waits for IO completions at each write(). Multipage-writer can reduce syscalls and improve IO throughput. 'open_direct' uses O_DIRECT instead of O_SYNC. O_DIRECT implies synchronous writing, so it may show the tendency like open_sync. But maybe it can reduce memcpy() and save OS's disk cache memory. I benchmarked this patch with pgbench. It works well and improved 50% of tps on my machine. WAL seems to be bottle-neck on machines with poor disks. This patch has not yet tested enough. I would like it to be examined much and taken into PostgreSQL. There are still many TODOs: * Is this logic really correct? - O_DIRECT_BUFFER_ALIGN should be adjusted to runtime, not compile time. - Consider to use writev() instead of write(). Buffers are noncontiguous when WAL ring buffer rotates. - If wan_sync_method is not open_direct, XLOG_EXTRA_BUFFERS can be 0. Sincerely, ITAGAKI Takahiro -- pgbench result -- $ ./pgbench -s 100 -c 50 -t 400 - 8.0.0 default + fsync: tps = 20.630632 (including connections establishing) tps = 20.636768 (excluding connections establishing) - multipage-writer + open_direct: tps = 33.761917 (including connections establishing) tps = 33.778320 (excluding connections establishing) Environment: OS : Linux kernel 2.6.9 CPU : Pentium 4 3GHz disk : ATA 5400rpm (Data and WAL are placed on same partition.) memory : 1GB config : shared_buffers=10000, wal_buffers=256, XLOG_SEG_SIZE=256MB, checkpoint_segment=4 --- ITAGAKI Takahiro <[EMAIL PROTECTED]> NTT Cyber Space Laboratories Nippon Telegraph and Telephone Corporation.
xlog.diff
Description: Binary data
---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq