Hi all,
O_DIRECT for WAL writes was discussed at
http://archives.postgresql.org/pgsql-patches/2005-06/msg00064.php
but I have some items that want to be discussed, so I would like to
re-post it to HACKERS.


Bruce Momjian <pgman@candle.pha.pa.us> wrote:

> I think the conclusion from the discussion is that O_DIRECT is in
> addition to the sync method, rather than in place of it, because
> O_DIRECT doesn't have the same media write guarantees as fsync().  Would
> you update the patch to do and see if there is a performance win?

I tested two combinations,
  - fsync_direct: O_DIRECT+fsync()
  - open_direct: O_DIRECT+O_SYNC
to compare them with O_DIRECT on my linux machine.
The pgbench results still shows a performance win:

scale| DBsize | open_sync | fsync=false  | O_DIRECT only| fsync_direct | 
open_direct
-----+--------+-----------+--------------+--------------+--------------+---------------
  10 |  150MB | 252.6 tps | 263.5(+ 4.3%)| 253.4(+ 0.3%)| 253.6(+ 0.4%)| 
253.3(+ 0.3%)
 100 |  1.5GB | 102.7 tps | 117.8(+14.7%)| 147.6(+43.7%)| 148.9(+45.0%)| 
150.8(+46.8%)
    60runs * pgbench -c 10 -t 1000
    on one Pentium4, 1GB mem, 2 ATA disks, Linux 2.6.8

O_DIRECT, fsync_direct and open_direct show the same tendency of performance.
There were a win on scale=100, but no win on scale=10, which is a fully
in-memory benchmark.

The following items still want to be discussed:
- Are their names appropriate?
    Simplify to 'direct'?
- Are both fsync_direct and open_direct necessary?
    MySQL seems to use only O_DIRECT+fsync() combination.
- Is it ok to set the dio buffer alignment to BLCKSZ?
    This is simple way to set the alignment to match many environment.
    If it is not enough, BLCKSZ would be also a problem for direct io.



BTW, IMHO the major benefit of direct io is saving memory. O_DIRECT gives
a hint that OS should not cache WAL files. Without direct io, OS might make
a effort to cache WAL files, which will never be used, and might discard
data file cache.

---
ITAGAKI Takahiro
NTT Cyber Space Laboratories



---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Reply via email to