[HACKERS] fsync reliability

Simon Riggs Thu, 21 Apr 2011 01:26:46 -0700

Daniel Farina points out to me that the Linux man page for fsync() says
"Calling fsync() does not necessarily ensure that the entry in the directory
       containing the file has also reached disk.  For that an
explicit fsync() on a
       file descriptor for the directory is also needed."
http://www.kernel.org/doc/man-pages/online/pages/man2/fsync.2.html


That phrase does not exist here
http://pubs.opengroup.org/onlinepubs/007908799/xsh/fsync.html

This point appears to have been discussed before
http://postgresql.1045698.n5.nabble.com/ALTER-DATABASE-SET-TABLESPACE-vs-crash-safety-td1995703.html

Tom said
"We don't try to "fsync the
directory" after a normal table create for instance"

which is fine because we don't need to. In the event of a crash a
missing table would be recreated during crash recovery.

However, that begs the question of what happens with WAL. At present,
we do nothing to ensure that "the entry in the directory containing
the file has also reached disk".

ISTM that we can easily do this, since we preallocate WAL files during
RemoveOldXlogFiles() and rarely extend the number of files.
So it seems easily possible to fsync the pg_xlog directory at the end
of RemoveOldXlogFiles(), which is mostly performed by the bgwriter
anyway.

It was also noted that "we've always expected the filesystem to take
care of its own metadata"
which isn't actually stated anywhere in the docs, AFAIK.

Perhaps this is an irrelevant problem these days, but would it hurt to fix?

Happy to do the patch if we agree.

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] fsync reliability

Reply via email to