Re: [HACKERS] fsync reliability

Daniel Farina Sun, 24 Apr 2011 18:15:44 -0700

On Thu, Apr 21, 2011 at 1:26 AM, Simon Riggs <si...@2ndquadrant.com> wrote:
> Daniel Farina points out to me that the Linux man page for fsync() says
> "Calling fsync() does not necessarily ensure that the entry in the directory
>       containing the file has also reached disk.  For that an
> explicit fsync() on a
>       file descriptor for the directory is also needed."
> http://www.kernel.org/doc/man-pages/online/pages/man2/fsync.2.html


I'd also like to point out that even on ext(2|3) there is a special
option, 'dirsync', and directory attribute (see 'chattr') that exists,
mostly to the benefit of the authors of MTAs that use a lot of
metadata manipulation operations, to allow all directory metadata
mangling to be synchronous, to get around non-durable metadata
manipulations (even if you use fsync() a crash between the rename()
and the fsync() will leave you in either the pre-move or post-move
state: it is atomic, and non-durable, the synchronous directory
modification ensures that the return of rename() coincides with the
durability of the rename itself, or so I would think.

I only found this from doing some research about how perform a
two-phase commit between postgres and the file system and reading the
kernel source.  I admit, it's a dusty and obscure corner, but it still
seems in use by said MTAs.

Would a reading and exploration of the kernel code at hand perhaps
help resolve this discussion, one way or another?

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fsync reliability

Reply via email to