Re: [HACKERS] fsync reliability

Greg Stark Fri, 22 Apr 2011 07:32:25 -0700

On Thu, Apr 21, 2011 at 4:55 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> The traditional standard is that the filesystem is supposed to take
> care of its own metadata, and even Linux filesystems have pretty much
> figured that out.  I don't really see a need for us to be nursemaiding
> the filesystem.  At most there's a documentation issue here, ie, we
> ought to be more explicit about which filesystems and which mount
> options we recommend.


To be fair the traditional standard was that filesystem metadata was
written synchronously. That is, the creat/rename/unlink calls didn't
finish until the data had been written. That was never brilliant but
it was simple. It's unclear to me whether that API was decided on
because the implementation of anything else was hard or whether it was
implemented that way because it was deemed a good idea to define the
API that way. I suspect it was the former.

As APIs go, having meta-data operations be buffered and reusing fsync
on the directory to block until they're written seems as sane as
anything else. It's a bit of a pain for us to keep track of which
files have been created or deleted in a directory and fsync the
directory on checkpoint but that's just because we've already gone to
special efforts to keep track of what data is dirty but not done
anything to keep track of which directories have been dirtied.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fsync reliability

Reply via email to