On Thu, Apr 21, 2011 at 4:55 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > The traditional standard is that the filesystem is supposed to take > care of its own metadata, and even Linux filesystems have pretty much > figured that out. I don't really see a need for us to be nursemaiding > the filesystem. At most there's a documentation issue here, ie, we > ought to be more explicit about which filesystems and which mount > options we recommend.
To be fair the traditional standard was that filesystem metadata was written synchronously. That is, the creat/rename/unlink calls didn't finish until the data had been written. That was never brilliant but it was simple. It's unclear to me whether that API was decided on because the implementation of anything else was hard or whether it was implemented that way because it was deemed a good idea to define the API that way. I suspect it was the former. As APIs go, having meta-data operations be buffered and reusing fsync on the directory to block until they're written seems as sane as anything else. It's a bit of a pain for us to keep track of which files have been created or deleted in a directory and fsync the directory on checkpoint but that's just because we've already gone to special efforts to keep track of what data is dirty but not done anything to keep track of which directories have been dirtied. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers