Daniel Farina points out to me that the Linux man page for fsync() says "Calling fsync() does not necessarily ensure that the entry in the directory containing the file has also reached disk. For that an explicit fsync() on a file descriptor for the directory is also needed." http://www.kernel.org/doc/man-pages/online/pages/man2/fsync.2.html
That phrase does not exist here http://pubs.opengroup.org/onlinepubs/007908799/xsh/fsync.html This point appears to have been discussed before http://postgresql.1045698.n5.nabble.com/ALTER-DATABASE-SET-TABLESPACE-vs-crash-safety-td1995703.html Tom said "We don't try to "fsync the directory" after a normal table create for instance" which is fine because we don't need to. In the event of a crash a missing table would be recreated during crash recovery. However, that begs the question of what happens with WAL. At present, we do nothing to ensure that "the entry in the directory containing the file has also reached disk". ISTM that we can easily do this, since we preallocate WAL files during RemoveOldXlogFiles() and rarely extend the number of files. So it seems easily possible to fsync the pg_xlog directory at the end of RemoveOldXlogFiles(), which is mostly performed by the bgwriter anyway. It was also noted that "we've always expected the filesystem to take care of its own metadata" which isn't actually stated anywhere in the docs, AFAIK. Perhaps this is an irrelevant problem these days, but would it hurt to fix? Happy to do the patch if we agree. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers