On Wed, Jan 22, 2014 at 10:48 AM, Andres Freund <and...@2ndquadrant.com> wrote: > Yes, individual operations should be, but you cannot be sure whether a > rename()/unlink() will survive a crash until the directory is > fsync()ed. So, what is one going to do if the unlink suceeded, but the > fsync didn't?
Well, apparently, one is going to PANIC and reinitialize the system. I presume that upon reinitialization we'll decide that the slot is gone, and thus won't recreate it in shared memory. Of course, if the entire system suffers a hard power failure after that and before the directory is succesfully fsync'd, then the slot could reappear on the next startup. Which is also exactly what would happen if we removed the slot from shared memory after doing the unlink, and then the system suffered a hard power failure before the directory contents made it to disk. Except that we also panicked. In the case of shared buffers, the way we handle fsync failures is by not allowing the system to checkpoint until all of the fsyncs succeed. If there's an OS-level reset before that happens, WAL replay will perform the same buffer modifications over again and the next checkpoint will again try to flush them to disk and will not complete unless it does. That forms a closed system where we never advance the redo pointer over the covering WAL record until the changes it covers are on the disk. But I don't think this code has any similar interlock; if it does, I missed it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers