On Thu, Dec 22, 2016 at 7:00 AM, Michael Paquier <michael.paqu...@gmail.com> wrote: > Hi all, > > 2PC files are created using RecreateTwoPhaseFile() in two places currently: > - at replay on a XLOG_XACT_PREPARE record. > - At checkpoint with CheckPointTwoPhase(). > > Now RecreateTwoPhaseFile() is careful to call pg_fsync() to be sure > that the 2PC files find their way into disk. But one piece is missing: > the parent directory pg_twophase is not fsync'd. At replay this is > more sensitive if there is a PREPARE record followed by a checkpoint > record. If there is a power failure after the checkpoint completes > there is a risk to lose 2PC status files here. > > It seems to me that we really should have CheckPointTwoPhase() call > fsync() on pg_twophase to be sure that no files are lost here. There > is no point to do this operation in RecreateTwoPhaseFile() as if there > are many 2PC transactions to replay performance would be impacted, and > we don't care about the durability of those files until a checkpoint > moves the redo pointer. I have drafted the patch attached to address > this issue.
I agree with this. If no prepared transactions were required to be fsynced CheckPointTwoPhase(), do we want to still fsync the directory? Probably not. May be you want to call fsync_fname(TWOPHASE_DIR, true); if serialized_xacts > 0. The comment just states what has been done earlier in the function, which is documented in the prologue as well. /* * Make sure that the content created is persistent on disk to prevent * data loss in case of an OS crash or a power failure. Each 2PC file * has been already created and flushed to disk individually by * RecreateTwoPhaseFile() using the list of GXACTs available for * normal processing as well as at recovery when replaying individually * each XLOG_XACT_PREPARE record. */ Instead, you may want to just say "If we have flushed any 2PC files, flush the metadata in the pg_twophase directory." Although, I thought that a simple case of creating a persistent table which requires creating a file also would need to flush the directory. I tried to locate the corresponding code to see if it also uses fsync_fname(). I couldn't locate the code. I have looked at mdcreate(), mdpreckpt(), mdpostckpt(). But may be that's not relevant here. > > I am adding that as well to the next CF for consideration. > > Thoughts? > -- > Michael > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers