On Sun, May 01, 2022 at 10:08:53PM +0900, Michael Paquier wrote:
> At the end, switching directly from durable_rename_excl() to
> durable_rename() should be fine for the WAL segment initialization,
> but we could do things a bit more carefully by adding a check on the
> file existence before calling durable_rename() and issue a elog(LOG)
> if a file is found, giving a mean for the WAL recycling to give up
> peacefully as it does now.  Per my analysis, the TLI history file
> created at the end of recovery ought to issue an elog(ERROR).

My only concern with this approach is that it inevitably introduces a race
condition.  In most cases, the file existence check will prevent
overwrites, but it might not always.  Furthermore, we believe that such
overwrites either 1) should not happen (e.g., WAL recycling) or 2) won't
cause problems if they happen (e.g., when the WAL receiver writes the TLI
history file).  Also, these races will be difficult to test, so we won't
know what breaks when they occur.

My instinct is to just let the overwrites happen.  That way, we are more
likely to catch breakage in tests, and we'll have one less race condition
to worry about.  I don't mind asserting that the file doesn't exist when we
don't expect it to, as that might help catch potential problems in
development without affecting behavior in production.  If we do want to
add file existence checks, I think we'd better add a comment about the
potential for race conditions.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com


Reply via email to