On Sun, May 01, 2022 at 10:08:53PM +0900, Michael Paquier wrote: > At the end, switching directly from durable_rename_excl() to > durable_rename() should be fine for the WAL segment initialization, > but we could do things a bit more carefully by adding a check on the > file existence before calling durable_rename() and issue a elog(LOG) > if a file is found, giving a mean for the WAL recycling to give up > peacefully as it does now. Per my analysis, the TLI history file > created at the end of recovery ought to issue an elog(ERROR).
My only concern with this approach is that it inevitably introduces a race condition. In most cases, the file existence check will prevent overwrites, but it might not always. Furthermore, we believe that such overwrites either 1) should not happen (e.g., WAL recycling) or 2) won't cause problems if they happen (e.g., when the WAL receiver writes the TLI history file). Also, these races will be difficult to test, so we won't know what breaks when they occur. My instinct is to just let the overwrites happen. That way, we are more likely to catch breakage in tests, and we'll have one less race condition to worry about. I don't mind asserting that the file doesn't exist when we don't expect it to, as that might help catch potential problems in development without affecting behavior in production. If we do want to add file existence checks, I think we'd better add a comment about the potential for race conditions. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com