On Mon, Oct 3, 2022 at 7:29 PM Michael Paquier <mich...@paquier.xyz> wrote: > On Mon, Oct 03, 2022 at 04:03:12PM +1300, Thomas Munro wrote: > > So I think that setting is_lnk = false is good enough here. Do > > you see a hole in it? > > I cannot think on one, on top of my head. Thanks for the > explanation.
Some things I learned while trying to understand how I managed to introduce that bug despite reading and testing: * the code in pgunlink() has comments saying that its retry loop is to handle sharing violations * in fact that retry loop also comes into play for STATUS_DELETE_PENDING * that case is fairly well hidden, because to reach it you need to unlink(pathname) twice! the second call will wait up to 10 seconds for handles to close and then report ENOENT, allowing rmdir(parent) to succeed * I guess this code is relying on that double-unlink to block until the directory is empty? * you wouldn't notice any of this if you were testing on Windows 10 on a desktop/laptop/VM, because it now uses POSIX semantics for unlink on NTFS, so the first unlink truly (synchronously) unlinks (no more STATUS_DELETE_PENDING) * Server 2019, as used on CI, still uses the traditional NT semantics (unlink is asynchronous, when all handles closes) * the fix I proposed has the right effect (I will follow up with tests to demonstrate) I'll post my tests for this and a bunch more things I figured out shortly in a new Windows-filesystem-semantics-omnibus thread.