Hello Kuroda-san,

25.11.2023 18:19, Hayato Kuroda (Fujitsu) wrote:
Thanks for attaching a program. This helps us to understand the issue.
I wanted to confirm your env - this failure was occurred on windows server 
XXXX, right?

I see that behavior on:
Windows 10 Version 1607 (OS Build 14393.0)
Windows Server 2016 Version 1607 (OS Build 14393.0)
Windows Server 2019 Version 1809 (OS Build 17763.1)

But it's not reproduced on:
Windows 10 Version 1809 (OS Build 17763.1) (triple-checked)
Windows Server 2019 Version 1809 (OS Build 17763.592)
Windows 10 Version 22H2 (OS Build 19045.3693)
Windows 11 Version 21H2 (OS Build 22000.613)

So it looks like the failure occurs depending not on Windows edition, but
rather on it's build. For Windows Server 2019 the "good" build is
somewhere between 17763.1 and 17763.592, but for Windows 10 it's between
14393.0 and 17763.1.
(Maybe there was some change related to FILE_DISPOSITION_POSIX_SEMANTICS/
FILE_DISPOSITION_ON_CLOSE implementation; I don't know where to find
information about that change.)

It's also interesting, what is full version/build of OS on drongo and
fairywren.

That is, my idea was to try removing a file through renaming it as a fast
path (thus avoiding that troublesome state DELETE PENDING), and if that
fails, to perform removal as before. May be the whole function might be
simplified, but I'm not sure about special cases yet.
I felt that your result showed pgrename() would be more rarely delayed than 
unlink().
That's why a file which has original name would not exist when subsequent 
open() was called.

I think that's because unlink() is performed asynchronously on those old
Windows versions, but rename() is always synchronous.

* IIUC, the important points is the latter part, which waits until the status is
    changed. Based on that, can we remove a double rmtree() from
cleanup_output_dirs()?
    They seems to be add for the similar motivation.
I couldn't yet reproduce a failure, which motivated that doubling (IIUC, it
was observed in [1]), with c28911750 reverted, so I need more time to
research that issue to answer this question.
Yeah, as the first place, this failure seldom occurred....

I've managed to reproduce that issue (or at least a situation that
manifested similarly) with a sleep added in miscinit.c:
        ereport(IsPostmasterEnvironment ? LOG : NOTICE,
                        (errmsg("database system is shut down")));
+       pg_usleep(500000L);

With this change, I get the same warning as in [1] when running in
parallel 10 tests 002_pg_upgrade with a minimal olddump (on iterations
33, 46, 8). And with my PoC patch applied, I could see the same warning
as well (on iteration 6).

I believe that's because rename() can't rename a directory containing an
open file, just as unlink() can't remove it.

In the light of the above, I think that the issue in question should be
fixed in accordance with/as a supplement to [2].

[1] https://www.postgresql.org/message-id/20230131172806.GM22427%40telsasoft.com
[2] 
https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BajSQ_8eu2AogTncOnZ5me2D-Cn66iN_-wZnRjLN%2Bicg%40mail.gmail.com

Best regards,
Alexander


Reply via email to