New submission from Alexey Izbyshev <izbys...@ispras.ru>:

On POSIX-conforming systems, O_APPEND flag for open() must ensure that no 
intervening file modification occurs between changing the file offset and the 
write operation[1]. In effect, two processes that independently opened the same 
file with O_APPEND can't overwrite each other's data. On Windows, however, the 
Microsoft C runtime implements O_APPEND as an lseek(fd, 0, SEEK_END) followed 
by write(), which obviously doesn't provide the above guarantee. This affects 
both os.open() and the builtin open() Python functions, which rely on _wopen() 
from MSVCRT. A demo is attached.

While POSIX O_APPEND doesn't guarantee the absence of partial writes, the 
guarantee of non-overlapping writes alone is still useful in cases such as 
debug logging from multiple processes without file locking or other 
synchronization. Moreover, for local filesystems, partial writes don't really 
occur in practice (barring conditions such as ENOSPC or EIO).

Windows offers two ways to achieve non-overlapping appends:

1. WriteFile()[2] with OVERLAPPED structure with Offset and OffsetHigh set to 
-1. This is essentially per-write O_APPEND.

2. Using a file handle with FILE_APPEND_DATA access right but without 
FILE_WRITE_DATA access right.

While (1) seems easy to add to FileIO, there are some problems:

* os.write(fd) can't use it without caller's help because it has no way to know 
that the fd was opened with O_APPEND (there is no fcntl() in MSVCRT)

* write() from MSVCRT (currently used by FileIO) performs some additional 
remapping of error codes and checks after it calls WriteFile(), so we'd have to 
emulate that behavior or risk breaking compatibility.

I considered to go for (2) by reimplementing _wopen() via CreateFile(), which 
would also allow us to solve a long-standing issue of missing FILE_SHARE_DELETE 
on file handles, but hit several problems:

* the most serious one is rather silly: we need to honor the current umask to 
possibly create a read-only file, but there is no way to query it without 
changing it, which is not thread-safe. Well, actually, I did discover a way: 
_umask_s(), when called with an invalid mask, returns both EINVAL error and the 
current umask. But this behavior directly contradicts MSDN, which claims that 
_umask_s() doesn't modify its second argument on failure[3]. So I'm not willing 
to rely on this until Microsoft fixes their docs.

* os module exposes some MSVCRT-specific flags for use with os.open() (like 
O_TEMPORARY), which a reimplementation would have to support. It seems easy in 
most cases, but there is O_TEXT, which enables some obscure legacy behavior in 
MSVCRT such as removal of a trailing byte 26 (Ctrl-Z) when a file is opened 
with O_RDWR. More generally, it's unclear to me whether os.open() is explicitly 
intended to be a gateway to MSVCRT and thus support all current and future 
flags or is just expected to work similarly to MSVCRT in common cases.

So in the end I decided to let _wopen() create the initial fd as usual, but 
then fix it up via DuplicateHandle() -- see the PR.

[1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html
[2] 
https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-writefile
[3] 
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/umask-s?view=msvc-160

----------
components: IO, Windows
files: test.py
messages: 382784
nosy: eryksun, izbyshev, paul.moore, steve.dower, tim.golden, vstinner, 
zach.ware
priority: normal
severity: normal
status: open
title: Support POSIX atomicity guarantee of O_APPEND on Windows
type: enhancement
versions: Python 3.10
Added file: https://bugs.python.org/file49661/test.py

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue42606>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to