Eryk Sun <[email protected]> added the comment:
> Does it make the most sense for us to make .flush() also do an
> implicit .fsync() (when it's actually a file object)?
Standard I/O in the Windows C runtime supports a "c" commit mode that causes
fflush to call _commit() on the underlying fd [1]. Perhaps Python should
support a similar "c" or "s" mode that makes a flush implicitly call fsync /
_commit.
But you may not be in control of flushing the file if it's being written to by
a third-party library or application. Calling os.[l]stat works around the
problem, but only with NTFS. It doesn't help with FAT32 / exFAT.
FAT filesystems update the last-write time when the file object is flushed or
closed. It depends on the FO_FILE_MODIFIED flag in the file object or the
CCB_FLAG_USER_SET_LAST_WRITE (from SetFileTime) in the file object's context
control block (CCB). But opening, and even flushing, a file doesn't synchronize
the context of other opens. Thus one can call os.stat (not even a scandir
problem) repeatedly on a file and observe st_size changing while st_mtime
remains constant:
>>> filepath = 'C:/Mount/TestFat32/test/spam.txt'
>>> f = open(filepath, 'w')
>>> s = os.stat(filepath); s.st_size, s.st_mtime
(0, 1593116028.0)
>>> print('spam', file=f, flush=True)
>>> s = os.stat(filepath); s.st_size, s.st_mtime
(6, 1593116028.0)
The last-write time gets updated by closing or flushing the kernel file object
that was used to write to the file.
>>> os.fsync(f.fileno())
>>> s = os.stat(filepath); s.st_size, s.st_mtime
(6, 1593116044.0)
Another problem is stale entries for NTFS hard links, which can lead to getting
a completely incorrect stat result via os.scandir -- wrong timestamps, wrong
file size, and wrong file attributes.
An NTFS file's MFT record contains its timestamps, size, and attributes in a
$STANDARD_INFORMATION attribute. This reliable information is what os.[l]stat
and os.fstat query. But it gets duplicated in per-link $FILE_NAME attributes
that directories index. The duplicated info for a link gets synchronized to the
standard info when the link is accessed, but other links to the file do not get
updated, and their values may be completely wrong. For example (using the scan
function from my previous post):
>>> filepath1 = 'C:/Mount/TestNtfs/test/spam1.txt'
>>> filepath2 = 'C:/Mount/TestNtfs/test/spam2.txt'
>>> f = open(filepath1, 'w')
>>> os.link(filepath1, filepath2)
>>> s = scan(filepath2).stat(); s.st_size, s.st_mtime
(0, 1593116055.7695396)
>>> print('spam', file=f, flush=True)
>>> s = scan(filepath2).stat(); s.st_size, s.st_mtime
(0, 1593116055.7695396)
>>> os.fsync(f.fileno())
>>> s = scan(filepath2).stat(); s.st_size, s.st_mtime
(0, 1593116055.7695396)
>>> f.close()
>>> s = scan(filepath2).stat(); s.st_size, s.st_mtime
(0, 1593116055.7695396)
As shown, flushing or closing the file object for the "spam1.txt" link is not
reflected in the entry for the "spam2.txt" link. The directory entry for the
link is only updated when the link is accessed:
>>> f = open(filepath2)
>>> s = scan(filepath2).stat(); s.st_size, s.st_mtime
(6, 1593116062.2080283)
---
[1] Linking commode.obj should enable commit-mode by default. But it's broken
because __acrt_stdio_parse_mode is buggy. It initializes _stdio_mode to the
global _commode value, but then it clobbers it when setting the required "r",
"w", or "a" open mode.
----------
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue41106>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com