Eryk Sun <eryk...@gmail.com> added the comment:

> Does it make the most sense for us to make .flush() also do an 
> implicit .fsync() (when it's actually a file object)?

Standard I/O in the Windows C runtime supports a "c" commit mode that causes 
fflush to call _commit() on the underlying fd [1]. Perhaps Python should 
support a similar "c" or "s" mode that makes a flush implicitly call fsync / 
_commit. 

But you may not be in control of flushing the file if it's being written to by 
a third-party library or application. Calling os.[l]stat works around the 
problem, but only with NTFS. It doesn't help with FAT32 / exFAT.

FAT filesystems update the last-write time when the file object is flushed or 
closed. It depends on the FO_FILE_MODIFIED flag in the file object or the 
CCB_FLAG_USER_SET_LAST_WRITE (from SetFileTime) in the file object's context 
control block (CCB). But opening, and even flushing, a file doesn't synchronize 
the context of other opens. Thus one can call os.stat (not even a scandir 
problem) repeatedly on a file and observe st_size changing while st_mtime 
remains constant:

    >>> filepath = 'C:/Mount/TestFat32/test/spam.txt'
    >>> f = open(filepath, 'w')
    >>> s = os.stat(filepath); s.st_size, s.st_mtime
    (0, 1593116028.0)

    >>> print('spam', file=f, flush=True)
    >>> s = os.stat(filepath); s.st_size, s.st_mtime
    (6, 1593116028.0)

The last-write time gets updated by closing or flushing the kernel file object 
that was used to write to the file. 

    >>> os.fsync(f.fileno())
    >>> s = os.stat(filepath); s.st_size, s.st_mtime
    (6, 1593116044.0)

Another problem is stale entries for NTFS hard links, which can lead to getting 
a completely incorrect stat result via os.scandir -- wrong timestamps, wrong 
file size, and wrong file attributes.

An NTFS file's MFT record contains its timestamps, size, and attributes in a 
$STANDARD_INFORMATION attribute. This reliable information is what os.[l]stat 
and os.fstat query. But it gets duplicated in per-link $FILE_NAME attributes 
that directories index. The duplicated info for a link gets synchronized to the 
standard info when the link is accessed, but other links to the file do not get 
updated, and their values may be completely wrong. For example (using the scan 
function from my previous post):

    >>> filepath1 = 'C:/Mount/TestNtfs/test/spam1.txt'
    >>> filepath2 = 'C:/Mount/TestNtfs/test/spam2.txt'
    >>> f = open(filepath1, 'w')
    >>> os.link(filepath1, filepath2)
    >>> s = scan(filepath2).stat(); s.st_size, s.st_mtime
    (0, 1593116055.7695396)

    >>> print('spam', file=f, flush=True)
    >>> s = scan(filepath2).stat(); s.st_size, s.st_mtime
    (0, 1593116055.7695396)

    >>> os.fsync(f.fileno())
    >>> s = scan(filepath2).stat(); s.st_size, s.st_mtime
    (0, 1593116055.7695396)

    >>> f.close()
    >>> s = scan(filepath2).stat(); s.st_size, s.st_mtime
    (0, 1593116055.7695396)

As shown, flushing or closing the file object for the "spam1.txt" link is not 
reflected in the entry for the "spam2.txt" link. The directory entry for the 
link is only updated when the link is accessed:

    >>> f = open(filepath2)
    >>> s = scan(filepath2).stat(); s.st_size, s.st_mtime
    (6, 1593116062.2080283)

---

[1] Linking commode.obj should enable commit-mode by default. But it's broken 
because __acrt_stdio_parse_mode is buggy. It initializes _stdio_mode to the 
global _commode value, but then it clobbers it when setting the required "r", 
"w", or "a" open mode.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue41106>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to