On 10/19/20, Steve Dower <steve.do...@python.org> wrote:
> On 15Oct2020 2239, Rob Cliffe via Python-Dev wrote:
>> TLDR: In os.scandir directory entries, atime is always a copy of mtime
>> rather than the actual access time.
>
> Correction - os.stat() updates the access time to _now_, while
> os.scandir() returns the last access time without updating it.

os.stat() shouldn't affect st_atime because it doesn't access the file
data. That has me curious if it can be reproduced.

With NTFS in Windows 10, I'd expect the os.stat() st_atime to change
immediately when the file data is read or modified. With other
filesystems, it may not be updated until the kernel file object that
was used to access the file's data is closed.

Note that updating the access time in NTFS can be disabled by the
"NtfsDisableLastAccessUpdate" value in
"HKLM\System\CurrentControlSet\Control\FileSystem". The default value
in Windows 10 should be 0x80000002, which means the value is system
managed and updating the access time is enabled.

If it's only the access time that changes, the directory entry may be
updated with a significant granularity such as hourly or daily. For
NTFS, it's hourly. To confirm this, wait an hour from the current
access time in the directory entry; open the file; read some data; and
close the file. The access time in the directory entry should be
updated.

For details, download the [MS-FSA] PDF [1] and look for all references
to the following sections:

    * 2.1.4.17 Algorithm for Noting That a File Has Been Modified
    * 2.1.4.19 Algorithm for Noting That a File Has Been Accessed
    * 2.1.4.18 Algorithm for Updating Duplicated Information

Also check the tables in Appendix A, which provide the update
granularity of file time stamps (presumably for directory entries) for
common Windows filesystems.

[1] 
https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-fsa/860b1516-c452-47b4-bdbc-625d344e2041

Going back to my initial message, I can't stress enough that this
problem is at its worst when a file has multiple hardlinks. If a
particular link in a directory wasn't the last link used to access the
file, its duplicated metadata may have the wrong file size, access
time, modify time, and change time (the latter is not reported by
Python). As is, for the current implementation, I'd only rely on the
basic attributes such as whether it's a directory or reparse point
(symlink, mountpoint, etc) when using scandir() to quickly process a
directory. For reliable stat information, call os.stat().

I do think, however, that os.scandir() can be improved in Windows
without significant performance loss if it calls GetFileAttributesExW
to get st_file_attributes, st_size, st_ctime (create time), st_mtime,
and st_atime. This API call is relatively fast because it doesn't
require opening the file via CreateFileW, which is one of the more
expensive operations in os.stat(). But I haven't tried modifying
scandir() to benchmark it.

Ultimately, I'm waiting for Windows 10 to provide a WinAPI function
that calls the relatively new NTAPI function NtQueryInformationByName
[2] (by name, not by handle!) to get the FileStatInformation, as well
as for this information to be made available by handle via
GetFileInformationByHandleEx. Compared to GetFileAttributesExW, the
FileStatInformation additionally provides the file ID (if implemented
by the filesystem), change time, reparse tag, number of links, and the
effective access of the security context of the caller (i.e. process
or thread access token). The latter is something that we've never
impemented with os.stat(). It's not the same as POSIX
owner-group-other permissions. It would need a new attribute such as
st_effective_access. It could be used to provide a real implementation
of os.access() in Windows.

https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-ntqueryinformationbyname
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/NPP6GKAEI7SOVA45WTJ222YVEALTF6WO/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to