On 10/19/20, Greg Ewing <greg.ew...@canterbury.ac.nz> wrote: > On 20/10/20 4:52 am, Gregory P. Smith wrote: >> Those of us with a traditional posix filesystem background may raise >> eyeballs at this duplication, seeing a directory as a place that merely >> maps names to inodes > > This is probably a holdover from MS-DOS, where there was no separate > inode-like structure -- it was all in the directory entry.
DOS implemented a find-first/find-next API (int 21h 4E/4F) that provided a file's name, attributes, size, and last write time/date. I think it's clear that the design was influenced by the readily-available contents of a FAT dirent. The Win32 API extended this to FindFirstFile/FindNextFile, with added support for the long filename, create and access times, and, in NT 5+, the reparse tag for a reparse point. NTFS had to support this metadata in the directory index, else FindFirstFile/FindNextFile would be too expensive if the filesystem had to fetch the metadata from the MFT for every matching file in a listing. It tries to keep the duplicated metadata in sync -- such as when a file is open, closed, manually extended in size, when the cache is flushed, or when metadata is explicitly set (e.g. SetFileInformationByHandle: FileBasicInfo). But for performance it doesn't update the duplicated data every time a file is read from or written to. And, in particular, if it's just the access time that changed, it updates the duplicated access time with a one-hour granularity. (There's also a registry value, as I mentioned previously, that disables updating access times completely -- in both the MFT record and the directory index.) That said, if a file has multiple hardlinks the current NTFS implementation for updating duplicated data is totally unreliable. It only updates the accessed link. All other links go stale. We don't have any reasonable way to special case this situation because the directory entry doesn't include the number of links a file has. It has to be opened and queried directly, but then one might as well do a full stat() for every file. I recommend relying on only the high-level is_dir(), is_file(), and is_symlink() methods of os.scandir() items, to quickly process a directory. inode() is reliable -- as much as is possible in Windows -- because the implementation gets the full stat info, but check to ensure it's not 0. It's based on the file ID, which Windows filesystems aren't required to support (or reliably support; it's not stable in FAT). NTFS and ReFS support reliable 64-bit file IDs, and opening by file ID. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/JKK47AWKUOWPPBEAIRGIFRMW6FCPZILG/ Code of Conduct: http://python.org/psf/codeofconduct/