New submission from benrg <benrud...@gmail.com>:

(Pure)WindowsPath uses str.lower to fold paths for comparison and hashing. This 
doesn't match the case folding of actual Windows file systems. There exist 
WindowsPath objects that compare and hash equal, but refer to different files. 
For example, the strings

  '\xdf' (sharp S) and '\u1e9e' (capital sharp S)
  '\u01c7' (LJ) and '\u01c8' (Lj)
  '\u0130' (I with dot) and 'i\u0307' (i followed by combining dot)
  'K' and '\u212a' (Kelvin sign)

are equal under str.lower folding but are distinct file names on NTFS volumes 
on my Windows 7 machine. There are hundreds of other such pairs.

I think this is very bad. The reverse (paths that compare unequal but refer to 
the same file) is probably unavoidable and is expected by programmers. But 
paths that compare equal should never be unequal to the OS.

How to fix this:

Unfortunately, there is no correct way to case fold Windows paths. The FAT, 
NTFS, and exFAT drivers on my machine all have different behavior. (The 
examples above work on all three, except for 'K' and '\u212a', which are 
equivalent on FAT volumes.) NTFS stores its case-folding map on each volume in 
the hidden $UpCase file, so even different NTFS volumes on the same machine can 
have different behavior. The contents of $UpCase have changed over time as 
Windows is updated to support new Unicode versions. NTFS and NFS (and possibly 
WebDAV) also support full case sensitivity when used with Interix/SUA and 
Cygwin, though this requires disabling system-wide case insensitivity via the 
registry.

I think that pathlib should either give up on case folding entirely, or should 
fold very conservatively, treating WCHARs as equivalent only if they're 
equivalent on all standard file systems on all supported Windows versions.

If pathlib folds case at all, there should be a solution for people who need to 
interoperate with Cygwin or SUA tools on a case-sensitive machine, but I 
suppose they can just use PosixPath.

----------
components: Library (Lib), Windows
messages: 310384
nosy: benrg, paul.moore, steve.dower, tim.golden, zach.ware
priority: normal
severity: normal
status: open
title: pathlib.(Pure)WindowsPaths can compare equal but refer to different files
type: security
versions: Python 3.4, Python 3.5, Python 3.6, Python 3.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue32612>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to