Eryk Sun <eryk...@gmail.com> added the comment:

> but suddenly adding "\\?\" to the paths breaks a lot of assumptions.

The unwritten assumption has been that readlink() is reading symlinks that get 
created by CreateSymbolicLinkW, which sets the print name as the DOS path 
that's passed to the call. In this case, readlink() can rely on the print name 
being the intended DOS path. I raised a concern in the case of reading 
junctions. There's no high-level API to create junctions, so we can't assume 
the print name is reliable. PowerShell's new-item doesn't even set a print name 
for junctions. That symlinks also are valid without a print name (in principle; 
I haven't come across it practice) lends additional weight to always using the 
substitute name.

Even if we have the DOS path, resolving paths manually is still complicated if 
it's a relative symlink with a reserved name (DOS device; trailing dots or 
spaces) as the final component or if it's a long path. Reparsing a relative 
symlink in the kernel doesn't reserve such names and there's no MAX_PATH limit 
in the kernel. So using readlink() is tricky. Fortunately realpath() in Windows 
doesn't require a resolve loop based on readlink(). The kernel almost always 
knows the final path of an opened file, and we can walk the components from the 
end until we find one that exists.

> My idea was to GetFinalPathName(path[4:])[4:] and if that fails

An existing file named "spam" would be a false positive for a link that targets 
"spam.". The internal CreateFileW call would open "spam". Also, symlinks allow 
remote paths, and this doesn't handle "\\\\?\\UNC" paths. More generally, a 
link target doesn't have to exist, so being able to access it shouldn't be a 
factor. I see it's also returning the result from _getfinalpathname. readlink() 
doesn't resolve a final, solid path. It just returns the contents of a link, 
which can be a relative or absolute path.

In the proposed implementation of realpath() that I helped on for issue 14094 
(I wasn't aware of the previous work in issue 9949), there's an 
_extended_to_normal function that tries to return a normal path if possible. 
The length of the normal path has to be less than MAX_PATH, and 
_getfullpathname should return the path unchanged. GetFullPathNameW is just 
rule-based processing in user mode; it doesn't touch the file system.

I wish we could remove the MAX_PATH limit in this case. I think at startup we 
should try to call RtlAreLongPathsEnabled, even though it's not documented, and 
set a sys flag to indicate whether we can use long paths. Also, support a -X 
option and an environment variable to override automatic detection.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue37834>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to