Eryk Sun <eryk...@gmail.com> added the comment:

> os.path.realpath() normalizes paths before resolving links 
> on Windows

Normalizing the input path is required in order to be consistent with the 
Windows file API. OTOH, the target path of a relative symlink gets resolved in 
a POSIX-ly correct manner in the kernel, and ntpath._readlink_deep() doesn't 
ensure this. 

I've attached a prototype that I wrote for a POSIX-like implementation that 
recursively resolves both the drive and the path. It uses the final path only 
as a shortcut to normalize volume GUID names as drives and the proper casing of 
UNC server and share names. However, it's considerably more work than the 
final-path approach, and more work always has the potential for more bugs. I'm 
providing it for the sake of discussion, or just for people to point to it as 
an example of what not to do... ;-)

Patching up the current implementation would probably involve extending 
_getfinalpathname() to support follow_symlinks=False. Aspects of the POSIX 
implementation would have to be adopted, but I think it can be kept relatively 
simple when integrated with _getfinalpathname(path, follow_symlinks=False). The 
latter also makes it easy to identify a UNC path, which is necessary because 
mountpoints should never be resolved in a UNC path, which is something the 
current implementation gets wrong.

What this wouldn't support is resolving an inaccessible drive as much as 
possible. Mapped drives are object symlinks that expand to UNC paths that can 
include an arbitrary filepath on a share. Substitute drives by definition 
target an arbitrary filepath, and can even target other substitute and mapped 
drives. A final-path only approach would leave the inaccessible drive in the 
result, along with any symlinks that are internal to the drive.

A final-path approach also can't support targets with rooted paths or ".." 
components that traverse a mountpoint. The final path will be on the 
mountpoint's device, which will change how such relative symlinks resolve. That 
said, rooted symlink targets are almost never seen in Windows, and targets that 
traverse a mountpoint by way of a ".." component should be rare, in principle. 

One problem is the frequent use of bind mountpoints in place of symlinks in 
Windows. In CMD, bind mountpoints can be created by anyone via `mklink /j`. 
Here's a fabricated example with a mountpoint (i.e. junction) that's used where 
normally a symlink should be used.

    C:\
        work\
            foo\
                bar [junction -> C:\work\bar]
                remote [symlink -> \\baz\spam]
            bar\
                remote [symlink -> ..\remote]
            remote [symlink -> \\qux\eggs]

C:\work\foo\bar\remote normally resolves as follows:

    C:\work\foo\bar\remote
        -> C:\work\foo\bar + ..\remote
        -> C:\work\foo\remote
        -> \\baz\spam

Assume that \\baz\spam is down, so C:\work\foo\bar\remote can't be strictly 
resolved. If the non-strict algorithm relies on getting the final path of 
C:\work\foo\bar\remote before resolving the target of "remote", then the result 
for this case will be incorrect.

    C:\work\foo\bar\remote
        -> C:\work\bar\remote
        -> C:\work\bar + ..\remote
        -> C:\work\remote
        -> \\qux\eggs

----------
components: +Windows
nosy: +eryksun, paul.moore, steve.dower, tim.golden, zach.ware
versions:  -Python 3.6, Python 3.7
Added file: https://bugs.python.org/file49984/realpath_posixly.py

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue43936>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to