Eryk Sun <eryk...@gmail.com> added the comment:

I'm tentatively reopening this issue for you to consider the following point, 
Steve.

A real path is not always the same as a final path. We can find code that does 
`relpath(realpath(target), realpath(start))` to compute the relative path to 
target for a symlink. The final path can't be relied on for this unless we 
always evaluate the symlink from the final path to `start`. In particular, it 
cannot be relied on if the relative path traverses a junction. 

What code like this needs from a realpath() implementation is a solid (real) 
path, not a final path. In other words, the caller wants a solidified form of 
`start` that can be used to compute the path to a target for a relative 
symlink, but one that works when accessed from `start`, not the final path of 
`start`. Generally this means resolving symlinks in the path, but not mount 
points. That's what Unix realpath() does, but of course there it's simpler 
because the only name surrogate in Unix is a symlink, which is never a mount 
point and never a directory.

Here's an example. In this first case "scripts" is a junction mount point that 
targets "C:/spam/etc/scripts":

    >>> eggs = r'C:\spam\dlls\eggs.dll'
    >>> scripts = r'C:\spam\scripts'

    >>> rel_eggs_right = os.path.relpath(eggs, scripts)
    >>> print(rel_eggs_right)
    ..\dlls\eggs.dll
    >>> os.symlink(rel_eggs_right, 'C:/spam/scripts/eggs_right.dll')
    >>> os.path.exists('C:/spam/scripts/eggs_right.dll')
    True

    >>> scripts_final = os.path._getfinalpathname(scripts)[4:]
    >>> print(scripts_final)
    C:\spam\etc\scripts
    >>> rel_eggs_wrong = os.path.relpath(eggs, scripts_final)
    >>> print(rel_eggs_wrong)
    ..\..\dlls\eggs.dll
    >>> os.symlink(rel_eggs_wrong, 'C:/spam/scripts/eggs_wrong.dll')
    >>> os.path.exists('C:/spam/scripts/eggs_wrong.dll')
    False

If we remove the junction and replace it with a 'soft' symlink that targets the 
same directory, then using the final path works, and using the given path no 
longer works.

    >>> print(os.readlink('C:/spam/scripts'))
    C:\spam\etc\scripts
    >>> scripts_final = os.path._getfinalpathname(scripts)[4:]
    >>> rel_eggs_right_2 = os.path.relpath(eggs, scripts_final)
    >>> print(rel_eggs_right_2)
    ..\..\dlls\eggs.dll
    >>> os.symlink(rel_eggs_right_2, 'C:/spam/scripts/eggs_right_2.dll')
    >>> os.path.exists('C:/spam/scripts/eggs_right_2.dll')
    True

    >>> rel_eggs_wrong_2 = os.path.relpath(eggs, scripts)
    >>> print(rel_eggs_wrong_2)
    ..\dlls\eggs.dll
    >>> os.symlink(rel_eggs_wrong_2, 'C:/spam/scripts/eggs_wrong_2.dll')
    >>> os.path.exists('C:/spam/scripts/eggs_wrong_2.dll')
    False

When the kernel traverses "scripts" as a soft link, it collapses to the target 
(i.e. "C:/spam/etc/scripts"), so our relative path that was computed from the 
final path is right in this case. On the other hand, if "scripts" is is a mount 
point (junction), it's a hard (solid) component. It does not collapse to the 
target (the kernel even checks the junction's security descriptor, which it 
does not do for a symlink), so ".." in the relative symlink traverses the 
junction component as if it were an actual directory.

What we need is an implementation of realpath("C:/spam/scripts") that returns 
"C:\\spam\\scripts" when "scripts" is a mount point and returns 
"C:\\spam\\etc\\scripts" when "scripts" is a symlink.

This means we need an implementation of realpath() that looks a lot like 
posixpath.realpath. Generally a mount point should be walked over like a 
directory, just as mount points are handled in Unix. The difference is that a 
mount point in Windows is allowed to target a symlink. (This is a design flaw; 
Unix doesn't allow it.) Since we need to know the target of a junction, we have 
to read the reparse point, until we hit a real directory target. As long as it 
targets another junction, it remains a hard component. As soon as it targets a 
symlink, however, it becomes a soft component that needs to be resolved. If the 
junction targets a name surrogate reparse point that we can't read, then our 
only option is to get a final path. This is dysfunctional. We should raise an 
exception for this case. Code can handle the exception and knowingly get a 
final path instead of a real path.

This also means we can't reliably compute a real path for a remote path (UNC) 
because we can't manually evaluate the target of a remote junction. A remote 
junction is meaningless to us. If we're evaluating a UNC path and reach a 
junction, we have to give up on a real path and settle for a final path. We can 
get a final path because that lets the kernel in the server talk to our kernel 
to resolve any combination of mount points (handled on the server side) and 
symlinks (handled on our side). This case should also raise an exception. Aware 
code can handle it by getting a real path and taking appropriate measures.

----------
status: closed -> open

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue9949>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to