[issue42957] os.readlink produces wrong result on windows

2021-04-27 Thread Steve Dower


Change by Steve Dower :


--
pull_requests:  -24360

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42957] os.readlink produces wrong result on windows

2021-04-27 Thread Ethan Furman


Change by Ethan Furman :


--
pull_requests: +24360
pull_request: https://github.com/python/cpython/pull/25670

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42957] os.readlink produces wrong result on windows

2021-04-27 Thread Steve Dower


Change by Steve Dower :


--
pull_requests:  -24359

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42957] os.readlink produces wrong result on windows

2021-04-27 Thread Ethan Furman


Change by Ethan Furman :


--
nosy: +ethan.furman
nosy_count: 6.0 -> 7.0
pull_requests: +24359
pull_request: https://github.com/python/cpython/pull/25670

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42957] os.readlink produces wrong result on windows

2021-01-22 Thread Steve Dower


Steve Dower  added the comment:

I agree with Eryk (unsurprisingly, we discussed this change *a lot* back when 
it was made ~3 years ago).

os.readlink is the lowest-level API that gives a reliable result.

os.path.realpath is the high-level API that probably does what most users want 
most of the time (and relies on os.readlink being reliable).

We can't change either of these to more of a middle-ground without taking away 
the ability for users to implement their own behaviour. So there's no 
behavioural issue here. (The behavioural issue we fixed in 3.8 was that 
realpath didn't resolve anything at all, so we're not reverting back to that.)

If you'd like a new API added to either retrieve the display path or to read a 
link and try to partially normalise the result without resolving any more 
links, please open an issue specifically for that (targeting Python 3.10). You 
probably also want to prototype a wrapper around readlink() to do it, and be 
prepared to have many potential edge cases thrown against it.

--
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42957] os.readlink produces wrong result on windows

2021-01-19 Thread Eryk Sun


Eryk Sun  added the comment:

os.readlink() was generalized to support mountpoints (junctions) as well as 
symlinks, and it's more common for mountpoints to lack the print name field in 
the reparse data buffer [1]. For example, PowerShell's new-item creates 
junctions that only have a substitute path. This is allowed since the 
filesystem protocols only require that the substitute path is valid in the 
reparse data buffer, since that's all that the system actually uses when 
resolving the reparse point.

The substitute path in the reparse point is a \\?\ prefixed path (actually a 
\??\ NT path, but they're effectively the same for our purposes). This type of 
path is usually called an extended path -- or an extended device path, or 
verbatim path. It's a device path, like the \\.\ prefix, except that (1) it's 
verbatim (i.e. not normalized), (2) its length is never limited to MAX_PATH 
(260) characters, and (3) the Windows file API supports the \\?\ prefix more 
broadly than the \\.\ prefix.

You're right that some programs can't grok an extended path. Some can't even 
handle any type of UNC path. I agree that we need a simple way to remove the 
prefix. I just don't agree that removing it should be the default behavior in 
nt.readlink(), which I prefer to keep efficient and free of race conditions.

os.path.realpath() isn't necessarily what you want since it resolves the final 
path. The link may target a path that traverses any number of reparse points 
and mapped drives, so the final path may be completely different from the 
os.readlink() result. We simply need an option to remove the \\?\ or \\?\UNC 
prefix, either always or only when the path doesn't require it. It could be 
implemented by a high-level wrapper function in os.py.

---
Reasons the prefix may be required

If the length of the target path exceeds MAX_PATH, then removing the prefix may 
render the path inaccessible if the current process doesn't support long paths 
without it (e.g. Windows 10 without long paths enabled at the system level, or 
any version prior to Windows 10).

Also, reserved DOS device names are only accessible using an extended path. Say 
I have the following "spam" junction:

>>> print(os.readlink('spam'))
\\?\C:\Temp\con

The junction allows accessing the target directory normally:

>>> stat.S_ISDIR(os.stat('spam').st_mode)
True

But look what happens when I try to access the target path without the prefix:

>>> stat.S_ISDIR(os.stat(r'C:\Temp\con').st_mode)
False
>>> stat.S_ISCHR(os.stat(r'C:\Temp\con').st_mode)
True

Instead of the directory that one might expect, it's actually a character 
device!? Let's see what Windows opens:

>>> print(os.path.abspath(r'C:\Temp\con'))
\\.\con

It opens the "CON" device, for console I/O. It turns out that a bunch of names 
are reserved, including NUL, CON, CONIN$, CONOUT$, AUX, PRN, COM<1-9>, and 
LPT<1-9>. They're reserved even with an extension introduced by a colon or dot, 
preceded by zero or more spaces. For example:

>>> print(os.path.abspath(r'C:\Temp\con :whatever'))
\\.\con

Directly accessing such a name in the filesystem requires a verbatim path. For 
example:

>>> stat.S_ISDIR(os.stat(r'\\?\C:\Temp\con').st_mode)
True

Using reserved names is cautioned against, but in the real world we have to be 
defensive. We can't simply remove the prefix and hope for the best.

---

[1] 
https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/ns-ntifs-_reparse_data_buffer

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42957] os.readlink produces wrong result on windows

2021-01-19 Thread simon mackenzie


simon mackenzie  added the comment:

I note os.path.realpath("v1") does produce the right path in windows. Maybe
that is what you meant. Will that work cross-platform?

On Tue, 19 Jan 2021 at 18:48, simon mackenzie  wrote:

> For most people the expectation would be that it returns a path in the
> same format as any other path. Furthermore it seems odd to change the
> default behaviour after years when it worked as expected. I never heard of
> this substitute path before and it does not work in some circumstances e.g.
> docker does not recognise it.
>
> Note also that os.path.realpath(os.path.readlink("v1")) still returns
> ?\\d:\\v1. There needs to be some way of getting to the path that
> everyone actually uses.
>
> On Mon, 18 Jan 2021 at 21:25, Eryk Sun  wrote:
>
>>
>> Eryk Sun  added the comment:
>>
>> Symlinks and mountpoints (aka junctions) contain two forms of the target
>> path. There's a path that's intended for display in the shell, and there's
>> the actual substitute path to which the link resolves. os.readlink() was
>> changed to return the substitute path because the display form is not
>> mandated by filesystem protocols (it's sometimes missing, especially for
>> junctions) and not reliable (e.g. the display path may be a long path or
>> contain reserved names such that it's not valid without the \\?\ prefix).
>> It was decided to keep the C implementation of os.readlink() simple.
>> Whether to retain the \\?\ prefix was shifted to high-level functions that
>> consume the result of os.readlink(), such as os.path.realpath().
>>
>> There was a previous issue related to this, in that the shutil module
>> copies symlinks via os.readlink() and os.symlink(), which thus copies only
>> the substitute path now. The issue was closed as not a bug, but had it been
>> resolved with new functionality, I would have preferred to do so with a
>> low-level function to copy a reparse point, not by reverting the behavior
>> of os.readlink(). I also see no reason against adding an option to
>> readlink() to return the display path instead of the substitute path, or to
>> just remove the prefix. But I'd vote against making it the default behavior.
>>
>> --
>> components: +Library (Lib)
>> nosy: +eryksun
>> versions: +Python 3.10
>>
>> ___
>> Python tracker 
>> 
>> ___
>>
>

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42957] os.readlink produces wrong result on windows

2021-01-19 Thread simon mackenzie


simon mackenzie  added the comment:

For most people the expectation would be that it returns a path in the same
format as any other path. Furthermore it seems odd to change the default
behaviour after years when it worked as expected. I never heard of this
substitute path before and it does not work in some circumstances e.g.
docker does not recognise it.

Note also that os.path.realpath(os.path.readlink("v1")) still returns
?\\d:\\v1. There needs to be some way of getting to the path that
everyone actually uses.

On Mon, 18 Jan 2021 at 21:25, Eryk Sun  wrote:

>
> Eryk Sun  added the comment:
>
> Symlinks and mountpoints (aka junctions) contain two forms of the target
> path. There's a path that's intended for display in the shell, and there's
> the actual substitute path to which the link resolves. os.readlink() was
> changed to return the substitute path because the display form is not
> mandated by filesystem protocols (it's sometimes missing, especially for
> junctions) and not reliable (e.g. the display path may be a long path or
> contain reserved names such that it's not valid without the \\?\ prefix).
> It was decided to keep the C implementation of os.readlink() simple.
> Whether to retain the \\?\ prefix was shifted to high-level functions that
> consume the result of os.readlink(), such as os.path.realpath().
>
> There was a previous issue related to this, in that the shutil module
> copies symlinks via os.readlink() and os.symlink(), which thus copies only
> the substitute path now. The issue was closed as not a bug, but had it been
> resolved with new functionality, I would have preferred to do so with a
> low-level function to copy a reparse point, not by reverting the behavior
> of os.readlink(). I also see no reason against adding an option to
> readlink() to return the display path instead of the substitute path, or to
> just remove the prefix. But I'd vote against making it the default behavior.
>
> --
> components: +Library (Lib)
> nosy: +eryksun
> versions: +Python 3.10
>
> ___
> Python tracker 
> 
> ___
>

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42957] os.readlink produces wrong result on windows

2021-01-18 Thread Eryk Sun


Eryk Sun  added the comment:

Symlinks and mountpoints (aka junctions) contain two forms of the target path. 
There's a path that's intended for display in the shell, and there's the actual 
substitute path to which the link resolves. os.readlink() was changed to return 
the substitute path because the display form is not mandated by filesystem 
protocols (it's sometimes missing, especially for junctions) and not reliable 
(e.g. the display path may be a long path or contain reserved names such that 
it's not valid without the \\?\ prefix). It was decided to keep the C 
implementation of os.readlink() simple. Whether to retain the \\?\ prefix was 
shifted to high-level functions that consume the result of os.readlink(), such 
as os.path.realpath().

There was a previous issue related to this, in that the shutil module copies 
symlinks via os.readlink() and os.symlink(), which thus copies only the 
substitute path now. The issue was closed as not a bug, but had it been 
resolved with new functionality, I would have preferred to do so with a 
low-level function to copy a reparse point, not by reverting the behavior of 
os.readlink(). I also see no reason against adding an option to readlink() to 
return the display path instead of the substitute path, or to just remove the 
prefix. But I'd vote against making it the default behavior.

--
components: +Library (Lib)
nosy: +eryksun
versions: +Python 3.10

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42957] os.readlink produces wrong result on windows

2021-01-18 Thread simon mackenzie


New submission from simon mackenzie :

os.readlink gives wrong result on python 3.8 onwards for windows

os.readlink("c:/users/simon/v1")
'?\\d:\\v1'

Should read d:\\v1

--
components: Windows
messages: 385218
nosy: paul.moore, simon mackenzie, steve.dower, tim.golden, zach.ware
priority: normal
severity: normal
status: open
title: os.readlink produces wrong result on windows
type: behavior
versions: Python 3.8, Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com