[issue23407] os.walk always follows Windows junctions

2021-05-02 Thread Eryk Sun


Eryk Sun  added the comment:

Windows implements filesystem symlinks and mountpoints as name-surrogate 
reparse points. Python 3.8 introduced behavior changes to how reparse points 
are supported, but the stat st_mode value still sets S_IFLNK only for actual 
symlinks, not for mountpoints. This ensures that if os.path.islink() is true, 
it's safe to read its target and copy it via os.readlink() and os.symlink().

A mountpoint is not equivalent to a symlink in a few cases, so it shouldn't 
always be handled the same or copied as a symlink. The major difference is that 
mountpoints in a remote path are evaluated by the server, whereas symlinks in a 
remote path are evaluated by the client. Also, during path parsing, the target 
of a symlink replaces the opened path, but mountpoints are retained in the 
opened path (except if the target path contains a symlink, but that's broken in 
remote paths and should be avoided). This means that relative ".." components 
and rooted paths in a relative symlink target will traverse a mountpoint as if 
it's just a directory in the opened path. That's an important distinction, but 
in practice I'd steer someone away from relying on it, especially if a 
filesystem is mounted in multiple locations (e.g. on both a DOS drive and a 
directory), else resolution of the symlink will depend on which mountpoint is 
used.

It's best to handle mountpoints as if they're symlinks when deleting a tree 
because the way they're implemented as reparse points doesn't prevent loops. 
However, when walking a tree, you may or may not want to traverse a mountpoint. 
If it's traversed, a seen set() can be used to remember previously traversed 
directories, in order to prevent loops. As Steve mentioned, look to the 
implementation of shutil.rmtree() as an example. 

However, don't look to shutil.copytree() since it's wrong. The is_symlink() 
method of a scandir() entry is only true for an actual symlink, not a 
mountpoint, so the extra check that copytree() does is redundant. I think it 
was left in by mistake when the plan was to handle mountpoints as symlinks. It 
would be nice if we could copy a mountpoint instead of traversing it in 
copytree(), but the private implementation of _winapi.CreateJunction() isn't 
well-behaved and tested enough to be promoted into the standard library as 
something like os.mount().

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23407] os.walk always follows Windows junctions

2021-05-02 Thread Eryk Sun


Change by Eryk Sun :


--
Removed message: https://bugs.python.org/msg389286

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23407] os.walk always follows Windows junctions

2021-03-22 Thread Eryk Sun


Eryk Sun  added the comment:

Python 3.8 introduced some behavior changes to how reparse points are 
supported, but generalized support for handling name-surrogate reparse points 
as symlinks was not implemented. Python continues to set S_IFLNK in st_mode 
only for IO_REPARSE_TAG_SYMLINK reparse points. This ensures that if 
os.path.islink() is true, the link can be read and copied exactly via 
os.readlink() and os.symlink(). Otherwise, islink() could be true but 
readlink() will fail or symlink() will be used to mistakenly copy a mountpoint 
as a symlink. 

A mountpoint is not equivalent to a symlink in a few cases. The major 
difference is that mountpoints are evaluated on the server side in a remote 
path, targeting devices on the server, whereas symlinks are evaluated on the 
client side, targeting devices on the client (e.g. its "C:" drive) and are 
subject to the client system's L2R (local to remote), L2L, R2L, and R2R symlink 
policy. Replacing a mountpoint with a symlink means that, at best, the path 
will no longer work when accessed remotely, and at worst the client will allow 
resolving the target locally to something that's dangerously wrong.

Another difference is how the kernel handles mounpoints when opening a path. 
The target of a mountpoint does not replace the previously traversed path 
components in the opened path, whereas the target path of a symlink does 
replace the opened path. The previously traversed path matters when the kernel 
resolves ".." components in the target of a relative symlink. For example, a 
relative symlink that traverses up the tree with ".." components may have been 
tested on a traversed directory, which worked fine. Then later the directory 
was replaced with a mountpoint (junction) for compatibility, which continued to 
work fine. But after a CopyTree() that naively replaces the mountpoint with a 
symlink, the copied relative symlink is either broken, or worse, it resolves to 
a target that's dangerously wrong.

A generalization of the readlink() and symlink() combination could be 
implemented to copy any type of name-surrogate reparse point. If Python had 
something like that, then it could reasonably support any name-surrogate 
reparse point as a "symlink". That's not without problems, considering the 
behavior isn't the same and APIs and other applications may only support 
IO_REPARSE_TAG_SYMLINK in various cases, but sometimes perfect is the enemy of 
good.

That said, os.walk() can still special case mountpoints and other 
name-surrogate reparse points. To support cases like this, the lstat() result 
was extended to include the st_reparse_tag value of name-surrogate reparse 
points. The stat module has the IO_REPARSE_TAG_SYMLINK and 
IO_REPARSE_TAG_MOUNT_POINT constants. A simple function that checks for a 
name-surrogate reparse point could be added as well -- i.e. bool(reparse_tag & 
0x2000).

---

Using st_reparse_tag to abstract checking the file type is awkward. I wanted to 
support a keyword-only parameter in Windows to expand the 'symlink' domain to 
include all name-surrogate reparse points. This parameter would have been added 
to os.[l]stat(), DirEntry.stat(), DirEntry.is_dir(), and DirEntry.is_file(), as 
well as os.path.islink() and DirEntry.is_symlink(). By default only 
IO_REPARSE_TAG_SYMLINK would have been handled as a symlink. But this idea 
wasn't accepted. Instead, custom checks have to be implemented whenever a 
problem needs the expanded 'symlink' domain.

--
versions: +Python 3.10, Python 3.8, Python 3.9 -Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23407] os.walk always follows Windows junctions

2019-11-18 Thread Steve Dower


Steve Dower  added the comment:

At a minimum, it needs to be turned into a GitHub PR.

We've made some significant changes in this area in 3.8, so possibly the best 
available code is now in shutil.rmtree (or shutil.copytree) rather than the 
older patch files.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23407] os.walk always follows Windows junctions

2019-11-15 Thread Jim Carroll


Jim Carroll  added the comment:

I can confirm the os.walk() behavior still exists on 3.8. Just curious on the 
status of the patch?

--
nosy: +jamercee

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23407] os.walk always follows Windows junctions

2017-03-10 Thread Craig Holmquist

Changes by Craig Holmquist :


Added file: http://bugs.python.org/file46718/issue23407-5.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23407] os.walk always follows Windows junctions

2017-01-14 Thread Eryk Sun

Eryk Sun added the comment:

Craig, can you add a patch for issue 29248, including a test based on the "All 
Users" link?

--
dependencies: +os.readlink fails on Windows

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23407] os.walk always follows Windows junctions

2017-01-14 Thread Craig Holmquist

Craig Holmquist added the comment:

New patch with spaces instead of tabs

--
Added file: http://bugs.python.org/file46291/issue23407-4.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23407] os.walk always follows Windows junctions

2017-01-14 Thread Craig Holmquist

Craig Holmquist added the comment:

Here's a new patch:  now, _Py_attribute_data_to_stat and Py_DeleteFileW will 
just use the IsReparseTagNameSurrogate macro to determine if the file is a 
link, so os.walk etc. will know not to follow them.  os.readlink, however, will 
only work with junctions and symbolic links; otherwise it will raise ValueError 
with "unsupported reparse tag".

This way, there's a basic level of support for all name-surrogate tags, but 
os.readlink only works with the ones whose internal structure is (semi-) 
documented.

--
Added file: http://bugs.python.org/file46289/issue23407-3.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23407] os.walk always follows Windows junctions

2017-01-13 Thread Craig Holmquist

Craig Holmquist added the comment:

FWIW, the only name-surrogate tags in the user-mode SDK headers (specifically 
winnt.h) are IO_REPARSE_TAG_MOUNT_POINT and IO_REPARSE_TAG_SYMLINK, as of at 
least the Windows 8.1 SDK.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23407] os.walk always follows Windows junctions

2017-01-12 Thread Eryk Sun

Eryk Sun added the comment:

I simply listed the tags that have the name-surrogate bit set out of those 
defined in km\ntifs.h. 

To keeps things simple it might be better to only include Microsoft tags (i.e. 
bit 31 is set). That way we don't have to deal with REPARSE_GUID_DATA_BUFFER 
struct that's used from non-Microsoft reparse points.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23407] os.walk always follows Windows junctions

2017-01-12 Thread Craig Holmquist

Craig Holmquist added the comment:

Can you point me toward any documentation on the additional tags you want to 
support?  Searching for IO_REPARSE_TAG_IIS_CACHE mostly seems to yield header 
files that define it (and nothing at all on MSDN), and the non-Microsoft tags 
just yield a few results each.

(For comparison, the junction and symbolic link tags yield 10K+ results each.)

Junctions are created with each user's home directory so they exist on every 
Windows system, even if the user never explicitly creates them.  The additional 
tags seem like they're far less common and much less well-documented.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23407] os.walk always follows Windows junctions

2017-01-12 Thread Eryk Sun

Eryk Sun added the comment:

I opened issue 29248 for the os.readlink bug and issue 29250 for the 
inconsistency between os.path.islink and os.stat.

Handling junctions as links is new behavior, so I've changed this issue to be 
an enhancement for 3.7.

If the notion of a link is generalized to junctions, then maybe it should be 
further generalized to include all name-surrogate reparse tags [1]. Currently 
for Microsoft tags this includes 

IO_REPARSE_TAG_MOUNT_POINT (junctions)
IO_REPARSE_TAG_SYMLINK
IO_REPARSE_TAG_IIS_CACHE

For non-Microsoft tags it includes 

IO_REPARSE_TAG_SOLUTIONSOFT
IO_REPARSE_TAG_OSR_SAMPLE
IO_REPARSE_TAG_QI_TECH_HSM
IO_REPARSE_TAG_MAXISCALE_HSM

The last two are outliers. HSM isn't the kind of immediate, fast access that 
one would expect from a symbolic link. All other HSM tags aren't categorized as 
name surrogates.

[1]: https://msdn.microsoft.com/en-us/library/aa365511

--
type: behavior -> enhancement
versions: +Python 3.7 -Python 3.4, Python 3.5

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23407] os.walk always follows Windows junctions

2017-01-11 Thread Eric Fahlgren

Eric Fahlgren added the comment:

> # Junctions are not recognized as links.
> self.assertFalse(os.path.islink(self.junction))

If the above comment is intended as a statement of fact, then it's inconsistent 
with the implementation of Py_DeleteFileW ( 
https://hg.python.org/cpython/file/v3.6.0/Modules/posixmodule.c#l4178 ).

--
nosy: +eric.fahlgren

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23407] os.walk always follows Windows junctions

2016-09-25 Thread Craig Holmquist

Craig Holmquist added the comment:

Updated patch with changes to Win32JunctionTests.

--
Added file: http://bugs.python.org/file44824/issue23407-2.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23407] os.walk always follows Windows junctions

2016-09-25 Thread Craig Holmquist

Craig Holmquist added the comment:

Actually, it looks like there is already a way to create junctions and a test 
for them in test_os.  However, it includes this line:

# Junctions are not recognized as links.
self.assertFalse(os.path.islink(self.junction))

That suggests the old behavior is intentional--does anyone know why?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23407] os.walk always follows Windows junctions

2016-09-25 Thread Craig Holmquist

Craig Holmquist added the comment:

The attached patch changes _Py_attribute_data_to_stat to set S_IFLNK for both 
symlinks and junctions, and changes win_readlink to return the target path for 
junctions (IO_REPARSE_TAG_MOUNT_POINT) as well as symlinks.

I'm not sure what to do as far as adding a test--either Python needs a way to 
create junctions or the test needs to rely on the ones Windows creates by 
default.

Incidentally, the existing win_readlink doesn't always work correctly with 
symbolic links, either (this is from 3.5.2):  

>>> import os
>>> os.readlink(r'C:\Users\All Users')
'\x00\x00f\x00\u0201\x00\x02\x00\x00\x00f\x00\x00\x00'

The problem is that PrintNameOffset is an offset in bytes, so it needs to be 
divided by sizeof(WCHAR) if you're going to add it to a WCHAR pointer 
(https://msdn.microsoft.com/en-us/library/windows/hardware/ff552012(v=vs.85).aspx).
  Some links still seem to work correctly because PrintNameOffset is 0.  The 
attached patch fixes this problem also--I wasn't sure if I should open a 
separate issue for it.

--
keywords: +patch
Added file: http://bugs.python.org/file44823/issue23407.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23407] os.walk always follows Windows junctions

2015-02-07 Thread Craig Holmquist

New submission from Craig Holmquist:

os.walk follows Windows junctions even if followlinks is False:

 import os
 appdata = os.environ['LOCALAPPDATA']
 for root, dirs, files in os.walk(appdata, followlinks=False):
... print(root)

C:\Users\Test\AppData\Local
C:\Users\Test\AppData\Local\Apple
C:\Users\Test\AppData\Local\Apple\Apple Software Update
C:\Users\Test\AppData\Local\Apple Computer
C:\Users\Test\AppData\Local\Apple Computer\iTunes
C:\Users\Test\AppData\Local\Application Data
C:\Users\Test\AppData\Local\Application Data\Apple
C:\Users\Test\AppData\Local\Application Data\Apple\Apple Software Update
C:\Users\Test\AppData\Local\Application Data\Apple Computer
C:\Users\Test\AppData\Local\Application Data\Apple Computer\iTunes
C:\Users\Test\AppData\Local\Application Data\Application Data
C:\Users\Test\AppData\Local\Application Data\Application Data\Apple
C:\Users\Test\AppData\Local\Application Data\Application Data\Apple\Apple 
Software Update
C:\Users\Test\AppData\Local\Application Data\Application Data\Apple Computer
C:\Users\Test\AppData\Local\Application Data\Application Data\Apple 
Computer\iTunes
C:\Users\Test\AppData\Local\Application Data\Application Data\Application Data
C:\Users\Test\AppData\Local\Application Data\Application Data\Application 
Data\Apple
C:\Users\Test\AppData\Local\Application Data\Application Data\Application 
Data\Apple\Apple Software Update
C:\Users\Test\AppData\Local\Application Data\Application Data\Application 
Data\Apple Computer
C:\Users\Test\AppData\Local\Application Data\Application Data\Application 
Data\Apple Computer\iTunes
C:\Users\Test\AppData\Local\Application Data\Application Data\Application 
Data\Application Data
[...]

For directory symbolic links, os.walk seems to have the correct behavior.  
However, Windows 7 (at least) employs junctions instead of symlinks in 
situations like the default user profile layout, i.e. the Application Data 
junction shown above.

I also noticed that, for junctions, os.path.islink returns False but os.stat 
and os.lstat return different results.

--
components: Library (Lib)
messages: 235531
nosy: craigh
priority: normal
severity: normal
status: open
title: os.walk always follows Windows junctions
type: behavior
versions: Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23407
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23407] os.walk always follows Windows junctions

2015-02-07 Thread eryksun

eryksun added the comment:

To check for a link on Windows, os.walk calls ntpath.islink, which calls 
os.lstat. Currently the os.lstat implementation only sets S_IFLNK for symbolic 
links. attribute_data_to_stat could also check for junctions 
(IO_REPARSE_TAG_MOUNT_POINT). For consistency, os.readlink should also read 
junctions (rdb-MountPointReparseBuffer).

islink
https://hg.python.org/cpython/file/7b493dbf944b/Lib/ntpath.py#l239

attribute_data_to_stat
https://hg.python.org/cpython/file/7b493dbf944b/Modules/posixmodule.c#l1515

win_readlink
https://hg.python.org/cpython/file/7b493dbf944b/Modules/posixmodule.c#l10056

REPARSE_DATA_BUFFER
https://hg.python.org/cpython/file/7b493dbf944b/Modules/winreparse.h#l11

--
components: +Windows
nosy: +eryksun, steve.dower, tim.golden, zach.ware
versions: +Python 3.5

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23407
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com