Eryk Sun <eryk...@gmail.com> added the comment:

Here are two additional differences between mount points and symlinks:

(1) A mount point in a remote path is always evaluated on the server and 
restricted to devices that are local to the server. So if we handle a mount 
point as if it's a POSIX symlink that works with readlink(), then what are we 
to do with the server's drive "Z:"? Genuine symlinks are evaluated on the 
client, so readlink() always makes sense. (Though if we resolve a symlink 
manually, then we're bypassing the system's R2L symlink policy.)

(2) A mount point has its own security that's checked in addition to the 
security on the target directory when it's reparsed. In contrast, security set 
on a symlink is not checked when the link is reparsed, which is why icacls.exe 
implicitly resolves a symlink when setting and viewing security unless the /L 
option is used.

>  - if it's a directory junction, call os.stat instead and return that > (???)

I wanted lstat in Windows to traverse mount points by default (but I gave up on 
this), as it does in Unix, because a mount point behaves like a hard name 
grafting in a path. This is important for relative symlinks that use ".." 
components to traverse above their parent directory. The result is different 
from a directory symlink that targets the same path.

A counter-argument (in favor of winlinks) is that a mount point is still 
ultimately a name-surrogate reparse point, so, unlike a hard link, its 
existence doesn't prevent the directory from being deleted. It's left in place 
as a dangling link if the target is deleted or the device is removed from the 
system. Trying to follow it fails with ERROR_PATH_NOT_FOUND or 
ERROR_FILE_NOT_FOUND. 

Also, handling a mount point as a directory by default would require an 
additional parameter because in some cases we need to be able to open a 
junction instead of traversing it, such as to implement shutil.rmtree to behave 
like CMD's `rmdir /s`. 

Another place identifying a mount point is required, unfortunately, is in 
realpath(). Ideally we would be able to handle mount points as just 
directories. The problem is that NT allows a mount point to target a symlink, 
something that's not allowed in Unix. Traversing the mount point is effectively 
the same as traversing the symlink. So we have to read the mount-point target, 
and if it's a symlink, we have to read and evaluate it. (Consequently it seems 
that getting the real path for a remote path is an intractable problem when 
mount points are involved. We can only get the final path.)

---

Even without the addition of a new parameter, we may still want to limit the 
definition of 'link' in Windows lstat to name-surrogate reparse points, i.e. 
winlinks. Reparse points that aren't name surrogates don't behave like links. 
They behave like the file itself, and reparsing may automatically replace the 
reparse point with the real file. Some of them are even directories that have 
the directory bit (28) set in the tag value, which means they're allowed to 
contain other files. (Without the directory tag bit, setting a reparse point on 
a non-empty directory should fail.)

The counter-argument to changing lstat to only open winlinks is that changing 
the meaning of 'link' in lstat is too disruptive to existing software that may 
depend on the old behavior, i.e. opening any reparse point. I think the use 
cases for opening non-links are rare enough that it's not beyond the pale to 
change this behavior in 3.8 or 3.9.

> Right, but is that because they deliberately want the junction 
> to be treated like a file? Or because they want it to be treated 
> like the directory is really right there?

For copytree it makes sense to traverse a mount point as a directory. We can't 
reliably copy a mount point. In Unix, even when a volume mount or bind mount 
can be detected, there's no standard way to clone it to a new mount point, and 
even if there were, that would require super-user access. In Windows, we could 
wrap CreateDirectorExW, which can copy a mount point, but it requires 
administrator access to copy a volume mount point (i.e. 
"\\\\?\\Volume{...}\\"), for which it calls SetVolumeMountPointW in order to 
update the mount-point manager in the kernel. 

We also have a limited ability to create mount points via 
_winapi.CreateJunction, but it's buggy in corner cases and incomplete. It 
suffices for the reason it was added -- testing the ability to delete a 
junction via os.remove(). 

> os.rmdir() already does special things to behave like a junction 
> rather than the real directory, 

This is similar in spirit to Unix, except Unix refuses to delete a mount point. 
For example, if we have a Unix bind mount to a non-empty directory, rmdir() 
fails with EBUSY. On the other hand, rmdir() on the real directory fails with 
ENOTEMPTY. If Unix handled the mount point as if it's just the mounted 
directory, I'd expect the error to be the same. 

It's not particularly special in Windows unless it's a volume mount point. Then 
RemoveDirectoryW tries to call DeleteVolumeMountPointW. This could be a case 
where it would fail to remove a mount point, just like Unix. But the internal 
DeleteVolumeMountPointW call is allowed to fail if the caller doesn't have 
access to update the mount-point manager, in which case it removes the junction 
anyway.

The consequence of failing to update the mount-point manager is that 
GetFinalPathNameByHandleW calls will subsequently return a non-existing path 
for a volume that was mounted only in the deleted folder (i.e. the volume isn't 
also assigned a drive letter). Thus we can't assume the result from 
GetFinalPathNameByHandleW exists. This just pertains to volume mount points, 
which are special to the mount-point manager because it uses them to translate 
a native device path into a canonical DOS path. Bind mount points have no 
special significance to the mount-point manager. 

> the islink/readlink/symlink process is going to be problematic on 
> Windows since most users can't create symlinks. 

Then copying the symlink fails, which I think is better than silently 
transforming the behavior from a mount point to a symlink. Defensive code can 
fall back on physically copying the target file or directory. 

The latter is the default behavior for copytree. It's only an issue if code 
calls copytree(src, dst, symlinks=True). 

However, it's always a concern with shutil.move(), which attempts to move a 
file via os.rename. This fails for a cross-volume rename. Then if islink() is 
true, it falls back on os.symlink(os.readlink(src), real_dst) and 
os.unlink(src). 

(On my own systems, I grant the symlink privilege to the Authenticated Users 
group, which allows symlink creation by standard users and administrators -- 
elevated or not. But in general, a fear of symlinks is warranted, even in Unix.)

> I'm proposing to fix the inconsistency by fixing the flags. Your
> proposal is to fix the inconsistency by generating a new error in 
> unlink()? (Just clarifying.)

unlink() didn't used to remove junctions prior to 3.5 (see issue 18314). 
Instead of rolling back the change, or conflating the meaning of S_IFLNK, a 
counter-proposal is to harmonize unlink with the proposed change to lstat, i.e. 
to allow removing all name-surrogate directories. A name-surrogate directory 
cannot have children in the directory itself, so allowing it for os.unlink is 
in the spirit of the function, even if doing so is inconsistent with the 
literal specification. 

This is documented in ntifs.h:

    D [bit 28] is the directory bit. When set to 1, indicates that any
    directory with this reparse tag can have children. Has no special
    meaning when used on a non-directory file. Not compatible with the
    name surrogate bit [bit 29].

Regarding the directory bit, the registered tags with this bit are 
IO_REPARSE_TAG_CLOUD*, IO_REPARSE_TAG_WCI_1, and IO_REPARSE_TAG_PROJFS (for 
projected file systems).

> Currently Windows shutil.rmtree traverses into junctions and deletes 
> everything, though it then succeeds to delete the junction. 

That's like Unix mount-point behavior, except Windows allows a volume mount 
point to be deleted (not just a bind mount point), despite negative 
consequences to API functions such as GetFinalPathNameByHandleW if the user 
isn't allowed to update the system database of volume mount points.

An issue here, and with all code that walks a tree (especially destructively), 
is the link behavior of mount points. Bind mount points have the same problem 
in both Unix and Windows. For example, shutil.rmtree will fail to remove a 
mount point that targets a directory that it already removed. It's a different 
OSError in Unix vs Windows (EBUSY vs ENOENT or ERROR_PATH_NOT_FOUND), but an 
error all the same. That in itself is not an argument to handle a junction as a 
symlink, because it's still a mount point that behaves as such, even if someone 
is using it as a symlink. However, it is an argument for special handling of 
winlinks, which would allow the Windows implementation to behave better than 
Unix, IMO, in addition to helping Windows users that are forced to use mount 
points instead of symlinks.

> With my change, rmtree() directly on a junction now raises (could be 
> fixed?) but rmtree on a directory containing a junction will remove 
> the junction without touching the target directory. So I think we're 
> both happy about this one.

Changing rmtree to work on a target directory that claims to be a symlink would 
require special casing Windows in shutil.rmtree. But in general this is a 
problem that affects all code that looks for symlinks, not just code in the 
standard library.

If the meaning of S_IFLNK remains the same, then existing code has the option 
of being upgraded to delete directory winlinks without traversing them, but 
nothing is forced on them. In this case, for example, we could wrap the 
os.scandir call:

    if not _WINDOWS:
        _rmtree_unsafe_scandir = os.scandir
    else:
        import contextlib

        def _rmtree_unsafe_scandir(path):
            try:
                st = os.lstat(path)
                attr, tag = st.st_file_attributes, st.st_reparse_tag
            except OSError:
                attr = tag = 0
            if (attr & stat.FILE_ATTRIBUTE_DIRECTORY
                  and attr & stat.FILE_ATTRIBUTE_REPARSE_POINT
                  and tag & 0x2000_0000): # IsReparseTagNameSurrogate
                return contextlib.nullcontext([])
            else:
                return os.scandir(path)

For a directory winlink, the above _rmtree_unsafe_scandir function returns a 
context manager that yields an empty list, so _rmtree_unsafe skips to 
os.rmdir(path). This reproduces the behavior of CMD's `rmdir /s`, which will 
not traverse any name-surrogate reparse point (it checks the tag for the 
name-surrogate bit) even if the reparse point is the target directory.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue37834>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to