Eryk Sun <[email protected]> added the comment:
> Okay, I get it now. So we _do_ want to "upgrade" lstat() to stat()
> when it's not a symlink.
I don't see that as a behavior upgrade. It's just an implementation detail.
lstat() is still following its mandate to not follow symlinks -- however you
ultimately define what a "symlink" is in this context in Windows.
> I don't want to add any parameters - I want to have predictable and
> reasonable default behaviour. os.readlink() already exists for
> "open reparse point" behaviour.
I'd appreciate a parameter to always open reparse points, even if a
filter-driver or the I/O manager handles them.
I'm no longer a big fan of mapping "follow_symlinks" to name surrogates (I used
to like this idea a couple years ago), or splitting hairs regarding
volume-mount-point junctions and bind-like junctions (used to like this too a
year ago, because some projects do this, before I thought about the deeper
concerns). But it's not up to me. If follow_symlinks means name surrogates, at
least then lstat can open any reparse point that claims to link to another path
and thus *should* have link-like behavior (hard link or soft link).
For example, we are able to move, rename, and delete symlinks and junctions
without affecting the target (except for a junction that's a volume mount
point, Windows will try DeleteVolumeMountPointW, which can have side effects;
failure is ignored and the directory deleted anyway). This is implemented by
the Windows API opening the reparse point and checking for symlink and junction
tags. It reparses other tags, regardless of whether they're name surrogates,
but I assume name-surrogate reparse points should be implemented by their
owning filter drivers to behave in a similar fashion for actions such as rename
and delete.
While deleting a name-surrogate reparse point should have no effect on the
target, it still might have unintended consequences. For example, it might
revive a 'deleted' file in a VFS for Git repo if we delete the tombstone
reparse point that marks a file that's supposed to be 'deleted'. This might
happen if code checks os.lstat(filename) and decides to delete the file in a
non-standard way that ensures only a reparse point is deleted, e.g.
CreateFileW(filename, ..., FILE_FLAG_DELETE_ON_CLOSE |
FILE_FLAG_OPEN_REPARSE_POINT, NULL), or manually setting the
FileDispositionInfo. (DeleteFileW would fail with a file-not-found error
because it would reparse the tombstone.) Now it's in for a surprise because the
file exists again in the projected filesystem, even though it was just
'deleted'. This is in theory. I haven't experimented with projected file
systems to determine whether they actually allow opening a tombstone reparse
point when using FILE_FLAG_OPEN_REPARSE_POINT. I assume they do,
like any other reparse point, unless there's deeper magic involved here.
The questions for me are whether os.readlink() should also read junctions and
exactly what follow_symlinks means in Windows. We have a complicated story to
tell if follow_symlinks=False (lstat) opens any reparse point or opens just
name-surrogate reparse points, and islink() is made consistent with this, but
then readlink() doesn't work.
If junctions are handled as symlinks, then islink(), readlink(), symlink()
would be used to copy a junction 'link' while copying a tree (e.g.
shutil.copytree with symlinks=True). This would transform junctions into
directory symlinks. In this case, we potentially have a problem that relative
symlinks in the tree no longer target the same files when accessed via a
directory symlink instead of a junction. No one thinks about this problem on
the POSIX side because it would be weird to copy a mountpoint as a symlink. In
POSIX, a mountpoint is always seen as just a directory and always traversed.
> I'm still not convinced that this is what we want to do. I don't
> have a true Linux machine handy to try it out (Python 3.6 and 3.7 on
> WSL behave exactly like the semantics I'm proposing, but that may
> just be because it's the Windows kernel below it).
If you're accessing NT junctions under WSL, in that environment they're always
handled as symlinks. And the result of my "C:/Junction" and "C:/Symlink"
example --- i.e. "/mnt/c/Junction" and "/mnt/c/Symlink" -- is that *both*
behave the same way, which is as expected since the WSL environment sees both
as symlinks, but also fundamentally wrong. In an NT process, they behave
differently, as a mount point (hard name grafting) and a symlink (soft name
grafting). This is a decision in WSL's drvfs file-system driver, and I have to
assume it's intentional.
In a perfect world, a path on the volume should be consistently evaluated,
regardless of whether it's accessed from a WSL or NT process. But it's also a
difficult problem, maybe intractable, if they want to avoid Linux programs
traversing junctions in dangerous operations -- e.g. `rm -rf`. The only name
surrogate that POSIX programs know about is a symlink (so simple). I can see
why they chose to handle junctions as symlinks, as a conservative, safe option,
even if it leads to inconsistencies.
> ismount() is currently not true for junctions. And I can't find any
> reference that says that POSIX symlinks can't point to directories,
Our current implementation for junctions is based on GetVolumePathNameW, which
will be true for junctions that use the "Volume{...}" name to mount the
file-system root directory. That's a volume mount point.
I don't know why someone decided that's the sum total of "mount point" in
Windows. DOS drives and UNC drives can refer to arbitrary file system
directories. They don't have to refer to file-system root directory. We can
have "W:" -> "\\??\\C:\\Windows", etc.
Per the docs, a mount point for ismount() is a "point in a file system where a
different file system has been mounted". The mounted directory doesn't have to
be the root directory of the file system. I'd relax this definition to include
all "hard" name grafting links to other directories, even within the same file
system. What matter to me is the semantics of how this differs from the soft
name grafting of a symlink.
Note that GetVolumePathNameW is expensive and has bugs with subst drives, which
we're not able to avoid unless someone happens to check the drive root
directory, i.e. "W:/". It will claim that "W:/System32" is a volume path if
"W:" is a subst drive for "C:/Windows". It also has a bug that a drive root is
a mount point, even if the drive doesn't exist. Also, it's wrong in not
checking for junctions in UNC paths. SMB supports opening reparse points over
the wire.
If follow_symlinks=False applies to name surrogates, then a junction would be
detectable via os.lstat(filename).st_reparse_tag, which is not only much
cheaper than GetVolumePathNameW, but also more generally correct and consistent
with DOS and UNC drive mount points.
> nor any evidence that we suppress symlink-to-directory creation or
> resolution in Python (also tested on WSL)..
S_IFDIR is suppressed for directory symlinks in the stat result. But
os.path.isdir() is supposed to be based on os.stat, and thus follows symlinks.
To that end, our nt._isdir is broken because it assumes GetFileAttributesW is
sufficient. Since we're supposed to follow links, it's not working right for
link targets that don't exist. It should return False in that case.
----------
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue37834>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com