Eryk Sun <eryk...@gmail.com> added the comment:

> Where "links" are the generic term for the set that includes 
> "reparse point", "symlink", "mount point", "junction", etc.)

Why group all reparse points under the banner of 'link'? If we have a typical 
HSM reparse point (the vast majority of allocated tags), then all operations, 
such as delete and rename, act on the file itself, not simply the reparse 
point. We should be able to delete or rename a link without affecting the 
target. 

In this case, there's also no chance that the reparse point is a surrogate for 
another path on the system, so code that walks paths doesn't have to worry 
about loops with regard to these reparse points. The only practical use case I 
can think of for detecting/opening this type of reparse point is backup 
software that should avoid triggering an HSM recall. For example:

https://www.ibm.com/support/knowledgecenter/en/SSEQVQ_8.1.0/client/r_opt_hsmreparsetag.html

As I've previously suggested (and this is the last time because I'm becoming a 
broken record), lstat() should at least be restricted to opening only 
name-surrogate reparse points that are supposed to be like links in that they 
target another path in the system. Plus it also has to open unhandled reparse 
points. 

Personally, I'm only comfortable with opening it up to name surrogates if 
islink() and readlink() are still limited to just Unix-like symlinks that we 
can create via symlink(). Nothing changes there. It's just a restriction of how 
lstat() currently works. The addition of the reparse tag in the stat result 
enables special handling of non-symlink surrogates.

> shutil.copytree(path): Unchanged. (requires a minor fix to 
> continue to recursively copy through junctions (using above test), 
> but not symlinks.)

Everyone else who relies on islink(), readlink(), and symlink() to copy 
symlinks isn't special casing their code to look for junctions or anything else 
we lump under the banner of islink(). They could code defensively if readlink() 
fails for a 'link' that we can't read. But that leaves the problem of 
readlink() succeeding for a junction. That can causes problems if the target is 
passed to os.symlink(), which changes the link from a hard name grafting to a 
soft name grafting.

Why would we need to read the target of a junction? It's not needed for 
realpath() in Windows. We should only have to resolve symlinks. For example:

    C:/Mount/junction/spam/eggs

             junction -> Z:/bar/baz

We don't have to resolve this as "Z:/bar/baz/spam/eggs", and doing so may even 
be wrong for someone using it to manually resolve a relative symlink. 
"C:/Mount/junction/spam/eggs" is a solid path. In Unix it would not be resolved 
by realpath(). A solid path is needed to figure out how to create a relative 
symlink, or how to manually resolve one for a given path. 

For example, if "foo_link" in "C:/Mount/junction/spam/eggs" targets 
"../../../foo", this refers to "C:/Mount/foo". On the other hand, if the 
junction mount point were replaced by a soft symlink, then 
"C:/Mount/symlink/spam/eggs" is not a solid path. "foo_link" is instead 
evaluated over the target path: "Z:/bar/baz/spam/eggs/foo_link", so the link 
resolves to "Z:/bar/foo".

IMO, S_IFLNK need not be set for anything other than Unix-like symbolic links. 
We would just need to document that on Windows, lstat opens any link-like 
reparse point that indicates it targets another path on the system, plus any 
reparse point that's not handled, but that islink() is only true for actual 
Unix symlinks that can be created via os.symlink() and read via os.readlink(). 

This preserves how islink() and readlink() currently work, while still leaving 
the door open to fix misbehavior in particular cases. Code, including our own 
code, that needs to look for the broader Windows category of "name surrogate" 
can examine the reparse tag. For convenience we can provide issurrogate() that 
checks lstat(filename).st_reparse_tag & 0x2000_0000. This can be true for 
directories. Also, a surrogate doesn't have to behave like a Unix "soft" 
symlink, i.e. it applies to "hard" mount points. In Unix, issurrogate() could 
just be an alias for islink() since Unix provides only one type of name 
surrogate.

Currently the name surrogate category includes the following tags:

    Microsoft name surrogate (bits 31 and 29)

    IO_REPARSE_TAG_MOUNT_POINT                  0xA0000003
    IO_REPARSE_TAG_SYMLINK                      0xA000000C
    IO_REPARSE_TAG_IIS_CACHE                    0xA0000010
    IO_REPARSE_TAG_GLOBAL_REPARSE               0xA0000019
    IO_REPARSE_TAG_LX_SYMLINK                   0xA000001D
    IO_REPARSE_TAG_WCI_TOMBSTONE                0xA000001F
    IO_REPARSE_TAG_PROJFS_TOMBSTONE             0xA0000022

    Non-Microsoft name surrogate (bit 29)

    IO_REPARSE_TAG_SOLUTIONSOFT                 0x2000000D
    IO_REPARSE_TAG_OSR_SAMPLE                   0x20000017
    IO_REPARSE_TAG_QI_TECH_HSM                  0x2000002F
    IO_REPARSE_TAG_MAXISCALE_HSM                0x20000035
    IO_REPARSE_TAG_ALERTBOOT                    0x2000004C
    IO_REPARSE_TAG_NVIDIA_UNIONFS               0x20000054

IO_REPARSE_TAG_OSR_SAMPLE is used by OSR sample code in their Windows driver 
curriculum, so that one is unlikely to be seen in practice. I don't know 
anything about the other non-Microsoft tags. NVidia's UnionFS looks 
interesting. Using reparse points to merge file systems is probably not the 
most efficient way to handle that problem, but I'm sure the devil is in the 
details there.

> os.unlink(path): unchanged (still removes the junction, not the 
> contents)

Whatever we're calling a link should be capable of being deleted via os.unlink. 
If we apply S_IFLNK, then it won't have S_IFDIR (at least how POSIX code 
expects it), and unlink should work on it. The current state of affairs in 
which unlink/remove works on a junction, which is reported by stat() as a 
directory, is inconsistent. It's not specified to remove directories, so 
nothing that it can remove should be a directory.

> shutil.rmtree(path): Will now remove a junction rather than 
> recursively deleting its contents (net improvement, IMHO)

I'd like for it to remove all name-surrogate directories like CMD's `rmdir /s` 
does. In contrast, Unix shutil.rmtree traverses into a mount point, deletes 
everything, and then fails because the directory is mounted and can't be 
removed. That's hideous, IMO.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue37834>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to