Re: Remove directory tree without following symlinks
On Sun, Apr 24, 2016 at 5:42 AM, Albert-Jan Roskam wrote: > Aww, I kinda forgot about that already, but I came across this last > year [1]. Apparently, shutil.rmtree(very_long_path) failed under Win 7, > even with the "silly prefix". I believe very_long_path was a > Python2-str. > [1] > https://mail.python.org/pipermail/python-list/2015-June/693156.html Python 2's str branch of the os functions gets implemented on Windows using the [A]NSI API, such as FindFirstFileA and FindNextFileA to implement listdir(). Generally the ANSI API is a light wrapper around the [W]ide-character API. It simply decodes byte strings to UTF-16 and calls the wide-character function (or a common internal function). IIRC, in Windows 7, byte strings are decoded using a per-thread buffer with size MAX_PATH (260), so prefixing the path with "\\?\" won't help. You have to use the wide-character API. Windows 10, on the other hand, decodes using a dynamically allocated buffer, so you can usually get away with using a long byte string. But not with Python 2 os.listdir(), which uses a stack-allocated MAX_PATH+5 buffer in the str branch. For example: Python 2 os.mkdir works: >>> path = os.path.normpath('//?/C:/Temp/long/' + 'a' * 255) >>> os.makedirs(path) but os.listdir requires unicode: >>> os.listdir(path) Traceback (most recent call last): File "", line 1, in TypeError: must be (buffer overflow), not str >>> os.listdir(path.decode('mbcs')) [] Also, the str branch of listdir appends "/*.*", with a forward slash, so it's incompatible with the "\\?\" prefix, even for short paths: >>> os.listdir(r'\\?\C:\Temp') Traceback (most recent call last): File "", line 1, in WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: '?\\C:\\Temp/*.*' > It seems useful if shutil or os.path would automatically prefix paths > with "\\?\". It is rarely really needed, though. (in my case it was > needed to copy a bunch of MS Outlook .msg files, which automatically > get the subject line as the filename, and perhaps the first sentence > of the mail of the mail has no subject). I doubt a change like that would get backported to 2.7. Recently there was a lengthy discussion about adding an __fspath__ protocol to Python 3. Possibly this can be automatically handled in the __fspath__ implementation of pathlib.WindowsPath and the DirEntry type returned by os.scandir. -- https://mail.python.org/mailman/listinfo/python-list
RE: Remove directory tree without following symlinks
> From: eryk...@gmail.com > Date: Sat, 23 Apr 2016 15:22:35 -0500 > Subject: Re: Remove directory tree without following symlinks > To: python-list@python.org > > On Sat, Apr 23, 2016 at 4:34 AM, Albert-Jan Roskam > wrote: >> >>> From: eryk...@gmail.com >>> Date: Fri, 22 Apr 2016 13:28:01 -0500 >>> On Fri, Apr 22, 2016 at 12:39 PM, Albert-Jan Roskam >>> wrote: >>>> FYI, Just today I found out that shutil.rmtree raises a WindowsError if >>>> the dir is read-only (or its contents). Using 'ignore_errors', won't help. >>>> Sure, no error is raised, but the dir is not deleted either! A 'force' >>>> option >>>> would be a nice improvement. >>> >>> Use the onerror handler to call os.chmod(path, stat.S_IWRITE). For >>> example, see pip's rmtree_errorhandler: >>> >>> https://github.com/pypa/pip/blob/8.1.1/pip/utils/__init__.py#L105 >> >> Thanks, that looks useful indeed. I thought about os.chmod, but with >> os.walk. That seemed expensive. So I used subprocess.call('rmdir "%s" /s /q' >> % dirname). That's Windows only, of course, but aside of that, is using >> subprocess less preferable? > > I assume you used shell=True in the above call, and not an external > rmdir.exe. There are security concerns with using the shell if you're > not in complete control of the command line. > > As to performance, cmd's rmdir wins without question, not only because > it's implemented in C, but also because it uses the stat data from the > WIN32_FIND_DATA returned by FindFirstFile/FindNextFile to check for > FILE_ATTRIBUTE_DIRECTORY and FILE_ATTRIBUTE_READONLY. > > On the other hand, Python wins when it comes to working with deeply > nested directories. Paths in cmd are limited to MAX_PATH characters. > rmdir uses DOS 8.3 short names (i.e. cAlternateFileName in > WIN32_FIND_DATA), but that could still exceed MAX_PATH for a deeply > nested tree, or the volume may not even have 8.3 DOS filenames. > shutil.rmtree allows you to work around the DOS limit by prefixing the > path with "\\?\". For example: > >>>> subprocess.call(r'rmdir /q/s Z:\Temp\long', shell=True) > The path Z:\Temp\long\aa > > > > a is too long. > 0 > >>>> shutil.rmtree(r'\\?\Z:\Temp\long') >>>> os.path.exists(r'Z:\Temp\long') > False > > Using "\\?\" requires a path that's fully qualified, normalized > (backslash only), and unicode (i.e. decode a Python 2 str). Aww, I kinda forgot about that already, but I came across this last year [1]. Apparently, shutil.rmtree(very_long_path) failed under Win 7, even with the "silly prefix". I believe very_long_path was a Python2-str. It seems useful if shutil or os.path would automatically prefix paths with "\\?\". It is rarely really needed, though. (in my case it was needed to copy a bunch of MS Outlook .msg files, which automatically get the subject line as the filename, and perhaps the first sentence of the mail of the mail has no subject). [1] https://mail.python.org/pipermail/python-list/2015-June/693156.html -- https://mail.python.org/mailman/listinfo/python-list
Re: Remove directory tree without following symlinks
On Sat, Apr 23, 2016, at 12:29, Nobody wrote: > On Linux, an alternative is to use fchdir() rather than chdir(), which > changes to a directory specified by an open file descriptor Of course, then there's also the risk of running out of open file descriptors. High-quality implementations of rm will fork to deal with this. -- https://mail.python.org/mailman/listinfo/python-list
Re: Remove directory tree without following symlinks
On Sat, Apr 23, 2016, at 06:24, Steven D'Aprano wrote: > "rm -r" gives me a NameError when I run it in my Python script :-) > > But seriously, where is that documented? I've read the man page for > rm, and it doesn't say anything about treatment of symlinks, nor is > there an option to follow/not follow symlinks. So I never trust rm -r > unless I know what I'm deleting. The Unix Standard says "For each entry contained in file, other than dot or dot-dot, the four steps listed here (1 to 4) shall be taken with the entry as if it were a file operand. The rm utility shall not traverse directories by following symbolic links into other parts of the hierarchy, but shall remove the links themselves." and "The rm utility removes symbolic links themselves, not the files they refer to, as a consequence of the dependence on the unlink() functionality, per the DESCRIPTION. When removing hierarchies with -r or -R, the prohibition on following symbolic links has to be made explicit." OSX (and I assume other BSDs) says "The rm utility removes symbolic links, not the files referenced by the links." I don't know why GNU rm's documentation doesn't say anything about its treatment of symlinks - maybe it never occurred to anyone at GNU that someone might think it would do anything else. (I'd be less inclined to trust windows' treatment of symlinks, junctions, and other reparse points without doing some experiments) -- https://mail.python.org/mailman/listinfo/python-list
Re: Remove directory tree without following symlinks
On Sat, Apr 23, 2016 at 4:34 AM, Albert-Jan Roskam wrote: > >> From: eryk...@gmail.com >> Date: Fri, 22 Apr 2016 13:28:01 -0500 >> On Fri, Apr 22, 2016 at 12:39 PM, Albert-Jan Roskam >> wrote: >> > FYI, Just today I found out that shutil.rmtree raises a WindowsError if >> > the dir is read-only (or its contents). Using 'ignore_errors', won't help. >> > Sure, no error is raised, but the dir is not deleted either! A 'force' >> > option >> > would be a nice improvement. >> >> Use the onerror handler to call os.chmod(path, stat.S_IWRITE). For >> example, see pip's rmtree_errorhandler: >> >> https://github.com/pypa/pip/blob/8.1.1/pip/utils/__init__.py#L105 > > Thanks, that looks useful indeed. I thought about os.chmod, but with > os.walk. That seemed expensive. So I used subprocess.call('rmdir "%s" /s /q' > % dirname). That's Windows only, of course, but aside of that, is using > subprocess less preferable? I assume you used shell=True in the above call, and not an external rmdir.exe. There are security concerns with using the shell if you're not in complete control of the command line. As to performance, cmd's rmdir wins without question, not only because it's implemented in C, but also because it uses the stat data from the WIN32_FIND_DATA returned by FindFirstFile/FindNextFile to check for FILE_ATTRIBUTE_DIRECTORY and FILE_ATTRIBUTE_READONLY. On the other hand, Python wins when it comes to working with deeply nested directories. Paths in cmd are limited to MAX_PATH characters. rmdir uses DOS 8.3 short names (i.e. cAlternateFileName in WIN32_FIND_DATA), but that could still exceed MAX_PATH for a deeply nested tree, or the volume may not even have 8.3 DOS filenames. shutil.rmtree allows you to work around the DOS limit by prefixing the path with "\\?\". For example: >>> subprocess.call(r'rmdir /q/s Z:\Temp\long', shell=True) The path Z:\Temp\long\aa a is too long. 0 >>> shutil.rmtree(r'\\?\Z:\Temp\long') >>> os.path.exists(r'Z:\Temp\long') False Using "\\?\" requires a path that's fully qualified, normalized (backslash only), and unicode (i.e. decode a Python 2 str). -- https://mail.python.org/mailman/listinfo/python-list
Re: Remove directory tree without following symlinks
On Sat, 23 Apr 2016 00:56:33 +1000, Steven D'Aprano wrote: > I want to remove a directory, including all files and subdirectories under > it, but without following symlinks. I want the symlinks to be deleted, not > the files pointed to by those symlinks. Note that this is non-trivial to do securely, i.e. where an adversary has write permission on any of the directories involved. Due to the potential for race conditions between checking whether a name refers to a directory and recursing into it, the process can be tricked into deleting any directory tree for which it has the appropriate permissions. The solution requires: 1. That you always chdir() into each directory and remove entries using their plain filename, rather than trying to remove entries from a higher-level directory using a relative path. 2. When chdir()ing into each subdirectory, you need to do e.g.: st1 = os.stat(".") os.chdir(subdir) st2 = os.stat("..") if st1.st_dev != st2.st_dev or st1.st_ino != st2.st_ino: raise SomeKindOfException() If the test fails, it means that the directory you just chdir()d into isn't actually a subdirectory of the one you just left, e.g. because the directory entry was replaced between checking it and chdir()ing into it. On Linux, an alternative is to use fchdir() rather than chdir(), which changes to a directory specified by an open file descriptor for that directory rather than by name. Provided that the directory was open()ed without any race condition (e.g. using O_NOFOLLOW), subsequent fstat() and fchdir() calls are guaranteed to use the same directory regardless of any filesystem changes. -- https://mail.python.org/mailman/listinfo/python-list
Re: Remove directory tree without following symlinks
Steven D'Aprano wrote: But seriously, where is that documented? I've read the man page for rm, and it doesn't say anything about treatment of symlinks The Linux man page seems to be a bit deficient on this. The BSD version contains this sentence: The rm utility removes symbolic links, not the files referenced by the links. -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: Remove directory tree without following symlinks
On Sat, 23 Apr 2016 06:13 pm, Paul Rubin wrote: > Steven D'Aprano writes: >> I want to remove a directory, including all files and subdirectories >> under it, but without following symlinks. I want the symlinks to be >> deleted, not the files pointed to by those symlinks. > > rm -r shouldn't follow symlinks like you mention. "rm -r" gives me a NameError when I run it in my Python script :-) But seriously, where is that documented? I've read the man page for rm, and it doesn't say anything about treatment of symlinks, nor is there an option to follow/not follow symlinks. So I never trust rm -r unless I know what I'm deleting. -- Steven -- https://mail.python.org/mailman/listinfo/python-list
RE: Remove directory tree without following symlinks
> From: eryk...@gmail.com > Date: Fri, 22 Apr 2016 13:28:01 -0500 > Subject: Re: Remove directory tree without following symlinks > To: python-list@python.org > > On Fri, Apr 22, 2016 at 12:39 PM, Albert-Jan Roskam > wrote: > > FYI, Just today I found out that shutil.rmtree raises a WindowsError if the > > dir is read- > > only (or its contents). Using 'ignore_errors', won't help. Sure, no error > > is raised, but the > > dir is not deleted either! A 'force' option would be a nice improvement. > > Use the onerror handler to call os.chmod(path, stat.S_IWRITE). For > example, see pip's rmtree_errorhandler: > > https://github.com/pypa/pip/blob/8.1.1/pip/utils/__init__.py#L105 Thanks, that looks useful indeed. I thought about os.chmod, but with os.walk. That seemed expensive. So I used subprocess.call('rmdir "%s" /s /q' % dirname). That's Windows only, of course, but aside of that, is using subprocess less preferable?Fun fact: I used it to remove .svn dirs, just like what is mentioned in the pip comments :-) -- https://mail.python.org/mailman/listinfo/python-list
Re: Remove directory tree without following symlinks
Steven D'Aprano writes: > I want to remove a directory, including all files and subdirectories under > it, but without following symlinks. I want the symlinks to be deleted, not > the files pointed to by those symlinks. rm -r shouldn't follow symlinks like you mention. -- https://mail.python.org/mailman/listinfo/python-list
Re: Remove directory tree without following symlinks
On Fri, Apr 22, 2016 at 12:39 PM, Albert-Jan Roskam wrote: > FYI, Just today I found out that shutil.rmtree raises a WindowsError if the > dir is read- > only (or its contents). Using 'ignore_errors', won't help. Sure, no error is > raised, but the > dir is not deleted either! A 'force' option would be a nice improvement. Use the onerror handler to call os.chmod(path, stat.S_IWRITE). For example, see pip's rmtree_errorhandler: https://github.com/pypa/pip/blob/8.1.1/pip/utils/__init__.py#L105 -- https://mail.python.org/mailman/listinfo/python-list
RE: Remove directory tree without following symlinks
> From: st...@pearwood.info > Subject: Re: Remove directory tree without following symlinks > Date: Sat, 23 Apr 2016 03:14:12 +1000 > To: python-list@python.org > > On Sat, 23 Apr 2016 01:09 am, Random832 wrote: > > > On Fri, Apr 22, 2016, at 10:56, Steven D'Aprano wrote: > >> What should I use for "remove_tree"? Do I have to write my own, or does a > >> solution already exist? > > > > In the os.walk documentation it provides a simple recipe and also > > mentions shutil.rmtree > > Thanks for that. FYI, Just today I found out that shutil.rmtree raises a WindowsError if the dir is read-only (or its contents). Using 'ignore_errors', won't help. Sure, no error is raised, but the dir is not deleted either! A 'force' option would be a nice improvement. > The os.walk recipe is described as a simple version of shutil.rmtree. The > documentation for rmtree seems lacking to me, but after testing it, it > appears to work as I want it: it removes symbolic links, it does not follow > them. > > Is anyone else able to confirm that my understanding is correct? If so, the > documentation should probably be a bit clearer. > > > > -- > Steven > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Remove directory tree without following symlinks
On Sat, 23 Apr 2016 01:09 am, Random832 wrote: > On Fri, Apr 22, 2016, at 10:56, Steven D'Aprano wrote: >> What should I use for "remove_tree"? Do I have to write my own, or does a >> solution already exist? > > In the os.walk documentation it provides a simple recipe and also > mentions shutil.rmtree Thanks for that. The os.walk recipe is described as a simple version of shutil.rmtree. The documentation for rmtree seems lacking to me, but after testing it, it appears to work as I want it: it removes symbolic links, it does not follow them. Is anyone else able to confirm that my understanding is correct? If so, the documentation should probably be a bit clearer. -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: Remove directory tree without following symlinks
On Fri, Apr 22, 2016, at 10:56, Steven D'Aprano wrote: > What should I use for "remove_tree"? Do I have to write my own, or does a > solution already exist? In the os.walk documentation it provides a simple recipe and also mentions shutil.rmtree -- https://mail.python.org/mailman/listinfo/python-list
Remove directory tree without following symlinks
I want to remove a directory, including all files and subdirectories under it, but without following symlinks. I want the symlinks to be deleted, not the files pointed to by those symlinks. E.g. if I have this tree: parent/ +-- spam/ : +-- a.txt : +-- b.txt : +-- eggs/ : : +-- c.txt : : +-- surprise -> ../../parent : +-- d.txt +-- e.txt and I call remove_tree("parent/spam"), I want the result to be: parent/ +-- e.txt (Assuming that I have permission to delete all the files and directories.) What should I use for "remove_tree"? Do I have to write my own, or does a solution already exist? -- Steven -- https://mail.python.org/mailman/listinfo/python-list