Re: Remove directory tree without following symlinks

2016-04-24 Thread eryk sun
On Sun, Apr 24, 2016 at 5:42 AM, Albert-Jan Roskam
 wrote:
> Aww, I kinda forgot about that already, but I came across this last
> year [1]. Apparently, shutil.rmtree(very_long_path) failed under Win 7,
> even with the "silly prefix". I believe very_long_path was a
> Python2-str.
> [1]
> https://mail.python.org/pipermail/python-list/2015-June/693156.html

Python 2's str branch of the os functions gets implemented on Windows
using the [A]NSI API, such as FindFirstFileA and FindNextFileA to
implement listdir(). Generally the ANSI API is a light wrapper around
the [W]ide-character API. It simply decodes byte strings to UTF-16 and
calls the wide-character function (or a common internal function).

IIRC, in Windows 7, byte strings are decoded using a per-thread buffer
with size MAX_PATH (260), so prefixing the path with "\\?\" won't
help. You have to use the wide-character API. Windows 10, on the other
hand, decodes using a dynamically allocated buffer, so you can usually
get away with using a long byte string. But not with Python 2
os.listdir(), which uses a stack-allocated MAX_PATH+5 buffer in the
str branch. For example:

Python 2 os.mkdir works:

>>> path = os.path.normpath('//?/C:/Temp/long/' + 'a' * 255)
>>> os.makedirs(path)

but os.listdir requires unicode:

>>> os.listdir(path)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: must be (buffer overflow), not str
>>> os.listdir(path.decode('mbcs'))
[]

Also, the str branch of listdir appends "/*.*", with a forward slash,
so it's incompatible with the "\\?\" prefix, even for short paths:

>>> os.listdir(r'\\?\C:\Temp')
Traceback (most recent call last):
  File "", line 1, in 
WindowsError: [Error 123] The filename, directory name, or volume
label syntax is incorrect: '?\\C:\\Temp/*.*'

> It seems useful if shutil or os.path would automatically prefix paths
> with "\\?\". It is rarely really needed, though. (in my case it was
> needed to copy a bunch of MS Outlook .msg files, which automatically
> get the subject line as the filename, and perhaps the first sentence
> of the mail of the mail has no subject).

I doubt a change like that would get backported to 2.7. Recently there
was a lengthy discussion about adding an __fspath__ protocol to Python
3. Possibly this can be automatically handled in the __fspath__
implementation of pathlib.WindowsPath and the DirEntry type returned
by os.scandir.
-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Remove directory tree without following symlinks

2016-04-24 Thread Albert-Jan Roskam

> From: eryk...@gmail.com
> Date: Sat, 23 Apr 2016 15:22:35 -0500
> Subject: Re: Remove directory tree without following symlinks
> To: python-list@python.org
> 
> On Sat, Apr 23, 2016 at 4:34 AM, Albert-Jan Roskam
>  wrote:
>>
>>> From: eryk...@gmail.com
>>> Date: Fri, 22 Apr 2016 13:28:01 -0500
>>> On Fri, Apr 22, 2016 at 12:39 PM, Albert-Jan Roskam
>>>  wrote:
>>>> FYI, Just today I found out that shutil.rmtree raises a WindowsError if
>>>> the dir is read-only (or its contents). Using 'ignore_errors', won't help.
>>>> Sure, no error is raised, but the dir is not deleted either! A 'force' 
>>>> option
>>>> would be a nice improvement.
>>>
>>> Use the onerror handler to call os.chmod(path, stat.S_IWRITE). For
>>> example, see pip's rmtree_errorhandler:
>>>
>>> https://github.com/pypa/pip/blob/8.1.1/pip/utils/__init__.py#L105
>>
>> Thanks, that looks useful indeed. I thought about os.chmod, but with
>> os.walk. That seemed expensive. So I used subprocess.call('rmdir "%s" /s /q'
>> % dirname). That's Windows only, of course, but aside of that, is using
>> subprocess less preferable?
> 
> I assume you used shell=True in the above call, and not an external
> rmdir.exe. There are security concerns with using the shell if you're
> not in complete control of the command line.
> 
> As to performance, cmd's rmdir wins without question, not only because
> it's implemented in C, but also because it uses the stat data from the
> WIN32_FIND_DATA returned by FindFirstFile/FindNextFile to check for
> FILE_ATTRIBUTE_DIRECTORY and FILE_ATTRIBUTE_READONLY.
> 
> On the other hand, Python wins when it comes to working with deeply
> nested directories. Paths in cmd are limited to MAX_PATH characters.
> rmdir uses DOS 8.3 short names (i.e. cAlternateFileName in
> WIN32_FIND_DATA), but that could still exceed MAX_PATH for a deeply
> nested tree, or the volume may not even have 8.3 DOS filenames.
> shutil.rmtree allows you to work around the DOS limit by prefixing the
> path with "\\?\". For example:
> 
>>>> subprocess.call(r'rmdir /q/s Z:\Temp\long', shell=True)
> The path Z:\Temp\long\aa
> 
> 
> 
> a is too long.
> 0
> 
>>>> shutil.rmtree(r'\\?\Z:\Temp\long')
>>>> os.path.exists(r'Z:\Temp\long')
> False
> 
> Using "\\?\" requires a path that's fully qualified, normalized
> (backslash only), and unicode (i.e. decode a Python 2 str).

Aww, I kinda forgot about that already, but I came across this last year [1]. 
Apparently, 
shutil.rmtree(very_long_path) failed under Win 7, even with the "silly prefix". 
I believe very_long_path was a Python2-str.
It seems useful if shutil or os.path would automatically prefix paths with 
"\\?\". It is rarely really needed, though.
(in my case it was needed to copy a bunch of MS Outlook .msg files, which 
automatically get the subject line as the filename, and perhaps
the first sentence of the mail of the mail has no subject).

[1] https://mail.python.org/pipermail/python-list/2015-June/693156.html

  
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Remove directory tree without following symlinks

2016-04-23 Thread Random832
On Sat, Apr 23, 2016, at 12:29, Nobody wrote:
> On Linux, an alternative is to use fchdir() rather than chdir(), which
> changes to a directory specified by an open file descriptor 

Of course, then there's also the risk of running out of open file
descriptors. High-quality implementations of rm will fork to deal with
this.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Remove directory tree without following symlinks

2016-04-23 Thread Random832
On Sat, Apr 23, 2016, at 06:24, Steven D'Aprano wrote:
> "rm -r" gives me a NameError when I run it in my Python script :-)
> 
> But seriously, where is that documented? I've read the man page for
> rm, and it doesn't say anything about treatment of symlinks, nor is
> there an option to follow/not follow symlinks. So I never trust rm -r
> unless I know what I'm deleting.

The Unix Standard says "For each entry contained in file, other than dot
or dot-dot, the four steps listed here (1 to 4) shall be taken with the
entry as if it were a file operand. The rm utility shall not traverse
directories by following symbolic links into other parts of the
hierarchy, but shall remove the links themselves." and "The rm utility
removes symbolic links themselves, not the files they refer to, as a
consequence of the dependence on the unlink() functionality, per the
DESCRIPTION. When removing hierarchies with -r or -R, the prohibition on
following symbolic links has to be made explicit."

OSX (and I assume other BSDs) says "The rm utility removes symbolic
links, not the files referenced by the links."

I don't know why GNU rm's documentation doesn't say anything about its
treatment of symlinks - maybe it never occurred to anyone at GNU that
someone might think it would do anything else.

(I'd be less inclined to trust windows' treatment of symlinks,
junctions, and other reparse points without doing some experiments)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Remove directory tree without following symlinks

2016-04-23 Thread eryk sun
On Sat, Apr 23, 2016 at 4:34 AM, Albert-Jan Roskam
 wrote:
>
>> From: eryk...@gmail.com
>> Date: Fri, 22 Apr 2016 13:28:01 -0500
>> On Fri, Apr 22, 2016 at 12:39 PM, Albert-Jan Roskam
>>  wrote:
>> > FYI, Just today I found out that shutil.rmtree raises a WindowsError if
>> > the dir is read-only (or its contents). Using 'ignore_errors', won't help.
>> > Sure, no error is raised, but the dir is not deleted either! A 'force' 
>> > option
>> > would be a nice improvement.
>>
>> Use the onerror handler to call os.chmod(path, stat.S_IWRITE). For
>> example, see pip's rmtree_errorhandler:
>>
>> https://github.com/pypa/pip/blob/8.1.1/pip/utils/__init__.py#L105
>
> Thanks, that looks useful indeed. I thought about os.chmod, but with
> os.walk. That seemed expensive. So I used subprocess.call('rmdir "%s" /s /q'
> % dirname). That's Windows only, of course, but aside of that, is using
> subprocess less preferable?

I assume you used shell=True in the above call, and not an external
rmdir.exe. There are security concerns with using the shell if you're
not in complete control of the command line.

As to performance, cmd's rmdir wins without question, not only because
it's implemented in C, but also because it uses the stat data from the
WIN32_FIND_DATA returned by FindFirstFile/FindNextFile to check for
FILE_ATTRIBUTE_DIRECTORY and FILE_ATTRIBUTE_READONLY.

On the other hand, Python wins when it comes to working with deeply
nested directories. Paths in cmd are limited to MAX_PATH characters.
rmdir uses DOS 8.3 short names (i.e. cAlternateFileName in
WIN32_FIND_DATA), but that could still exceed MAX_PATH for a deeply
nested tree, or the volume may not even have 8.3 DOS filenames.
shutil.rmtree allows you to work around the DOS limit by prefixing the
path with "\\?\". For example:

>>> subprocess.call(r'rmdir /q/s Z:\Temp\long', shell=True)
The path Z:\Temp\long\aa



a is too long.
0

>>> shutil.rmtree(r'\\?\Z:\Temp\long')
>>> os.path.exists(r'Z:\Temp\long')
False

Using "\\?\" requires a path that's fully qualified, normalized
(backslash only), and unicode (i.e. decode a Python 2 str).
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Remove directory tree without following symlinks

2016-04-23 Thread Nobody
On Sat, 23 Apr 2016 00:56:33 +1000, Steven D'Aprano wrote:

> I want to remove a directory, including all files and subdirectories under
> it, but without following symlinks. I want the symlinks to be deleted, not
> the files pointed to by those symlinks.

Note that this is non-trivial to do securely, i.e. where an adversary has
write permission on any of the directories involved. Due to the potential
for race conditions between checking whether a name refers to a directory
and recursing into it, the process can be tricked into deleting any
directory tree for which it has the appropriate permissions.

The solution requires:

1. That you always chdir() into each directory and remove entries using
their plain filename, rather than trying to remove entries from a
higher-level directory using a relative path.

2. When chdir()ing into each subdirectory, you need to do e.g.:

st1 = os.stat(".")
os.chdir(subdir)
st2 = os.stat("..")
if st1.st_dev != st2.st_dev or st1.st_ino != st2.st_ino:
raise SomeKindOfException()

If the test fails, it means that the directory you just chdir()d into
isn't actually a subdirectory of the one you just left, e.g. because the
directory entry was replaced between checking it and chdir()ing into it.

On Linux, an alternative is to use fchdir() rather than chdir(), which
changes to a directory specified by an open file descriptor for that
directory rather than by name. Provided that the directory was open()ed
without any race condition (e.g. using O_NOFOLLOW), subsequent fstat() and
fchdir() calls are guaranteed to use the same directory regardless of any
filesystem changes.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Remove directory tree without following symlinks

2016-04-23 Thread Gregory Ewing

Steven D'Aprano wrote:

But seriously, where is that documented? I've read the man page for rm, and
it doesn't say anything about treatment of symlinks


The Linux man page seems to be a bit deficient on this.
The BSD version contains this sentence:

 The rm utility removes symbolic links, not the files referenced by the
 links.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Re: Remove directory tree without following symlinks

2016-04-23 Thread Steven D'Aprano
On Sat, 23 Apr 2016 06:13 pm, Paul Rubin wrote:

> Steven D'Aprano  writes:
>> I want to remove a directory, including all files and subdirectories
>> under it, but without following symlinks. I want the symlinks to be
>> deleted, not the files pointed to by those symlinks.
> 
> rm -r shouldn't follow symlinks like you mention.


"rm -r" gives me a NameError when I run it in my Python script :-)

But seriously, where is that documented? I've read the man page for rm, and
it doesn't say anything about treatment of symlinks, nor is there an option
to follow/not follow symlinks. So I never trust rm -r unless I know what
I'm deleting.



-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Remove directory tree without following symlinks

2016-04-23 Thread Albert-Jan Roskam


> From: eryk...@gmail.com
> Date: Fri, 22 Apr 2016 13:28:01 -0500
> Subject: Re: Remove directory tree without following symlinks
> To: python-list@python.org
> 
> On Fri, Apr 22, 2016 at 12:39 PM, Albert-Jan Roskam
>  wrote:
> > FYI, Just today I found out that shutil.rmtree raises a WindowsError if the 
> > dir is read-
> > only (or its contents). Using 'ignore_errors', won't help. Sure, no error 
> > is raised, but the
> > dir is not deleted either! A 'force' option would be a nice improvement.
> 
> Use the onerror handler to call os.chmod(path, stat.S_IWRITE). For
> example, see pip's rmtree_errorhandler:
> 
> https://github.com/pypa/pip/blob/8.1.1/pip/utils/__init__.py#L105

Thanks, that looks useful indeed. I thought about os.chmod, but with os.walk. 
That seemed expensive. So I used subprocess.call('rmdir "%s" /s /q' % dirname). 
That's Windows only, of course, but aside of that, is using subprocess less 
preferable?Fun fact: I used it to remove .svn dirs, just like what is mentioned 
in the pip comments :-)

  
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Remove directory tree without following symlinks

2016-04-23 Thread Paul Rubin
Steven D'Aprano  writes:
> I want to remove a directory, including all files and subdirectories under
> it, but without following symlinks. I want the symlinks to be deleted, not
> the files pointed to by those symlinks.

rm -r shouldn't follow symlinks like you mention.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Remove directory tree without following symlinks

2016-04-22 Thread eryk sun
On Fri, Apr 22, 2016 at 12:39 PM, Albert-Jan Roskam
 wrote:
> FYI, Just today I found out that shutil.rmtree raises a WindowsError if the 
> dir is read-
> only (or its contents). Using 'ignore_errors', won't help. Sure, no error is 
> raised, but the
> dir is not deleted either! A 'force' option would be a nice improvement.

Use the onerror handler to call os.chmod(path, stat.S_IWRITE). For
example, see pip's rmtree_errorhandler:

https://github.com/pypa/pip/blob/8.1.1/pip/utils/__init__.py#L105
-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Remove directory tree without following symlinks

2016-04-22 Thread Albert-Jan Roskam


> From: st...@pearwood.info
> Subject: Re: Remove directory tree without following symlinks
> Date: Sat, 23 Apr 2016 03:14:12 +1000
> To: python-list@python.org
> 
> On Sat, 23 Apr 2016 01:09 am, Random832 wrote:
> 
> > On Fri, Apr 22, 2016, at 10:56, Steven D'Aprano wrote:
> >> What should I use for "remove_tree"? Do I have to write my own, or does a
> >> solution already exist?
> > 
> > In the os.walk documentation it provides a simple recipe and also
> > mentions shutil.rmtree
> 
> Thanks for that.

FYI, Just today I found out that shutil.rmtree raises a WindowsError if the dir 
is read-only (or its contents). Using 'ignore_errors', won't help. Sure, no 
error is raised, but the dir is not deleted either! A 'force' option would be a 
nice improvement.



> The os.walk recipe is described as a simple version of shutil.rmtree. The
> documentation for rmtree seems lacking to me, but after testing it, it
> appears to work as I want it: it removes symbolic links, it does not follow
> them.
>
> Is anyone else able to confirm that my understanding is correct? If so, the
> documentation should probably be a bit clearer.
> 
> 
> 
> -- 
> Steven
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list
  
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Remove directory tree without following symlinks

2016-04-22 Thread Steven D'Aprano
On Sat, 23 Apr 2016 01:09 am, Random832 wrote:

> On Fri, Apr 22, 2016, at 10:56, Steven D'Aprano wrote:
>> What should I use for "remove_tree"? Do I have to write my own, or does a
>> solution already exist?
> 
> In the os.walk documentation it provides a simple recipe and also
> mentions shutil.rmtree

Thanks for that.

The os.walk recipe is described as a simple version of shutil.rmtree. The
documentation for rmtree seems lacking to me, but after testing it, it
appears to work as I want it: it removes symbolic links, it does not follow
them.

Is anyone else able to confirm that my understanding is correct? If so, the
documentation should probably be a bit clearer.



-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Remove directory tree without following symlinks

2016-04-22 Thread Random832
On Fri, Apr 22, 2016, at 10:56, Steven D'Aprano wrote:
> What should I use for "remove_tree"? Do I have to write my own, or does a
> solution already exist?

In the os.walk documentation it provides a simple recipe and also
mentions shutil.rmtree
-- 
https://mail.python.org/mailman/listinfo/python-list


Remove directory tree without following symlinks

2016-04-22 Thread Steven D'Aprano
I want to remove a directory, including all files and subdirectories under
it, but without following symlinks. I want the symlinks to be deleted, not
the files pointed to by those symlinks.

E.g. if I have this tree:


parent/
+-- spam/
:   +-- a.txt
:   +-- b.txt
:   +-- eggs/
:   :   +-- c.txt
:   :   +-- surprise -> ../../parent
:   +-- d.txt
+-- e.txt

and I call remove_tree("parent/spam"), I want the result to be:

parent/
+-- e.txt


(Assuming that I have permission to delete all the files and directories.)

What should I use for "remove_tree"? Do I have to write my own, or does a
solution already exist?




-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list