[issue26545] [doc] os.walk is limited by python's recursion limit

2022-03-16 Thread Irit Katriel


Irit Katriel  added the comment:

I agree with Stanley. The documentation for os is clear that recursion is used 
and the documentation for RecursionError links to getrecursionlimit(). This 
seems sufficient.

--
nosy: +iritkatriel
resolution:  -> wont fix
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26545] [doc] os.walk is limited by python's recursion limit

2022-03-02 Thread Stanley


Stanley  added the comment:

I'm not too sure about documenting the recursive limit here. There's a few 
other recursive functions in the os library (such as makedirs()) and if we note 
the recursive limit for os.walk then all of them should be noted too, but that 
doesn't seem quite right to me.

--
nosy: +slateny

___
Python tracker 
<https://bugs.python.org/issue26545>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12970] [doc] os.walk() consider some symlinks as dirs instead of non-dirs

2021-11-28 Thread Irit Katriel


Change by Irit Katriel :


--
title: os.walk() consider some symlinks as dirs instead of non-dirs -> [doc] 
os.walk() consider some symlinks as dirs instead of non-dirs
versions: +Python 3.11 -Python 2.7, Python 3.2, Python 3.3

___
Python tracker 
<https://bugs.python.org/issue12970>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45564] shutil.rmtree and os.walk are implemented using recursion, fail on deep hierarchies

2021-10-21 Thread Alexander Patrakov


New submission from Alexander Patrakov :

It is possible to create deep directory hierarchies that cannot be removed via 
shutil.rmtree or walked via os.walk, because these functions exceed the 
interpreter recursion limit. This may have security implications for web 
services (e.g. various webdisks) that have to clean up user-created mess or 
walk through it.

[aep@aep-haswell ~]$ mkdir /tmp/badstuff
[aep@aep-haswell ~]$ cd /tmp/badstuff
[aep@aep-haswell badstuff]$ for x in `seq 2048` ; do mkdir $x ; cd $x ; done
[aep@aep-haswell 103]$ cd
[aep@aep-haswell ~]$ python
Python 3.9.7 (default, Oct 10 2021, 15:13:22) 
[GCC 11.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import shutil
>>> shutil.rmtree('/tmp/badstuff')
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib/python3.9/shutil.py", line 726, in rmtree
_rmtree_safe_fd(fd, path, onerror)
  File "/usr/lib/python3.9/shutil.py", line 663, in _rmtree_safe_fd
_rmtree_safe_fd(dirfd, fullname, onerror)
  File "/usr/lib/python3.9/shutil.py", line 663, in _rmtree_safe_fd
_rmtree_safe_fd(dirfd, fullname, onerror)
  File "/usr/lib/python3.9/shutil.py", line 663, in _rmtree_safe_fd
_rmtree_safe_fd(dirfd, fullname, onerror)
  [Previous line repeated 992 more times]
  File "/usr/lib/python3.9/shutil.py", line 642, in _rmtree_safe_fd
fullname = os.path.join(path, entry.name)
  File "/usr/lib/python3.9/posixpath.py", line 77, in join
sep = _get_sep(a)
  File "/usr/lib/python3.9/posixpath.py", line 42, in _get_sep
    if isinstance(path, bytes):
RecursionError: maximum recursion depth exceeded while calling a Python object
>>> import os
>>> list(os.walk('/tmp/badstuff'))
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib/python3.9/os.py", line 418, in _walk
yield from _walk(new_path, topdown, onerror, followlinks)
  File "/usr/lib/python3.9/os.py", line 418, in _walk
yield from _walk(new_path, topdown, onerror, followlinks)
  File "/usr/lib/python3.9/os.py", line 418, in _walk
yield from _walk(new_path, topdown, onerror, followlinks)
  [Previous line repeated 993 more times]
  File "/usr/lib/python3.9/os.py", line 412, in _walk
new_path = join(top, dirname)
  File "/usr/lib/python3.9/posixpath.py", line 77, in join
sep = _get_sep(a)
  File "/usr/lib/python3.9/posixpath.py", line 42, in _get_sep
if isinstance(path, bytes):
RecursionError: maximum recursion depth exceeded while calling a Python object
>>>

--
components: Library (Lib)
messages: 404687
nosy: Alexander.Patrakov
priority: normal
severity: normal
status: open
title: shutil.rmtree and os.walk are implemented using recursion, fail on deep 
hierarchies
type: crash
versions: Python 3.9

___
Python tracker 
<https://bugs.python.org/issue45564>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26545] [doc] os.walk is limited by python's recursion limit

2021-08-18 Thread Ryan Mast (nightlark)


Change by Ryan Mast (nightlark) :


--
nosy: +rmast

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26545] [doc] os.walk is limited by python's recursion limit

2021-06-22 Thread Irit Katriel


Change by Irit Katriel :


--
keywords: +easy
title: os.walk is limited by python's recursion limit -> [doc] os.walk is 
limited by python's recursion limit
versions: +Python 3.10, Python 3.11, Python 3.9 -Python 3.5

___
Python tracker 
<https://bugs.python.org/issue26545>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44008] os.walk and other directory traversal does not handle recursive mounts on Windows

2021-05-02 Thread Eryk Sun


Change by Eryk Sun :


--
resolution:  -> duplicate
stage:  -> resolved
status: open -> closed
superseder:  -> os.walk always follows Windows junctions

___
Python tracker 
<https://bugs.python.org/issue44008>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23407] os.walk always follows Windows junctions

2021-05-02 Thread Eryk Sun


Eryk Sun  added the comment:

Windows implements filesystem symlinks and mountpoints as name-surrogate 
reparse points. Python 3.8 introduced behavior changes to how reparse points 
are supported, but the stat st_mode value still sets S_IFLNK only for actual 
symlinks, not for mountpoints. This ensures that if os.path.islink() is true, 
it's safe to read its target and copy it via os.readlink() and os.symlink().

A mountpoint is not equivalent to a symlink in a few cases, so it shouldn't 
always be handled the same or copied as a symlink. The major difference is that 
mountpoints in a remote path are evaluated by the server, whereas symlinks in a 
remote path are evaluated by the client. Also, during path parsing, the target 
of a symlink replaces the opened path, but mountpoints are retained in the 
opened path (except if the target path contains a symlink, but that's broken in 
remote paths and should be avoided). This means that relative ".." components 
and rooted paths in a relative symlink target will traverse a mountpoint as if 
it's just a directory in the opened path. That's an important distinction, but 
in practice I'd steer someone away from relying on it, especially if a 
filesystem is mounted in multiple locations (e.g. on both a DOS drive and a 
directory), else resolution of the symlink will depend on which mountpoint is 
used.

It's best to handle mountpoints as if they're symlinks when deleting a tree 
because the way they're implemented as reparse points doesn't prevent loops. 
However, when walking a tree, you may or may not want to traverse a mountpoint. 
If it's traversed, a seen set() can be used to remember previously traversed 
directories, in order to prevent loops. As Steve mentioned, look to the 
implementation of shutil.rmtree() as an example. 

However, don't look to shutil.copytree() since it's wrong. The is_symlink() 
method of a scandir() entry is only true for an actual symlink, not a 
mountpoint, so the extra check that copytree() does is redundant. I think it 
was left in by mistake when the plan was to handle mountpoints as symlinks. It 
would be nice if we could copy a mountpoint instead of traversing it in 
copytree(), but the private implementation of _winapi.CreateJunction() isn't 
well-behaved and tested enough to be promoted into the standard library as 
something like os.mount().

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23407] os.walk always follows Windows junctions

2021-05-02 Thread Eryk Sun


Change by Eryk Sun :


--
Removed message: https://bugs.python.org/msg389286

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44008] os.walk and other directory traversal does not handle recursive mounts on Windows

2021-05-01 Thread R0b0t1


New submission from R0b0t1 :

Using `os.walk` to traverse a filesystem on Windows does not terminate in the 
case of a recursive mountpoint existing somewhere in the path.

In my case C:\circlemount is linked to C:\, producing paths such as 
C:\circlemount\circlemount\circlemount\circlemount\...

A drive mount point may be set up as follows:

```diskpart (enters shell)
list volume`
select volume ${#}`
assign mount=${path}
```

Notably this only happens for Win32 python. Cygwin and MSYS2 pythons as well as 
the pythons distributed with some packages like Inkscape behave properly.

--
components: Windows
messages: 392666
nosy: R0b0t1, paul.moore, steve.dower, tim.golden, zach.ware
priority: normal
severity: normal
status: open
title: os.walk and other directory traversal does not handle recursive mounts 
on Windows
type: behavior
versions: Python 3.8

___
Python tracker 
<https://bugs.python.org/issue44008>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23407] os.walk always follows Windows junctions

2021-03-22 Thread Eryk Sun


Eryk Sun  added the comment:

Python 3.8 introduced some behavior changes to how reparse points are 
supported, but generalized support for handling name-surrogate reparse points 
as symlinks was not implemented. Python continues to set S_IFLNK in st_mode 
only for IO_REPARSE_TAG_SYMLINK reparse points. This ensures that if 
os.path.islink() is true, the link can be read and copied exactly via 
os.readlink() and os.symlink(). Otherwise, islink() could be true but 
readlink() will fail or symlink() will be used to mistakenly copy a mountpoint 
as a symlink. 

A mountpoint is not equivalent to a symlink in a few cases. The major 
difference is that mountpoints are evaluated on the server side in a remote 
path, targeting devices on the server, whereas symlinks are evaluated on the 
client side, targeting devices on the client (e.g. its "C:" drive) and are 
subject to the client system's L2R (local to remote), L2L, R2L, and R2R symlink 
policy. Replacing a mountpoint with a symlink means that, at best, the path 
will no longer work when accessed remotely, and at worst the client will allow 
resolving the target locally to something that's dangerously wrong.

Another difference is how the kernel handles mounpoints when opening a path. 
The target of a mountpoint does not replace the previously traversed path 
components in the opened path, whereas the target path of a symlink does 
replace the opened path. The previously traversed path matters when the kernel 
resolves ".." components in the target of a relative symlink. For example, a 
relative symlink that traverses up the tree with ".." components may have been 
tested on a traversed directory, which worked fine. Then later the directory 
was replaced with a mountpoint (junction) for compatibility, which continued to 
work fine. But after a CopyTree() that naively replaces the mountpoint with a 
symlink, the copied relative symlink is either broken, or worse, it resolves to 
a target that's dangerously wrong.

A generalization of the readlink() and symlink() combination could be 
implemented to copy any type of name-surrogate reparse point. If Python had 
something like that, then it could reasonably support any name-surrogate 
reparse point as a "symlink". That's not without problems, considering the 
behavior isn't the same and APIs and other applications may only support 
IO_REPARSE_TAG_SYMLINK in various cases, but sometimes perfect is the enemy of 
good.

That said, os.walk() can still special case mountpoints and other 
name-surrogate reparse points. To support cases like this, the lstat() result 
was extended to include the st_reparse_tag value of name-surrogate reparse 
points. The stat module has the IO_REPARSE_TAG_SYMLINK and 
IO_REPARSE_TAG_MOUNT_POINT constants. A simple function that checks for a 
name-surrogate reparse point could be added as well -- i.e. bool(reparse_tag & 
0x2000).

---

Using st_reparse_tag to abstract checking the file type is awkward. I wanted to 
support a keyword-only parameter in Windows to expand the 'symlink' domain to 
include all name-surrogate reparse points. This parameter would have been added 
to os.[l]stat(), DirEntry.stat(), DirEntry.is_dir(), and DirEntry.is_file(), as 
well as os.path.islink() and DirEntry.is_symlink(). By default only 
IO_REPARSE_TAG_SYMLINK would have been handled as a symlink. But this idea 
wasn't accepted. Instead, custom checks have to be implemented whenever a 
problem needs the expanded 'symlink' domain.

--
versions: +Python 3.10, Python 3.8, Python 3.9 -Python 3.7

___
Python tracker 
<https://bugs.python.org/issue23407>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39567] Add audit for os.walk(), os.fwalk(), Path.glob() and Path.rglob()

2020-10-19 Thread Saiyang Gou


Saiyang Gou  added the comment:

Should we backport this to 3.8? I believe that we should either backport this 
to 3.8 or document that these audit events are new in 3.9.

--
nosy: +gousaiyang

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40160] documentation example of os.walk should be less destructive

2020-05-24 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

1. There is already a warning before example.
2. Even if you blindly copy-paste the example it will not work. You have to set 
the top variable.

So I don't see any problem here. You always can shoot yourself in the foot if 
try enough.

--
resolution:  -> not a bug
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40160] documentation example of os.walk should be less destructive

2020-05-23 Thread Kyle Stanley


Kyle Stanley  added the comment:

> I made the suggested change to just print the os.remove() statements (instead 
> of executing them) and also removed the 'skip news'.

I think you may have misunderstood the suggestion. 

Specifically, the key part was "I would suggest adding succinct comments or a 
note that very briefly explains how one could see a visual demonstration...". 
This would mean the actual code in the example would be *unchanged*, but with a 
new code comment or separate note after the example that explains how one could 
replace ``os.remove(os.path.join(root, name))`` with 
``print(f"os.remove({os.path.join(root, name)})")`` for a purely visual 
demonstration that doesn't affect any local files.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40160] documentation example of os.walk should be less destructive

2020-04-03 Thread John Taylor


John Taylor  added the comment:

I made the suggested change to just print the os.remove() statements (instead 
of executing them) and also removed the 'skip news'.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40160] documentation example of os.walk should be less destructive

2020-04-03 Thread Kyle Stanley


Kyle Stanley  added the comment:

Serhiy Storchaka wrote:
> I do not think there is clearer example of topdown=False than recursive 
> remove.
>
> If you think that this example is destructive, consider how destructive is 
> any possible example for shutil.rmtree()!

I concur with Serhiy. If the example is changed to just print out the file and 
directory path, as the PR proposes to do, it seems to defeat the purpose of 
using `topdown=False` (and the code example) in the first place.

If anything, I would suggest adding succinct comments or a note that very 
briefly explains how one could see a visual demonstration of what it does 
without removing any files or directories. For example: 
``print(f"os.remove({os.path.join(root, name)})")``.

--
nosy: +aeros

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40160] documentation example of os.walk should be less destructive

2020-04-03 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

I do not think there is clearer example of topdown=False than recursive remove.

If you think that this example is destructive, consider how destructive is any 
possible example for shutil.rmtree()!

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40160] documentation example of os.walk should be less destructive

2020-04-03 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

One possibility is a gathering cumulative directory statistics that include 
totals from all descendants (i.e. how many bytes of files would you save by 
removing the directory with rm -rf).

Outside of aggregating statistics, the normal reason to use topdown=False is 
when paths are being mutated.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40160] documentation example of os.walk should be less destructive

2020-04-03 Thread John Taylor


John Taylor  added the comment:

I would prefer an example that does not actually modify the file system.  Is 
there any way this could be achieved, yet still demonstrate why topdown=False 
is necessary?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40160] documentation example of os.walk should be less destructive

2020-04-02 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

The proposed replacement doesn't succeed in demonstrating why topdown=False is 
necessary.  Consider doing a rename instead of a deletion or print.

--
nosy: +rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40160] documentation example of os.walk should be less destructive

2020-04-02 Thread John Taylor


John Taylor  added the comment:

https://github.com/python/cpython/pull/19313

I have just signed the CLA.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40160] documentation example of os.walk should be less destructive

2020-04-02 Thread Roundup Robot


Change by Roundup Robot :


--
keywords: +patch
nosy: +python-dev
nosy_count: 2.0 -> 3.0
pull_requests: +18678
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/19313

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40160] documentation example of os.walk should be less destructive

2020-04-02 Thread John Taylor


New submission from John Taylor :

The example for os.walkdir should be less destructive.  It currently 
recursively removes all files and directories.  I will be submitting a PR on 
GitHub.

--
assignee: docs@python
components: Documentation
messages: 365625
nosy: docs@python, jftuga
priority: normal
severity: normal
status: open
title: documentation example of os.walk should be less destructive
versions: Python 3.8

___
Python tracker 
<https://bugs.python.org/issue40160>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39567] Add audit for os.walk(), os.fwalk(), Path.glob() and Path.rglob()

2020-03-08 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39567] Add audit for os.walk(), os.fwalk(), Path.glob() and Path.rglob()

2020-03-08 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:


New changeset db283b32e7580741a8b6b7f27f616cc656634750 by Serhiy Storchaka in 
branch 'master':
bpo-39567: Document audit for os.walk, os.fwalk, Path.glob and Path.rglob. 
(GH-18499)
https://github.com/python/cpython/commit/db283b32e7580741a8b6b7f27f616cc656634750


--

___
Python tracker 
<https://bugs.python.org/issue39567>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39567] Add audit for os.walk(), os.fwalk(), Path.glob() and Path.rglob()

2020-02-12 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

Oh, I did not know about the audit-event directive. Thanks Karthikeyan.

As for backporting this to 3.8 I left it on to the RM.

--
nosy: +lukasz.langa
resolution: fixed -> 
stage: resolved -> patch review
status: closed -> open

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39567] Add audit for os.walk(), os.fwalk(), Path.glob() and Path.rglob()

2020-02-12 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
pull_requests: +17873
pull_request: https://github.com/python/cpython/pull/18499

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39567] Add audit for os.walk(), os.fwalk(), Path.glob() and Path.rglob()

2020-02-12 Thread Steve Dower


Steve Dower  added the comment:

Yes, they should be.

They can also be backported to 3.8 - it was deliberate in the original PEP that 
adding new events is not a breaking change or a new feature.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39567] Add audit for os.walk(), os.fwalk(), Path.glob() and Path.rglob()

2020-02-12 Thread Karthikeyan Singaravelan


Karthikeyan Singaravelan  added the comment:

Do these new audit events have to be documented?

--
nosy: +xtreak

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39567] Add audit for os.walk(), os.fwalk(), Path.glob() and Path.rglob()

2020-02-12 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39567] Add audit for os.walk(), os.fwalk(), Path.glob() and Path.rglob()

2020-02-12 Thread miss-islington


Change by miss-islington :


--
pull_requests: +17848
pull_request: https://github.com/python/cpython/pull/18478

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39567] Add audit for os.walk(), os.fwalk(), Path.glob() and Path.rglob()

2020-02-12 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:


New changeset f4f445b69306c68a2ba8ce8eb8c6cb3064db5fe7 by Serhiy Storchaka in 
branch 'master':
bpo-39567: Add audit for os.walk(), os.fwalk(), Path.glob() and Path.rglob(). 
(GH-18372)
https://github.com/python/cpython/commit/f4f445b69306c68a2ba8ce8eb8c6cb3064db5fe7


--

___
Python tracker 
<https://bugs.python.org/issue39567>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39567] Add audit for os.walk(), os.fwalk(), Path.glob() and Path.rglob()

2020-02-06 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
keywords: +patch
pull_requests: +17748
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/18372

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39567] Add audit for os.walk(), os.fwalk(), Path.glob() and Path.rglob()

2020-02-06 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
title: Add audit for os.walk, os.fwalk, Path.glob() and Path.rglob() -> Add 
audit for os.walk(), os.fwalk(), Path.glob() and Path.rglob()

___
Python tracker 
<https://bugs.python.org/issue39567>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39567] Add audit for os.walk, os.fwalk, Path.glob() and Path.rglob()

2020-02-06 Thread Serhiy Storchaka


New submission from Serhiy Storchaka :

See issue38149.

There is an audit for os.scandir(), but it would be useful to have information 
about higher-level operations.

--
components: Library (Lib)
messages: 361472
nosy: serhiy.storchaka, steve.dower
priority: normal
severity: normal
status: open
title: Add audit for os.walk, os.fwalk, Path.glob() and Path.rglob()
type: enhancement
versions: Python 3.9

___
Python tracker 
<https://bugs.python.org/issue39567>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39246] shutil.rmtree is inefficient because of using os.scandir instead of os.walk

2020-01-08 Thread Felipe A. Hernandez


Felipe A. Hernandez  added the comment:

After some tests, due the accumulating nature of fwalk, I've just realised it's 
not very safe for big directories, so I'll be closing this issue.

Alternatively, using py37+ fd based scandir, and dir_fd unlink and rmdir calls 
would reduce complexity while improving safety.

Sorry for the noise.

--
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39246] shutil.rmtree is inefficient because of using os.scandir instead of os.walk

2020-01-07 Thread Felipe A. Hernandez


New submission from Felipe A. Hernandez :

os.rmtree has fd-based symlink replacement protection when iterating with 
scandir (after bpo-28564).

This logic could be greatly simplified simply by os.fwalk in supported 
platforms, which already implements a similar (maybe safer) protection.

--
components: Library (Lib)
messages: 359512
nosy: Felipe A. Hernandez
priority: normal
severity: normal
status: open
title: shutil.rmtree is inefficient because of using os.scandir instead of 
os.walk
versions: Python 3.5, Python 3.6, Python 3.7, Python 3.8, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue39246>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: urllib unqoute providing string mismatch between string found using os.walk (Python3)

2019-12-22 Thread Ben Hearn
On Saturday, 21 December 2019 21:46:43 UTC, Ben Hearn  wrote:
> Hello all,
> 
> I am having a bit of trouble with a string mismatch operation in my tool I am 
> writing.
> 
> I am comparing a database collection or url quoted paths to the paths on the 
> users drive.
> 
> These 2 paths look identical, one from the drive & the other from an xml url:
> a = '/Users/macbookpro/Music/tracks_new/_NS_2018/J.Staaf - ¡Móchate! 
> _PromoMix_.wav'
> b = '/Users/macbookpro/Music/tracks_new/_NS_2018/J.Staaf - ¡Móchate! 
> _PromoMix_.wav'
> 
> But after realising it was failing on them I ran a difflib and these 
> differences popped up.
> 
> import difflib
> print('\n'.join(difflib.ndiff([a], [b])))
> - /Users/macbookpro/Music/tracks_new/_NS_2018/J.Staaf - ¡Móchate! 
> _PromoMix_.wav
> ? 
>  ^^
> + /Users/macbookpro/Music/tracks_new/_NS_2018/J.Staaf - ¡Móchate! 
> _PromoMix_.wav
> ? 
>  ^
> 
> 
> What am I missing when it comes to unquoting the string, or should I do some 
> other fancy operation on the drive string?
> 
> Cheers,
> 
> Ben

Thanks for the help on this, normalizing my string fixed up the problem cheers!
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: urllib unqoute providing string mismatch between string found using os.walk (Python3)

2019-12-21 Thread Richard Damon
On 12/21/19 8:25 PM, MRAB wrote:
> On 2019-12-22 00:22, Michael Torrie wrote:
>> On 12/21/19 2:46 PM, Ben Hearn wrote:
>>> These 2 paths look identical, one from the drive & the other from an
>>> xml url:
>>> a = '/Users/macbookpro/Music/tracks_new/_NS_2018/J.Staaf -
>>> ¡Móchate! _PromoMix_.wav'
>>     ^^
>>> b = '/Users/macbookpro/Music/tracks_new/_NS_2018/J.Staaf - ¡Móchate!
>>> _PromoMix_.wav'
>>     ^^
>> They are actually are different strings.  The name is spelled
>> differently between the two.  Móchate vs Móchate (the former seems to
>> be the correct spelling according to my inline spell checker).  Is this
>> from your own program? I wonder how it got switched?
>>
> Use the 'ascii' function to see what's different:
>
> >>> ascii(a)
> "'/Users/macbookpro/Music/tracks_new/_NS_2018/J.Staaf -
> \\xa1Mo\\u0301chate! _PromoMix_.wav'"
> >>> ascii(b)
> "'/Users/macbookpro/Music/tracks_new/_NS_2018/J.Staaf -
> \\xa1M\\xf3chate! _PromoMix_.wav'"
> >>>

It is a Unicode Normalization issue. A number of characters can be
'spelled' different ways.

ó can be either a single codepoint U+00F3, or it can be the pair of
codepoints, the o and U+0301 (the accent).

If you want to make the strings compare equal then you need to make sure
that you have normalized both strings the same way. I beleive that the
Mac OS always converts file names into the NFD format when it uses them
(that is what the first (a) string is in)

-- 
Richard Damon

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: urllib unqoute providing string mismatch between string found using os.walk (Python3)

2019-12-21 Thread MRAB

On 2019-12-22 00:22, Michael Torrie wrote:

On 12/21/19 2:46 PM, Ben Hearn wrote:

These 2 paths look identical, one from the drive & the other from an xml url:
a = '/Users/macbookpro/Music/tracks_new/_NS_2018/J.Staaf - ¡Móchate! 
_PromoMix_.wav'

^^

b = '/Users/macbookpro/Music/tracks_new/_NS_2018/J.Staaf - ¡Móchate! 
_PromoMix_.wav'

^^
They are actually are different strings.  The name is spelled
differently between the two.  Móchate vs Móchate (the former seems to
be the correct spelling according to my inline spell checker).  Is this
from your own program? I wonder how it got switched?


Use the 'ascii' function to see what's different:

>>> ascii(a)
"'/Users/macbookpro/Music/tracks_new/_NS_2018/J.Staaf - 
\\xa1Mo\\u0301chate! _PromoMix_.wav'"

>>> ascii(b)
"'/Users/macbookpro/Music/tracks_new/_NS_2018/J.Staaf - 
\\xa1M\\xf3chate! _PromoMix_.wav'"

>>>
--
https://mail.python.org/mailman/listinfo/python-list


Re: urllib unqoute providing string mismatch between string found using os.walk (Python3)

2019-12-21 Thread Chris Angelico
On Sun, Dec 22, 2019 at 11:33 AM Michael Torrie  wrote:
>
> On 12/21/19 2:46 PM, Ben Hearn wrote:
> > These 2 paths look identical, one from the drive & the other from an xml 
> > url:
> > a = '/Users/macbookpro/Music/tracks_new/_NS_2018/J.Staaf - ¡Móchate! 
> > _PromoMix_.wav'
>^^
> > b = '/Users/macbookpro/Music/tracks_new/_NS_2018/J.Staaf - ¡Móchate! 
> > _PromoMix_.wav'
>^^
> They are actually are different strings.  The name is spelled
> differently between the two.  Móchate vs Móchate (the former seems to
> be the correct spelling according to my inline spell checker).  Is this
> from your own program? I wonder how it got switched?

Then your inline spell checker is flawed, because the two strings
differ only in whether a single "o with acute" or a pair "o" and
"combining acute" is used to represent the accented letter. The word
is the same either way.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: urllib unqoute providing string mismatch between string found using os.walk (Python3)

2019-12-21 Thread Michael Torrie
On 12/21/19 2:46 PM, Ben Hearn wrote:
> These 2 paths look identical, one from the drive & the other from an xml url:
> a = '/Users/macbookpro/Music/tracks_new/_NS_2018/J.Staaf - ¡Móchate! 
> _PromoMix_.wav'
   ^^
> b = '/Users/macbookpro/Music/tracks_new/_NS_2018/J.Staaf - ¡Móchate! 
> _PromoMix_.wav'
   ^^
They are actually are different strings.  The name is spelled
differently between the two.  Móchate vs Móchate (the former seems to
be the correct spelling according to my inline spell checker).  Is this
from your own program? I wonder how it got switched?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: urllib unqoute providing string mismatch between string found using os.walk (Python3)

2019-12-21 Thread Dan Sommers

On 12/21/19 4:46 PM, Ben Hearn wrote:


import difflib
print('\n'.join(difflib.ndiff([a], [b])))
- /Users/macbookpro/Music/tracks_new/_NS_2018/J.Staaf - ¡Móchate! 
_PromoMix_.wav
?   
   ^^
+ /Users/macbookpro/Music/tracks_new/_NS_2018/J.Staaf - ¡Móchate! _PromoMix_.wav
?   
   ^


What am I missing when it comes to unquoting the string, or should I do some 
other fancy operation on the drive string?


I'm going to guess that the trailing characters are newline
(U+0010) and/or carriage return (U+001D) characters.  Use
str.strip() to remove them before comparing:

a = a.strip()

Dan
--
https://mail.python.org/mailman/listinfo/python-list


Re: urllib unqoute providing string mismatch between string found using os.walk (Python3)

2019-12-21 Thread Pieter van Oostrum
Ben Hearn  writes:

> Hello all,
>
> I am having a bit of trouble with a string mismatch operation in my tool I am 
> writing.
>
> I am comparing a database collection or url quoted paths to the paths on the 
> users drive.
>
> These 2 paths look identical, one from the drive & the other from an xml url:
> a = '/Users/macbookpro/Music/tracks_new/_NS_2018/J.Staaf - ¡Móchate! 
> _PromoMix_.wav'
> b = '/Users/macbookpro/Music/tracks_new/_NS_2018/J.Staaf - ¡Móchate! 
> _PromoMix_.wav'
>
> But after realising it was failing on them I ran a difflib and these 
> differences popped up.
>
> import difflib
> print('\n'.join(difflib.ndiff([a], [b])))
> - /Users/macbookpro/Music/tracks_new/_NS_2018/J.Staaf - ¡Móchate! 
> _PromoMix_.wav
> ? ^^
> + /Users/macbookpro/Music/tracks_new/_NS_2018/J.Staaf - ¡Móchate! 
> _PromoMix_.wav
> ? ^
>
>
> What am I missing when it comes to unquoting the string, or should I do
> some other fancy operation on the drive string?
>

In [8]: len(a)
Out[8]: 79

In [9]: len(b)
Out[9]: 78

The difference is in the ó. In (b) it is a single character, Unicode 0xF3,
LATIN SMALL LETTER O WITH ACUTE.
In (a) it is composed of the letter o and the accent "́" (Unicode 0x301).
So you would have to do Unicode normalisation before comparing.

For example:

In [16]: from unicodedata import normalize

In [17]: a == b
Out[17]: False

In [18]: normalize('NFC', a) == normalize('NFC', b)
Out[18]: True

-- 
Pieter van Oostrum
www: http://pieter.vanoostrum.org/
PGP key: [8DAE142BE17999C4]
-- 
https://mail.python.org/mailman/listinfo/python-list


urllib unqoute providing string mismatch between string found using os.walk (Python3)

2019-12-21 Thread Ben Hearn
Hello all,

I am having a bit of trouble with a string mismatch operation in my tool I am 
writing.

I am comparing a database collection or url quoted paths to the paths on the 
users drive.

These 2 paths look identical, one from the drive & the other from an xml url:
a = '/Users/macbookpro/Music/tracks_new/_NS_2018/J.Staaf - ¡Móchate! 
_PromoMix_.wav'
b = '/Users/macbookpro/Music/tracks_new/_NS_2018/J.Staaf - ¡Móchate! 
_PromoMix_.wav'

But after realising it was failing on them I ran a difflib and these 
differences popped up.

import difflib
print('\n'.join(difflib.ndiff([a], [b])))
- /Users/macbookpro/Music/tracks_new/_NS_2018/J.Staaf - ¡Móchate! 
_PromoMix_.wav
?   
   ^^
+ /Users/macbookpro/Music/tracks_new/_NS_2018/J.Staaf - ¡Móchate! _PromoMix_.wav
?   
   ^


What am I missing when it comes to unquoting the string, or should I do some 
other fancy operation on the drive string?

Cheers,

Ben
-- 
https://mail.python.org/mailman/listinfo/python-list


[issue23407] os.walk always follows Windows junctions

2019-11-18 Thread Steve Dower


Steve Dower  added the comment:

At a minimum, it needs to be turned into a GitHub PR.

We've made some significant changes in this area in 3.8, so possibly the best 
available code is now in shutil.rmtree (or shutil.copytree) rather than the 
older patch files.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23407] os.walk always follows Windows junctions

2019-11-15 Thread Jim Carroll


Jim Carroll  added the comment:

I can confirm the os.walk() behavior still exists on 3.8. Just curious on the 
status of the patch?

--
nosy: +jamercee

___
Python tracker 
<https://bugs.python.org/issue23407>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25461] Unclear language (the word ineffective) in the documentation for os.walk

2019-09-10 Thread miss-islington


miss-islington  added the comment:


New changeset d94b762ce824e97c441f9231f0e69ef8f59beeab by Miss Islington (bot) 
in branch '3.7':
closes bpo-25461: Update os.walk() docstring to match the online docs. 
(GH-11836)
https://github.com/python/cpython/commit/d94b762ce824e97c441f9231f0e69ef8f59beeab


--

___
Python tracker 
<https://bugs.python.org/issue25461>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25461] Unclear language (the word ineffective) in the documentation for os.walk

2019-09-10 Thread miss-islington


miss-islington  added the comment:


New changeset c43f26eca35f22707a52fb8f3fbfc9340639b58d by Miss Islington (bot) 
in branch '3.8':
closes bpo-25461: Update os.walk() docstring to match the online docs. 
(GH-11836)
https://github.com/python/cpython/commit/c43f26eca35f22707a52fb8f3fbfc9340639b58d


--
nosy: +miss-islington

___
Python tracker 
<https://bugs.python.org/issue25461>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25461] Unclear language (the word ineffective) in the documentation for os.walk

2019-09-10 Thread Benjamin Peterson

Benjamin Peterson  added the comment:


New changeset 734f1202a50641eb2c4bfbcd5b75247c1dc99a8f by Benjamin Peterson 
(Bernt Røskar Brenna) in branch 'master':
closes bpo-25461: Update os.walk() docstring to match the online docs. 
(GH-11836)
https://github.com/python/cpython/commit/734f1202a50641eb2c4bfbcd5b75247c1dc99a8f


--
nosy: +benjamin.peterson
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue25461>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25461] Unclear language (the word ineffective) in the documentation for os.walk

2019-09-10 Thread miss-islington


Change by miss-islington :


--
pull_requests: +15490
pull_request: https://github.com/python/cpython/pull/15844

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25461] Unclear language (the word ineffective) in the documentation for os.walk

2019-09-10 Thread miss-islington


Change by miss-islington :


--
pull_requests: +15489
pull_request: https://github.com/python/cpython/pull/15843

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37462] Default starting directory for os.walk()

2019-07-01 Thread Kyle Stanley


Kyle Stanley  added the comment:

Oh okay thanks taleinat. I wasn't aware of the existence of the ideas mailing 
list, I'll have to check that out next time.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37462] Default starting directory for os.walk()

2019-07-01 Thread Tal Einat


Tal Einat  added the comment:

Hi Kyle,

First off, thanks for bringing this up, and for taking our feedback so well!

> Before attempting to create any more original issues, I'll work on existing 
> issues and the suggestions of others to develop a better practical 
> understanding of the standards. 

Do not be discouraged that one of the first suggestions you've made has been 
rejected; this is normal and happens all of the time.

In the future, you could also bring up such ideas for discussion on the 
python-ideas mailing list to "test the water" before opening an issue.

> For any future issues, is it more appropriate to leave the issue status on 
> pending until the proposal is approved?

In the future, you don't have to set the status to anything special when 
creating a new issue. The default, "open", is what it should be to begin with.

--
resolution:  -> rejected

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37462] Default starting directory for os.walk()

2019-07-01 Thread aeros167


aeros167  added the comment:

For any future issues, is it more appropriate to leave the issue status on 
pending until the proposal is approved? This may not apply if the issue was 
specifically mentioned or requested by a core developer, but it may be better 
to use pending for any original issues that I propose until they have received 
feedback.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37462] Default starting directory for os.walk()

2019-07-01 Thread aeros167


aeros167  added the comment:

Oh okay, that's fair enough. I can definitely understand that assigning the 
current directory as the default does not provide a significant change in 
improved functionality, and it is not implicit that os.walk() would use the 
current directory as the default option. Even if this change was approved, it 
is likely that many would never notice a difference. After reading the 
criticisms, I can also understand that the value of this feature would be too 
niche and does not justify the maintenance. It may also lead to some 
unnecessary confusion. 

Thanks for the constructive feedback serhiy and taleinat. I'm quite new to 
making contributions to Python (only a couple of minor merged PRs) and working 
on open source projects in general. This was my first attempt at coming up with 
an original issue for Python. As a result, I'm not entirely certain as to what 
qualifies as being adequately valuable enough to justify creating an issue. I 
have some basic understanding from others that I've looked over, but it seems 
like it might be a bit of a trial by error process. 

Before attempting to create any more original issues, I'll work on existing 
issues and the suggestions of others to develop a better practical 
understanding of the standards. 

I'll close this issue and the pull request, but I'll be sure to read over any 
other responses that are made. I definitely appreciate the criticism.

--
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue37462>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37462] Default starting directory for os.walk()

2019-07-01 Thread Tal Einat


Tal Einat  added the comment:

I agree with Serhiy: In this case, it seems to me that "explicit is better than 
implicit" should be the guiding principle.

--
nosy: +taleinat

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37462] Default starting directory for os.walk()

2019-07-01 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

It is not obvious that the majority of os.walk() calls are with the current 
working directory. And if you need to walk from the current working directory, 
it is not hard to pass "." explicitly. Even if your program contains a lot of 
os.walk(".") you could not use this feature until drop support of Python 3.8.

So I think the value of this feature is tiny. It is not worth the effort for 
implementing, documenting, reviewing and maintaining it. On other hand, it 
increases the burden for learning Python.

--
nosy: +serhiy.storchaka

___
Python tracker 
<https://bugs.python.org/issue37462>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37462] Default starting directory for os.walk()

2019-07-01 Thread aeros167


aeros167  added the comment:

Created a new PR which sets the default value of top to the current working 
directory. (https://github.com/python/cpython/pull/14497)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37462] Default starting directory for os.walk()

2019-07-01 Thread aeros167


Change by aeros167 :


--
keywords: +patch
pull_requests: +14312
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/14497

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37462] Default starting directory for os.walk()

2019-07-01 Thread aeros167


New submission from aeros167 :

As a quality of life enhancement, it would beneficial to set a default starting 
directory for os.walk() to use when one is not specified. I would suggest for 
this to be the system's current working directory. 

The implementation would be rather straightforward, and should not cause any 
compatibility issues. In the function definition for walk in os.py, it would 
simply need to be changed from "def walk(top, topdown=True, onerror=None, 
followlinks=False)" to "def walk(top=".", topdown=True, onerror=None, 
followlinks=False)".

As a separate topic but relevant to os.walk(), the current name of the 
positional argument which specifies the starting directory is currently called 
"top". However, a more descriptive word for the argument would probably be 
"root", "dir", "directory", or "start". The term "top" is quite generic and is 
not used to describe a filesystem position. 

Personally, I'm the most in favor of "root" as it describes the highest parent 
node of a tree, and provides an accurate description of its purpose.  However, 
I will hold off on adding a PR for this change until some feedback is added, as 
it is not essential to the initial issue.

Changing the name of argument could cause potential compatibility issues if the 
name is specified when using os.walk(), such as in "os.walk(top="dirpath")". 
However, doing so would be significantly unconventional. In the majority of 
common use cases, os.walk() only uses the first argument, so there is no need 
to specify the keyword. Even when more than one is used, usually the first 
argument is specified by position rather than keyword.

--
components: Library (Lib)
messages: 346961
nosy: aeros167
priority: normal
severity: normal
status: open
title: Default starting directory for os.walk()
type: enhancement
versions: Python 3.9

___
Python tracker 
<https://bugs.python.org/issue37462>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36771] Feature Request: An option to os.walk() to return os.DirEntry lists instead of just filenames/dirnames

2019-05-06 Thread CJ Kucera


CJ Kucera  added the comment:

Will do, thanks for the input!

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36771] Feature Request: An option to os.walk() to return os.DirEntry lists instead of just filenames/dirnames

2019-05-06 Thread Stéphane Wirtel

Stéphane Wirtel  added the comment:

Thank you for your contribution but as discussed in this issue, we prefer to 
have a new function and not a new boolean flag.

Here is my suggestions.

* Create an other PR where you add a new function
* Change the title of this issue.

Thank you

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36771] Feature Request: An option to os.walk() to return os.DirEntry lists instead of just filenames/dirnames

2019-05-06 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

I concur with Stéphane. The boolean option for changing the type of the 
returned value is considered a bad design.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36771] Feature Request: An option to os.walk() to return os.DirEntry lists instead of just filenames/dirnames

2019-05-06 Thread Stéphane Wirtel

Stéphane Wirtel  added the comment:

I would like to have the advices of Serhiy.

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36771] Feature Request: An option to os.walk() to return os.DirEntry lists instead of just filenames/dirnames

2019-05-06 Thread Stéphane Wirtel

Stéphane Wirtel  added the comment:

And another solution, you could use the pathlib.Path class for your project.

Have a nice day,

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36771] Feature Request: An option to os.walk() to return os.DirEntry lists instead of just filenames/dirnames

2019-05-01 Thread CJ Kucera


CJ Kucera  added the comment:

Yeah, I'd wondered that too (re: a separate function) but it seemed like an 
awful lot of duplicated code.  The PR I'd put through just changes the 
datatypes within the `filenames` and `dirnames` lists...  I'd been thinking 
that'd be sufficient since you wouldn't get that without specifying the 
optional boolean, but perhaps not.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36771] Feature Request: An option to os.walk() to return os.DirEntry lists instead of just filenames/dirnames

2019-05-01 Thread Stéphane Wirtel

Stéphane Wirtel  added the comment:

Hi,

I think you have to create a new function and not to modify the current
os.walk(), just because you change the type of the returned value.

We have to avoid the inconsistency for the caller of os.walk(). is it a
list of DirEntry or another list?

--
nosy: +matrixise

___
Python tracker 
<https://bugs.python.org/issue36771>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36771] Feature Request: An option to os.walk() to return os.DirEntry lists instead of just filenames/dirnames

2019-05-01 Thread CJ Kucera


CJ Kucera  added the comment:

I've started up a Github PR for this, btw, though IMO it's not really in a 
mergeable state yet:

1) I wasn't sure what to do about os.fwalk(), since that *doesn't* already 
generate DirEntry objects, and this change would introduce a small 
inconsistency between the two, functionalitywise.
2) Also wasn't sure what to do about unit tests, though I'll ponder that some 
more once I've got some time later.
3) The actual implementation is pretty trivial, but could perhaps be handled 
differently.

The systems's still processing my CLA signature, as well, so there's that too.  
:)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36771] Feature Request: An option to os.walk() to return os.DirEntry lists instead of just filenames/dirnames

2019-05-01 Thread Roundup Robot


Change by Roundup Robot :


--
keywords: +patch
pull_requests: +12960
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36771] Feature Request: An option to os.walk() to return os.DirEntry lists instead of just filenames/dirnames

2019-05-01 Thread CJ Kucera


New submission from CJ Kucera :

It'd be nice to have an option to os.walk which would return DirEntry objects 
in its return tuple, as opposed to just the string filenames/dirnames.  (Or 
failing that, an alternate function which does so.)  The function already uses 
os.scandir() internally, so the DirEntry objects already exist -- I assume it'd 
be a pretty easy change.  At the moment, if I want to be efficient and use 
os.scandir() myself, I've got to basically reimplement os.walk(), which seems 
silly since os.walk is already calling scandir itself.

--
components: Library (Lib)
messages: 341210
nosy: apocalyptech
priority: normal
severity: normal
status: open
title: Feature Request: An option to os.walk() to return os.DirEntry lists 
instead of just filenames/dirnames
type: enhancement
versions: Python 3.7

___
Python tracker 
<https://bugs.python.org/issue36771>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25461] Unclear language (the word ineffective) in the documentation for os.walk

2019-02-13 Thread Roundup Robot


Change by Roundup Robot :


--
pull_requests: +11867

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25461] Unclear language (the word ineffective) in the documentation for os.walk

2019-02-12 Thread STINNER Victor


STINNER Victor  added the comment:

> What is the status on this? From this discussion, it looks like @vstinner 
> pushed changes to resolve this. Do we close this? If this still needs a 
> patch, then one of the patches can be reviewed in a PR on GitHub.

It seems like os_walk_doc.patch and issue25461.patch have been written after I 
pushed my changes. Would you be interested to convert these patches into a 
proper PR?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25461] Unclear language (the word ineffective) in the documentation for os.walk

2019-02-11 Thread Joannah Nanjekye


Joannah Nanjekye  added the comment:

What is the status on this? From this discussion, it looks like @vstinner 
pushed changes to resolve this. Do we close this? If this still needs a patch, 
then one of the patches can be reviewed in a PR on GitHub.

--
nosy: +nanjekyejoannah

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25461] Unclear language (the word ineffective) in the documentation for os.walk

2018-07-26 Thread Jonathan Fine


Change by Jonathan Fine :


--
nosy: +jfine2358

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33202] os.walk mentions os.listdir instead of os.scandir

2018-04-02 Thread miss-islington

miss-islington <mariatta.wijaya+miss-isling...@gmail.com> added the comment:


New changeset fa5157e0499f7afdb59e220e3f4fdbf44adb5ac8 by Miss Islington (bot) 
in branch '3.6':
closes bpo-33202: fix os.walk mentioning os.listdir instead of os.scandir 
(GH-6335)
https://github.com/python/cpython/commit/fa5157e0499f7afdb59e220e3f4fdbf44adb5ac8


--

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33202>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33202] os.walk mentions os.listdir instead of os.scandir

2018-04-02 Thread miss-islington

miss-islington <mariatta.wijaya+miss-isling...@gmail.com> added the comment:


New changeset f6d1d65803f290dfe14048f17d8125f8093a61ec by Miss Islington (bot) 
in branch '3.7':
closes bpo-33202: fix os.walk mentioning os.listdir instead of os.scandir 
(GH-6335)
https://github.com/python/cpython/commit/f6d1d65803f290dfe14048f17d8125f8093a61ec


--
nosy: +miss-islington

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33202>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33202] os.walk mentions os.listdir instead of os.scandir

2018-04-02 Thread miss-islington

Change by miss-islington :


--
pull_requests: +6071

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33202] os.walk mentions os.listdir instead of os.scandir

2018-04-02 Thread miss-islington

Change by miss-islington :


--
pull_requests: +6070

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33202] os.walk mentions os.listdir instead of os.scandir

2018-04-02 Thread Benjamin Peterson

Benjamin Peterson <benja...@python.org> added the comment:


New changeset badb8948aaa8b669c4a6f675a0bc7d98e188 by Benjamin Peterson 
(Andrés Delfino) in branch 'master':
closes bpo-33202: fix os.walk mentioning os.listdir instead of os.scandir 
(GH-6335)
https://github.com/python/cpython/commit/badb8948aaa8b669c4a6f675a0bc7d98e188


--
nosy: +benjamin.peterson
resolution:  -> fixed
stage:  -> resolved
status: open -> closed

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33202>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33202] os.walk mentions os.listdir instead of os.scandir

2018-04-01 Thread Andrés Delfino

New submission from Andrés Delfino <adelf...@gmail.com>:

Documentation states that, for walk, "errors from the listdir() call are 
ignored". That's no longer the case since 3.5. Change mention to listdir() to 
scandir().

--
assignee: docs@python
components: Documentation
messages: 314786
nosy: adelfino, docs@python
priority: normal
pull_requests: 6049
severity: normal
status: open
title: os.walk mentions os.listdir instead of os.scandir
versions: Python 3.8

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33202>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31214] os.walk has a bug on Windows

2017-08-15 Thread Chris Lovett

Chris Lovett added the comment:

Oh, my bad then. Apologies for the noise in your system.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31214] os.walk has a bug on Windows

2017-08-15 Thread R. David Murray

Changes by R. David Murray :


--
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31214] os.walk has a bug on Windows

2017-08-15 Thread Peter Otten

Peter Otten added the comment:

Read the documentation of os.walk() again. It already walks the complete 
directory tree starting with src. 

When you invoke it again by calling your copy_dir() method recursively you will 
of course see once more the files and directories in the respective 
subdirectory.

--
nosy: +peter.otten

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue31214>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31214] os.walk has a bug on Windows

2017-08-15 Thread Chris Lovett

New submission from Chris Lovett:

When I walk a directory recursively, it is tacking on an additional 
non-existant file from one of the subdirectories.  Here's the code:

def copy_dir(self, src, dest):
result = sftp.mkdir(dest)
for dirname, dirnames, filenames in os.walk(src):
for subdirname in dirnames:
print("entering dir:" + subdirname)
self.copy_dir(os.path.join(src, subdirname), os.path.join(dest, 
subdirname))
for filename in filenames:
print("copying:" + filename)

Here's the output:

entering dir:include
copying:CallbackInterface.h
copying:ClockInterface.h
entering dir:tcc
copying:CallbackInterface.tcc
copying:CMakeLists.txt
copying:darknet.i
copying:darknet.i.h
copying:darknet.obj
copying:darknet.py
copying:darknetImageNetLabels.txt
copying:darknetPYTHON_wrap.cxx
copying:darknetPYTHON_wrap.h
copying:darknet_config.json
copying:demo.py
copying:demoHelper.py
copying:OpenBLASSetup.cmake
copying:runtest.sh
copying:schoolbus.png
copying:CallbackInterface.h
copying:ClockInterface.h
copying:CallbackInterface.tcc

The last 3 files listed here doesn't exist, they are a repeat of the files 
found in the subdirectories.

--
components: Windows
messages: 300313
nosy: clovett, paul.moore, steve.dower, tim.golden, zach.ware
priority: normal
severity: normal
status: open
title: os.walk has a bug on Windows
type: behavior
versions: Python 3.6

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue31214>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26781] os.walk max_depth

2017-07-17 Thread Raymond Hettinger

Raymond Hettinger added the comment:

> I think there is a little need in this feature.

I concur with Serhiy and think we're better-off without this proposal.
Marking this as closed.

--
nosy: +rhettinger
resolution:  -> rejected
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26781] os.walk max_depth

2017-07-17 Thread André Rossi Korol

André Rossi Korol added the comment:

I proposed a new function called lwalk(level walk) that recurses only to a 
certain level of depth: http://bugs.python.org/issue30942
It is implemented in os.py and calls os.walk, but making sure it recurses only 
to a selected level of depth.
If it is accepted I could send a Pull Request with the lwalk function 
implemented in os.py.

--
hgrepos: +371
nosy: +andrekorol
Added file: http://bugs.python.org/file47020/os.py

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue26781>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25911] Regression: os.walk now using os.scandir() breaks bytes filenames on windows

2017-06-28 Thread STINNER Victor

Changes by STINNER Victor :


--
pull_requests: +2536

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: os.walk the apostrophe and unicode

2017-06-25 Thread Rod Person
On Sun, 25 Jun 2017 08:18:45 -0600
Michael Torrie <torr...@gmail.com> wrote:

> On 06/25/2017 06:19 AM, Rod Person wrote:
> > But doing a simple ls of that directory show it is unicode but the
> > replacement of the offending character.
> > 
> > http://rodperson.com/graphics/uc/ls.png  
> 
> Now that is really strange.  Your OS seems to not recognize that the
> filename is in UTF-8.  I suspect this has something to do with the NAS
> file sharing protocol (smb). Though I'm pretty sure that Samba can
> handle UTF-8 filenames correctly.
> 
> > I am in fact using Python 3.5. I may be lacking in unicode skills
> > but I do have the sense enough to know the version of Python I am
> > invoking. So I included this screenshot of that so the version of
> > Python and the files list returned by os.walk
> > 
> > http://rodperson.com/graphics/uc/files.png  
> 
> If I create a file that has the U+2019 character in it on my Linux
> machine (BtrFS), and do os.walk on it, I see the character in then
> string properly.  So it looks like Python does the right thing,
> automatically decoding from UTF-8.
> 
> In your situation I think the problem is the file sharing protocol
> that your NAS is using. Somehow some information is being lost and
> your OS does not know that the filenames are in UTF-8, and just
> thinks they are bytes. And therefore Python doesn't know to decode
> the string, so you just end up with each byte being converted to a
> unicode code point and being shoved into the unicode string.
> 
> How to get around this issue I don't know.  Maybe there's a way to
> convert the unicode string to bytes using the value of each character,
> and then decode that back to unicode.

I think you theory is on the correct path. I'm actually attached to the
NAS via NFS not samba. And just quickly looking into that it seems the
NFS server needs and option set to pass unicode correctly...but my NAS
software doesn't allow my access to settings only to turn it on or off.

Looks like my option is the original correct the file name.


-- 
Rod

http://www.rodperson.com

Who at Clitorius fountain thirst remove 
Loath Wine and, abstinent, meer Water love.

 - Ovid
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: os.walk the apostrophe and unicode

2017-06-25 Thread Peter Otten
Rod Person wrote:

> Ok...so after reading all the replies in the thread, I thought I would
> be easier to send a general reply and include some links to screenshots.
> 
> As Peter mention, the logic thing to do would be to fix the file name
> to what I actually thought it was and if this was for work that
> probably what I would have done, but since I want to understand what's
> going on I decided to waste time on that.
> 
> I have to admit, I didn't think the file system was utf-8 as seeing what
> looked to be an apostrophe sent me down the road of why is this
> apostrophe screwed up instead of "ah this must be unicode".
> 
> But doing a simple ls of that directory show it is unicode but the
> replacement of the offending character.
> 
> http://rodperson.com/graphics/uc/ls.png

Have you set LANG to something that implies ASCII?

$ touch Todd’s ähnlich üblich löblich
$ ls
ähnlich  löblich  Todd’s  üblich
$ LANG=C ls
Todd???s  l??blich  ??hnlich  ??blich
$ python3 -c 'import os; print(os.listdir())'
['Todd’s', 'üblich', 'ähnlich', 'löblich']
$ LANG=C python3 -c 'import os; print(os.listdir())'
['Todd\udce2\udc80\udc99s', '\udcc3\udcbcblich', '\udcc3\udca4hnlich', 
'l\udcc3\udcb6blich']
$ LANG=en_US.utf-8 python3 -c 'import os; print(os.listdir())'
['Todd’s', 'üblich', 'ähnlich', 'löblich']

For file names Python resorts to surrogates whenever a byte does not 
translate into a character in the advertised encoding.
 
> I am in fact using Python 3.5. I may be lacking in unicode skills but I
> do have the sense enough to know the version of Python I am invoking.

I've made so many "stupid errors" myself that I always consider them first 
;)

> So I included this screenshot of that so the version of Python and the
> files list returned by os.walk
> 
> http://rodperson.com/graphics/uc/files.png
> 
> So the fact that it shows as a string and not bytes in the debugger was
> throwing me for a loop, in my log section I was trying to determine if
> it was unicode decode it...if not don't do anything which wasn't working
> 
> http://rodperson.com/graphics/uc/log_section.png
> 
> 
> 
> 
> On Sun, 25 Jun 2017 10:47:18 +0200
> Peter Otten <__pete...@web.de> wrote:
> 
>> Steve D'Aprano wrote:
>> 
>> > On Sun, 25 Jun 2017 04:57 pm, Peter Otten wrote:
>> 
>> >> if everything worked correctly? Though I don't understand why the
>> >> OP doesn't see
>> >> 
>> >> '06 - Toddâ\x80\x99s Song (Post-Spiderland Song in Progress).flac'
>> >> 
>> >> which is the repr() that I get.
>> > 
>> > That's mojibake and is always wrong :-)
>> 
>> Yes, that's my very point.
>> 
>> > I'm not sure how you got that.
>> 
>> I took the OP's string at face value and pasted it into the
>> interpreter:
>> 
>> # python 3.4
>> >>> '06 - Todd\xe2\x80\x99s Song (Post-Spiderland Song in
>> >>> Progress).flac'
>> '06 - Toddâ\x80\x99s Song (Post-Spiderland Song in Progress).flac'
>> 
>> > Something to do with an accidental decode to Latin-1?
>> 
>> If the above filename is the only one or one of a few that seem
>> broken, and other non-ascii filenames look OK the OP's
>> toolchain/filesystem may work correctly and the odd name might have
>> been produced elsewhere, e. g. by copying an already messed-up
>> freedb.org entry.
>> 
>> [Heureka]
>> 
>> However, the most likely explanation is that the filename is correct
>> and that the OP is not using Python 3 as he claims but Python 2.
>> 
>> Yes, it took that long for me to realise ;) Python 2 is slowly
>> sinking into oblivion...
>> 
> 
> 
> 


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: os.walk the apostrophe and unicode

2017-06-25 Thread Michael Torrie
On 06/25/2017 06:19 AM, Rod Person wrote:
> But doing a simple ls of that directory show it is unicode but the
> replacement of the offending character.
> 
> http://rodperson.com/graphics/uc/ls.png

Now that is really strange.  Your OS seems to not recognize that the
filename is in UTF-8.  I suspect this has something to do with the NAS
file sharing protocol (smb). Though I'm pretty sure that Samba can
handle UTF-8 filenames correctly.

> I am in fact using Python 3.5. I may be lacking in unicode skills but I
> do have the sense enough to know the version of Python I am invoking.
> So I included this screenshot of that so the version of Python and the
> files list returned by os.walk
> 
> http://rodperson.com/graphics/uc/files.png

If I create a file that has the U+2019 character in it on my Linux
machine (BtrFS), and do os.walk on it, I see the character in then
string properly.  So it looks like Python does the right thing,
automatically decoding from UTF-8.

In your situation I think the problem is the file sharing protocol that
your NAS is using. Somehow some information is being lost and your OS
does not know that the filenames are in UTF-8, and just thinks they are
bytes. And therefore Python doesn't know to decode the string, so you
just end up with each byte being converted to a unicode code point and
being shoved into the unicode string.

How to get around this issue I don't know.  Maybe there's a way to
convert the unicode string to bytes using the value of each character,
and then decode that back to unicode.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: os.walk the apostrophe and unicode

2017-06-25 Thread Rod Person
Ok...so after reading all the replies in the thread, I thought I would
be easier to send a general reply and include some links to screenshots.

As Peter mention, the logic thing to do would be to fix the file name
to what I actually thought it was and if this was for work that
probably what I would have done, but since I want to understand what's
going on I decided to waste time on that.

I have to admit, I didn't think the file system was utf-8 as seeing what
looked to be an apostrophe sent me down the road of why is this
apostrophe screwed up instead of "ah this must be unicode".

But doing a simple ls of that directory show it is unicode but the
replacement of the offending character.

http://rodperson.com/graphics/uc/ls.png

I am in fact using Python 3.5. I may be lacking in unicode skills but I
do have the sense enough to know the version of Python I am invoking.
So I included this screenshot of that so the version of Python and the
files list returned by os.walk

http://rodperson.com/graphics/uc/files.png

So the fact that it shows as a string and not bytes in the debugger was
throwing me for a loop, in my log section I was trying to determine if
it was unicode decode it...if not don't do anything which wasn't working

http://rodperson.com/graphics/uc/log_section.png




On Sun, 25 Jun 2017 10:47:18 +0200
Peter Otten <__pete...@web.de> wrote:

> Steve D'Aprano wrote:
> 
> > On Sun, 25 Jun 2017 04:57 pm, Peter Otten wrote:  
> 
> >> if everything worked correctly? Though I don't understand why the
> >> OP doesn't see
> >> 
> >> '06 - Toddâ\x80\x99s Song (Post-Spiderland Song in Progress).flac'
> >> 
> >> which is the repr() that I get.  
> > 
> > That's mojibake and is always wrong :-)   
> 
> Yes, that's my very point. 
> 
> > I'm not sure how you got that.  
> 
> I took the OP's string at face value and pasted it into the
> interpreter:
> 
> # python 3.4
> >>> '06 - Todd\xe2\x80\x99s Song (Post-Spiderland Song in
> >>> Progress).flac'  
> '06 - Toddâ\x80\x99s Song (Post-Spiderland Song in Progress).flac'
> 
> > Something to do with an accidental decode to Latin-1?  
> 
> If the above filename is the only one or one of a few that seem
> broken, and other non-ascii filenames look OK the OP's
> toolchain/filesystem may work correctly and the odd name might have
> been produced elsewhere, e. g. by copying an already messed-up
> freedb.org entry.
> 
> [Heureka]
> 
> However, the most likely explanation is that the filename is correct
> and that the OP is not using Python 3 as he claims but Python 2.
> 
> Yes, it took that long for me to realise ;) Python 2 is slowly
> sinking into oblivion...
> 



-- 
Rod

http://www.rodperson.com
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: os.walk the apostrophe and unicode

2017-06-25 Thread alister
On Sun, 25 Jun 2017 02:23:15 -0700, wxjmfauth wrote:

> Le samedi 24 juin 2017 21:10:47 UTC+2, alister a écrit :
>> On Sat, 24 Jun 2017 14:57:21 -0400, Rod Person wrote:
>> 
>> > \xe2\x80\x99,
>> 
>> because the file name has been created using "Right single quote"
>> instead of apostrophe, the glyphs look identical in many fonts.
>> 
>> 
> Trust me. Fonts are clearly making distinction between \u0027 and
> \u2019.



Not all, and even when they do it has absolutely nothing to do with the 
point of the post
the character in the file name is \u2019 right quotation mark & not an 
apostrophe which the op was assuming.
he needs to decode the file name correctly 

-- 
You will be held hostage by a radical group.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: os.walk the apostrophe and unicode

2017-06-25 Thread Peter Otten
Steve D'Aprano wrote:

> On Sun, 25 Jun 2017 04:57 pm, Peter Otten wrote:

>> if everything worked correctly? Though I don't understand why the OP
>> doesn't see
>> 
>> '06 - Toddâ\x80\x99s Song (Post-Spiderland Song in Progress).flac'
>> 
>> which is the repr() that I get.
> 
> That's mojibake and is always wrong :-) 

Yes, that's my very point. 

> I'm not sure how you got that.

I took the OP's string at face value and pasted it into the interpreter:

# python 3.4
>>> '06 - Todd\xe2\x80\x99s Song (Post-Spiderland Song in Progress).flac'
'06 - Toddâ\x80\x99s Song (Post-Spiderland Song in Progress).flac'

> Something to do with an accidental decode to Latin-1?

If the above filename is the only one or one of a few that seem broken, and 
other non-ascii filenames look OK the OP's toolchain/filesystem may work 
correctly and the odd name might have been produced elsewhere, e. g. by 
copying an already messed-up freedb.org entry.

[Heureka]

However, the most likely explanation is that the filename is correct and 
that the OP is not using Python 3 as he claims but Python 2.

Yes, it took that long for me to realise ;) Python 2 is slowly sinking into 
oblivion...

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: os.walk the apostrophe and unicode

2017-06-25 Thread Steve D'Aprano
On Sun, 25 Jun 2017 04:57 pm, Peter Otten wrote:

> Steve D'Aprano wrote:
> 
>> On Sun, 25 Jun 2017 07:17 am, Peter Otten wrote:
>> 
>>> Then I'd fix the name manually...
>> 
>> The file name isn't broken.
>> 
>> 
>> What's broken is parts of the OP's code which assumes that non-ASCII file
>> names are broken...
> 
> Hm, the OP says
> 
> '06 - Todd\xe2\x80\x99s Song (Post-Spiderland Song in Progress).flac'
> 
> Shouldn't it be
> 
> '06 - Todd’s Song (Post-Spiderland Song in Progress).flac'

It should, if the OP did everything right.

He has a file name containing the word "Todd’s":

# Python 3.5

py> fname = 'Todd’s'
py> repr(fname)
"'Todd’s'"

On disk, that is represented in UTF-8:

py> repr(fname.encode('utf-8'))
"b'Todd\\xe2\\x80\\x99s'"

The OP appears to be using Python 2, so when he calls os.listdir() he gets the
file names as bytes, not Unicode. That means he'll see:

- the file name will be Python 2 str, which is *byte string* not text string;
- so not Unicode
- rather the individual bytes in the UTF-8 encoding of the file name.

So in Python 2.7 instead of 3.5 above:

py> fname = u'Todd’s'
py> repr(fname)
"u'Todd\\u2019s'"
py> repr(fname.encode('utf-8'))
"'Todd\\xe2\\x80\\x99s'"


> if everything worked correctly? Though I don't understand why the OP doesn't
> see
> 
> '06 - Toddâ\x80\x99s Song (Post-Spiderland Song in Progress).flac'
> 
> which is the repr() that I get.

That's mojibake and is always wrong :-) I'm not sure how you got that. Something
to do with an accidental decode to Latin-1?

# Python 2.7
py> repr(fname.encode('utf-8').decode('latin-1'))
"u'Todd\\xe2\\x80\\x99s'"

# Python 3.5
py> repr(fname.encode('utf-8').decode('latin-1'))
"'Toddâ\\x80\\x99s'"



-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: os.walk the apostrophe and unicode

2017-06-25 Thread Peter Otten
Steve D'Aprano wrote:

> On Sun, 25 Jun 2017 07:17 am, Peter Otten wrote:
> 
>> Then I'd fix the name manually...
> 
> The file name isn't broken.
> 
> 
> What's broken is parts of the OP's code which assumes that non-ASCII file
> names are broken...

Hm, the OP says

'06 - Todd\xe2\x80\x99s Song (Post-Spiderland Song in Progress).flac'

Shouldn't it be

'06 - Todd’s Song (Post-Spiderland Song in Progress).flac'

if everything worked correctly? Though I don't understand why the OP doesn't 
see

'06 - Toddâ\x80\x99s Song (Post-Spiderland Song in Progress).flac'

which is the repr() that I get.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: os.walk the apostrophe and unicode

2017-06-24 Thread Steve D'Aprano
On Sun, 25 Jun 2017 07:17 am, Peter Otten wrote:

> Then I'd fix the name manually...

The file name isn't broken.


What's broken is parts of the OP's code which assumes that non-ASCII file names
are broken...



-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: os.walk the apostrophe and unicode

2017-06-24 Thread Peter Otten
Rod Person wrote:

> On Sat, 24 Jun 2017 21:28:45 +0200
> Peter Otten <__pete...@web.de> wrote:
> 
>> Rod Person wrote:
>> 
>> > Hi,
>> > 
>> > I'm working on a program that will walk a file system and clean the
>> > id3 tags of mp3 and flac files, everything is working great until
>> > the follow file is found
>> > 
>> > '06 - Todd's Song (Post-Spiderland Song in Progress).flac'
>> > 
>> > for some reason that I can't understand os.walk() returns this file
>> > name as
>> > 
>> > '06 - Todd\xe2\x80\x99s Song (Post-Spiderland Song in
>> > Progress).flac'
>> > 
>> > which then causes more hell than a little bit for me. I'm not
>> > understand why apostrophe(') becomes \xe2\x80\x99, or what I can do
>> > about it.
>> 
>> >>> b"\xe2\x80\x99".decode("utf-8")
>> '’'
>> >>> unicodedata.name(_)
>> 'RIGHT SINGLE QUOTATION MARK'
>> 
>> So it's '’' rather than "'".
>> 
>> > The script is Python 3, the file system it is running on is a hammer
>> > filesystem on DragonFlyBSD. The audio files reside on a QNAP NAS
>> > which runs some kind of Linux so it probably ext3/4. The files came
>> > from various system (Mac, Windows, FreeBSD).
>> 
>> There seems to be a mismatch between the assumed and the actual file
>> system encoding somewhere in this mix. Is this the only glitch or are
>> there similar problems with other non-ascii characters?
>> 
> 
> This is the only glitch as in file names so far.
> 

Then I'd fix the name manually...

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: os.walk the apostrophe and unicode

2017-06-24 Thread MRAB

On 2017-06-24 20:47, Rod Person wrote:

On Sat, 24 Jun 2017 13:28:55 -0600
Michael Torrie <torr...@gmail.com> wrote:


On 06/24/2017 12:57 PM, Rod Person wrote:
> Hi,
> 
> I'm working on a program that will walk a file system and clean the

> id3 tags of mp3 and flac files, everything is working great until
> the follow file is found
> 
> '06 - Todd's Song (Post-Spiderland Song in Progress).flac'
> 
> for some reason that I can't understand os.walk() returns this file

> name as
> 
> '06 - Todd\xe2\x80\x99s Song (Post-Spiderland Song in
> Progress).flac'  


That's basically a UTF-8 string there:

$ python3
>>> a= b'06 - Todd\xe2\x80\x99s Song (Post-Spiderland Song in  
Progress).flac'
>>> print (a.decode('utf-8'))  
06 - Todd’s Song (Post-Spiderland Song in Progress).flac
>>>  


The NAS is just happily reading the UTF-8 bytes and passing them on
the wire.

> which then causes more hell than a little bit for me. I'm not
> understand why apostrophe(') becomes \xe2\x80\x99, or what I can do
> about it.  


It's clearly not an apostrophe in the original filename, but probably
U+2019 (’)

> The script is Python 3, the file system it is running on is a hammer
> filesystem on DragonFlyBSD. The audio files reside on a QNAP NAS
> which runs some kind of Linux so it probably ext3/4. The files came
> from various system (Mac, Windows, FreeBSD).  


It's the file serving protocol that dictates how filenames are
transmitted. In your case it's probably smb. smb (samba) is just
passing the native bytes along from the file system.  Since you know
the native file system is just UTF-8, you can just decode every
filename from utf-8 bytes into unicode.


This is the impression that I was under, my unicode is that strong, so
maybe my understand is off...but I tried.

file_name = file_name.decode('utf-8', 'ignore')

but when I get to my logging code:

logfile.write(file_name)

that throws the error:
UnicodeEncodeError: 'ascii' codec can't encode characters in
position 39-41: ordinal not in range(128)


Your logfile was opened with the 'ascii' encoding, so you can't write 
anything outside the ASCII range.


Open it with the 'utf-8' encoding instead.
--
https://mail.python.org/mailman/listinfo/python-list


Re: os.walk the apostrophe and unicode

2017-06-24 Thread Andre Müller
Can os.fsencode and os.fsdecode help? I've seen it somewhere.
I've never used it.

To fix encodings, sometimes I use the module ftfy

Greetings
Andre
-- 
https://mail.python.org/mailman/listinfo/python-list


  1   2   3   4   5   6   7   >