On 12.05.16 01:13, Brett Cannon wrote:
On Wed, 11 May 2016 at 13:45 Serhiy Storchaka <[email protected]
<mailto:[email protected]>> wrote:
On 11.05.16 19:43, Brett Cannon wrote:
> os.path
> '''''''
>
> The various path-manipulation functions of ``os.path`` [#os-path]_
> will be updated to accept path objects. For polymorphic functions
that
> accept both bytes and strings, they will be updated to simply use
> code very much similar to
> ``path.__fspath__() if hasattr(path, '__fspath__') else path``. This
> will allow for their pre-existing type-checking code to continue to
> function.
I afraid that this will hit a performance. Some os.path functions are
used in tight loops, they are hard optimized, and adding support of path
protocol can have visible negative effect.
As others have asked, what specific examples do you have that os.path is
used in a tight loop w/o any I/O that would overwhelm the performance?
Most examples does some I/O (like os.lstat()): posixpath.realpath(),
os.walk(), glob.glob(). But for example os.walk() was significantly
boosted with using os.scandir(), it would be sad to make it slower
again. os.path is used in number of files, sometimes in loops, sometimes
indirectly. It is hard to find all examples.
Such functions as glob.glob() calls split() and join() for every
component, but they also use string or bytes operations with paths. So
they need to convert argument to str or bytes before start iteration,
and always call os.path functions only with str or bytes. Additional
conversion in every os.path function is redundant. I suppose most other
high-level functions that manipulates paths in a loop also should
convert arguments once at the start and don't need the support of path
protocol in os.path functions.
I see this whole discussion breaking down into a few groups which
changes what gets done upfront and what might be done farther down the line:
1. Maximum acceptance: do whatever we can to make all representation of
paths just work, which means making all places working with a path
in the stdlib accept path objects, str, and bytes.
2. Safely use path objects: __fspath__() is there to signal an object
is a file system path and to get back a lower-level representation
so people stop calling str() on everything, providing some interface
signaling that someone doesn't misuse an object as a path and only
changing path consumptions APIs -- e.g. open() -- and not path
manipulation APIs -- e.g. os.path -- in the stdlib.
3. It ain't worth it: those that would rather just skip all of this and
drop pathlib from the stdlib.
Ethan and Koos are in group #1 and I'm personally in group #2 but I
tried to compromise somewhat and find a middle ground in the PEP with
the level of changes in the stdlib but being more restrictive with
os.fspath(). If I were doing a pure group #2 PEP I would drop os.path
changes and make os.fspath() do what Ethan and Koos have suggested and
simply pass through without checks whatever path.__fspath__() returned
if the argument wasn't str or bytes.
I'm for adding conversions in C implemented path consuming APIs and may
be in high-level path manipulation functions like os.walk(), but left
low-level API of os.path, fnmatch and glob unchanged.
_______________________________________________
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com