Ben Hoyt <benh...@gmail.com> writes: ... > ``scandir()`` yields a ``DirEntry`` object for each file and directory > in ``path``. Just like ``listdir``, the ``'.'`` and ``'..'`` > pseudo-directories are skipped, and the entries are yielded in > system-dependent order. Each ``DirEntry`` object has the following > attributes and methods: > > * ``name``: the entry's filename, relative to the ``path`` argument > (corresponds to the return values of ``os.listdir``) > > * ``full_name``: the entry's full path name -- the equivalent of > ``os.path.join(path, entry.name)``
I suggest renaming .full_name -> .path .full_name might be misleading e.g., it implies that .full_name == abspath(.full_name) that might be false. The .path name has no such associations. The semantics of the the .path attribute is defined by these assertions:: for entry in os.scandir(topdir): #NOTE: assume os.path.normpath(topdir) is not called to create .path assert entry.path == os.path.join(topdir, entry.name) assert entry.name == os.path.basename(entry.path) assert entry.name == os.path.relpath(entry.path, start=topdir) assert os.path.dirname(entry.path) == topdir assert (entry.path != os.path.abspath(entry.path) or os.path.isabs(topdir)) # it is absolute only if topdir is assert (entry.path != os.path.realpath(entry.path) or topdir == os.path.realpath(topdir)) # symlinks are not resolved assert (entry.path != os.path.normcase(entry.path) or topdir == os.path.normcase(topdir)) # no case-folding, # unlike PureWindowsPath ... > * ``is_dir()``: like ``os.path.isdir()``, but much cheaper -- it never > requires a system call on Windows, and usually doesn't on POSIX > systems I suggest documenting the implicit follow_symlinks parameter for .is_X methods. Note: lstat == partial(stat, follow_symlinks=False). In particular, .is_dir() should probably use follow_symlinks=True by default as suggested by Victor Stinner *if .is_dir() does it on Windows* MSDN says: GetFileAttributes() does not follow symlinks. os.path.isdir docs imply follow_symlinks=True: "both islink() and isdir() can be true for the same path." ... > Like the other functions in the ``os`` module, ``scandir()`` accepts > either a bytes or str object for the ``path`` parameter, and returns > the ``DirEntry.name`` and ``DirEntry.full_name`` attributes with the > same type as ``path``. However, it is *strongly recommended* to use > the str type, as this ensures cross-platform support for Unicode > filenames. Document when {e.name for e in os.scandir(path)} != set(os.listdir(path)) +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ e.g., path can be an open file descriptor in os.listdir(path) since Python 3.3 but the PEP doesn't mention it explicitly. It has been discussed already e.g., https://mail.python.org/pipermail/python-dev/2014-July/135296.html PEP 471 should explicitly reject the support for specifying a file descriptor so that a code that uses os.scandir may assume that entry.path (.full_name) attribute is always present (no exceptions due to a failure to read /proc/self/fd/NNN or an error while calling fcntl(F_GETPATH) or GetFileInformationByHandleEx() -- see http://stackoverflow.com/q/1188757 ). Reject explicitly in PEP 471 the support for dir_fd parameter +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ aka the support for paths relative to directory descriptors. Note: it is a *different* (but related) issue. ... > Notes on exception handling > --------------------------- > > ``DirEntry.is_X()`` and ``DirEntry.lstat()`` are explicitly methods > rather than attributes or properties, to make it clear that they may > not be cheap operations, and they may do a system call. As a result, > these methods may raise ``OSError``. > > For example, ``DirEntry.lstat()`` will always make a system call on > POSIX-based systems, and the ``DirEntry.is_X()`` methods will make a > ``stat()`` system call on such systems if ``readdir()`` returns a > ``d_type`` with a value of ``DT_UNKNOWN``, which can occur under > certain conditions or on certain file systems. > > For this reason, when a user requires fine-grained error handling, > it's good to catch ``OSError`` around these method calls and then > handle as appropriate. > I suggest documenting that next(os.scandir()) may raise OSError e.g., on POSIX it may happen due to an OS error in opendir/readdir/closedir Also, document whether os.scandir() itself may raise OSError (whether opendir or other OS functions may be called before the first yield). ... os.scandir() should allow the explicit cleanup ++++++++++++++++++++++++++++++++++++++++++++++ :: with closing(os.scandir()) as entries: for _ in entries: break entries.close() is called that frees the resources if necessary, to *avoid relying on garbage-collection for managing file descriptors* (check whether it is consistent with the .close() method from the generator protocol e.g., it might be already called on the exit from the loop whether an exception happens or not without requiring the with-statement (I don't know)). *It should be possible to limit the resource life-time on non-refcounting Python implementations.* os.scandir() object may support the context manager protocol explicitly:: with os.scandir() as entries: for _ in entries: break ``.__exit__`` method may just call ``.close`` method. ... > Rejected ideas > ============== > > > Naming > ------ > > The only other real contender for this function's name was > ``iterdir()``. However, ``iterX()`` functions in Python (mostly found > in Python 2) tend to be simple iterator equivalents of their > non-iterator counterparts. For example, ``dict.iterkeys()`` is just an > iterator version of ``dict.keys()``, but the objects returned are > identical. In ``scandir()``'s case, however, the return values are > quite different objects (``DirEntry`` objects vs filename strings), so > this should probably be reflected by a difference in name -- hence > ``scandir()``. > > See some `relevant discussion on python-dev > <https://mail.python.org/pipermail/python-dev/2014-June/135228.html>`_. > - os.scandir() name is inconsistent with the pathlib module. pathlib.Path has `.iterdir() method <https://docs.python.org/3/library/pathlib.html#pathlib.Path.iterdir>`_ that generates Path instances i.e., the argument that iterdir() should return strings is not valid - os.scandir() name conflicts with POSIX. POSIX already has `scandir() function <http://pubs.opengroup.org/onlinepubs/9699919799/functions/scandir.html>`_ Most functions in the os module are thin-wrappers of their corresponding POSIX analogs In principle, POSIX scandir(path, &entries, sel, compar) is emulated using:: entries = sorted(filter(sel, os.scandir(path)), key=cmp_to_key(compar)) so that the above code snippet could be provided in the docs. We may say that os.scandir is a pythonic analog of the POSIX function and therefore there is no conflict even if os.scandir doesn't use POSIX scandir function in its implementation. If we can't say it then a *different name/module should be used to allow adding POSIX-compatible os.scandir() in the future*. -- Akira _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com