On Jun 28, 2014 12:49 PM, "Ben Hoyt" <benh...@gmail.com> wrote: > > >> But the underlying system calls -- ``FindFirstFile`` / > >> ``FindNextFile`` on Windows and ``readdir`` on Linux and OS X -- > > > > What about FreeBSD, OpenBSD, NetBSD, Solaris, etc. They don't provide readdir? > > I guess it'd be better to say "Windows" and "Unix-based OSs" > throughout the PEP? Because all of these (including Mac OS X) are > Unix-based.
No, Just say POSIX. > > > It looks like the WIN32_FIND_DATA has a dwFileAttributes field. So we > > should mimic stat_result recent addition: the new > > stat_result.file_attributes field. Add DirEntry.file_attributes which > > would only be available on Windows. > > > > The Windows structure also contains > > > > FILETIME ftCreationTime; > > FILETIME ftLastAccessTime; > > FILETIME ftLastWriteTime; > > DWORD nFileSizeHigh; > > DWORD nFileSizeLow; > > > > It would be nice to expose them as well. I'm no more surprised that > > the exact API is different depending on the OS for functions of the os > > module. > > I think you've misunderstood how DirEntry.lstat() works on Windows -- > it's basically a no-op, as Windows returns the full stat information > with the original FindFirst/FindNext OS calls. This is fairly explict > in the PEP, but I'm sure I could make it clearer: > > DirEntry.lstat(): "like os.lstat(), but requires no system calls on Windows > > So you can already get the dwFileAttributes for free by saying > entry.lstat().st_file_attributes. You can also get all the other > fields you mentioned for free via .lstat() with no additional OS calls > on Windows, for example: entry.lstat().st_size. > > Feel free to suggest changes to the PEP or scandir docs if this isn't > clear. Note that is_dir()/is_file()/is_symlink() are free on all > systems, but .lstat() is only free on Windows. > > > Does your implementation uses a free list to avoid the cost of memory > > allocation? A short free list of 10 or maybe just 1 may help. The free > > list may be stored directly in the generator object. > > No, it doesn't. I might add this to the PEP under "possible > improvements". However, I think the speed increase by removing the > extra OS call and/or disk seek is going to be way more than memory > allocation improvements, so I'm not sure this would be worth it. > > > Does it support also bytes filenames on UNIX? > > > Python now supports undecodable filenames thanks to the PEP 383 > > (surrogateescape). I prefer to use the same type for filenames on > > Linux and Windows, so Unicode is better. But some users might prefer > > bytes for other reasons. > > I forget exactly now what my scandir module does, but for os.scandir() > I think this should behave exactly like os.listdir() does for > Unicode/bytes filenames. > > > Crazy idea: would it be possible to "convert" a DirEntry object to a > > pathlib.Path object without losing the cache? I guess that > > pathlib.Path expects a full stat_result object. > > The main problem is that pathlib.Path objects explicitly don't cache > stat info (and Guido doesn't want them to, for good reason I think). > There's a thread on python-dev about this earlier. I'll add it to a > "Rejected ideas" section. > > > I don't understand how you can build a full lstat() result without > > really calling stat. I see that WIN32_FIND_DATA contains the size, but > > here you call lstat(). > > See above. > > > Do you plan to continue to maintain your module for Python < 3.5, but > > upgrade your module for the final PEP? > > Yes, I intend to maintain the standalone scandir module for 2.6 <= > Python < 3.5, at least for a good while. For integration into the > Python 3.5 stdlib, the implementation will be integrated into > posixmodule.c, of course. > > >> Should there be a way to access the full path? > >> ---------------------------------------------- > >> > >> Should ``DirEntry``'s have a way to get the full path without using > >> ``os.path.join(path, entry.name)``? This is a pretty common pattern, > >> and it may be useful to add pathlib-like ``str(entry)`` functionality. > >> This functionality has also been requested in `issue 13`_ on GitHub. > >> > >> .. _`issue 13`: https://github.com/benhoyt/scandir/issues/13 > > > > I think that it would be very convinient to store the directory name > > in the DirEntry. It should be light, it's just a reference. > > > > And provide a fullname() name which would just return > > os.path.join(path, entry.name) without trying to resolve path to get > > an absolute path. > > Yeah, fair suggestion. I'm still slightly on the fence about this, but > I think an explicit fullname() is a good suggestion. Ideally I think > it'd be better to mimic pathlib.Path.__str__() which is kind of the > equivalent of fullname(). But how does pathlib deal with unicode/bytes > issues if it's the str function which has to return a str object? Or > at least, it'd be very weird if __str__() returned bytes. But I think > it'd need to if you passed bytes into scandir(). Do others have > thoughts? > > > Would it be hard to implement the wildcard feature on UNIX to compare > > performances of scandir('*.jpg') with and without the wildcard built > > in os.scandir? > > It's a good idea, the problem with this is that the Windows wildcard > implementation has a bunch of crazy edge cases where *.ext will catch > more things than just a simple regex/glob. This was discussed on > python-dev or python-ideas previously, so I'll dig it up and add to a > Rejected Ideas section. In any case, this could be added later if > there's a way to iron out the Windows quirks. > > -Ben > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/greg%40krypto.org
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com