On 2012-12-17 17:27, Paul Rudin wrote:
Chris Angelico <ros...@gmail.com> writes:

On Tue, Dec 18, 2012 at 2:28 AM, Gilles Lenfant
<gilles.lenf...@gmail.com> wrote:
Hi,

I have googled but did not find an efficient solution to my
problem. My customer provides a directory with a huuuuge list of
files (flat, potentially 100000+) and I cannot reasonably use
os.listdir(this_path) unless creating a big memory footprint.

So I'm looking for an iterator that yields the file names of a
directory and does not make a giant list of what's in.

Sounds like you want os.walk.

But doesn't os.walk call listdir() and that creates a list of the
contents of a directory, which is exactly the initial problem?

But... a hundred thousand files? I know the Zen of Python says that
flat is better than nested, but surely there's some kind of directory
structure that would make this marginally manageable?


Sometimes you have to deal with things other people have designed, so
the directory structure is not something you can control. I've run up
against exactly the same problem and made something in C that
implemented an iterator.

<Off topic>
Years ago I had to deal with an in-house application that was written
using a certain database package. The package stored each predefined
query in a separate file in the same directory.

I found that if I packed all the predefined queries into a single file
and then called an external utility to extract the desired query from
the file every time it was needed into a file for the package to use,
not only did it save a significant amount of disk space (hard disks
were a lot smaller then), I also got a significant speed-up!

It wasn't as bad as 100000 in one directory, but it was certainly too
many...
</Off topic>
It would probably be better if listdir() made an iterator rather than a
list.


--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to