> Dave Angel <da...@davea.name> wrote (and I agreed with): >> I'd suggest you open the file twice, and get two file objects. Then you >> can iterate over them independently.
On Sep 18, 2013, at 9:09 AM, Oscar Benjamin wrote: > There's no need to use OS resources by opening the file twice or to > screw up the IO caching with seek(). There's no reason NOT to use OS resources. That's what the OS is there for; to make life easier on application programmers. Opening a file twice costs almost nothing. File descriptors are almost as cheap as whitespace. > Peter's version holds just as many lines as is necessary in an > internal Python buffer and performs the minimum possible > amount of IO. I believe by "Peter's version", you're talking about: > from itertools import islice, tee > > with open("tmp.txt") as f: > while True: > for outer in f: > print outer, > if "*" in outer: > f, g = tee(f) > for inner in islice(g, 3): > print " ", inner, > break > else: > break There's this note from http://docs.python.org/2.7/library/itertools.html#itertools.tee: > This itertool may require significant auxiliary storage (depending on how > much temporary data needs to be stored). In general, if one iterator uses > most or all of the data before another iterator starts, it is faster to use > list() instead of tee(). I have no idea how that interacts with the pattern above where you call tee() serially. You're basically doing with open("my_file") as f: while True: f, g = tee(f) Are all of those g's just hanging around, eating up memory, while waiting to be garbage collected? I have no idea. But I do know that no such problems exist with the two file descriptor versions. > I would expect this to be more > efficient as well as less error-prone on Windows. > > > Oscar > --- Roy Smith r...@panix.com
-- https://mail.python.org/mailman/listinfo/python-list