Re: iterating over a file with two pointers

Roy Smith Wed, 18 Sep 2013 07:38:57 -0700

> Dave Angel <[email protected]> wrote (and I agreed with):
>> I'd suggest you open the file twice, and get two file objects.  Then you
>> can iterate over them independently.

On Sep 18, 2013, at 9:09 AM, Oscar Benjamin wrote:
> There's no need to use OS resources by opening the file twice or to
> screw up the IO caching with seek().

There's no reason NOT to use OS resources.  That's what the OS is there for; to 
make life easier on application programmers.  Opening a file twice costs almost 
nothing.  File descriptors are almost as cheap as whitespace.

> Peter's version holds just as many lines as is necessary in an
> internal Python buffer and performs the minimum possible
> amount of IO.

I believe by "Peter's version", you're talking about:

> from itertools import islice, tee 
> 
> with open("tmp.txt") as f: 
>     while True: 
>         for outer in f: 
>             print outer, 
>             if "*" in outer: 
>                 f, g = tee(f) 
>                 for inner in islice(g, 3): 
>                     print "   ", inner, 
>                 break 
>         else: 
>             break 

There's this note from 
http://docs.python.org/2.7/library/itertools.html#itertools.tee:

> This itertool may require significant auxiliary storage (depending on how 
> much temporary data needs to be stored). In general, if one iterator uses 
> most or all of the data before another iterator starts, it is faster to use 
> list() instead of tee().

I have no idea how that interacts with the pattern above where you call tee() 
serially.  You're basically doing

with open("my_file") as f:
while True:
        f, g = tee(f)

Are all of those g's just hanging around, eating up memory, while waiting to be 
garbage collected?  I have no idea.  But I do know that no such problems exist 
with the two file descriptor versions.

> I would expect this to be more
> efficient as well as less error-prone on Windows.
> 
> 
> Oscar
> 

---
Roy Smith
[email protected]

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: iterating over a file with two pointers

Reply via email to