On 8/8/07, Ben Finney <[EMAIL PROTECTED]> wrote: > Sullivan WxPyQtKinter <[EMAIL PROTECTED]> writes: > > > On Aug 8, 2:35 am, Paul Rubin <http://[EMAIL PROTECTED]> wrote: > > > Sullivan WxPyQtKinter <[EMAIL PROTECTED]> writes: > > > > This program: > > > > for i in range(1000000000): > > > > f.readline() > > > > is absolutely every slow.... > > > > > > There are two problems: > > > > > > 1) range(1000000000) builds a list of a billion elements in memory > [...] > > > > > > 2) f.readline() reads an entire line of input > [...] > > > > Thank you for pointing out these two problem. I wrote this program > > just to say that how inefficient it is to use a seemingly NATIVE way > > to seek a such a big file. No other intention........ > > The native way isn't iterating over 'range(hugenum)', it's to use an > iterator. Python file objects are iterable, only reading eaach line as > needed and not creating a companion list. > > logfile = open("foo.log", 'r') > for line in logfile: > do_stuff(line) > > This at least avoids the 'range' issue. > > To know when we've reached a particular line, use 'enumerate' to > number each item as it comes out of the iterator. > > logfile = open("foo.log", 'r') > target_line_num = 10**9 > for (line_num, line) in enumerate(file): > if line_num < target_line_num: > continue > else: > do_stuff(line) > break > > As for reading each line: that's unavoidable if you want a specific > line from a stream of variable-length lines. >
The minimum bounds for a line is at least one byte (the newline) and maybe more, depending on your data. You can seek() forward the minimum amount of bytes that (1 billion -1) lines will consume and save yourself some wasted IO. -- http://mail.python.org/mailman/listinfo/python-list