Chris Mellon wrote: > On 8/8/07, Ben Finney <[EMAIL PROTECTED]> wrote: >> Sullivan WxPyQtKinter <[EMAIL PROTECTED]> writes: >> >>> On Aug 8, 2:35 am, Paul Rubin <http://[EMAIL PROTECTED]> wrote: >>>> Sullivan WxPyQtKinter <[EMAIL PROTECTED]> writes: >>>>> This program: >>>>> for i in range(1000000000): >>>>> f.readline() >>>>> is absolutely every slow.... >>>> There are two problems: >>>> >>>> 1) range(1000000000) builds a list of a billion elements in memory >> [...] >>>> 2) f.readline() reads an entire line of input >> [...] >>> Thank you for pointing out these two problem. I wrote this program >>> just to say that how inefficient it is to use a seemingly NATIVE way >>> to seek a such a big file. No other intention........ >> The native way isn't iterating over 'range(hugenum)', it's to use an >> iterator. Python file objects are iterable, only reading eaach line as >> needed and not creating a companion list. >> >> logfile = open("foo.log", 'r') >> for line in logfile: >> do_stuff(line) >> >> This at least avoids the 'range' issue. >> >> To know when we've reached a particular line, use 'enumerate' to >> number each item as it comes out of the iterator. >> >> logfile = open("foo.log", 'r') >> target_line_num = 10**9 >> for (line_num, line) in enumerate(file): >> if line_num < target_line_num: >> continue >> else: >> do_stuff(line) >> break >> >> As for reading each line: that's unavoidable if you want a specific >> line from a stream of variable-length lines. >> > > The minimum bounds for a line is at least one byte (the newline) and > maybe more, depending on your data. You can seek() forward the minimum > amount of bytes that (1 billion -1) lines will consume and save > yourself some wasted IO.
Except that you will have to count the number of lines in that first billion characters in order to determine when to stop. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://del.icio.us/steve.holden --------------- Asciimercial ------------------ Get on the web: Blog, lens and tag the Internet Many services currently offer free registration ----------- Thank You for Reading ------------- -- http://mail.python.org/mailman/listinfo/python-list