Jack Diederich <jackd...@gmail.com> added the comment: I tried passing a size to readline to see if increasing the chunk helps (test file was 120meg with 700k lines). For values 1k-10k all took around 30 seconds, with a value of 100 it took 80 seconds, with a value of 100k it ran for several minutes before I killed it. The default starts at 100 and quickly maxes to 512, which seems to be a sweet spot (thanks whomever figured that out!).
I profiled it and function overhead seems to be the real killer. 30% of the time is spent in readline(). The next() function does almost nothing and consumes 1/4th the time of readline(). Ditto for read() and _unread(). Even lowly len() consumes 1/3rd the time of readline() because it is called over 2million times. There doesn't seem to be any way to speed this up without rewriting the whole thing as a C module. I'm closing the bug WONTFIX. ---------- nosy: +jackdied resolution: -> wont fix status: open -> closed _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue7471> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com