On 1/4/2010 5:35 PM, wiso wrote:
I'm trying the fileinput module, and I like it, but I don't understand why
it's so slow... look:

from time import time
from fileinput import FileInput

file = ['r1_200907.log', 'r1_200908.log', 'r1_200909.log', 'r1_200910.log',
'r1_200911.log']

def f1():
   n = 0
   for f in file:
     print "new file: %s" % f
     ff = open(f)
     for line in ff:
       n += 1
     ff.close()
   return n

def f2():
   f = FileInput(file)
   for line in f:
     if f.isfirstline(): print "new file: %s" % f.filename()
   return f.lineno()

def f3(): # f2 simpler
   f = FileInput(file)
   for line in f:
     pass
   return f.lineno()


t = time(); f1(); print time()-t # 1.0
t = time(); f2(); print time()-t # 7.0 !!!
t = time(); f3(); print time()-t # 5.5


I'm using text files, there are 2563150 lines in total.

1. Timings should include platform and Python version.

2. fileinput executes a lot of Python code on top of the underlying file methods.

Your n += 1 is inadequate as compensation.

Fileinput does at least the following for each line :

        try:
            line = self._buffer[self._bufindex]
        except IndexError:
            pass
        else:
            self._bufindex += 1
            self._lineno += 1
            self._filelineno += 1

That is 5 attribute accesses, an indexing, and 3 additions

3. You are welcome to read the Python source in .../pythonxy/Lib/fileinput.py

4. Doc string for 3.1 version says
 "Performance: this module is unfortunately one of the slower ways of
processing large numbers of input lines.  Nevertheless, a significant
speed-up has been obtained by using readlines(bufsize) instead of
readline().  A new keyword argument, bufsize=N, is present on the
input() function and the FileInput() class to override the default
buffer size."

If your version has bufsize, try something larger than the default of 8*1024, say 1024*1024.

Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to