I'm working with very large text files and am always looking for ways to optimize the performance of our scripts. While reviewing our code, I wondered if changing the size of our file buffers to a very large buffer size might speed up our file I/O. Intuitively, I thought that bigger buffers might improve performance by reducing the number of reads. Instead I observed just the opposite - performance was 7x slower! (~500 sec vs. 70 sec) and used 3x the memory (24M vs. 8M) due to the larger buffer. The following tests were run on a Windows XP system using Python 2.6.1 SOURCE: import time # timer class class timer( object ): def __init__( self, message='' ): self.message = message def start( self ): self.starttime = time.time() print 'Start: %s' % ( self.message )
def stop( self ): print 'Finish: %s %6.2f' % ( self.message, time.time() - self.starttime ) # myFileName points to a 2G text file. myFileName = r'C:\logs\jan2009.dat' # default buffering myFile = open( myFileName ) for line in myFile: pass myFile.close() strategy1.stop() # setting the buffer size to 16M bufferSize = 2 ** 24 strategy2 = timer( 'Large buffer (%sk)' % (bufferSize/1024) ) strategy2.start() myFile = open( myFileName, 'rt', bufferSize ) for line in myFile: pass myFile.close() strategy2.stop() OUTPUT: Start: Default buffer Finish: Default buffer 69.98 Start: Large buffer (16384k) Finish: Large buffer (16384k) 493.88 <--- 7x slower Any comments regarding this massive slowdown? Thanks, Malcolm
-- http://mail.python.org/mailman/listinfo/python-list