Hi Tony, > I was trying to see if I could speed up processing huge files (in the 10's of > Gigabytes) by passing various values to the readline() method of the file > object.
We also work with large sized text files (we use Python for ETL jobs in the 100-200 Gb range) and have noticed similar behaviors. We're currently running 32 and 64 bit versions of Python 2.6.1 on Windows XP (32-bit/2 Gb) and Windows 2008 (64-bit/48 Gb) with SCSI and eSATA drives. All our file access is local (vs. over a network) and the boxes our jobs run on are dedicated machines running nothing else but our Python scripts. Our boxes are on an isolated network and we have no virus checking software running in the background. Since our boxes are maxed out with memory, we thought that supplying large buffer sizes to open() would improve performance. Like you, we were surprised that this strategy significantly slowed down our processing. We're currently opening our text files via open() and allowing Python to choose the default buffer size. My experience with Python is that many performance enhancement techniques are not (initially) intuitive - but my gut tells me that the behavior we are both seeing is *NOT* one of these cases, eg. something smells fishy. I'm happy to run experiments on our side if anyone has suggestions. Thanks for bringing this up. Regards, Malcolm _______________________________________________ python-win32 mailing list python-win32@python.org http://mail.python.org/mailman/listinfo/python-win32