On 10/23/12 11:17, Paul Rubin wrote: > Virgil Stokes <v...@it.uu.se> writes: >> Finally, to my question --- What is a fast way to write these >> variables to an external file and then read them in backwards? > > Seeking backwards in files works, but the performance hit is > significant. There is also a performance hit to scanning pointers > backwards in memory, due to cache misprediction. If it's something > you're just running a few times, seeking backwards the simplest > approach. If you're really trying to optimize the thing, you might > buffer up large chunks (like 1 MB) before writing. If you're writing > once and reading multiple times, you might reverse the order of records > within the chunks during the writing phase.
I agree with Paul here, it's been a while since I did it, and my dataset was small enough (and passed through once) so I just let it run. Writing larger chunks is definitely a good way to go. > You're of course taking a performance bath from writing the program in > Python to begin with (unless using scipy/numpy or the like), enough that > it might dominate any effects of how the files are written. I usually find that the I/O almost always overwhelms the actual processing. > Of course (it should go without saying) that you want to dump in a > binary format rather than converting to decimal. Again, the conversion to/from decimal hasn't been a great cost in my experience, as it's overwhelmed by the I/O cost of shoveling the data to/from disk. -tkc -- http://mail.python.org/mailman/listinfo/python-list