On 24-Oct-2012 00:53, Steven D'Aprano wrote:
On Tue, 23 Oct 2012 17:50:55 -0400, David Hutto wrote:

On Tue, Oct 23, 2012 at 10:31 AM, Virgil Stokes <v...@it.uu.se> wrote:
I am working with some rather large data files (>100GB)
[...]
Finally, to my question --- What is a fast way to write these variables
to an external file and then read them in backwards?
Don't forget to use timeit for an average OS utilization.
Given that the data files are larger than 100 gigabytes, the time
required to process each file is likely to be in hours, not microseconds.
That being the case, timeit is the wrong tool for the job, it is
optimized for timings tiny code snippets. You could use it, of course,
but the added inconvenience doesn't gain you any added accuracy.

Here's a neat context manager that makes timing long-running code simple:


http://code.activestate.com/recipes/577896
Thanks for this link



I'd suggest two list comprehensions for now, until I've reviewed it some
more:
I would be very surprised if the poster will be able to fit 100 gigabytes
of data into even a single list comprehension, let alone two.
You are correct and I have been looking at working with blocks that are sized to the RAM available for processing.

This is a classic example of why the old external processing algorithms
of the 1960s and 70s will never be obsolete. No matter how much memory
you have, there will always be times when you want to process more data
than you can fit into memory.



Thanks for your insights :-)
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to