On 25 September 2012 00:58, Junkshops <junksh...@gmail.com> wrote: > Hi Tim, thanks for the response. > > > - check how you're reading the data: are you iterating over >> the lines a row at a time, or are you using >> .read()/.readlines() to pull in the whole file and then >> operate on that? >> > I'm using enumerate() on an iterable input (which in this case is the > filehandle). > > > - check how you're storing them: are you holding onto more >> than you think you are? >> > I've used ipython to look through my data structures (without going into > ungainly detail, 2 dicts with X numbers of key/value pairs, where X = > number of lines in the file), and everything seems to be working correctly. > Like I say, heapy output looks reasonable - I don't see anything surprising > there. In one dict I'm storing a id string (the first token in each line of > the file) with values as (again, without going into massive detail) the md5 > of the contents of the line. The second dict has the md5 as the key and an > object with __slots__ set that stores the line number of the file and the > type of object that line represents.
Can you give an example of how these data structures look after reading only the first 5 lines? Oscar
-- http://mail.python.org/mailman/listinfo/python-list