per wrote: > hi all, > > i have a very large dictionary object that is built from a text file > that is about 800 MB -- it contains several million keys. ideally i > would like to pickle this object so that i wouldnt have to parse this > large file to compute the dictionary every time i run my program. > however currently the pickled file is over 300 MB and takes a very > long time to write to disk - even longer than recomputing the > dictionary from scratch. > > i would like to split the dictionary into smaller ones, containing > only hundreds of thousands of keys, and then try to pickle them. is > there a way to easily do this? i.e. is there an easy way to make a > wrapper for this such that i can access this dictionary as just one > object, but underneath it's split into several? so that i can write > my_dict[k] and get a value, or set my_dict[m] to some value without > knowing which sub dictionary it's in. > > if there aren't known ways to do this, i would greatly apprciate any > advice/examples on how to write this data structure from scratch, > reusing as much of the dict() class as possible. > You aren't by any chance running this on Python 3.0, are you? The I/O implementation for that release is known to be slow, and this would have its effect on pickle dump/load performance.
regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ Want to know? Come to PyCon - soon! http://us.pycon.org/ -- http://mail.python.org/mailman/listinfo/python-list