per wrote:
hi all,
i have a very large dictionary object that is built from a text file
that is about 800 MB -- it contains several million keys. ideally i
would like to pickle this object so that i wouldnt have to parse this
large file to compute the dictionary every time i run my program.
however currently the pickled file is over 300 MB and takes a very
long time to write to disk - even longer than recomputing the
dictionary from scratch.
But you only write it once. How does the read and reconstruct time
compare to the recompute time?
i would like to split the dictionary into smaller ones, containing
only hundreds of thousands of keys, and then try to pickle them.
Do you have any evidence that this would really be faster?
is
there a way to easily do this? i.e. is there an easy way to make a
wrapper for this such that i can access this dictionary as just one
object, but underneath it's split into several? so that i can write
my_dict[k] and get a value, or set my_dict[m] to some value without
knowing which sub dictionary it's in.
Searching for a key in, say, 10 dicts will be slower than searching for
it in just one. The only reason I would do this would be if the dict
had to be split, say over several machines. But then, you could query
them in parallel.
if there aren't known ways to do this, i would greatly apprciate any
advice/examples on how to write this data structure from scratch,
reusing as much of the dict() class as possible.
Terry Jan Reedy
--
http://mail.python.org/mailman/listinfo/python-list