hello all, i have a large dictionary which contains about 10 keys, each key has a value which is a list containing about 1 to 5 million (small) dictionaries. for example,
mydict = {key1: [{'a': 1, 'b': 2, 'c': 'hello'}, {'d', 3, 'e': 4, 'f': 'world'}, ...], key2: [...]} in total there are about 10 to 15 million lists if we concatenate together all the values of every key in 'mydict'. mydict is a structure that represents data in a very large file (about 800 megabytes). what is the fastest way to pickle 'mydict' into a file? right now i am experiencing a lot of difficulties with cPickle when using it like this: from cPickle import pickle pfile = open(my_file, 'w') pickle.dump(mydict, pfile) pfile.close() this creates extremely large files (~ 300 MB) though it does so *extremely* slowly. it writes about 1 megabyte per 5 or 10 seconds and it gets slower and slower. it takes almost an hour if not more to write this pickle object to file. is there any way to speed this up? i dont mind the large file... after all the text file with the data used to make the dictionary was larger (~ 800 MB) than the file it eventually creates, which is 300 MB. but i do care about speed... i have tried optimizing this by using this: s = pickle.dumps(mydict, 2) pfile.write(s) but this takes just as long... any ideas ? is there a different module i could use that's more suitable for large dictionaries ? thank you very much. -- http://mail.python.org/mailman/listinfo/python-list