Re: writing large dictionaries to file using cPickle

Gabriel Genellina Wed, 28 Jan 2009 18:12:22 -0800

En Wed, 28 Jan 2009 14:13:10 -0200, <perfr...@gmail.com> escribió:

i have a large dictionary which contains about 10 keys, each key has a
value which is a list containing about 1 to 5 million (small)
dictionaries. for example,

mydict = {key1: [{'a': 1, 'b': 2, 'c': 'hello'}, {'d', 3, 'e': 4, 'f':
'world'}, ...],
                key2: [...]}

[pickle] creates extremely large files (~ 300 MB) though it does so
*extremely* slowly. it writes about 1 megabyte per 5 or 10 seconds and
it gets slower and slower. it takes almost an hour if not more to
write this pickle object to file.

There is an undocumented Pickler attribute, "fast". Usually, when the sameobject is referenced more than once, only the first appearance is storedin the pickled stream; later references just point to the original. Thisrequires the Pickler instance to remember every object pickled so far --setting the "fast" attribute to a true value bypasses this check. Beforeusing this, you must be positively sure that your objects don't containcircular references -- else pickling will never finish.


py> from cPickle import Pickler
py> from cStringIO import StringIO
py> s = StringIO()
py> p = Pickler(s, -1)
py> p.fast = 1
py> x = [1,2,3]
py> y = [x, x, x]
py> y
[[1, 2, 3], [1, 2, 3], [1, 2, 3]]
py> y[0] is y[1]
True
py> p.dump(y)
<cPickle.Pickler object at 0x00BC0E48>
py> s.getvalue()
'\x80\x02](](K\x01K\x02K\x03e](K\x01K\x02K\x03e](K\x01K\x02K\x03ee.'

Note that, when unpickling, shared references are broken:

py> s.seek(0,0)
py> from cPickle import load
py> y2 = load(s)
py> y2
[[1, 2, 3], [1, 2, 3], [1, 2, 3]]
py> y2[0] is y2[1]
False

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list

Re: writing large dictionaries to file using cPickle

Reply via email to