New submission from stw <[email protected]>:
I've found that unpickling a certain kind of dictionary is substantially slower
in python 2.7 compared to python 2.6. The dictionary has keys that are tuples
of strings - a 1-tuple is enough to see the effect. The problem seems to be
caused by garbage collection, as turning it off eliminates the slowdown. Both
pickle and cPickle modules are affected.
I've attached two files to demonstrate this. The file 'make_file.py'
creates a dictionary of specified size, with keys containing 1-tuples of random
strings. It then dumps the dictionary to a pickle file using a specified pickle
module.
The file 'load_file.py' unpickles the file created by 'make_file.py', using a
specified pickle module, and prints the time taken. The code can be run with
garbage collection either on or off.
The results below are for a dictionary of 200000 entries. Each entry is the
time taken in seconds with garbage collection on / garbage collection off. The
row headings are the module used to pickle the data, the column headings the
module used to unpickle it.
python 2.6, n = 200000
size pickle cPickle
pickle 4.3M 3.02/2.65 0.786/0.559
cPickle 3.4M 2.27/2.04 0.66/0.443
python 2.7, n = 200000
size pickle cPickle
pickle 4.3M 10.5/2.67 6.62/0.563
cPickle 2.4M 1.45/1.39 0.362/0.325
When pickle is used to pickle the data, there is a significant slowdown in
python 2.7 compared to python 2.6 with garbage collection on. With garbage
collection off the times in python 2.7 are essentially identical to those in
python 2.6.
When cPickle is used to pickle the data, both unpicklers are faster in python
2.7 than in python 2.6. Presumably the speedup is due to the dictionary
optimizations introduced from issue #5670.
Both pickle and cPickle show a slowdown when data pickled in python 2.6 is
unpickled in python 2.7:
pickled in python 2.6, unpickled in python 2.7, n = 200000
size pickle (2.7) cPickle (2.7)
pickle (2.6) 4.3M 10.4/2.66 6.64/0.56
cPickle (2.6) 3.4M 8.73/2.08 6.1/0.452
I don't know enough about the internals of the pickle modules or garbage
collector to offer an explanation/fix. The list of optimizations for python 2.7
indicates changes to both pickle modules (issues #5670 and #5084) and the
garbage collector (issues #4074 and #4688). It seems possible that the slowdown
is the result of some interaction between these changes.
Further notes:
1. System details: python 2.6.5 and python 2.7.3 on Ubuntu 10.04, 1.73GHz
Pentium M processor.
2. Only pickle files created with protocols 1 and 2 are affected. Pickling with
protocol 0 gives similar timings on python 2.6 and 2.7.
3. The fact that the dictionary's keys are tuples is relevant, although the
length of the tuple is not. Unpickling a dictionary whose keys are strings does
not show any slowdown.
----------
files: make_file.py
messages: 160368
nosy: stw
priority: normal
severity: normal
status: open
title: Slow unpickling of certain dictionaries in python 2.7 vs python 2.6
type: performance
versions: Python 2.7
Added file: http://bugs.python.org/file25524/make_file.py
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue14775>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com