On Thu, 15 Nov 2007 21:51:21 +0100, Hrvoje Niksic wrote: > Steven D'Aprano <[EMAIL PROTECTED]> writes: > >>>> Someone please summarize. >>> >>> Yes, that would be good. >> >> On systems with multiple CPUs or 64-bit systems, or both, creating >> and/or deleting a multi-megabyte dictionary in recent versions of >> Python (2.3, 2.4, 2.5 at least) takes a LONG time, of the order of 30+ >> minutes, compared to seconds if the system only has a single CPU. > > Can you post minimal code that exhibits this behavior on Python 2.5.1? > The OP posted a lot of different versions, most of which worked just > fine for most people.
Who were testing it on single-CPU, 32 bit systems. The plot thickens... I wrote another version of my test code, reading the data into a list of tuples rather than a dict: $ python slurp_dict4.py # actually slurp a list, despite the name Starting at Fri Nov 16 08:55:26 2007 Line 0 Line 1000000 Line 2000000 Line 3000000 Line 4000000 Line 5000000 Line 6000000 Line 7000000 Line 8000000 Items in list: 8191180 Completed import at Fri Nov 16 08:56:26 2007 Starting to delete list... Completed deletion at Fri Nov 16 08:57:04 2007 Finishing at Fri Nov 16 08:57:04 2007 Quite a reasonable speed, considering my limited memory. What do we know so far? (1) The problem occurs whether or not gc is enabled. (2) It only occurs on some architectures. 64 bit CPU seems to be common factor. (3) I don't think we've seen it demonstrated under Windows, but we've seen it under at least two different Linux distros. (4) It affects very large dicts, but not very large lists. (5) I've done tests where instead of one really big dict, the data is put into lots of smaller dicts. The problem still occurs. (6) It was suggested the problem is related to long/int unification, but I've done tests that kept the dict keys as strings, and the problem still occurs. (7) It occurs in Python 2.3, 2.4 and 2.5, but not 2.5.1. Do we treat this as a solved problem and move on? -- Steven. -- http://mail.python.org/mailman/listinfo/python-list