On Sat, 20 Dec 2008 09:02:38 pm Kristján Valur Jónsson wrote:

> Can you distill the program into something reproducible?
> Maybe with something slightly less than 45Gb but still exhibiting
> some degradation of exit performance? I can try to point our
> commercial profiling tools at it and see what it is doing. K

In November 2007, a similar problem was reported on the comp.lang.python 
newsgroup. 370MB was large enough to demonstrate the problem. I don't 
know if a bug was ever reported.

The thread starts here:
http://mail.python.org/pipermail/python-list/2007-November/465498.html

or if you prefer Google Groups:
http://preview.tinyurl.com/97xsso

and it describes extremely long times to populate and destroy large 
dicts even with garbage collection turned off.

My summary at the time was:

"On systems with multiple CPUs or 64-bit systems, or both, creating 
and/or deleting a multi-megabyte dictionary in recent versions of 
Python (2.3, 2.4, 2.5 at least) takes a LONG time, of the order of 30+ 
minutes, compared to seconds if the system only has a single CPU. 
Turning garbage collection off doesn't help."

I make no guarantee that the above is a correct description of the 
problem, only that this is what I believed at the time.

I'm afraid it is a very long thread, with multiple red herrings, lots of 
people unable to reproduce the problem, and the usual nonsense that 
happens on comp.lang.python.

I was originally one of the skeptics until I reproduced the original 
posters problem. I generated a sample file 8 million key/value pairs as 
a 370MB text file. Reading it into a dict took two and a half minutes 
on my relatively slow computer. But deleting the dict took more than 30 
minutes even with garbage collection switched off. Sample code 
reproducing the problem on my machine is here:

http://mail.python.org/pipermail/python-list/2007-November/465513.html

According to this post of mine:

http://mail.python.org/pipermail/python-list/2007-November/466209.html

deleting 8 million (key, value) pairs stored as a list of tuples was 
very fast. It was only if they were stored as a dict that deleting it 
was horribly slow.

Please note that other people have tried and failed to replicate the 
problem. I suspect the fault (if it is one, and not human error) is 
specific to some combinations of Python version and hardware.

Even if this is a Will Not Fix, I'd be curious if anyone else can 
reproduce the problem.

Hope this is helpful,

Steven.



> -----Original Message-----
> From: python-dev-bounces+kristjan=ccpgames....@python.org
> [mailto:python-dev-bounces+kristjan=ccpgames....@python.org] On
> Behalf Of Mike Coleman Sent: 19. desember 2008 23:30
> To: python-dev@python.org
> Subject: [Python-Dev] extremely slow exit for program having huge
> (45G) dict (python 2.5.2)
>
> I have a program that creates a huge (45GB) defaultdict.  (The keys
> are short strings, the values are short lists of pairs (string,
> int).) Nothing but possibly the strings and ints is shared.
>
> The program takes around 10 minutes to run, but longer than 20
> minutes to exit (I gave up at that point).  That is, after executing
> the final statement (a print), it is apparently spending a huge
> amount of time cleaning up before exiting.  I haven't installed any
> exit handlers or anything like that, all files are already closed and
> stdout/stderr flushed, and there's nothing special going on.  I have
> done
> 'gc.disable()' for performance (which is hideous without it)--I have
> no reason to think there are any loops.
>
> Currently I am working around this by doing an os._exit(), which is
> immediate, but this seems like a bit of hack.  Is this something that
> needs fixing, or that has already been fixed?
>
> Mike




-- 
Steven D'Aprano
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to