Mike Coleman wrote:
I have a program that creates a huge (45GB) defaultdict.  (The keys
are short strings, the values are short lists of pairs (string, int).)
 Nothing but possibly the strings and ints is shared.

The program takes around 10 minutes to run, but longer than 20 minutes
to exit (I gave up at that point).  That is, after executing the final
statement (a print), it is apparently spending a huge amount of time
cleaning up before exiting.  I haven't installed any exit handlers or
anything like that, all files are already closed and stdout/stderr
flushed, and there's nothing special going on.  I have done
'gc.disable()' for performance (which is hideous without it)--I have
no reason to think there are any loops.

Currently I am working around this by doing an os._exit(), which is
immediate, but this seems like a bit of hack.  Is this something that
needs fixing, or that has already been fixed?

You don't mention the platform, but...

This behaviour was not unknown in the distant past, with much smaller
datasets.  Most of the problems then related to the platform malloc()
doing funny things as stuff was free()ed, like coalescing free space.

[I once sat and watched a Python script run in something like 30 seconds
 and then take nearly 10 minutes to terminate, as you describe (Python
 2.1/Solaris 2.5/Ultrasparc E3500)... and that was only a couple of
 hundred MB of memory - the Solaris 2.5 malloc() had some undesirable
 properties from Python's point of view]

PyMalloc effectively removed this as an issue for most cases and platform
malloc()s have also become considerably more sophisticated since then,
but I wonder whether the sheer size of your dataset is unmasking related
issues.

Note that in Python 2.5 PyMalloc does free() unused arenas as a surplus
accumulates (2.3 & 2.4 never free()ed arenas).  Your platform malloc()
might have odd behaviour with 45GB of arenas returned to it piecemeal.
This is something that could be checked with a small C program.
Calling os._exit() circumvents the free()ing of the arenas.

Also consider that, with the exception of small integers (-1..256), no
interning of integers is done.  If your data contains large quantities
of integers with non-unique values (that aren't in the small integer
range) you may find it useful to do your own interning.

--
-------------------------------------------------------------------------
Andrew I MacIntyre                     "These thoughts are mine alone..."
E-mail: andy...@bullseye.apana.org.au  (pref) | Snail: PO Box 370
       andy...@pcug.org.au             (alt) |        Belconnen ACT 2616
Web:    http://www.andymac.org/               |        Australia
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to