I have a pair of python programs that parse and index files on my computer to make them searchable. The problem that I have is that they continually grow until my system is out of memory, and then things get ugly. I remember, when I was first learning python, reading that the python interpreter doesn't gc small strings, but I assumed that was outdated and sort of forgot about it. Unfortunately, it seems this is still the case. A sample program (to type/copy and paste into the python REPL):
a=[] for i in xrange(33,127): for j in xrange(33,127): for k in xrange(33,127): for l in xrange(33, 127): a.append(chr(i)+chr(j)+chr(k)+chr(l)) del(a) import gc gc.collect() The loop is deep enough that I always interrupt it once python's size is around 250 MB. Once the gc.collect() call is finished, python's size has not changed a bit. Even though there are no locals, no references at all to all the strings that were created, python will not reduce its size. This example is obviously artificial, but I am getting the exact same behaviour in my real programs. Is there some way to convince python to get rid of all the data that is no longer referenced, or do I need to use a different language? This has been tried under python 2.4.3 in gentoo linux and python 2.3 under OS X.3. Any suggestions/work arounds would be much appreciated.
-- http://mail.python.org/mailman/listinfo/python-list