[EMAIL PROTECTED] wrote: > In an application dealing with very large text files, I need to create > dictionaries indexed by tuples of words (bi-, tri-, n-grams) or nested > dictionaries. The number of different data structures in memory grows > into orders beyond 1E7. > > It turns out that the default behaviour of Python is not very suitable > for such a situation, as garbage collection occasionally traverses all > objects in memory (including all tuples) in order to find out which > could be collected. This leads to the sitation that creating O(N) > objects effectively takes O(N*N) time.
Do your data structures need garbage collection? CPython is a reference counted system with garbage collection as a backup to catch cycles. Try using weak back-pointers in your data structures to avoid creating cycles. John Nagle -- http://mail.python.org/mailman/listinfo/python-list