[EMAIL PROTECTED] wrote:
> In an application dealing with very large text files, I need to create
> dictionaries indexed by tuples of words (bi-, tri-, n-grams) or nested
> dictionaries. The number of different data structures in memory grows
> into orders beyond 1E7.
> 
> It turns out that the default behaviour of Python is not very suitable
> for such a situation, as garbage collection occasionally traverses all
> objects in memory (including all tuples) in order to find out which
> could be collected. This leads to the sitation that creating O(N)
> objects effectively takes O(N*N) time.

    Do your data structures need garbage collection?  CPython is
a reference counted system with garbage collection as a backup
to catch cycles.  Try using weak back-pointers in your data
structures to avoid creating cycles.

                                John Nagle
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to