On 9/25/2012 11:17 AM, Oscar Benjamin wrote:
On 25 September 2012 19:08, Junkshops <junksh...@gmail.com <mailto:junksh...@gmail.com>> wrote:


    In [38]: mpef._ustore._store
    Out[38]: defaultdict(<type 'dict'>, {'Measurement':
    {'8991c2dc67a49b909918477ee4efd767':
    <micropheno.exchangeformat.Exceptions.FileContext object at
    0x2f0fe90>, '7b38b429230f00fe4731e60419e92346':
    <micropheno.exchangeformat.Exceptions.FileContext object at
    0x2f0fad0>, 'b53531471b261c44d52f651add647544':
    <micropheno.exchangeformat.Exceptions.FileContext object at
    0x2f0f4d0>, '44ea6d949f7c8c8ac3bb4c0bf4943f82':
    <micropheno.exchangeformat.Exceptions.FileContext object at
    0x2f0f910>, '0de96f928dc471b297f8a305e71ae3e1':
    <micropheno.exchangeformat.Exceptions.FileContext object at
    0x2f0f550>}})


Have these exceptions been raised from somewhere before being stored? I wonder if you're inadvertently keeping execution frames alive. There are some problems in CPython with this that are related to storing exceptions.
FileContext objects aren't exceptions. They store information about where the stored object originally came from, so if there's an MD5 or ID clash with a later line in the file the code can report both the current line and the older clashing line to the user. I have an Exception subclass that takes a FileContext as an argument. There are no exceptions thrown in the file I processed to get the heapy results earlier in the thread.

In [43]: mpef._ustore._idstore['Measurement']._SIDstore
Out[43]: defaultdict(<function <lambda> at 0x2ece7d0>, {'emailRemoved': defaultdict(<function <lambda> at 0x2c4caa0>, {'microPhenoShew2011': defaultdict(<type 'dict'>, {0: {'MLR_124572462': '8991c2dc67a49b909918477ee4efd767', 'MLR_124572161': '7b38b429230f00fe4731e60419e92346', 'SMMLR_12551352': 'b53531471b261c44d52f651add647544', 'SMMLR_12551051': '0de96f928dc471b297f8a305e71ae3e1', 'SMMLR_12550750': '44ea6d949f7c8c8ac3bb4c0bf4943f82'}})})})
Also I think lambda functions might be able to keep the frame alive. Are they by any chance being created in a function that is called in a loop?

Here's the context for the lambdas:

  def __init__(self):
self._SIDstore = defaultdict(lambda: defaultdict(lambda: defaultdict(dict)))

So the lambda is only being called when a new key is added to the top 3 levels of the datastructure, which in the test case I've been discussing, only happens once each.

Although the suggestion to change the hex strings to ints is a good one and I'll do it, what I'm really trying to understand is why there's such a large difference between the memory use per top (and the fact that the code appears to thrash swap) and per heapy and my calculations of how much memory the code should be using.

Cheers, MrsEntity
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to