On 9/25/2012 11:17 AM, Oscar Benjamin wrote:
On 25 September 2012 19:08, Junkshops <junksh...@gmail.com
<mailto:junksh...@gmail.com>> wrote:
In [38]: mpef._ustore._store
Out[38]: defaultdict(<type 'dict'>, {'Measurement':
{'8991c2dc67a49b909918477ee4efd767':
<micropheno.exchangeformat.Exceptions.FileContext object at
0x2f0fe90>, '7b38b429230f00fe4731e60419e92346':
<micropheno.exchangeformat.Exceptions.FileContext object at
0x2f0fad0>, 'b53531471b261c44d52f651add647544':
<micropheno.exchangeformat.Exceptions.FileContext object at
0x2f0f4d0>, '44ea6d949f7c8c8ac3bb4c0bf4943f82':
<micropheno.exchangeformat.Exceptions.FileContext object at
0x2f0f910>, '0de96f928dc471b297f8a305e71ae3e1':
<micropheno.exchangeformat.Exceptions.FileContext object at
0x2f0f550>}})
Have these exceptions been raised from somewhere before being stored?
I wonder if you're inadvertently keeping execution frames alive. There
are some problems in CPython with this that are related to storing
exceptions.
FileContext objects aren't exceptions. They store information about
where the stored object originally came from, so if there's an MD5 or ID
clash with a later line in the file the code can report both the current
line and the older clashing line to the user. I have an Exception
subclass that takes a FileContext as an argument. There are no
exceptions thrown in the file I processed to get the heapy results
earlier in the thread.
In [43]: mpef._ustore._idstore['Measurement']._SIDstore
Out[43]: defaultdict(<function <lambda> at 0x2ece7d0>,
{'emailRemoved': defaultdict(<function <lambda> at 0x2c4caa0>,
{'microPhenoShew2011': defaultdict(<type 'dict'>, {0:
{'MLR_124572462': '8991c2dc67a49b909918477ee4efd767',
'MLR_124572161': '7b38b429230f00fe4731e60419e92346',
'SMMLR_12551352': 'b53531471b261c44d52f651add647544',
'SMMLR_12551051': '0de96f928dc471b297f8a305e71ae3e1',
'SMMLR_12550750': '44ea6d949f7c8c8ac3bb4c0bf4943f82'}})})})
Also I think lambda functions might be able to keep the frame alive.
Are they by any chance being created in a function that is called in a
loop?
Here's the context for the lambdas:
def __init__(self):
self._SIDstore = defaultdict(lambda: defaultdict(lambda:
defaultdict(dict)))
So the lambda is only being called when a new key is added to the top 3
levels of the datastructure, which in the test case I've been
discussing, only happens once each.
Although the suggestion to change the hex strings to ints is a good one
and I'll do it, what I'm really trying to understand is why there's such
a large difference between the memory use per top (and the fact that the
code appears to thrash swap) and per heapy and my calculations of how
much memory the code should be using.
Cheers, MrsEntity
--
http://mail.python.org/mailman/listinfo/python-list