Hello,

I am looking to improve the performance of the following piece of Python code:-

for cc in StatusContainer:
        for srv in StatusContainer[cc]:
                for id in StatusContainer[cc][srv]['RECV']:
                        if id in StageContainer['13']:
                                StatusContainer[cc][srv].setdefault('ACK', 
[]).append(id)
                        elif id in StageContainer['14']:
                                StatusContainer[cc][srv].setdefault('NACK', 
[]).append(id)
                        else:
                                if id in StageContainer['504'] or id in 
StageContainer['505']:
                                        
StatusContainer[cc][srv].setdefault('RETRY', []).append(id)

I am new to programming and I decided that I would use python as my first 
programming language.  I have written couple of (very basic but useful) scripts 
and now I am looking to optimise the performance of some of these scripts so I 
can make use of them.

The code above is an extract from a bigger script which I am using to count how 
many message IDs have been received, sent or rejected per service per country.  
The code is iterating through two dictionaries (a status dictionary which is a 
set of sub-dictionaries and a stages dictionary which again is a set of 
sub-dictionaries), if an entry (in this case an id) is found in the status 
dictionary, I am performing a search for that id in the stages dictionary.  
Depending where in the stages dictionary the id is found (which stage), then I 
am creating a new sub-dictionary inside the status dictionary and I append the 
id.

At the end, all I do is count how many IDs each sub-dictionary is holding (e.g: 
len(StatusContainer[cc][srv]['RECV'])).

My two dictionaries look like this:-

StatusContainer:-

defaultdict(<type 'set'>, {'FR': {'VDMS': {'RECV': 
['cad1c651-a61e-11e0-9f43-00238bbdc9eb', 
'df25a1d1-a61e-11e0-9f43-00238bbdc9eb', '1a9452c0-a61f-11e0-9f43-00238bbdc9eb', 
'1a9b3090-a61f-11e0-9f43-00238bbdc9eb']}}, 'PT': {'VDMS': {'RECV': 
['2410e980-a61f-11e0-cffc-00238bbdc9e7', 
'24146bf0-a61f-11e0-cffc-00238bbdc9e7', '29c5f0f0-a61f-11e0-cffc-00238bbdc9e7', 
'2ebe5f20-a61f-11e0-cffc-00238bbdc9e7']}}, 'CL': {'VDMS': {'RECV': 
['379cfda0-a61e-11e0-9bd6-00238bbdc9e7', 
'ff60f990-a61e-11e0-9bd6-00238bbdc9e7', '1a7e80d0-a61f-11e0-9bd6-00238bbdc9e7', 
'7ab64f00-a61f-11e0-9bd6-00238bbdc9e7']}}})

StageContainer:-

defaultdict(<type 'set'>, {'13': ['17897931-a61e-11e0-a764-00238bce423b', 
'd0fcb681-a61d-11e0-e693-00238bbdc9eb', '178819a1-a61e-11e0-a764-00238bce423b', 
'17161df1-a61e-11e0-b558-00238bbdc9e7', '15eb5991-a61e-11e0-8aeb-00238bbdc9b7', 
'1872eed0-a61e-11e0-de66-00238bce424c', 
'b3e470b1-a61d-11e0-b558-00238bbdc9e7'], '14': 
['1a98dc11-a61e-11e0-a9e7-00238bce423b', 
'da485231-a61d-11e0-ff8f-00238bce423b', '1c245e11-a61e-11e0-8aeb-00238bbdc9b7', 
'1eab5711-a61e-11e0-da5a-00238bce423b', '8db533c1-a61d-11e0-9393-00238bbdc9b7', 
'e0e066a1-a61d-11e0-b558-00238bbdc9e7']})

I ran this piece of code through the python profiler and I see that the code is 
taking on average around 1300 CPU secs to execute, depending on the size of the 
dicts.  The problem is that both dictionaries will always hold hundreds of 
thousands of entries, so I need to re-write this code to be more efficient or I 
need to find a different way to achieve my goal.

Someone in a different forum suggested that I use 'binary search' to iterate 
through the dictionaries but like I mentioned before, I am still very new to 
programming and 'binary search' sounds a bit to complicated to me.  I was 
wondering if someone here can have a look at my code above and maybe re-write 
it using 'binary search' or any other way you may see fit and re-post the code 
here with some comments on what each section is doing?

Your help would be very much appreciated.

Regards,

Junior
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to