Re: How to iterate through two dicts efficiently

Tim Chase Tue, 19 Jul 2011 04:36:05 -0700

On 07/19/2011 04:36 AM, J wrote:

Someone in a different forum suggested that I use 'binary
search' to iterate through the dictionaries

I'm not sure what they were smoking...a binary search is usefulfor finding a thing in a sorted list. It looks like your data isnot sorted (strike #1) and it looks like you're gatheringstatistics instead of trying to find a particular value (strike #2).

I am looking to improve the performance of the following piece of Python code:-

for cc in StatusContainer:
        for srv in StatusContainer[cc]:
                for id in StatusContainer[cc][srv]['RECV']:

I'd rewrite this to use tuple unpacking and the .iteritems()method of dicts (which might not have any performanceimprovements, but improves readability:


  for cc, services in StatusContainer.iteritems():
    for srv, data in services.iteritems():
      for id in data['RECV']:  # I'd also avoid shadowing "id"
        ...

                        if id in StageContainer['13']:
                                StatusContainer[cc][srv].setdefault('ACK', 
[]).append(id)
                        elif id in StageContainer['14']:
                                StatusContainer[cc][srv].setdefault('NACK', 
[]).append(id)

Rather than look these up each time, I'd hoist them out of theloops that don't change the values and give them semanticmeaning. You also seem to be using defaultdict() everywhereelse, so I'd set it up so that StatusContainer[cc][srv] is adefaultdict(list) as well to clean up that logic:


  acks = StageContainer['13']
  nacks = StageContainer['14']
  my_504s = StageContainer['504']
  my_505s = StageContainer['505']

  for cc, services in ...
    for ...
      container = StatusContainer[cc][srv]
      for ...
        if id in acks:
          container["ACK"].append(id)
        elif id in nacks:
          container["NACK"].append(id)
        else:
          if id in my_504s or id in my_505s:
            container["RETRY"].append(id)

If your codes can only be 13, 14, 504 or 505, I'd even skip thetest at the end:


  else:
    container["RETRY"].append(id)

but that would take knowing your data, something I can't confirmis safe without the full data-set.

That said, all those are minor improvements that will likely onlyminor speed-ups in your code.

My two dictionaries look like this:- >
StatusContainer:-

defaultdict(<type 'set'>, {'FR': {'VDMS': {'RECV': ...

I'm not sure why this is showing "<type 'set'>" when thesecond-level data is clearly a dict, not a set. Same here:

StageContainer:-

defaultdict(<type 'set'>, {'13': ['17897931-a61e-11e0-a764-00238bce423b', 
'd0fcb681-a61d-11e0-e693-00238bbdc9eb'

Additionally, I notice the things you're searching through inyour StageContainer are *lists*. Yet for memberships testing,you're using "in". This is likely where your slowdown is coming.With the above changes to add semantic meaning to your13/14/504/505s, just convert the results to a set:


  acks = set(StageContainer['13'])
  nacks = set(StageContainer['14'])
  retries = set(StageContent['504']) + set(StageContent['505'])

...

  else:
    if id in retries:  # instead of the "id in my504 or id in my505"

By changing this data-type from a list to a set, you change fromO(N) to O(1) for lookup time. For the non-CS geeks, that meansthat doing an "in" on a list is related to the length of the datain the list, while doing an "in" on a set is a constant timeregardless of the list-length. Assuming your data-dumps areaccurate and that these are lists, this will make a world ofdifference in the speed of your run.


-tkc



--
http://mail.python.org/mailman/listinfo/python-list

Re: How to iterate through two dicts efficiently

Reply via email to