On 07/19/2011 04:36 AM, J wrote:
Someone in a different forum suggested that I use 'binary
search' to iterate through the dictionaries

I'm not sure what they were smoking...a binary search is useful for finding a thing in a sorted list. It looks like your data is not sorted (strike #1) and it looks like you're gathering statistics instead of trying to find a particular value (strike #2).

I am looking to improve the performance of the following piece of Python code:-

for cc in StatusContainer:
        for srv in StatusContainer[cc]:
                for id in StatusContainer[cc][srv]['RECV']:

I'd rewrite this to use tuple unpacking and the .iteritems() method of dicts (which might not have any performance improvements, but improves readability:

  for cc, services in StatusContainer.iteritems():
    for srv, data in services.iteritems():
      for id in data['RECV']:  # I'd also avoid shadowing "id"
        ...

                        if id in StageContainer['13']:
                                StatusContainer[cc][srv].setdefault('ACK', 
[]).append(id)
                        elif id in StageContainer['14']:
                                StatusContainer[cc][srv].setdefault('NACK', 
[]).append(id)

Rather than look these up each time, I'd hoist them out of the loops that don't change the values and give them semantic meaning. You also seem to be using defaultdict() everywhere else, so I'd set it up so that StatusContainer[cc][srv] is a defaultdict(list) as well to clean up that logic:

  acks = StageContainer['13']
  nacks = StageContainer['14']
  my_504s = StageContainer['504']
  my_505s = StageContainer['505']

  for cc, services in ...
    for ...
      container = StatusContainer[cc][srv]
      for ...
        if id in acks:
          container["ACK"].append(id)
        elif id in nacks:
          container["NACK"].append(id)
        else:
          if id in my_504s or id in my_505s:
            container["RETRY"].append(id)

If your codes can only be 13, 14, 504 or 505, I'd even skip the test at the end:

  else:
    container["RETRY"].append(id)

but that would take knowing your data, something I can't confirm is safe without the full data-set.

That said, all those are minor improvements that will likely only minor speed-ups in your code.


My two dictionaries look like this:- >
StatusContainer:-

defaultdict(<type 'set'>, {'FR': {'VDMS': {'RECV': ...

I'm not sure why this is showing "<type 'set'>" when the second-level data is clearly a dict, not a set. Same here:

StageContainer:-

defaultdict(<type 'set'>, {'13': ['17897931-a61e-11e0-a764-00238bce423b', 
'd0fcb681-a61d-11e0-e693-00238bbdc9eb'

Additionally, I notice the things you're searching through in your StageContainer are *lists*. Yet for memberships testing, you're using "in". This is likely where your slowdown is coming. With the above changes to add semantic meaning to your 13/14/504/505s, just convert the results to a set:

  acks = set(StageContainer['13'])
  nacks = set(StageContainer['14'])
  retries = set(StageContent['504']) + set(StageContent['505'])

...

  else:
    if id in retries:  # instead of the "id in my504 or id in my505"

By changing this data-type from a list to a set, you change from O(N) to O(1) for lookup time. For the non-CS geeks, that means that doing an "in" on a list is related to the length of the data in the list, while doing an "in" on a set is a constant time regardless of the list-length. Assuming your data-dumps are accurate and that these are lists, this will make a world of difference in the speed of your run.

-tkc



--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to