On 07/19/2011 04:36 AM, J wrote:
Someone in a different forum suggested that I use 'binary
search' to iterate through the dictionaries
I'm not sure what they were smoking...a binary search is useful
for finding a thing in a sorted list. It looks like your data is
not sorted (strike #1) and it looks like you're gathering
statistics instead of trying to find a particular value (strike #2).
I am looking to improve the performance of the following piece of Python code:-
for cc in StatusContainer:
for srv in StatusContainer[cc]:
for id in StatusContainer[cc][srv]['RECV']:
I'd rewrite this to use tuple unpacking and the .iteritems()
method of dicts (which might not have any performance
improvements, but improves readability:
for cc, services in StatusContainer.iteritems():
for srv, data in services.iteritems():
for id in data['RECV']: # I'd also avoid shadowing "id"
...
if id in StageContainer['13']:
StatusContainer[cc][srv].setdefault('ACK',
[]).append(id)
elif id in StageContainer['14']:
StatusContainer[cc][srv].setdefault('NACK',
[]).append(id)
Rather than look these up each time, I'd hoist them out of the
loops that don't change the values and give them semantic
meaning. You also seem to be using defaultdict() everywhere
else, so I'd set it up so that StatusContainer[cc][srv] is a
defaultdict(list) as well to clean up that logic:
acks = StageContainer['13']
nacks = StageContainer['14']
my_504s = StageContainer['504']
my_505s = StageContainer['505']
for cc, services in ...
for ...
container = StatusContainer[cc][srv]
for ...
if id in acks:
container["ACK"].append(id)
elif id in nacks:
container["NACK"].append(id)
else:
if id in my_504s or id in my_505s:
container["RETRY"].append(id)
If your codes can only be 13, 14, 504 or 505, I'd even skip the
test at the end:
else:
container["RETRY"].append(id)
but that would take knowing your data, something I can't confirm
is safe without the full data-set.
That said, all those are minor improvements that will likely only
minor speed-ups in your code.
My two dictionaries look like this:- >
StatusContainer:-
defaultdict(<type 'set'>, {'FR': {'VDMS': {'RECV': ...
I'm not sure why this is showing "<type 'set'>" when the
second-level data is clearly a dict, not a set. Same here:
StageContainer:-
defaultdict(<type 'set'>, {'13': ['17897931-a61e-11e0-a764-00238bce423b',
'd0fcb681-a61d-11e0-e693-00238bbdc9eb'
Additionally, I notice the things you're searching through in
your StageContainer are *lists*. Yet for memberships testing,
you're using "in". This is likely where your slowdown is coming.
With the above changes to add semantic meaning to your
13/14/504/505s, just convert the results to a set:
acks = set(StageContainer['13'])
nacks = set(StageContainer['14'])
retries = set(StageContent['504']) + set(StageContent['505'])
...
else:
if id in retries: # instead of the "id in my504 or id in my505"
By changing this data-type from a list to a set, you change from
O(N) to O(1) for lookup time. For the non-CS geeks, that means
that doing an "in" on a list is related to the length of the data
in the list, while doing an "in" on a set is a constant time
regardless of the list-length. Assuming your data-dumps are
accurate and that these are lists, this will make a world of
difference in the speed of your run.
-tkc
--
http://mail.python.org/mailman/listinfo/python-list