Jay Tee schrieb:
> Hi,
> 
> I have some code that does, essentially, the following:
> 
> - gather information on tens of thousands of items (in this case, jobs
> running on a
>      compute cluster)
> - store the information as a list (one per job) of Job items
> (essentially wrapped
>      dictionaries mapping attribute names to values)
> 
> and then does some computations on the data.  One of the things the
> code needs to do, very often, is troll through the list and find jobs
> of a certain class:
> 
> for j in jobs:
>    if (j.get('user') == 'jeff' and j.get('state')=='running') :
>       do_something()
> 
> This operation is ultimately the limiting factor in the performance.
> What I would like to try, if it is possible, is instead do something
> like this:
> 
>    if j.subset_attr({'user' : 'jeff', 'state' : 'running'}) :
>       do_something()
> 
> 
> where subset_attr would see if the dict passed in was a subset of the
> underlying attribute dict of j:

This would still need to run over all items in jobs. No gain.

> 
>   j1's dict : { 'user' : 'jeff', 'start' : 43, 'queue' : 'qlong',
> 'state' : 'running' }
>   j2's dict : { 'user' : 'jeff', 'start' : 57, 'queue' : 'qlong',
> 'state' : 'queued' }
> 
> so in the second snippet, if j was j1 then subset_attr would return
> true, for j2 the answer would be false (because of the 'state' value
> not being the same).

If you're jobs dictionary is immutable regarding the key-set (not from 
it's implementation, but from its usage), the thing you can do to 
enhance performance is to create an index. Take a predicate like

def p(j):
    return j.get('user') == 'jeff'

and build a list

jeffs_jobs = [j for j in jobs if p(j)]

Then you can test only over these. Alternatively, if you have quite a 
few of such predicate/action-pairs, try and loop once over all jobs, 
applynig the predicates and actions accordingly.

Diez
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to