Jay Tee schrieb: > Hi, > > I have some code that does, essentially, the following: > > - gather information on tens of thousands of items (in this case, jobs > running on a > compute cluster) > - store the information as a list (one per job) of Job items > (essentially wrapped > dictionaries mapping attribute names to values) > > and then does some computations on the data. One of the things the > code needs to do, very often, is troll through the list and find jobs > of a certain class: > > for j in jobs: > if (j.get('user') == 'jeff' and j.get('state')=='running') : > do_something() > > This operation is ultimately the limiting factor in the performance. > What I would like to try, if it is possible, is instead do something > like this: > > if j.subset_attr({'user' : 'jeff', 'state' : 'running'}) : > do_something() > > > where subset_attr would see if the dict passed in was a subset of the > underlying attribute dict of j:
This would still need to run over all items in jobs. No gain. > > j1's dict : { 'user' : 'jeff', 'start' : 43, 'queue' : 'qlong', > 'state' : 'running' } > j2's dict : { 'user' : 'jeff', 'start' : 57, 'queue' : 'qlong', > 'state' : 'queued' } > > so in the second snippet, if j was j1 then subset_attr would return > true, for j2 the answer would be false (because of the 'state' value > not being the same). If you're jobs dictionary is immutable regarding the key-set (not from it's implementation, but from its usage), the thing you can do to enhance performance is to create an index. Take a predicate like def p(j): return j.get('user') == 'jeff' and build a list jeffs_jobs = [j for j in jobs if p(j)] Then you can test only over these. Alternatively, if you have quite a few of such predicate/action-pairs, try and loop once over all jobs, applynig the predicates and actions accordingly. Diez -- http://mail.python.org/mailman/listinfo/python-list