Nico Schlömer wrote: > Hi, > > I ran into a bit of an unexpected issue here with itertools, and I > need to say that I discovered itertools only recently, so maybe my way > of approaching the problem is "not what I want to do". > > Anyway, the problem is the following: > I have a list of dictionaries, something like > > [ { "a": 1, "b": 1, "c": 3 }, > { "a": 1, "b": 1, "c": 4 }, > ... > ] > > and I'd like to iterate through all items with, e.g., "a":1. What I do > is sort and then groupby, > > my_list.sort( key=operator.itemgetter('a') ) > my_list_grouped = itertools.groupby( my_list, operator.itemgetter('a') ) > > and then just very simply iterate over my_list_grouped, > > for my_item in my_list_grouped: > # do something with my_item[0], my_item[1] > > Now, inside this loop I'd like to again iterate over all items with > the same 'b'-value -- no problem, just do the above inside the loop: > > for my_item in my_list_grouped: > # group by keyword "b" > my_list2 = list( my_item[1] ) > my_list2.sort( key=operator.itemgetter('b') ) > my_list_grouped = itertools.groupby( my_list2, > operator.itemgetter('b') ) > for e in my_list_grouped: > # do something with e[0], e[1] > > That seems to work all right. > > Now, the problem occurs when this all is wrapped into an outer loop, such > as > > for k in [ 'first pass', 'second pass' ]: > for my_item in my_list_grouped: > # bla, the above > > To be able to iterate more than once through my_list_grouped, I have > to convert it into a list first, so outside all loops, I go like > > my_list.sort( key=operator.itemgetter('a') ) > my_list_grouped = itertools.groupby( my_list, operator.itemgetter('a') ) > my_list_grouped = list( my_list_grouped ) > > This, however, makes it impossible to do the inner sort and > groupby-operation; you just get the very first element, and that's it. > > An example file is attached. > > Hints, anyone?
If you want a reusable copy of a groupby(...) it is not enough to convert it to a list as a whole: >>> from itertools import groupby >>> from operator import itemgetter >>> items = [(1,1), (1,2), (1,3), (2,1), (2,2)] >>> grouped_items = list(groupby(items, key=itemgetter(0))) # WRONG >>> for run in 1, 2: ... print "run", run ... for k, g in grouped_items: ... print k, list(g) ... run 1 1 [] 2 [(2, 2)] run 2 1 [] 2 [] Instead, you have to process the groups, too: >>> grouped_items = [(k, list(g)) for k, g in groupby(items, key=itemgetter(0))] >>> for run in 1, 2: ... print "run", run ... for k, g in grouped_items: ... print k, list(g) ... run 1 1 [(1, 1), (1, 2), (1, 3)] 2 [(2, 1), (2, 2)] run 2 1 [(1, 1), (1, 2), (1, 3)] 2 [(2, 1), (2, 2)] But usually you don't bother and just run groupby() twice: >>> for run in 1, 2: ... print "run", run ... for k, g in groupby(items, key=itemgetter(0)): ... print k, list(g) ... run 1 1 [(1, 1), (1, 2), (1, 3)] 2 [(2, 1), (2, 2)] run 2 1 [(1, 1), (1, 2), (1, 3)] 2 [(2, 1), (2, 2)] The only caveat then is that list(items) == list(items) must hold. Peter -- http://mail.python.org/mailman/listinfo/python-list