Re: itertools: problem with nested groupby, list()
Nico Schlömer wrote: I ran into a bit of an unexpected issue here with itertools, and I need to say that I discovered itertools only recently, so maybe my way of approaching the problem is not what I want to do. Anyway, the problem is the following: I have a list of dictionaries, something like [ { a: 1, b: 1, c: 3 }, { a: 1, b: 1, c: 4 }, ... ] and I'd like to iterate through all items with, e.g., a:1. What I do is sort and then groupby, my_list.sort( key=operator.itemgetter('a') ) my_list_grouped = itertools.groupby( my_list, operator.itemgetter('a') ) and then just very simply iterate over my_list_grouped, for my_item in my_list_grouped: # do something with my_item[0], my_item[1] I'd try to avoid copying the list and instead just iterate over it: def iterate_by_key(l, key): for d in l: try: yield l[key] except: continue Note that you could also ask the dictionary first if it has the key, but I'm told this way is even faster since it only requires a single lookup attempt. Now, inside this loop I'd like to again iterate over all items with the same 'b'-value -- no problem, just do the above inside the loop: for my_item in my_list_grouped: # group by keyword b my_list2 = list( my_item[1] ) my_list2.sort( key=operator.itemgetter('b') ) my_list_grouped = itertools.groupby( my_list2, operator.itemgetter('b') ) for e in my_list_grouped: # do something with e[0], e[1] That seems to work all right. Since your operation not only iterates over a list but first sorts it, it requires a modification which must not happen while iterating. You work around this by copying the list first. Now, the problem occurs when this all is wrapped into an outer loop, such as for k in [ 'first pass', 'second pass' ]: for my_item in my_list_grouped: # bla, the above To be able to iterate more than once through my_list_grouped, I have to convert it into a list first, so outside all loops, I go like my_list.sort( key=operator.itemgetter('a') ) my_list_grouped = itertools.groupby( my_list, operator.itemgetter('a') ) my_list_grouped = list( my_list_grouped ) This, however, makes it impossible to do the inner sort and groupby-operation; you just get the very first element, and that's it. I believe that you are doing a modifying operation inside the the iteration, which is a no-no. Create a custom iterator function (IIRC they are called generators) and you should be fine. Note that this should also perform better since copying and sorting are not exactly for free, though you may not notice that with small numbers of objects. Uli -- Sator Laser GmbH Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932 -- http://mail.python.org/mailman/listinfo/python-list
Re: itertools: problem with nested groupby, list()
On 4 May, 11:10, Nico Schlömer nico.schloe...@gmail.com wrote: Hi, I ran into a bit of an unexpected issue here with itertools, and I need to say that I discovered itertools only recently, so maybe my way of approaching the problem is not what I want to do. Anyway, the problem is the following: I have a list of dictionaries, something like [ { a: 1, b: 1, c: 3 }, { a: 1, b: 1, c: 4 }, ... ] and I'd like to iterate through all items with, e.g., a:1. What I do is sort and then groupby, my_list.sort( key=operator.itemgetter('a') ) my_list_grouped = itertools.groupby( my_list, operator.itemgetter('a') ) and then just very simply iterate over my_list_grouped, for my_item in my_list_grouped: # do something with my_item[0], my_item[1] Now, inside this loop I'd like to again iterate over all items with the same 'b'-value -- no problem, just do the above inside the loop: for my_item in my_list_grouped: # group by keyword b my_list2 = list( my_item[1] ) my_list2.sort( key=operator.itemgetter('b') ) my_list_grouped = itertools.groupby( my_list2, operator.itemgetter('b') ) for e in my_list_grouped: # do something with e[0], e[1] That seems to work all right. Now, the problem occurs when this all is wrapped into an outer loop, such as for k in [ 'first pass', 'second pass' ]: for my_item in my_list_grouped: # bla, the above To be able to iterate more than once through my_list_grouped, I have to convert it into a list first, so outside all loops, I go like my_list.sort( key=operator.itemgetter('a') ) my_list_grouped = itertools.groupby( my_list, operator.itemgetter('a') ) my_list_grouped = list( my_list_grouped ) This, however, makes it impossible to do the inner sort and groupby-operation; you just get the very first element, and that's it. An example file is attached. Hints, anyone? Cheers, Nico Does this example help at all? my_list.sort( key=itemgetter('a','b','c') ) for a, a_iter in groupby(my_list, itemgetter('a')): print 'New A', a for b, b_iter in groupby(a_iter, itemgetter('b')): print '\t', 'New B', b for c, c_iter in groupby(b_iter, itemgetter('c')): print '\t'*2, 'New C', c for c_data in c_iter: print '\t'*3, a, b, c, c_data print '\t'*2, 'End C', c print '\t', 'End B', b print 'End A', a Jon. -- http://mail.python.org/mailman/listinfo/python-list
Re: itertools: problem with nested groupby, list()
Does this example help at all? Thanks, that clarified things a lot! To make it easier, let's just look at 'a' and 'b': my_list.sort( key=itemgetter('a','b','c') ) for a, a_iter in groupby(my_list, itemgetter('a')): print 'New A', a for b, b_iter in groupby(a_iter, itemgetter('b')): print '\t', 'New B', b for b_data in b_iter: print '\t'*3, a, b, b_data print '\t', 'End B', b print 'End A', a That works well, and I can wrap the outer loop in another loop without problems. What's *not* working, though, is having more than one pass on the inner loop, as in === *snip* === my_list.sort( key=itemgetter('a','b','c') ) for a, a_iter in groupby(my_list, itemgetter('a')): print 'New A', a for pass in ['first pass', 'second pass']: for b, b_iter in groupby(a_iter, itemgetter('b')): print '\t', 'New B', b for b_data in b_iter: print '\t'*3, a, b, b_data print '\t', 'End B', b print 'End A', a === *snap* === I tried working around this by === *snip* === my_list.sort( key=itemgetter('a','b','c') ) for a, a_iter in groupby(my_list, itemgetter('a')): print 'New A', a inner_list = list( groupby(a_iter, itemgetter('b')) ) for pass in ['first pass', 'second pass']: for b, b_iter in inner_list: print '\t', 'New B', b for b_data in b_iter: print '\t'*3, a, b, b_data print '\t', 'End B', b print 'End A', a === *snap* === which don't work either, and I don't understand why. -- I'll look at Uli's comments. Cheers, Nico On Tue, May 4, 2010 at 1:08 PM, Jon Clements jon...@googlemail.com wrote: On 4 May, 11:10, Nico Schlömer nico.schloe...@gmail.com wrote: Hi, I ran into a bit of an unexpected issue here with itertools, and I need to say that I discovered itertools only recently, so maybe my way of approaching the problem is not what I want to do. Anyway, the problem is the following: I have a list of dictionaries, something like [ { a: 1, b: 1, c: 3 }, { a: 1, b: 1, c: 4 }, ... ] and I'd like to iterate through all items with, e.g., a:1. What I do is sort and then groupby, my_list.sort( key=operator.itemgetter('a') ) my_list_grouped = itertools.groupby( my_list, operator.itemgetter('a') ) and then just very simply iterate over my_list_grouped, for my_item in my_list_grouped: # do something with my_item[0], my_item[1] Now, inside this loop I'd like to again iterate over all items with the same 'b'-value -- no problem, just do the above inside the loop: for my_item in my_list_grouped: # group by keyword b my_list2 = list( my_item[1] ) my_list2.sort( key=operator.itemgetter('b') ) my_list_grouped = itertools.groupby( my_list2, operator.itemgetter('b') ) for e in my_list_grouped: # do something with e[0], e[1] That seems to work all right. Now, the problem occurs when this all is wrapped into an outer loop, such as for k in [ 'first pass', 'second pass' ]: for my_item in my_list_grouped: # bla, the above To be able to iterate more than once through my_list_grouped, I have to convert it into a list first, so outside all loops, I go like my_list.sort( key=operator.itemgetter('a') ) my_list_grouped = itertools.groupby( my_list, operator.itemgetter('a') ) my_list_grouped = list( my_list_grouped ) This, however, makes it impossible to do the inner sort and groupby-operation; you just get the very first element, and that's it. An example file is attached. Hints, anyone? Cheers, Nico Does this example help at all? my_list.sort( key=itemgetter('a','b','c') ) for a, a_iter in groupby(my_list, itemgetter('a')): print 'New A', a for b, b_iter in groupby(a_iter, itemgetter('b')): print '\t', 'New B', b for c, c_iter in groupby(b_iter, itemgetter('c')): print '\t'*2, 'New C', c for c_data in c_iter: print '\t'*3, a, b, c, c_data print '\t'*2, 'End C', c print '\t', 'End B', b print 'End A', a Jon. -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: itertools: problem with nested groupby, list()
I'd try to avoid copying the list and instead just iterate over it: def iterate_by_key(l, key): for d in l: try: yield l[key] except: continue Hm, that won't work for me b/c I don't know all the keys beforehand. I could certainly do a unique(list.keys()) or something like that beforehand, but I guess this does away with the speed advantage. Since your operation not only iterates over a list but first sorts it, it requires a modification which must not happen while iterating. You work around this by copying the list first. So when I go like for item in list: item[1].sort() I actually modify *list*? I didn't realize that; I thought it'd just be a copy of it. Anyway, I could just try for item in list: newitem = sorted( item[1] ) in that case. which is a no-no. Create a custom iterator function (IIRC they are called generators) and you should be fine. I'll look into this, thanks for the hint. Cheers, Nico On Tue, May 4, 2010 at 12:46 PM, Ulrich Eckhardt eckha...@satorlaser.com wrote: Nico Schlömer wrote: I ran into a bit of an unexpected issue here with itertools, and I need to say that I discovered itertools only recently, so maybe my way of approaching the problem is not what I want to do. Anyway, the problem is the following: I have a list of dictionaries, something like [ { a: 1, b: 1, c: 3 }, { a: 1, b: 1, c: 4 }, ... ] and I'd like to iterate through all items with, e.g., a:1. What I do is sort and then groupby, my_list.sort( key=operator.itemgetter('a') ) my_list_grouped = itertools.groupby( my_list, operator.itemgetter('a') ) and then just very simply iterate over my_list_grouped, for my_item in my_list_grouped: # do something with my_item[0], my_item[1] I'd try to avoid copying the list and instead just iterate over it: def iterate_by_key(l, key): for d in l: try: yield l[key] except: continue Note that you could also ask the dictionary first if it has the key, but I'm told this way is even faster since it only requires a single lookup attempt. Now, inside this loop I'd like to again iterate over all items with the same 'b'-value -- no problem, just do the above inside the loop: for my_item in my_list_grouped: # group by keyword b my_list2 = list( my_item[1] ) my_list2.sort( key=operator.itemgetter('b') ) my_list_grouped = itertools.groupby( my_list2, operator.itemgetter('b') ) for e in my_list_grouped: # do something with e[0], e[1] That seems to work all right. Since your operation not only iterates over a list but first sorts it, it requires a modification which must not happen while iterating. You work around this by copying the list first. Now, the problem occurs when this all is wrapped into an outer loop, such as for k in [ 'first pass', 'second pass' ]: for my_item in my_list_grouped: # bla, the above To be able to iterate more than once through my_list_grouped, I have to convert it into a list first, so outside all loops, I go like my_list.sort( key=operator.itemgetter('a') ) my_list_grouped = itertools.groupby( my_list, operator.itemgetter('a') ) my_list_grouped = list( my_list_grouped ) This, however, makes it impossible to do the inner sort and groupby-operation; you just get the very first element, and that's it. I believe that you are doing a modifying operation inside the the iteration, which is a no-no. Create a custom iterator function (IIRC they are called generators) and you should be fine. Note that this should also perform better since copying and sorting are not exactly for free, though you may not notice that with small numbers of objects. Uli -- Sator Laser GmbH Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932 -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: itertools: problem with nested groupby, list()
On 4 May, 12:36, Nico Schlömer nico.schloe...@gmail.com wrote: Does this example help at all? Thanks, that clarified things a lot! To make it easier, let's just look at 'a' and 'b': my_list.sort( key=itemgetter('a','b','c') ) for a, a_iter in groupby(my_list, itemgetter('a')): print 'New A', a for b, b_iter in groupby(a_iter, itemgetter('b')): print '\t', 'New B', b for b_data in b_iter: print '\t'*3, a, b, b_data print '\t', 'End B', b print 'End A', a That works well, and I can wrap the outer loop in another loop without problems. What's *not* working, though, is having more than one pass on the inner loop, as in === *snip* === my_list.sort( key=itemgetter('a','b','c') ) for a, a_iter in groupby(my_list, itemgetter('a')): print 'New A', a for pass in ['first pass', 'second pass']: for b, b_iter in groupby(a_iter, itemgetter('b')): print '\t', 'New B', b for b_data in b_iter: print '\t'*3, a, b, b_data print '\t', 'End B', b print 'End A', a === *snap* === I tried working around this by === *snip* === my_list.sort( key=itemgetter('a','b','c') ) for a, a_iter in groupby(my_list, itemgetter('a')): print 'New A', a inner_list = list( groupby(a_iter, itemgetter('b')) ) for pass in ['first pass', 'second pass']: for b, b_iter in inner_list: print '\t', 'New B', b for b_data in b_iter: print '\t'*3, a, b, b_data print '\t', 'End B', b print 'End A', a === *snap* === which don't work either, and I don't understand why. -- I'll look at Uli's comments. Cheers, Nico On Tue, May 4, 2010 at 1:08 PM, Jon Clements jon...@googlemail.com wrote: On 4 May, 11:10, Nico Schlömer nico.schloe...@gmail.com wrote: Hi, I ran into a bit of an unexpected issue here with itertools, and I need to say that I discovered itertools only recently, so maybe my way of approaching the problem is not what I want to do. Anyway, the problem is the following: I have a list of dictionaries, something like [ { a: 1, b: 1, c: 3 }, { a: 1, b: 1, c: 4 }, ... ] and I'd like to iterate through all items with, e.g., a:1. What I do is sort and then groupby, my_list.sort( key=operator.itemgetter('a') ) my_list_grouped = itertools.groupby( my_list, operator.itemgetter('a') ) and then just very simply iterate over my_list_grouped, for my_item in my_list_grouped: # do something with my_item[0], my_item[1] Now, inside this loop I'd like to again iterate over all items with the same 'b'-value -- no problem, just do the above inside the loop: for my_item in my_list_grouped: # group by keyword b my_list2 = list( my_item[1] ) my_list2.sort( key=operator.itemgetter('b') ) my_list_grouped = itertools.groupby( my_list2, operator.itemgetter('b') ) for e in my_list_grouped: # do something with e[0], e[1] That seems to work all right. Now, the problem occurs when this all is wrapped into an outer loop, such as for k in [ 'first pass', 'second pass' ]: for my_item in my_list_grouped: # bla, the above To be able to iterate more than once through my_list_grouped, I have to convert it into a list first, so outside all loops, I go like my_list.sort( key=operator.itemgetter('a') ) my_list_grouped = itertools.groupby( my_list, operator.itemgetter('a') ) my_list_grouped = list( my_list_grouped ) This, however, makes it impossible to do the inner sort and groupby-operation; you just get the very first element, and that's it. An example file is attached. Hints, anyone? Cheers, Nico Does this example help at all? my_list.sort( key=itemgetter('a','b','c') ) for a, a_iter in groupby(my_list, itemgetter('a')): print 'New A', a for b, b_iter in groupby(a_iter, itemgetter('b')): print '\t', 'New B', b for c, c_iter in groupby(b_iter, itemgetter('c')): print '\t'*2, 'New C', c for c_data in c_iter: print '\t'*3, a, b, c, c_data print '\t'*2, 'End C', c print '\t', 'End B', b print 'End A', a Jon. -- http://mail.python.org/mailman/listinfo/python-list Are you basically after this, then? for a, a_iter in groupby(my_list, itemgetter('a')): print 'New A', a for b, b_iter in groupby(a_iter, itemgetter('b')): b_list = list(b_iter) for p in ['first', 'second']: for b_data in b_list: #whatever... Cos that looks like it could be simplified to (untested) for (a, b), data_iter in
Re: itertools: problem with nested groupby, list()
Nico Schlömer wrote: So when I go like for item in list: item[1].sort() I actually modify *list*? I didn't realize that; I thought it'd just be a copy of it. No, I misunderstood your code there. Modifying the objects inside the list is fine, but I don't thing you do that, provided the items in the list don't contain references to the list itself. Good luck! Uli -- Sator Laser GmbH Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932 -- http://mail.python.org/mailman/listinfo/python-list
Re: itertools: problem with nested groupby, list()
Are you basically after this, then? for a, a_iter in groupby(my_list, itemgetter('a')): print 'New A', a for b, b_iter in groupby(a_iter, itemgetter('b')): b_list = list(b_iter) for p in ['first', 'second']: for b_data in b_list: #whatever... Yes. Moving the 'first', 'second' operation to the innermost loop works all right, and I guess that's what I'll do. Cos that looks like it could be simplified to (untested) for (a, b), data_iter in groupby(my_list, itemgetter('a','b')): data = list(data) # take copy for pass_ in ['first', 'second']: # do something with data Potentially yes, but for now I actually need to do something at print 'New A', a, so I can't just skip this. Anyway, the above suggestion works well for now. Thanks! --Nico On Tue, May 4, 2010 at 1:52 PM, Jon Clements jon...@googlemail.com wrote: On 4 May, 12:36, Nico Schlömer nico.schloe...@gmail.com wrote: Does this example help at all? Thanks, that clarified things a lot! To make it easier, let's just look at 'a' and 'b': my_list.sort( key=itemgetter('a','b','c') ) for a, a_iter in groupby(my_list, itemgetter('a')): print 'New A', a for b, b_iter in groupby(a_iter, itemgetter('b')): print '\t', 'New B', b for b_data in b_iter: print '\t'*3, a, b, b_data print '\t', 'End B', b print 'End A', a That works well, and I can wrap the outer loop in another loop without problems. What's *not* working, though, is having more than one pass on the inner loop, as in === *snip* === my_list.sort( key=itemgetter('a','b','c') ) for a, a_iter in groupby(my_list, itemgetter('a')): print 'New A', a for pass in ['first pass', 'second pass']: for b, b_iter in groupby(a_iter, itemgetter('b')): print '\t', 'New B', b for b_data in b_iter: print '\t'*3, a, b, b_data print '\t', 'End B', b print 'End A', a === *snap* === I tried working around this by === *snip* === my_list.sort( key=itemgetter('a','b','c') ) for a, a_iter in groupby(my_list, itemgetter('a')): print 'New A', a inner_list = list( groupby(a_iter, itemgetter('b')) ) for pass in ['first pass', 'second pass']: for b, b_iter in inner_list: print '\t', 'New B', b for b_data in b_iter: print '\t'*3, a, b, b_data print '\t', 'End B', b print 'End A', a === *snap* === which don't work either, and I don't understand why. -- I'll look at Uli's comments. Cheers, Nico On Tue, May 4, 2010 at 1:08 PM, Jon Clements jon...@googlemail.com wrote: On 4 May, 11:10, Nico Schlömer nico.schloe...@gmail.com wrote: Hi, I ran into a bit of an unexpected issue here with itertools, and I need to say that I discovered itertools only recently, so maybe my way of approaching the problem is not what I want to do. Anyway, the problem is the following: I have a list of dictionaries, something like [ { a: 1, b: 1, c: 3 }, { a: 1, b: 1, c: 4 }, ... ] and I'd like to iterate through all items with, e.g., a:1. What I do is sort and then groupby, my_list.sort( key=operator.itemgetter('a') ) my_list_grouped = itertools.groupby( my_list, operator.itemgetter('a') ) and then just very simply iterate over my_list_grouped, for my_item in my_list_grouped: # do something with my_item[0], my_item[1] Now, inside this loop I'd like to again iterate over all items with the same 'b'-value -- no problem, just do the above inside the loop: for my_item in my_list_grouped: # group by keyword b my_list2 = list( my_item[1] ) my_list2.sort( key=operator.itemgetter('b') ) my_list_grouped = itertools.groupby( my_list2, operator.itemgetter('b') ) for e in my_list_grouped: # do something with e[0], e[1] That seems to work all right. Now, the problem occurs when this all is wrapped into an outer loop, such as for k in [ 'first pass', 'second pass' ]: for my_item in my_list_grouped: # bla, the above To be able to iterate more than once through my_list_grouped, I have to convert it into a list first, so outside all loops, I go like my_list.sort( key=operator.itemgetter('a') ) my_list_grouped = itertools.groupby( my_list, operator.itemgetter('a') ) my_list_grouped = list( my_list_grouped ) This, however, makes it impossible to do the inner sort and groupby-operation; you just get the very first element, and that's it. An example file is attached. Hints, anyone? Cheers, Nico Does this example help at all? my_list.sort( key=itemgetter('a','b','c') )
Re: itertools: problem with nested groupby, list()
Nico Schlömer wrote: Hi, I ran into a bit of an unexpected issue here with itertools, and I need to say that I discovered itertools only recently, so maybe my way of approaching the problem is not what I want to do. Anyway, the problem is the following: I have a list of dictionaries, something like [ { a: 1, b: 1, c: 3 }, { a: 1, b: 1, c: 4 }, ... ] and I'd like to iterate through all items with, e.g., a:1. What I do is sort and then groupby, my_list.sort( key=operator.itemgetter('a') ) my_list_grouped = itertools.groupby( my_list, operator.itemgetter('a') ) and then just very simply iterate over my_list_grouped, for my_item in my_list_grouped: # do something with my_item[0], my_item[1] Now, inside this loop I'd like to again iterate over all items with the same 'b'-value -- no problem, just do the above inside the loop: for my_item in my_list_grouped: # group by keyword b my_list2 = list( my_item[1] ) my_list2.sort( key=operator.itemgetter('b') ) my_list_grouped = itertools.groupby( my_list2, operator.itemgetter('b') ) for e in my_list_grouped: # do something with e[0], e[1] That seems to work all right. Now, the problem occurs when this all is wrapped into an outer loop, such as for k in [ 'first pass', 'second pass' ]: for my_item in my_list_grouped: # bla, the above To be able to iterate more than once through my_list_grouped, I have to convert it into a list first, so outside all loops, I go like my_list.sort( key=operator.itemgetter('a') ) my_list_grouped = itertools.groupby( my_list, operator.itemgetter('a') ) my_list_grouped = list( my_list_grouped ) This, however, makes it impossible to do the inner sort and groupby-operation; you just get the very first element, and that's it. An example file is attached. Hints, anyone? If you want a reusable copy of a groupby(...) it is not enough to convert it to a list as a whole: from itertools import groupby from operator import itemgetter items = [(1,1), (1,2), (1,3), (2,1), (2,2)] grouped_items = list(groupby(items, key=itemgetter(0))) # WRONG for run in 1, 2: ... print run, run ... for k, g in grouped_items: ... print k, list(g) ... run 1 1 [] 2 [(2, 2)] run 2 1 [] 2 [] Instead, you have to process the groups, too: grouped_items = [(k, list(g)) for k, g in groupby(items, key=itemgetter(0))] for run in 1, 2: ... print run, run ... for k, g in grouped_items: ... print k, list(g) ... run 1 1 [(1, 1), (1, 2), (1, 3)] 2 [(2, 1), (2, 2)] run 2 1 [(1, 1), (1, 2), (1, 3)] 2 [(2, 1), (2, 2)] But usually you don't bother and just run groupby() twice: for run in 1, 2: ... print run, run ... for k, g in groupby(items, key=itemgetter(0)): ... print k, list(g) ... run 1 1 [(1, 1), (1, 2), (1, 3)] 2 [(2, 1), (2, 2)] run 2 1 [(1, 1), (1, 2), (1, 3)] 2 [(2, 1), (2, 2)] The only caveat then is that list(items) == list(items) must hold. Peter -- http://mail.python.org/mailman/listinfo/python-list