On Mar 7, 8:47 pm, Raymond Hettinger <pyt...@rcn.com> wrote: > The existing groupby() itertool works great when every element in a > group has the same key, but it is not so handy when groups are > determined by boundary conditions. > > For edge-triggered events, we need to convert a boundary-event > predicate to groupby-style key function. The code below encapsulates > that process in a new itertool called split_on(). > > Would love you guys to experiment with it for a bit and confirm that > you find it useful. Suggestions are welcome. > > Raymond > > ----------------------------------------- > > from itertools import groupby > > def split_on(iterable, event, start=True): > 'Split iterable on event boundaries (either start events or stop > events).' > # split_on('X1X23X456X', 'X'.__eq__, True) --> X1 X23 X456 X > # split_on('X1X23X456X', 'X'.__eq__, False) --> X 1X 23X 456X > def transition_counter(x, start=start, cnt=[0]): > before = cnt[0] > if event(x): > cnt[0] += 1 > after = cnt[0] > return after if start else before > return (g for k, g in groupby(iterable, transition_counter)) > > if __name__ == '__main__': > for start in True, False: > for g in split_on('X1X23X456X', 'X'.__eq__, start): > print list(g) > print > > from pprint import pprint > boundary = '--===============2615450625767277916==\n' > email = open('email.txt') > for mime_section in split_on(email, boundary.__eq__): > pprint(list(mime_section, 1, None)) > print '= = ' * 30
Sorry to hijack the thread but I now that you have a knack for finding good iterator patterns. I have noticed a pattern lately: Aggregation using a defaultdict. I quickly found two examples of problems that could use this: http://groups.google.com/group/comp.lang.python/browse_frm/thread/c8b3976ec3ceadfd http://www.willmcgugan.com/blog/tech/2009/1/17/python-coder-test/ To show an example, using data like this: >>> data=[('red',2,'other data'),('blue',5,'more data'),('yellow',3,'lots of >>> things'),('blue',1,'data'),('red',2,'random data')] Then >>> from itertools import groupby >>> from operator import itemgetter >>> from collections import defaultdict We can use groupby to do this: >>> [(el[0],sum(x[1] for x in el[1])) for el in >>> groupby(sorted(data,key=itemgetter(0)),itemgetter(0))] [('blue', 6), ('red', 4), ('yellow', 3)] >>> [(el[0],[x[1] for x in el[1]]) for el in >>> groupby(sorted(data,key=itemgetter(0)),itemgetter(0))] [('blue', [5, 1]), ('red', [2, 2]), ('yellow', [3])] >>> [(el[0],set([x[1] for x in el[1]])) for el in >>> groupby(sorted(data,key=itemgetter(0)),itemgetter(0))] [('blue', set([1, 5])), ('red', set([2])), ('yellow', set([3]))] But this way seems to be more efficient: >>> def aggrsum(data,key,agrcol): dd=defaultdict(int) for el in data: dd[key(el)]+=agrcol(el) return dd.items() >>> aggrsum(data,itemgetter(0),itemgetter(1)) [('blue', 6), ('yellow', 3), ('red', 4)] >>> def aggrlist(data,key,agrcol): dd=defaultdict(list) for el in data: dd[key(el)].append(agrcol(el)) return dd.items() >>> aggrlist(data,itemgetter(0),itemgetter(1)) [('blue', [5, 1]), ('yellow', [3]), ('red', [2, 2])] >>> def aggrset(data,key,agrcol): dd=defaultdict(set) for el in data: dd[key(el)].add(agrcol(el)) return dd.items() >>> aggrset(data,itemgetter(0),itemgetter(1)) [('blue', set([1, 5])), ('yellow', set([3])), ('red', set([2]))] The data often contains objects with attributes instead of tuples, and I expect the new namedtuple datatype to be used also as elements of the list to be processed. But I haven't found a nice generalized way for that kind of pattern that aggregates from a list of one datatype to a list of key plus output datatype that would make it practical and suitable for inclusion in the standard library. -- http://mail.python.org/mailman/listinfo/python-list