On Mar 7, 8:47 pm, Raymond Hettinger <pyt...@rcn.com> wrote: > The existing groupby() itertool works great when every element in a > group has the same key, but it is not so handy when groups are > determined by boundary conditions. > > For edge-triggered events, we need to convert a boundary-event > predicate to groupby-style key function. The code below encapsulates > that process in a new itertool called split_on(). > > Would love you guys to experiment with it for a bit and confirm that > you find it useful. Suggestions are welcome. > > Raymond > > ----------------------------------------- > > from itertools import groupby > > def split_on(iterable, event, start=True): > 'Split iterable on event boundaries (either start events or stop > events).' > # split_on('X1X23X456X', 'X'.__eq__, True) --> X1 X23 X456 X > # split_on('X1X23X456X', 'X'.__eq__, False) --> X 1X 23X 456X > def transition_counter(x, start=start, cnt=[0]): > before = cnt[0] > if event(x): > cnt[0] += 1 > after = cnt[0] > return after if start else before > return (g for k, g in groupby(iterable, transition_counter)) > > if __name__ == '__main__': > for start in True, False: > for g in split_on('X1X23X456X', 'X'.__eq__, start): > print list(g) > print > > from pprint import pprint > boundary = '--===============2615450625767277916==\n' > email = open('email.txt') > for mime_section in split_on(email, boundary.__eq__): > pprint(list(mime_section, 1, None)) > print '= = ' * 30
I've found this type of splitting quite useful when grouping sections of a text file. I used the groupby function directly in the file, when i would have rather used something like this. However, I wonder if it would be helpful to break that function into two instead of having the "start" flag. The flag feels odd to me (maybe it's the name?), and the documentation might have a better feel to it, coming from a newcomer's perspective. Also, it would be cool if the function took keywords; I wonder why most of the other functions in the itertools module don't take keywords. I wouldn't split out the keys separately from the groups. But the idea of a flag to exclude the keys sounds interesting to me. Thank you for giving me the opportunity to use the nonlocal keyword for the first time since trying out Python 3.0. I hope this is an appropriate usage: def split_on(iterable, key=bool, start=True): 'Split iterable on boundaries (either start events or stop events).' # split_on('X1X23X456X', 'X'.__eq__, True) --> X1 X23 X456 X # split_on('X1X23X456X', 'X'.__eq__, False) --> X 1X 23X 456X flag = 0 def event_marker(x, start_flag=start): nonlocal flag, key before = flag if key(x): flag += 1 after = flag return after if start_flag else before return (g for k, g in it.groupby(iterable, key=event_marker)) -- http://mail.python.org/mailman/listinfo/python-list