Jason Friedman <jsf80238 <at> gmail.com> writes: > > Thank you for the responses! Not sure yet which one I will pick. >
Hi again, I was working a bit on my own solution and on the one from Steven/Joshua, and maybe that helps you deciding: def separate_on(iterable, separator): # based on groupby sep_len=len(separator) for is_header, item in groupby(iterable, lambda line: line[:sep_len] == separator): if is_header: header_tails = [h[sep_len:].strip() for h in item] for naked_header in header_tails[:-1]: yield (naked_header,[]) header_tail = header_tails[-1] else: try: yield (header_tail, [s.strip() for s in item]) except UnboundLocalError: yield (None, [s.strip() for s in item]) def group(iterable, separator): # Steven's/Joshua's rewritten sep_len = len(separator) accum = None header = None for item in iterable: item = item.strip() if item[:sep_len] == separator: if accum is not None: # Don't bother if there are no accumulated lines. yield (header, accum) header = item[sep_len:] accum = [] else: try: accum.append(item) except AttributeError: accum = [item] # Don't forget the last group of lines. yield (header, accum) Both versions behave as follows: - any line that *starts* with the separator is treated as a header line. The tail of that line is returned as the groups title in a tuple with the group's content, i.e. (header, [body]). If there's only the separator, the title is ''. I find this a more useful behaviour as it allows things like: ##Group1 elem1 elem2 elem3 ##Group2 a b c ... - if there are headers without body, they are reported as (header, []). - if the first body has no header, that's reported as (None, [body]). Advantages & Disadvantages of either form: Steven's/Joshua's: simple and fast it's more readable I'd say, and for small groups the groupby implementation is about 1.5x slower than this one. The groupby version catches up with increasing group sizes (because it uses comprehensions instead of list.append I think), but it only wins with groups of ~1000 elements. the groupby implementation: more flexible its yield statement deliberately returns a list of the elements, but before that you just have an iterator, which you could just as well turn into a tuple, set, string or anything without constructing the list in memory. So in terms of code recycling this might be preferable. Cheers, Wolfgang -- http://mail.python.org/mailman/listinfo/python-list