Flyzone: > i have a problem with the split function and regexp. > I have a file that i want to split using the date as token.
My first try: data = """ error text Mon Apr 9 22:30:18 2007 text text Mon Apr 9 22:31:10 2007 text text Mon Apr 10 22:31:10 2007 text text """ import re date_find = re.compile(r"\d\d:\d\d:\d\d \d{4}$") section = [] for line in data.splitlines(): if date_find.search(line): if section: print "\n" + "-" * 10 + "\n", "\n".join(section) section = [line] else: if line: section.append(line) print "\n" + "-" * 10 + "\n", "\n".join(section) itertools.groupby() is fit to split sequences like: 1111100011111100011100101011111 as: 11111 000 111111 000 111 00 1 0 1 0 11111 While here we have a sequence like: 100001000101100001000000010000 that has to be splitted as: 10000 1000 10 1 10000 10000000 10000 A standard itertool can be added for such quite common situation too. Along those lines I have devised this different (and maybe over- engineered) version: from itertools import groupby import re class Splitter(object): # Not tested much def __init__(self, predicate): self.predicate = predicate self.precedent_el = None self.state = True def __call__(self, el): if self.predicate(el): self.state = not self.state self.precedent_el = el return self.state date_find = re.compile(r"\d\d:\d\d:\d\d \d{4}$") splitter = Splitter(date_find.search) sections = ("\n".join(g) for h,g in groupby(data.splitlines(), key=splitter)) for section in sections: if section: print "\n" + "-" * 10 + "\n", section The Splitter class + the groupby can become a single simpler generator, like in this this version: def grouper(seq, key=bool): # A fast identity function can be used instead of bool() # Not tested much group = [] for part in seq: if key(part): if group: yield group group = [part] else: group.append(part) yield group import re date_find = re.compile(r"\d\d:\d\d:\d\d \d{4}$") for section in grouper(data.splitlines(), date_find.search): print "\n" + "-" * 10 + "\n", "\n".join(section) Maybe that grouper can be modified to manage group lazily, like groupby does, instead of building a true list. Flyzone (seen later): >Amm..not! I need to get the text-block between the two data, not the data! :) Then you can modify the code like this: def grouper(seq, key=bool): group = [] for part in seq: if key(part): if group: yield group group = [] # changed else: group.append(part) yield group Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list