On Fri, 17 Oct 2008 11:42:05 -0400, Luis Zarrabeitia wrote: > I need to parse a file, text file. The format is something like that: > > TYPE1 metadata > data line 1 > data line 2 > ... > data line N > TYPE2 metadata > data line 1 > ... > TYPE3 metadata > ... > […] > because when the parser iterates over the input, it can't know that it > finished processing the section until it reads the next "TYPE" line > (actually, until it reads the first line that it cannot parse, which if > everything went well, should be the 'TYPE'), but once it reads it, it is > no longer available to the outer loop. I wouldn't like to leak the > internals of the parsers to the outside. > > What could I do? > (to the curious: the format is a dialect of the E00 used in GIS)
Group the lines before processing and feed each group to the right parser: import sys from itertools import groupby, imap from operator import itemgetter def parse_a(metadata, lines): print 'parser a', metadata for line in lines: print 'a', line def parse_b(metadata, lines): print 'parser b', metadata for line in lines: print 'b', line def parse_c(metadata, lines): print 'parser c', metadata for line in lines: print 'c', line def test_for_type(line): return line.startswith('TYPE') def parse(lines): def tag(): type_line = None for line in lines: if test_for_type(line): type_line = line else: yield (type_line, line) type2parser = {'TYPE1': parse_a, 'TYPE2': parse_b, 'TYPE3': parse_c } for type_line, group in groupby(tag(), itemgetter(0)): type_id, metadata = type_line.split(' ', 1) type2parser[type_id](metadata, imap(itemgetter(1), group)) def main(): parse(sys.stdin) -- http://mail.python.org/mailman/listinfo/python-list