Christoph Krammer wrote: > Hello, > > I want to use the re module to split a data stream that consists of > several blocks of data. I use the following code: > > iter = re.finditer('^(HEADER\n.*)+$', data) > > The data variable contains binary data that has the word HEADER in it > in some places and binary data after this word till the next > appearance of header or the end of the file. But if I iterate over > iter, I only get one match and this match only contains one group. How > to access the other matches? Data may contain tens of them. > > Thanks in advance, > Christoph > >
Use .*? instead of .* in your regular expression. From the manual page: *|*?|, |+?|, |??|* The "*", "+", and "?" qualifiers are all /greedy/; they match as much text as possible. Sometimes this behaviour isn't desired; if the RE <.*> is matched against |'<H1>title</H1>'|, it will match the entire string, and not just |'<H1>'|. Adding "?" after the qualifier makes it perform the match in /non-greedy/ or /minimal/ fashion; as /few/ characters as possible will be matched. Using .*? in the previous expression will match only |'<H1>'|. Gary Herron -- http://mail.python.org/mailman/listinfo/python-list