Jeremy wrote: > On Jan 11, 8:44 am, Iain King <iaink...@gmail.com> wrote: >> On Jan 11, 3:35 pm, Jeremy <jlcon...@gmail.com> wrote: >> >> >> >> >> >> > Hello all, >> >> > I am using re.split to separate some text into logical structures. >> > The trouble is that re.split doesn't find everything while re.findall >> > does; i.e.: >> >> > > found = re.findall('^ 1', line, re.MULTILINE) >> > > len(found) >> > 6439 >> > > tables = re.split('^ 1', line, re.MULTILINE) >> > > len(tables) >> > > 1 >> >> > Can someone explain why these two commands are giving different >> > results? I thought I should have the same number of matches (or maybe >> > different by 1, but not 6000!) >> >> > Thanks, >> > Jeremy >> >> re.split doesn't take re.MULTILINE as a flag: it doesn't take any >> flags. It does take a maxsplit parameter, which you are passing the >> value of re.MULTILINE (which happens to be 8 in my implementation). >> Since your pattern is looking for line starts, and your first line >> presumably has more splits than the maxsplits you are specifying, your >> re.split never finds more than 1. > > Yep. Thanks for pointing that out. I guess I just assumed that > re.split was similar to re.search/match/findall in what it accepted as > function parameters. I guess I'll have to use a \n instead of a ^ for > split.
You can precompile the pattern and then invoke the split() method: >>> re.compile("^X", re.MULTILINE).split("""X alpha ... beta ... X gamma ... delta X ... X ... zeta ... """) ['', ' alpha\nbeta\n', ' gamma\ndelta X\n', '\nzeta\n'] Peter -- http://mail.python.org/mailman/listinfo/python-list