Dear Taco, On Mon, 21 Dec 2020 at 13:46, Taco Hoekwater wrote: > > On 21 Dec 2020, at 13:16, Mojca Miklavec wrote: > > > > My only explanation would be that perhaps "^1" is so greedy that the > > rest of the pattern doesn't get found. But I don't want to believe > > that explanation. > > Which (of course) means that that is exactly what happens ;) > > The ones that match are > > ababbb (a (ba+bb) b) => r4 r1(r3(r5 r4) r2(r5 r5)) r5 > abbbab (a (bb+ba) b) => r4 r1(r2(r5 r5) r3(r5 r4)) r5 > > With the ^1, in the “bb” cases the first “b” eats all three “b”s: > > ababbb fails the r5 at the end > > abbbab fails the first r2 already (since the second r5 therein never happens)
Is this a deliberate choice, a limitation of the grammar expressiveness, some misuse on my side that could/should/needs to be implemented in a different way, or does it count as a "bug" on the lpeg side? For example, I wouldn't expect a regexp "b+b" to fail on "bbb" just because "b+" would eat all three "b"s at once (the regexp "b+b" in fact finds "bbb", and I would expect a less-than-totally-greedy hit with lpeg as well). Or is my reasoning wrong here? It certainly works if I use lpeg.P('b') + lpeg.P('bb') + lpeg.P('bbb') -- and a couple more (as long as I can predict the maximum length) but that's not really a viable workaround in general. Thank you, Mojca PS: sorry, a tiny bug also crippled into my sample code. The line after matching the 'parser1' should have used 'total1' rather than 'total': if lpeg.match(parser1, s) then total1 = total1 + 1 end ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________