On Sat, 22 Jun 2019, ND via Pcre-dev wrote: > PCRE2 version 10.33 2019-04-16 > /(?<=(?<=a)b)c.*/info > Capture group count = 0 > Max lookbehind = 1 > First code unit = 'c' > Subject length lower bound = 1 > abc\=ph > Partial match: bc > < > > Why max lookbehind=1, but not 2?
According to the documentation, what happens is strictly correct. The doc says for PCRE2_INFO_MAX_LOOKBEHIND: Return the number of characters (not code units) in the longest lookbehind assertion in the pattern. The length that is matched by each of the lookbehind assertions in your pattern is 1. I understand what you are asking. The reason it was done like this was because it was easy to do. Look at the code: /(?<=(?<=a)b)c.*/fullbincode ------------------------------------------------------------------ 0 29 Bra 3 19 AssertB 6 1 Reverse 9 8 AssertB 12 1 Reverse 15 a 17 8 Ket 20 b 22 19 Ket 25 c 27 Any*+ 29 29 Ket 32 End ------------------------------------------------------------------ The maximum argument for "Reverse" is 1 and it is easy for the code to find that as it sets the values. What you want is a lot more complicated, because the code would need to be able to distinguish between (?<=(?<=a)b) and (?<=.(?<=a)b) for example. In the second case, your max is 1. pcre2_match() is supposed to record the earliest consulted character during a match, and this is what pcre2test outputs. However, there does seem to be a bug because it should show both "a" and "b" for your example. I will investigate. I will also think about the max lookbehind value, but nesting lookbehinds in the way you have done is unusual. Philip -- Philip Hazel -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev