On Mon, 23 Jan 2012, ND wrote:

> No. If PCRE can not calculate the length of the longest lookbehind in pattern
> then main application must know that string returned for a partial match may
> be not long enough and may be more symbols needed to keep.
> 
> If PCRE can calculate the length of the longest lookbehind in pattern then it
> can simply returns it. Value 0 means that no lookbehinds present in pattern.

PCRE can easily calculate the length of the longest lookbehind. That is 
not a problem. It can return it to the application via a PCRE_INFO_xxx 
call. I think a negative value should mean no lookbehinds, because a
lookbehind of length 0 is permitted.

I think we have not yet got this fully understood.

If zero-length partial matches are allowed whenever there is a 
lookbehind, then just adding a lookbehind in another branch of the 
pattern will change its behaviour. You can always add (?<!)| at the 
start of a pattern without having any effect ... the lookbehind always 
fails, so matching just carries on with the rest of the pattern.

Something like /abc/ matched against \P\Pxyz give no match. I do not 
think it would be useful to make /(?<!)|abc/ give instead a partial 
match, just because there is a lookbehind somewhere in the pattern.

Another idea: the problems arise from lookaheads at the end of the 
subject string when no previous characters have been inspected. Perhaps 
a zero-length partial match should be allowed if it arises within a 
lookahead? This means that (?!b) could cause such a match, but [^b] 
would not.

I need to work through a lot more examples to see what might make sense 
here.

Philip

-- 
Philip Hazel

-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to