Re: [pcre-dev] Unexpected result when zero-length global matching and pattern contains \G

ph10 Mon, 25 Jun 2018 07:28:45 -0700

On Thu, 31 May 2018, ND via Pcre-dev wrote:

> PCRE2 version 10.31 2018-02-12
> /(?<=\G.)/g,replace=-
> abc
> 2: a-bc-
> 
> 
> Logically expected result: a-b-c-
> 
> 
> PCRE advances by one character between zero-length matches. But it seems it
> should not in this case.


It is *pcre2test* that is doing the advancing, not the PCRE2 library, 
which only ever does one match at a time.

The use of \G in lookbehinds, like \K, can cause a lot of confusion.

\G is true when the current matching point is at the start of the 
subject plus the starting offset (compare ^, which is only true at the 
start of the subject in single-line mode). Consider the match without 
the replace, and some added parentheses:

/(?<=\G(.))/g
abc
 0: 
 1: a
 0: 
 1: c

The /g operation in pcre2test starts by calling pcre2_match() with a 
starting offset of 0. This defines where \G will match. However, the
lookbehind fails early, because there are no earlier characters, so the
match moves on (within the PCRE2 library) to try again from a new
starting position, but this does NOT change the original starting 
offset.

This match succeeds. Normally, the next call to pcre2_match() from 
pcre2test would pass the subject with a new starting offset, set to the
end of the previous match. However, the match was for an empty string,
so it moves on one character, so the starting offset is now 2. This time
it can lookbehind, but when it does, the matching point is not at the
starting offset, so once again it moves on within the PCRE library,
finding a match one character later.

In both cases, it finds a match one character later than the starting 
offset.

Philip

-- 
Philip Hazel

-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

Re: [pcre-dev] Unexpected result when zero-length global matching and pattern contains \G

Reply via email to