On 2019-07-09 13:53, ph10 wrote:
On Mon, 8 Jul 2019, ND via Pcre-dev wrote:
And if we disregards Perl's bugs then it seems (*COMMIT) in Perl works
in a
> following manner:
>> 1. Backtracking can't move to the left of COMMIT (this is PCRE behaviour too) > 2. If COMMIT occurs then no advance match to any other position of subject can > happen. No matter there are any other backtracking control verbs occurs after > COMMIT or COMMIT occurs in atomic group/negative lookaround etc (this is not
> implemented by PCRE)
There is also a difference in the way Perl handles repeated groups. Consider
In Perl, the group repeat matches "abcd", but when it then does not
match "c", it unwinds complete repetitions of the group. In PCRE2,
there is a backtrack onto *COMMIT, so it fails. Looks like Perl handles *COMMIT somehow differently to normal backtracks, because it does do ordinary backtracks into repeated groups:


No. I think Perl don't handle (*COMMIT) somehow differently. Perl can match pattern A*B by number of methods. Common method is named CURLYX-WHILEM. But there are some optimizations that are involved in some situations. Thus, optimized method named CURLYM used when A is a group of constant length without captures. CURLYM have a buggy realization that is not take into account a (*COMMIT) influence.

Perl match a patterns
/\A(?:.(*COMMIT))*c/
/\A(?:(*COMMIT).)*c/
with use of CURLYM. So it do it wrong in both cases that we can see at Perl debug output. But in second case result is accidentally coincided to expected.

A pattern
/\A(?:.{1,2}(*COMMIT))*c/
is matched with CURLYX-WHILEM which realization have not such bug.


I think Perl developers should fix a realization of CURLYM or process groups that have (*COMMIT) with CURLYX-WHILEM.


What can do PCRE?
PCRE can do nothing or change to process (*COMMIT) as Perl mean it:
1. If COMMIT occurs then backtracking can't move to the pattern part that is left of it.
2. If COMMIT occurs then start position can't be advanced.
This two principles works no matter there are any other backtracking control verbs
occurs after COMMIT or COMMIT occurs in atomic group or negative lookaround
etc.

PCRE didn't now realize them strong.
For example consider a pattern:

PCRE2 version 10.33 2019-04-16
/.?(?!(*COMMIT)x)a/
abc
 0: a

Perl way is "There can be no backtracking left of COMMIT". So engine can't backtrack to ".?" and Perl result will be "no match".

--
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

Reply via email to