Re: [pcre-dev] Capture not reset inside recursion
Hi ND, On Sun, 6 Jun 2021 at 16:09, ND via Pcre-dev wrote: > > > On 2021-06-06 05:53, Zoltán Herczeg wrote: > > ND I think you have found a pretty nice Perl bug, maybe you could report > > it to them. > > Zoltan, thank you for great investigation. > Now I sure it looks like a Perl bug. > > Everybody feel free to report it. My English is bad and I have much > difficulties with reporting and further conversation. I've done it here https://github.com/Perl/perl5/issues/18865 . Kudos to you and Zoltán for the analysis, and thank you very much for contributing to Perl and PCRE :) Cheers, -- Giuseppe D'Angelo -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
Re: [pcre-dev] Capture not reset inside recursion
On 2021-06-06 05:53, Zoltán Herczeg wrote: ND I think you have found a pretty nice Perl bug, maybe you could report it to them. Zoltan, thank you for great investigation. Now I sure it looks like a Perl bug. Everybody feel free to report it. My English is bad and I have much difficulties with reporting and further conversation. -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
Re: [pcre-dev] Capture not reset inside recursion
I agree with Zoltan. I do not think this is a bug. Regards, Philip On Sat, 5 Jun 2021 at 23:43, ND via Pcre-dev wrote: > > Here is pcretest listing: > > > PCRE2 version 10.35 2020-05-09 > /(?:(a)?\1)+/ > aaa > 0: aaa > > > Expected result: > 0: aa > > Perl result: > 0: aa > > -- > ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev > -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
Re: [pcre-dev] Capture not reset inside recursion
I did more investigation: Perl: /(?:(?:(a)b)?\1)+/ matches abaa /(?:(?:(ab))?\1)+/ does not match ababab These pattern / input pairs match in PCRE2. I am pretty sure (?:(P))? is rewritten to ((?:P)?) in Perl, which is valid in some cases, but not in all cases. ND I think you have found a pretty nice Perl bug, maybe you could report it to them. Regards, Zoltan Eredeti levél Feladó: Zoltán Herczeg < hzmes...@freemail.hu (Link -> mailto:hzmes...@freemail.hu) > Dátum: 2021 június 6 07:21:30 Tárgy: Re: [pcre-dev] Capture not reset inside recursion Címzett: Pcre-dev@exim.org < nad...@mail.ru (Link -> mailto:nad...@mail.ru) > The title is misleading, that feature is a JavaScript thing: /(?:(a)b|\1)+/ matches aba in Perl, but not in JavaScript. Anyway it looks like the problem here is ()? clears the capturing bracket in Perl when the empty case is selected while restores its previous value in PCRE2. Matching /(?:(a)??b)+/ to abb also has this difference: the capturing bracket is empty in Perl, while set to a in PCRE2. Even more interesting that /(?:(?:(a))??\1)+/ only matches to aa as well, while the body of the ?? should not be matched in the second iteration. Let's do some debugging: Match /(?:(?{ print "<$1>" })(?:(a))??(?{ print "[$1]" })\1)+/ to aaa Output: <>[][a][][a] It the second iteration, the capturing bracket contains a before the ?? is executed, and reset to nothing after. You will not belive this, but /(?:(?:(?{ print "!" })(a))?\1)+/ matches to aaa similar to PCRE2. The code block should have zero effect on the matching, still it disables something (probably an optimization) and works as expected. Is this a perl bug? Regards, Zoltan Eredeti levél Feladó: ND via Pcre-dev < pcre-dev@exim.org (Link -> mailto:pcre-dev@exim.org) > Dátum: 2021 június 6 00:44:08 Tárgy: [pcre-dev] Capture not reset inside recursion Címzett: Pcre-dev@exim.org (Link -> mailto:Pcre-dev@exim.org) Here is pcretest listing: PCRE2 version 10.35 2020-05-09 /(?:(a)?\1)+/ aaa 0: aaa Expected result: 0: aa Perl result: 0: aa -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
Re: [pcre-dev] Capture not reset inside recursion
The title is misleading, that feature is a JavaScript thing: /(?:(a)b|\1)+/ matches aba in Perl, but not in JavaScript. Anyway it looks like the problem here is ()? clears the capturing bracket in Perl when the empty case is selected while restores its previous value in PCRE2. Matching /(?:(a)??b)+/ to abb also has this difference: the capturing bracket is empty in Perl, while set to a in PCRE2. Even more interesting that /(?:(?:(a))??\1)+/ only matches to aa as well, while the body of the ?? should not be matched in the second iteration. Let's do some debugging: Match /(?:(?{ print "<$1>" })(?:(a))??(?{ print "[$1]" })\1)+/ to aaa Output: <>[][a][][a] It the second iteration, the capturing bracket contains a before the ?? is executed, and reset to nothing after. You will not belive this, but /(?:(?:(?{ print "!" })(a))?\1)+/ matches to aaa similar to PCRE2. The code block should have zero effect on the matching, still it disables something (probably an optimization) and works as expected. Is this a perl bug? Regards, Zoltan Eredeti levél Feladó: ND via Pcre-dev < pcre-dev@exim.org (Link -> mailto:pcre-dev@exim.org) > Dátum: 2021 június 6 00:44:08 Tárgy: [pcre-dev] Capture not reset inside recursion Címzett: Pcre-dev@exim.org (Link -> mailto:Pcre-dev@exim.org) Here is pcretest listing: PCRE2 version 10.35 2020-05-09 /(?:(a)?\1)+/ aaa 0: aaa Expected result: 0: aa Perl result: 0: aa -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
[pcre-dev] Capture not reset inside recursion
Here is pcretest listing: PCRE2 version 10.35 2020-05-09 /(?:(a)?\1)+/ aaa 0: aaa Expected result: 0: aa Perl result: 0: aa -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev