[Bug libstdc++/85472] Regex match bug

timshen at gcc dot gnu.org Wed, 25 Apr 2018 10:38:15 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85472


--- Comment #9 from Tim Shen <timshen at gcc dot gnu.org> ---
Ah with the example it's clear, thanks!

> The last line gives for #1 the sub-string "z" , and for #2 "aacbbbcac".

This is not what ECMAScript produces either. for capture #2, ECMAScriptn
produces "ac", the last match of the loop.

Think about the difference between

  (z)((a+)?(b+)?(c))*

and

  (z)((?:(a+)?(b+)?(c))*)

Your #2 seems to capture the second case, which is different.

> For
> #3 "a", and for #5 "c". But #4 is missing, indication there is no match. So
> there might be problem here, as there are earlier matches:
> 
> Perhaps the intent is that it should be implemented as a loop, only
> retaining the last #4, 

That's what the implementations (boost, libstdc++, python) actually do. That's
not ECMAScript's intention. ECMAScript's intention is to leave #4 undefined
(*not* retaining the last non-empty #4), as in the last iteration of the loop,
#4 (b+)? doesn't match any sub-string.

> but the problem is that that is not what the
> underlying theory says.

I'm not sure if there is any theory around caputring groups. If we are about to
create one, be aware that there are multiple plausible definitions.

[Bug libstdc++/85472] Regex match bug

Reply via email to