https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85472

--- Comment #9 from Tim Shen <timshen at gcc dot gnu.org> ---
Ah with the example it's clear, thanks!

> The last line gives for #1 the sub-string "z" , and for #2 "aacbbbcac".

This is not what ECMAScript produces either. for capture #2, ECMAScriptn
produces "ac", the last match of the loop.

Think about the difference between

  (z)((a+)?(b+)?(c))*

and

  (z)((?:(a+)?(b+)?(c))*)

Your #2 seems to capture the second case, which is different.

> For
> #3 "a", and for #5 "c". But #4 is missing, indication there is no match. So
> there might be problem here, as there are earlier matches:
> 
> Perhaps the intent is that it should be implemented as a loop, only
> retaining the last #4, 

That's what the implementations (boost, libstdc++, python) actually do. That's
not ECMAScript's intention. ECMAScript's intention is to leave #4 undefined
(*not* retaining the last non-empty #4), as in the last iteration of the loop,
#4 (b+)? doesn't match any sub-string.

> but the problem is that that is not what the
> underlying theory says.

I'm not sure if there is any theory around caputring groups. If we are about to
create one, be aware that there are multiple plausible definitions.

Reply via email to