https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85472

--- Comment #10 from Hans Åberg <haberg-1 at telia dot com> ---
(In reply to Tim Shen from comment #9)
> Ah with the example it's clear, thanks!

You are welcome.

> > The last line gives for #1 the sub-string "z" , and for #2 "aacbbbcac".
> 
> This is not what ECMAScript produces either. for capture #2, ECMAScriptn
> produces "ac", the last match of the loop.
> 
> Think about the difference between
> 
>   (z)((a+)?(b+)?(c))*
> 
> and
> 
>   (z)((?:(a+)?(b+)?(c))*)
> 
> Your #2 seems to capture the second case, which is different.

Yes, I thought about this difference in an earlier version that marked the
start and last states only, but switched to this, because I wanted to ignore
empty matches:

I think though that one might mark the transitions with action numbers and
capture it that way (see below).

> > For
> > #3 "a", and for #5 "c". But #4 is missing, indication there is no match. So
> > there might be problem here, as there are earlier matches:
> > 
> > Perhaps the intent is that it should be implemented as a loop, only
> > retaining the last #4, 
> 
> That's what the implementations (boost, libstdc++, python) actually do.
> That's not ECMAScript's intention. ECMAScript's intention is to leave #4
> undefined (*not* retaining the last non-empty #4), as in the last iteration
> of the loop, #4 (b+)? doesn't match any sub-string.

The problem is that #4 has an earlier capture, making it impossible to see that
it is left undefined later. So it would be better to capture it as an empty
string here:

In the example I gave, if the transitions are marked (ignoring #2)
  1: {a ↦ {1#3, 2#3, 3#3$4}};
  2: {b ↦ {2#4, 3#4}};
  3: {c ↦ {3#5, 2#5, 1#5, ∅#2}.
Then in the "ac" part, 'a' in state 1 goes to 3 and gets a #3 match 'a' plus an
empty which I have written $4. That is the idea, though.

> > but the problem is that that is not what the
> > underlying theory says.
> 
> I'm not sure if there is any theory around caputring groups. If we are about
> to create one, be aware that there are multiple plausible definitions.

I have though not implemented the counted repetitions A{m, n}, which I think
usually are implemented as loops, so I do not know how that fits into the
picture.

Reply via email to