On Sun, 22 Jul 2018, ND via Pcre-dev wrote:

> Why PCRE2_INFO_NAMETABLE entries can't have same number? I see no drawbacks of
> this.

Consider   /(?| (?<A>foo) | (?<B>bar) )/x

The table will tell you "group 1 is called A" and "group 1 is called B". 
What happens if you match the pattern with "foo" and then ask "what is 
the value of group B?". The table will tell you that group B is group 1, 
and group 1 has matched "foo". But this is not right.

At least, I don't think it's right! I would expect a group identified as 
B to be "unset".

BUT.....

I have never experimented with Perl on this, but I have now done so, and 
Perl (v5.26.2) behaves exactly as I have described:

$ perl -e 'if (foo =~ /(?| (?<A>foo) | (?<B>bar) )/x) { print "yes >$& A=$+{A} 
B=$+{B} 1=$1<\n"; } else { print "no \n"; }'
yes >foo A=foo B=foo 1=foo<

Oh dear. Looks like Perl implemented named groups in a similar way to 
PCRE, but allowed different names for the same number. I think it's very
confusing.

AHA! I have found this in the Perl documentation:

  Be careful when using the branch reset pattern in combination with
  named captures. Named captures are implemented as being aliases to
  numbered groups holding the captures, and that interferes with the
  implementation of the branch reset pattern. If you are using named
  captures in a branch reset pattern, it's best to use the same
  names, in the same order, in each of the alternations:

     /(?|  (?<a> x ) (?<b> y )
        |  (?<a> z ) (?<b> w )) /x

  Not doing so may lead to surprises:

        "12" =~ /(?| (?<a> \d+ ) | (?<b> \D+))/x;
        say $+{a};    # Prints '12'
        say $+{b};    # *Also* prints '12'.

  The problem here is that both the group named "a" and the group
  named "b" are aliases for the group belonging to $1.

I think it is better to avoid the "surprises" by not allowing different
named aliases for the same number.

Philip

-- 
Philip Hazel

-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to