Hi Ivan,
The code handles group name was added later. So "historically" those cases
trigger "unknow look-behind group" when the first character after "<" is not
"=" or "?". With the addition of the group name support, it's actually
hard to
say which one is more accurate, incorrect group name or incorrect "looks
behind". Sure with a tailing ">" it might be more desired to lean to group
name.
It's definitely a bug not to check whether or not the first char is
alpha for \\k<.
I'm fine with the proposed change.
Thanks,
Sherman
On 2/8/18, 8:32 PM, Ivan Gerasimov wrote:
Hello!
Capturing group name can be used in a regular expression in two
contexts: When introducing a group (?...) or when referring it
\k.
If the name is invalid (i.e. does not start with a Latin letter, or
contains wrong chars) then we may see different error messages, some
of which look confusing.
Here are examples of the messages produced by the current JDK:
Unknown look-behind group near index 3
(?<>)
^
named capturing group is missing trailing '>' near index 4
\\k<>
^
Unknown look-behind group near index 4
(?<.>)
^
(named capturing group <.> does not exit near index 4
\\k<.>
^
named capturing group is missing trailing '>' near index 4
(?)
^
named capturing group is missing trailing '>' near index 4
\\k
^
In particular, this diversity is caused by that the internal
Pattern.groupname() function lacks a check for the very first
character of the name.
So that when \k is parsed, the first char is always accepted, no
matter what it was.
Some cleanup was also done along the way.
Would you please help review the fix?
BUGURL: https://bugs.openjdk.java.net/browse/JDK-8197462
WEBREV: http://cr.openjdk.java.net/~igerasim/8197462/00/webrev/
Thanks in advance!