Re: RFR 8197462 : Inconsistent exception messages for invalid capturing group names

2018-02-12 Thread Xueming Shen

Hi Ivan,

The code handles group name was added later. So "historically" those cases
trigger "unknow look-behind group" when the first character after "<" is not
"=" or "?". With the addition of the group name support, it's actually 
hard to

say which one is more accurate, incorrect group name or incorrect "looks
behind". Sure with a tailing ">" it might be more desired to lean to group
name.

It's definitely a bug not to check whether or not the first char is 
alpha for \\k<.


I'm fine with the proposed change.

Thanks,
Sherman


On 2/8/18, 8:32 PM, Ivan Gerasimov wrote:

Hello!

Capturing group name can be used in a regular expression in two 
contexts:  When introducing a group (?...) or when referring it 
\k.
If the name is invalid (i.e. does not start with a Latin letter, or 
contains wrong chars) then we may see different error messages, some 
of which look confusing.


Here are examples of the messages produced by the current JDK:
Unknown look-behind group near index 3
(?<>)
   ^
named capturing group is missing trailing '>' near index 4
\\k<>
^
Unknown look-behind group near index 4
(?<.>)
^
(named capturing group <.> does not exit near index 4
\\k<.>
^
named capturing group is missing trailing '>' near index 4
(?)
^
named capturing group is missing trailing '>' near index 4
\\k
^

In particular, this diversity is caused by that the internal 
Pattern.groupname() function lacks a check for the very first 
character of the name.
So that when \k is parsed, the first char is always accepted, no 
matter what it was.


Some cleanup was also done along the way.

Would you please help review the fix?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8197462
WEBREV: http://cr.openjdk.java.net/~igerasim/8197462/00/webrev/

Thanks in advance!





RFR 8197462 : Inconsistent exception messages for invalid capturing group names

2018-02-08 Thread Ivan Gerasimov

Hello!

Capturing group name can be used in a regular expression in two 
contexts:  When introducing a group (?...) or when referring it 
\k.
If the name is invalid (i.e. does not start with a Latin letter, or 
contains wrong chars) then we may see different error messages, some of 
which look confusing.


Here are examples of the messages produced by the current JDK:
Unknown look-behind group near index 3
(?<>)
   ^
named capturing group is missing trailing '>' near index 4
\\k<>
^
Unknown look-behind group near index 4
(?<.>)
^
(named capturing group <.> does not exit near index 4
\\k<.>
^
named capturing group is missing trailing '>' near index 4
(?)
^
named capturing group is missing trailing '>' near index 4
\\k
^

In particular, this diversity is caused by that the internal 
Pattern.groupname() function lacks a check for the very first character 
of the name.
So that when \k is parsed, the first char is always accepted, no 
matter what it was.


Some cleanup was also done along the way.

Would you please help review the fix?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8197462
WEBREV: http://cr.openjdk.java.net/~igerasim/8197462/00/webrev/

Thanks in advance!

--
With kind regards,
Ivan Gerasimov