Thank you Roger for reviewing CSR and the release note!
On 2/11/20 12:49 PM, Roger Riggs wrote:
Hi Ivan,
Will this have enough of a compatibility concern to warrant a CSR?
It will change the behavor of these cases.
In the RegExTest, the failures should print which case is failing.
(Line 4961, 4990).
I agree that many testcases in RegExTest could provide better
diagnostics in a case of a failure.
I think, it maybe done as a separate cleanup.
In the added testcase I made sure that both the input string and the
pattern are printed upon failure.
With kind regards,
Ivan
Regards, Roger
On 2/7/20 3:05 PM, Ivan Gerasimov wrote:
Gentle ping.
I had to rebase the fix, as the code has diverged since the RFR was
sent out 10 months ago.
Also, the test was slightly modified to cover more cases.
BUGURL: https://bugs.openjdk.java.net/browse/JDK-8214245
WEBREV: http://cr.openjdk.java.net/~igerasim/8214245/01/webrev/
Thanks in advance to the volunteer to review the fix!
With kind regards,
Ivan
On 4/21/19 7:50 PM, Ivan Gerasimov wrote:
Hello!
It turns out, that the case-insensitive j.u.regex.Pattern still pays
attention to the characters case when certain char classes are used.
For example \p{IsLowerCase}, \p{IsUpperCase} and \p{IsTitleCase}
continue to recognize only lower, upper and title case characters,
even in case-insensitive context.
For example, for POSIX char classes this behavior contradicts this
paragraph:
"""
9.2 Regular Expression General Requirements
...
When a standard utility or function that uses regular expressions
specifies that pattern matching shall be performed without regard to
the case (uppercase or lowercase) of either data or patterns, then
when each character in the string is matched against the pattern,
not only the character, but also its case counterpart (if any),
shall be matched. This definition of case-insensitive processing is
intended to allow matching of multi-character collating elements as
well as characters, as each character in the string is matched using
both its cases.
...
"""
I also checked how Perl is dealing with in such situation, and yes,
it ignores the case with various \p{} classes when they are used in
case-insensitive context, so all these tests run fine:
'A' =~ /\p{Lower}/i or die;
'a' =~ /\p{Upper}/i or die;
'A' =~ /\p{gc=Lt}/i or die; # title case
'a' =~ /\p{IsTitlecase}/i or die;
'Lj' =~ /\p{Lower}/i or die; # title-cased digraph
'lj' =~ /\p{Upper}/i or die;
'LJ' =~ /\p{Lt}/i or die;
For reference, here's a lengthy document, describing precise rules
used by Perl to deal with \p{} char classes:
https://perldoc.perl.org/perluniprops.html#Properties-accessible-through-%5cp%7b%7d-and-%5cP%7b%7d
So, for any Lower, Upper or Title case chars in case-insensitive
context Perl uses set of "Cased Letters", with is just a combination
of these three categories (aka "LC" general category).
Would you please help review the patch?
BUGURL: https://bugs.openjdk.java.net/browse/JDK-8214245
WEBREV: http://cr.openjdk.java.net/~igerasim/8214245/00/webrev/
--
With kind regards,
Ivan Gerasimov