Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

naoto . sato Wed, 15 Jul 2020 11:28:16 -0700

Hi Roger,

Thank you for your review and suggestions.


On 7/15/20 10:56 AM, Roger Riggs wrote:

Hi Naoto,
Given the extra tests in the body of the loop, I think its worth findingor creating
a JMH test for this and checking the performance.


Good point.

With performance in mind, I would try to fall back to the UC/LCconversions only

when the bytes don't match.  See java.util.Arrays.mismatch(byte[], byte[]).

It might even be worth finding the mismatch in the byte arrays before even
starting to look at the characters.


Agree.


There's also an option to assemble 4 bytes at a time and compare the int's.
If they are equal you are ahead of the game.  If not, back off to comparing
the characters and checking for surrogates.  The backoff code will be a bit
messier though.

Sure int comparison would be faster, I am not quite sure that would beworth compared to more complicated recovery code though.

Also, compareToCI and regionMatchesCI could share the implementation ofthe inner loop.


Yes.

If k1 and k2 ever get out of sync, isn't that failed assertion, so whyhave two indexes.

This is for preventing possible case maps where one in BMP and the otherin supplementary, thus the indices could be different.

The loop will have fewer checks against the length of it processes len-1chars
and then have a check if there is a final char to be checked.
it can always know there is another char and can blindly get it.


Maybe, but could be complicated if the last char is a low surrogate.

Anyway, I will come up with a JMH test case and revise the webrev.

Naoto

Regards, Roger


On 7/15/20 12:00 PM, [email protected] wrote:
Hello,

Please review the fix to the following issues:

https://bugs.openjdk.java.net/browse/JDK-8248655
https://bugs.openjdk.java.net/browse/JDK-8248434

The proposed changeset and its CSR are located at:

https://cr.openjdk.java.net/~naoto/8248655.8248434/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8248664
A bug was filed against SimpleDateFormat (8248434) wherecase-insensitive date format/parse failed in some of the new localesin JDK15. The root cause was that case-insensitiveString.regionMatches() method did not work with supplementarycharacters. The problem is that the method's spec does not expect casemappings of supplementary characters, possibly because it wasoverlooked in the first place, JSR 204 - "Unicode SupplementaryCharacter support". Similar behavior is observed in other twocase-insensitive methods, i.e., compareToIgnoreCase() andequalsIgnoreCase().
The fix is straightforward to compare strings by code point basis,instead of code unit (16bit "char") basis. Technically this changewill introduce a backward incompatibility, but I believe it is anincompatibility to wrong behavior, not true to the meaning of thosemethods' expectations.
Naoto

Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

Reply via email to