Joe: This is a defensive approach that I believe has minimal cost.
public static boolean isHighSurrogate(char ch) {
// Help VM constant-fold; MAX_HIGH_SURROGATE + 1 == MIN_LOW_SURROGATE
return ch >= MIN_HIGH_SURROGATE && ch < (MAX_HIGH_SURROGATE + 1);
}
> On Jul 15, 2020, at 3:32 PM, [email protected] wrote:
>
> Hi Joe,
>
> Thank you for your review.
>
> On 7/15/20 10:57 AM, Joe Wang wrote:
>> Hi Naoto,
>> In StringUTF16.java, if one is isHighSurrogate and the other not, you may
>> quickly return without going through the rest of the process, probably not
>> significant as cp1 and cp2 and/or u1 and u2 won't be equal anyways. But it
>> could skip a couple of toCodePoint/toUpperCase/toLowerCase calls.
>
> Yes, that is correct as of now, which is based on the assumption that case
> mappings do not cross BMP and supplementary planes boundary. I could not find
> any description where that's given or not. So I just took it to be safe.
>
> Naoto
>
>> -Joe
>> On 7/15/20 9:00 AM, [email protected] wrote:
>>> Hello,
>>>
>>> Please review the fix to the following issues:
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8248655
>>> https://bugs.openjdk.java.net/browse/JDK-8248434
>>>
>>> The proposed changeset and its CSR are located at:
>>>
>>> https://cr.openjdk.java.net/~naoto/8248655.8248434/webrev.00/
>>> https://bugs.openjdk.java.net/browse/JDK-8248664
>>>
>>> A bug was filed against SimpleDateFormat (8248434) where case-insensitive
>>> date format/parse failed in some of the new locales in JDK15. The root
>>> cause was that case-insensitive String.regionMatches() method did not work
>>> with supplementary characters. The problem is that the method's spec does
>>> not expect case mappings of supplementary characters, possibly because it
>>> was overlooked in the first place, JSR 204 - "Unicode Supplementary
>>> Character support". Similar behavior is observed in other two
>>> case-insensitive methods, i.e., compareToIgnoreCase() and
>>> equalsIgnoreCase().
>>>
>>> The fix is straightforward to compare strings by code point basis, instead
>>> of code unit (16bit "char") basis. Technically this change will introduce a
>>> backward incompatibility, but I believe it is an incompatibility to wrong
>>> behavior, not true to the meaning of those methods' expectations.
>>>
>>> Naoto