Re: Possible optimization in StringLatin1.regionMatchesCI

Claes Redestad Tue, 26 May 2020 15:04:11 -0700

So to try and clarify:

if (Character.toLowerCase(u1) == Character.toLowerCase(u2))


... can never happen today in the context of the StringLatin1 version
of regionMatchesCI (I did a quick check), and a test that exhaustively
tests this property holds should ensure any future unicode updates
doesn't trip us (unlikely -- but not theoretically impossible).

I think we can go ahead with this.

/Claes

On 2020-05-26 18:27, Martin Buchholz wrote:

On Tue, May 26, 2020 at 4:07 AM Christoph Dreis
<[email protected]> wrote:


Hi Martin,

> Not a review, but:

Compare with the variant of this code in StringUTF16.
StringLatin1 only ever needs to support the first 256 chars in Unicode


Does it really? That makes me wonder even more about the additional lowercase 
check.

which can never change, unlike StringUTF16,


What do you mean by "can never change"?


When we discover sentient life on Titan, their script needs to get
added to Unicode.  But the first 256 chars are already fully
allocated; the Titans will be given empty space elsewhere.  Hopefully
Unicode won't be clogged by a million emojis at that point.

There's a real fear of eszett capitalization changing. After centuries
of debate the German Sprachbund will finally decide to (wisely!)
abolish eszett, but Liechtenstein will be the only holdout insisting
that eszett be capitalized to
https://en.wikipedia.org/wiki/Capital_%E1%BA%9E

Fortunately the code we are reviewing here is Locale-independent, and
so is hopefully immune to the future politics of Liechtenstein.

Re: Possible optimization in StringLatin1.regionMatchesCI

Reply via email to