On Sat, 18 Feb 2023 09:21:25 GMT, Eirik Bjorsnos <d...@openjdk.org> wrote:
> This PR suggests we can speed up `StringLatin1.regionMatchesCI` by applying > 'the oldest ASCII trick in the book'. > > The new static method `CharacterDataLatin1.equalsIgnoreCase` compares two > latin1 bytes for equality ignoring case. `StringLatin1.regionMatchesCI` is > updated to use `equalsIgnoreCase` > > To verify the correctness of `equalsIgnoreCase`, a new test is added to > `EqualsIgnoreCase` with an exhaustive verification that all 256x256 latin1 > code point pairs have an `equalsIgnoreCase` consistent with > Character.toUpperCase, Character.toLowerCase. > > Performance is tested for matching and mismatching cases of code point pairs > picked from the ASCII letter, ASCII number and latin1 letter ranges. Results > in the first comment below. src/java.base/share/classes/java/lang/CharacterDataLatin1.java.template line 181: > 179: return ( U <= 'Z' // In range A-Z > 180: || (U >= 0xC0 && U <= 0XDE && U != 0xD7)) // ..or > A-grave-Thorn, excl. multiplication > 181: && U == (b2 & 0xDF); // b2 has same uppercase I'm curious if the order of comparisons could alter performance to a small degree. For example, it might be interesting to compare various permutations like below to short circuit reject unequal uppercased b2 Suggestion: // uppercase b1 using 'the oldest ASCII trick in the book' int U = b1 & 0xDF; return (U == (b2 & 0xDF)) && ((U >= 'A' && U <= 'Z') // In range A-Z || (U >= 0xC0 && U <= 0XDE && U != 0xD7)) // ..or A-grave-Thorn, excl. multiplication ------------- PR: https://git.openjdk.org/jdk/pull/12632