On Sat, 18 Feb 2023 19:45:34 GMT, Eirik Bjorsnos <d...@openjdk.org> wrote:
>> src/java.base/share/classes/java/lang/CharacterDataLatin1.java.template line >> 181: >> >>> 179: return ( U <= 'Z' // In range A-Z >>> 180: || (U >= 0xC0 && U <= 0XDE && U != 0xD7)) // ..or >>> A-grave-Thorn, excl. multiplication >>> 181: && U == (b2 & 0xDF); // b2 has same uppercase >> >> I'm curious if the order of comparisons could alter performance to a small >> degree. For example, it might be interesting to compare various permutations >> like below to short circuit reject unequal uppercased b2 >> >> Suggestion: >> >> // uppercase b1 using 'the oldest ASCII trick in the book' >> int U = b1 & 0xDF; >> return (U == (b2 & 0xDF)) >> && ((U >= 'A' && U <= 'Z') // In range A-Z >> || (U >= 0xC0 && U <= 0XDE && U != 0xD7)) // ..or >> A-grave-Thorn, excl. multiplication > > Yeah, as you noticed this code is tricky and sensitive to the order of > operations. I did some quite extensive exploration before ending on the > current structure. This particular one seems to improve rejection somewhat at > the cost of matches. > > Since rejection is relatively speaking already very fast, I think we should > favour fast matching here. > > Results: > > > enchmark (codePoints) (size) Mode Cnt > Score Error Units > RegionMatchesIC.Latin1.regionMatchesIC ascii-match 1024 avgt 15 > 917.796 ± 20.285 ns/op > RegionMatchesIC.Latin1.regionMatchesIC ascii-mismatch 1024 avgt 15 > 4.367 ± 0.348 ns/op > RegionMatchesIC.Latin1.regionMatchesIC number-match 1024 avgt 15 > 399.656 ± 10.703 ns/op > RegionMatchesIC.Latin1.regionMatchesIC number-mismatch 1024 avgt 15 > 4.361 ± 0.664 ns/op > RegionMatchesIC.Latin1.regionMatchesIC lat1-match 1024 avgt 15 > 1384.443 ± 22.199 ns/op > RegionMatchesIC.Latin1.regionMatchesIC lat1-mismatch 1024 avgt 15 > 4.119 ± 0.451 ns/op Thanks for confirming ------------- PR: https://git.openjdk.org/jdk/pull/12632