Re: RFR: 8365675: Add String Unicode Case-Folding Support [v7]

Xueming Shen Wed, 29 Oct 2025 20:03:56 -0700

On Wed, 29 Oct 2025 21:07:03 GMT, Roger Riggs <[email protected]> wrote:


>>> Experimenting with Arrays.mismatch at the beginning of the array iteration 
>>> as
>>> ...
>>> The benchmark results suggest that it does help 'dramatically' when the 
>>> compared strings share with the same prefix. For example those "UpperLower" 
>>> test cases (which shares the same upper cases text prefix. However it is 
>>> also relatively expensive, with a 20%-ish overhead when the strings do not 
>>> share the same string text but are case-insensitively equals. I would 
>>> suggest let's leave it out for now?
>> 
>>> ```
>> Ok to leave it out for now.  In similar contexts where System.arraycopy or 
>> Arrays.mismatch has some overhead I've suggested doing a simple check (like 
>> `size < 8`) to avoid the overhead when the strings/byte arrays are short.
>> Thanks for checking.
>
>> The performance is slightly better, but not as good as I would have 
>> expected. The access to codepoint from the long looks a little clumsy, but 
>> the logic looks smooth. need more work. opinion?
> It does look cleaner without the array indexing in the loops.
> Can the counting of characters (fcnt1,fcnt2) be eliminated by encoding 3 
> 20-bit characters into the long and then checking `f1 != 0` to indicate there 
> are more characters.  Its a bit of an odd mix of 16-bit characters vs a 
> single 20-bit char. Are there any 20-bit chars from or to folded replacements 
> in the folding mappings?

Good idea.   After removing the fcnt the implementation looks much cleaner and 
more straightforward. The1:m folding implementation is also faster.  Maybe this 
is good enough to. go :-) 

The latest numbers


Benchmark                                    Mode  Cnt   Score   Error  Units
StringCompareToFoldCase.asciiLower           avgt   15  15.874 ± 1.276  ns/op
StringCompareToFoldCase.asciiLowerEQ         avgt   15   9.915 ± 0.242  ns/op
StringCompareToFoldCase.asciiLowerEQFC       avgt   15  10.751 ± 0.219  ns/op
StringCompareToFoldCase.asciiLowerFC         avgt   15  10.277 ± 0.126  ns/op
StringCompareToFoldCase.asciiUpperLower      avgt   15  12.121 ± 0.699  ns/op
StringCompareToFoldCase.asciiUpperLowerEQ    avgt   15  10.836 ± 0.746  ns/op
StringCompareToFoldCase.asciiUpperLowerEQFC  avgt   15   9.091 ± 0.273  ns/op
StringCompareToFoldCase.asciiUpperLowerFC    avgt   15   9.207 ± 0.255  ns/op
StringCompareToFoldCase.asciiWithDFFC        avgt   15  38.322 ± 0.975  ns/op
StringCompareToFoldCase.greekLower           avgt   15  39.746 ± 0.127  ns/op
StringCompareToFoldCase.greekLowerEQ         avgt   15  39.303 ± 0.063  ns/op
StringCompareToFoldCase.greekLowerEQFC       avgt   15  20.470 ± 0.329  ns/op
StringCompareToFoldCase.greekLowerFC         avgt   15  19.734 ± 0.295  ns/op
StringCompareToFoldCase.greekUpperLower      avgt   15   7.084 ± 0.085  ns/op
StringCompareToFoldCase.greekUpperLowerEQ    avgt   15   7.472 ± 0.115  ns/op
StringCompareToFoldCase.greekUpperLowerEQFC  avgt   15   6.608 ± 0.248  ns/op
StringCompareToFoldCase.greekUpperLowerFC    avgt   15   6.573 ± 0.189  ns/op
StringCompareToFoldCase.latin1UTF16          avgt   15  24.407 ± 2.157  ns/op
StringCompareToFoldCase.latin1UTF16EQ        avgt   15  22.632 ± 0.131  ns/op
StringCompareToFoldCase.latin1UTF16EQFC      avgt   15  29.564 ± 0.655  ns/op
StringCompareToFoldCase.latin1UTF16FC        avgt   15  29.273 ± 0.324  ns/op
StringCompareToFoldCase.supLower             avgt   15  54.145 ± 0.075  ns/op
StringCompareToFoldCase.supLowerEQ           avgt   15  55.545 ± 0.042  ns/op
StringCompareToFoldCase.supLowerEQFC         avgt   15  24.788 ± 0.180  ns/op
StringCompareToFoldCase.supLowerFC           avgt   15  24.515 ± 0.025  ns/op
StringCompareToFoldCase.supUpperLower        avgt   15  14.437 ± 0.127  ns/op
StringCompareToFoldCase.supUpperLowerEQ      avgt   15  15.253 ± 0.728  ns/op
StringCompareToFoldCase.supUpperLowerEQFC    avgt   15   9.820 ± 0.104  ns/op
StringCompareToFoldCase.supUpperLowerFC      avgt   15   9.776 ± 0.127  ns/op
Finished running test 
'micro:org.openjdk.bench.java.lang.StringCompareToFoldCase'

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27628#discussion_r2476267966

Re: RFR: 8365675: Add String Unicode Case-Folding Support [v7]

Reply via email to