On Sat, 23 Mar 2024 02:16:57 GMT, Scott Gibbons <sgibb...@openjdk.org> wrote:

>> Re-write the IndexOf code without the use of the pcmpestri instruction, only 
>> using AVX2 instructions.  This change accelerates String.IndexOf on average 
>> 1.3x for AVX2.  The benchmark numbers:
>> 
>> 
>> Benchmark                                                   Score            
>> Latest          
>> StringIndexOf.advancedWithMediumSub   343.573                317.934         
>> 0.925375393x
>> StringIndexOf.advancedWithShortSub1    1039.081              1053.96         
>> 1.014319384x
>> StringIndexOf.advancedWithShortSub2        55.828            110.541         
>> 1.980027943x
>> StringIndexOf.constantPattern                        9.361           11.906  
>>         1.271872663x
>> StringIndexOf.searchCharLongSuccess          4.216           4.218           
>> 1.000474383x
>> StringIndexOf.searchCharMediumSuccess        3.133           3.216           
>> 1.02649218x
>> StringIndexOf.searchCharShortSuccess 3.76                    3.761           
>> 1.000265957x
>> StringIndexOf.success                                        9.186           
>> 9.713           1.057369911x
>> StringIndexOf.successBig                           14.341            46.343  
>>         3.231504079x
>> StringIndexOfChar.latin1_AVX2_String   6220.918              12154.52        
>>         1.953814533x
>> StringIndexOfChar.latin1_AVX2_char     5503.556              5540.044        
>>         1.006629895x
>> StringIndexOfChar.latin1_SSE4_String   6978.854              6818.689        
>>         0.977049957x
>> StringIndexOfChar.latin1_SSE4_char     5657.499              5474.624        
>>         0.967675646x
>> StringIndexOfChar.latin1_Short_String          7132.541              
>> 6863.359                0.962260014x
>> StringIndexOfChar.latin1_Short_char  16013.389             16162.437         
>> 1.009307711x
>> StringIndexOfChar.latin1_mixed_String          7386.123            14771.622 
>>         1.999915517x
>> StringIndexOfChar.latin1_mixed_char    9901.671              9782.245        
>>         0.987938803
>
> Scott Gibbons has updated the pull request with a new target base due to a 
> merge or a rebase. The pull request now contains 46 commits:
> 
>  - Merge branch 'openjdk:master' into indexof
>  - Cleaned up, ready for review
>  - Pre-cleanup code
>  - Add JMH. Add 16-byte compares to arrays_equals
>  - Better method for mask creation
>  - Merge branch 'openjdk:master' into indexof
>  - Most cleanup done.
>  - Remove header dependency
>  - Works - needs cleanup
>  - Passes tests.
>  - ... and 36 more: https://git.openjdk.org/jdk/compare/bc739639...e079fc12

Hi, in Netty, we have our own AsciiString::indexOf based on SWAR techniques, 
which is manually loop unrolling the head processing (first < 8 bytes) to 
artificially make sure the branch predictor got different branches to care AND 
JIT won't make it wrong. We have measured (I can provide a link of the 
benchmark and results, If you are interested) that it delivers a much better 
performance on tiny strings and makes a smoother  degradation vs perfectly 
aligned string length as well. Clearly this tends to be much visible if the 
input strings have shuffled delimiter positions, to make the branch prediction 
misses more relevant.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2016432345

Reply via email to