On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons <sgibb...@openjdk.org> wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score >> Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 >> 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 >> 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 >> 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 >> 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 >> 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 >> 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 >> 1.000265957x >> StringIndexOf.success 9.186 >> 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 >> 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 >> 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 >> 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 >> 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 >> 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 >> 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 >> 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 >> 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 >> 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional > commit since the last revision: > > Rearrange; add lambdas for clarity src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 314: > 312: > 313: // needle_len is in elements, not bytes, for UTF-16 > 314: __ cmpq(needle_len, isUU ? OPT_NEEDLE_SIZE_MAX / 2 : > OPT_NEEDLE_SIZE_MAX); OPT_NEEDLE_SIZE_MAX is an odd number (set to 5), should that have been an even number? src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 329: > 327: > //////////////////////////////////////////////////////////////////////////////////////// > 328: > 329: __ bind(L_begin); So far we have handled haystack <= 32 and needle_size <= 5 (?) in bytes. A high level algorithm description here is needed in comments to follow the code below. A description of what are the various paths in terms of haystack and needle sizes and how to reason the assembly code below and make sure that all the paths are taken care of. Also the abstraction level suddenly changes here to detailed code below instead of methods for the various paths. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591640551 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1591646095