Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v48]
On Thu, 30 May 2024 14:57:35 GMT, Scott Gibbons wrote: >>> Control question: Are we confident with this potentially going into JDK 23 >>> or should we rather postpone to JDK 24? The fork is next week. >> >> I would hold off. @asgibbons it may pass our tests, and your extensive >> testing. But you never know what the fuzzer can find over a few weeks once >> it runs with your changes. I have made that experience many times. Let's >> just give it a few days, and then we have one JDK version less to worry >> about for backports on possible follow-up bugs ;) > > @eme64 I'm glad to have received your feedback. I see I have erroneously > assumed that by making the exact code change you requested still requires > your acceptance - I won't make that mistake again. I had also erroneously > assumed that your review was complete and you had no further changes for me > to make. I'd also not like to make that mistake again, but I'm unsure how to > conclude that a review is complete - it seems like 7 hours of elapsed time > isn't sufficient to indicate completion, so can you please help me figure > this out? Perhaps it's just my distaste for "trickle-in" comments, which I > should get over, or is there another way you can suggest? > > As for the fuzzer I would be very interested in learning more about this. We > have a significant number of compute resources, so it may be valuable for us > to set up a copy of the fuzzer on-site to improve the quality of our > submissions. Can you help in pointing me to someone that can advise me on > how to do this? > > As for holding off the integration, I'll leave the decision to a sponsor for > this PR. I don't believe increasing the reviewer count just to "force" > reevaluation should be an acceptable practice, although I'm not an insider in > this community. @asgibbons I was done with my review, or at least so I thought Still: if I give comments, it would be nice to quickly finish the conversation, unless if I don't respond in many days and not even to emails. Often I only see the glaring issues. Then you fix them, and then I see something else around it. Then I may give more comments. That is what happened. If I think that I have small suggestions and then I'm done, then I might even approve even though there are suggestions still to be added. I just put up the limit really quick so that nobody else would by accident sponsor it before we have finished the conversation, and I will definitely give you my approval once the little issues are resolved ;) - PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2139893561
Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v48]
On Thu, 30 May 2024 13:56:30 GMT, Emanuel Peter wrote: >> Control question: Are we confident with this potentially going into JDK 23 >> or should we rather postpone to JDK 24? The fork is next week. > >> Control question: Are we confident with this potentially going into JDK 23 >> or should we rather postpone to JDK 24? The fork is next week. > > I would hold off. @asgibbons it may pass our tests, and your extensive > testing. But you never know what the fuzzer can find over a few weeks once it > runs with your changes. I have made that experience many times. Let's just > give it a few days, and then we have one JDK version less to worry about for > backports on possible follow-up bugs ;) @eme64 I guess to add some confidence.. we did also 'test it independently' to catch blind spots. i.e. `String/IndexOf.java` is from me. I tried to be as paranoid as possible with non-random strings. Passed everything I could throw at it.. - PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2139882544
Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v48]
On Thu, 30 May 2024 13:56:30 GMT, Emanuel Peter wrote: >> Control question: Are we confident with this potentially going into JDK 23 >> or should we rather postpone to JDK 24? The fork is next week. > >> Control question: Are we confident with this potentially going into JDK 23 >> or should we rather postpone to JDK 24? The fork is next week. > > I would hold off. @asgibbons it may pass our tests, and your extensive > testing. But you never know what the fuzzer can find over a few weeks once it > runs with your changes. I have made that experience many times. Let's just > give it a few days, and then we have one JDK version less to worry about for > backports on possible follow-up bugs ;) @eme64 I'm glad to have received your feedback. I see I have erroneously assumed that by making the exact code change you requested still requires your acceptance - I won't make that mistake again. I had also erroneously assumed that your review was complete and you had no further changes for me to make. I'd also not like to make that mistake again, but I'm unsure how to conclude that a review is complete - it seems like 7 hours of elapsed time isn't sufficient to indicate completion, so can you please help me figure this out? Perhaps it's just my distaste for "trickle-in" comments, which I should get over, or is there another way you can suggest? As for the fuzzer I would be very interested in learning more about this. We have a significant number of compute resources, so it may be valuable for us to set up a copy of the fuzzer on-site to improve the quality of our submissions. Can you help in pointing me to someone that can advise me on how to do this? As for holding off the integration, I'll leave the decision to a sponsor for this PR. I don't believe increasing the reviewer count just to "force" reevaluation should be an acceptable practice, although I'm not an insider in this community. - PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2139814010
Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v48]
On Thu, 30 May 2024 12:58:27 GMT, Scott Gibbons wrote: >> test/jdk/java/lang/String/IndexOf.java line 35: >> >>> 33: * @requires vm.cpu.features ~= ".*avx2.*" >>> 34: * @requires vm.compiler2.enabled >>> 35: * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions -Xcomp >>> -XX:-TieredCompilation -XX:UseAVX=2 -XX:+UnlockDiagnosticVMOptions >>> -XX:+EnableX86ECoreOpts IndexOf >> >> Same here: why is the test AVX2 specific? Could other platforms not also be >> "tickled" in interesting ways with this test? > > There are two test blocks, so all platforms will be able to take advantage of > the test via the first block. I'm told that's how this works. Yes, that is right. Good. >> test/jdk/java/lang/StringBuffer/IndexOf.java line 188: >> >>> 186: } >>> 187: >>> 188: } >> >> It looks like you just indented basically the whole file by 1 space. Why? > > I hadn't noticed this. It's most likely an artifact of my editor as it > wasn't intentional. I'll change this back. Ok, maybe check your code on GitHub next time ;) - PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620768228 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620746147
Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v48]
On Thu, 30 May 2024 06:25:32 GMT, Tobias Hartmann wrote: > Control question: Are we confident with this potentially going into JDK 23 or > should we rather postpone to JDK 24? The fork is next week. I would hold off. @asgibbons it may pass our tests, and your extensive testing. But you never know what the fuzzer can find over a few weeks once it runs with your changes. I have made that experience many times. Let's just give it a few days, and then we have one JDK version less to worry about for backports on possible follow-up bugs ;) - PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2139615822
Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v48]
On Thu, 30 May 2024 13:03:06 GMT, Scott Gibbons wrote: >> Would be a shame to spend so much time on writing a test and then not apply >> it everywhere ;) > > I'll add a separate @test block to this file. It was, however, written > specifically tuned for the new algorithm to exercise known edge cases. A new `@test` sounds like a good idea - PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620747402
Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v48]
On Thu, 30 May 2024 13:33:40 GMT, Scott Gibbons wrote: >> Control question: Are we confident with this potentially going into JDK 23 >> or should we rather postpone to JDK 24? The fork is next week. > > Thank you all for the comments. @TobiHartmann I'm comfortable with this > going into JDK 23. The code has been functionally stable for me for the past > 2 months. The recent churn centers primarily around restructuring the code > for readability and maintainability and ensuring protection against reading > past the end of strings. Both Vlad (Volodymyr) and @sviswa7 have scoured the > code with me and together we have convinced ourselves that we've covered all > the bases. Of course we may have missed something but my confidence is high. > > The overall performance gain as reported by the StringIndexOf JMH averages > ~7x running on an e-core as compared with baseline on the same core. It's > skewed somewhat towards massive gains for long (~2K) strings (avg 14.4x) and > modest gains for small-ish strings (avg ~1.8x). I've measured up to 60x > performance improvement for some 2K UTF-16 indexOf operations. > > Again, thank you all. It's been a fun exercise and I've learned a lot. @asgibbons generally it would be nice if you waited for me to accept your changes before integrating. - PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2139604424
Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v48]
On Thu, 30 May 2024 06:25:32 GMT, Tobias Hartmann wrote: >> Scott Gibbons has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Remove duplicate vm.compiler2.enabled > > Control question: Are we confident with this potentially going into JDK 23 or > should we rather postpone to JDK 24? The fork is next week. Thank you all for the comments. @TobiHartmann I'm comfortable with this going into JDK 23. The code has been functionally stable for me for the past 2 months. The recent churn centers primarily around restructuring the code for readability and maintainability and ensuring protection against reading past the end of strings. Both Vlad (Volodymyr) and @sviswa7 have scoured the code with me and together we have convinced ourselves that we've covered all the bases. Of course we may have missed something but my confidence is high. The overall performance gain as reported by the StringIndexOf JMH averages ~7x running on an e-core as compared with baseline on the same core. It's skewed somewhat towards massive gains for long (~2K) strings (avg 14.4x) and modest gains for small-ish strings (avg ~1.8x). I've measured up to 60x performance improvement for some 2K UTF-16 indexOf operations. Again, thank you all. It's been a fun exercise and I've learned a lot. - PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2139569361
Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v48]
On Thu, 30 May 2024 06:23:05 GMT, Emanuel Peter wrote: >> Scott Gibbons has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Remove duplicate vm.compiler2.enabled > > test/jdk/java/lang/String/IndexOf.java line 35: > >> 33: * @requires vm.cpu.features ~= ".*avx2.*" >> 34: * @requires vm.compiler2.enabled >> 35: * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions -Xcomp >> -XX:-TieredCompilation -XX:UseAVX=2 -XX:+UnlockDiagnosticVMOptions >> -XX:+EnableX86ECoreOpts IndexOf > > Same here: why is the test AVX2 specific? Could other platforms not also be > "tickled" in interesting ways with this test? There are two test blocks, so all platforms will be able to take advantage of the test via the first block. I'm told that's how this works. > test/jdk/java/lang/StringBuffer/IndexOf.java line 188: > >> 186: } >> 187: >> 188: } > > It looks like you just indented basically the whole file by 1 space. Why? I hadn't noticed this. It's most likely an artifact of my editor as it wasn't intentional. I'll change this back. - PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620669257 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620679629
Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v48]
On Thu, 30 May 2024 06:22:17 GMT, Emanuel Peter wrote: >> test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 29: >> >>> 27: * @requires vm.cpu.features ~= ".*avx2.*" >>> 28: * @requires vm.compiler2.enabled >>> 29: * @run main/othervm -XX:+UnlockDiagnosticVMOptions >>> -XX:+EnableX86ECoreOpts -XX:UseAVX=2 -Xbatch -XX:-TieredCompilation >>> -XX:CompileCommand=dontinline,ECoreIndexOf.indexOfKernel ECoreIndexOf >> >> Does this test really need to be `avx2` specific? Does it even need to be C2 >> specific? >> Or can this run on all platforms? > > Would be a shame to spend so much time on writing a test and then not apply > it everywhere ;) I'll add a separate @test block to this file. It was, however, written specifically tuned for the new algorithm to exercise known edge cases. - PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620676513
Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v48]
On Thu, 30 May 2024 06:21:36 GMT, Emanuel Peter wrote: >> Scott Gibbons has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Remove duplicate vm.compiler2.enabled > > test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 29: > >> 27: * @requires vm.cpu.features ~= ".*avx2.*" >> 28: * @requires vm.compiler2.enabled >> 29: * @run main/othervm -XX:+UnlockDiagnosticVMOptions >> -XX:+EnableX86ECoreOpts -XX:UseAVX=2 -Xbatch -XX:-TieredCompilation >> -XX:CompileCommand=dontinline,ECoreIndexOf.indexOfKernel ECoreIndexOf > > Does this test really need to be `avx2` specific? Does it even need to be C2 > specific? > Or can this run on all platforms? Would be a shame to spend so much time on writing a test and then not apply it everywhere ;) - PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620017891
Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v48]
On Wed, 29 May 2024 22:20:31 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score >> Latest >> StringIndexOf.advancedWithMediumSub 343.573317.934 >> 0.925375393x >> StringIndexOf.advancedWithShortSub11039.081 1053.96 >> 1.014319384x >> StringIndexOf.advancedWithShortSub255.828110.541 >> 1.980027943x >> StringIndexOf.constantPattern9.361 11.906 >> 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 >> 1.000474383x >> StringIndexOf.searchCharMediumSuccess3.133 3.216 >> 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.763.761 >> 1.000265957x >> StringIndexOf.success9.186 >> 9.713 1.057369911x >> StringIndexOf.successBig 14.34146.343 >> 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 >> 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 >> 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 >> 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 >> 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 >> 6863.3590.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 >> 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.12314771.622 >> 1.15517x >> StringIndexOfChar.latin1_mixed_char9901.671 9782.245 >> 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional > commit since the last revision: > > Remove duplicate vm.compiler2.enabled test/jdk/java/lang/String/IndexOf.java line 35: > 33: * @requires vm.cpu.features ~= ".*avx2.*" > 34: * @requires vm.compiler2.enabled > 35: * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions -Xcomp > -XX:-TieredCompilation -XX:UseAVX=2 -XX:+UnlockDiagnosticVMOptions > -XX:+EnableX86ECoreOpts IndexOf Same here: why is the test AVX2 specific? Could other platforms not also be "tickled" in interesting ways with this test? test/jdk/java/lang/StringBuffer/ECoreIndexOf.java line 29: > 27: * @requires vm.cpu.features ~= ".*avx2.*" > 28: * @requires vm.compiler2.enabled > 29: * @run main/othervm -XX:+UnlockDiagnosticVMOptions > -XX:+EnableX86ECoreOpts -XX:UseAVX=2 -Xbatch -XX:-TieredCompilation > -XX:CompileCommand=dontinline,ECoreIndexOf.indexOfKernel ECoreIndexOf Does this test really need to be `avx2` specific? Does it even need to be C2 specific? Or can this run on all platforms? test/jdk/java/lang/StringBuffer/IndexOf.java line 188: > 186: } > 187: > 188: } It looks like you just indented basically the whole file by 1 space. Why? - PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620019084 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620016717 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1620013302
Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v48]
On Wed, 29 May 2024 22:20:31 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score >> Latest >> StringIndexOf.advancedWithMediumSub 343.573317.934 >> 0.925375393x >> StringIndexOf.advancedWithShortSub11039.081 1053.96 >> 1.014319384x >> StringIndexOf.advancedWithShortSub255.828110.541 >> 1.980027943x >> StringIndexOf.constantPattern9.361 11.906 >> 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 >> 1.000474383x >> StringIndexOf.searchCharMediumSuccess3.133 3.216 >> 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.763.761 >> 1.000265957x >> StringIndexOf.success9.186 >> 9.713 1.057369911x >> StringIndexOf.successBig 14.34146.343 >> 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 >> 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 >> 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 >> 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 >> 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 >> 6863.3590.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 >> 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.12314771.622 >> 1.15517x >> StringIndexOfChar.latin1_mixed_char9901.671 9782.245 >> 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional > commit since the last revision: > > Remove duplicate vm.compiler2.enabled Control question: Are we confident with this potentially going into JDK 23 or should we rather postpone to JDK 24? The fork is next week. - PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-2138771509
Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v48]
On Wed, 29 May 2024 22:20:31 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score >> Latest >> StringIndexOf.advancedWithMediumSub 343.573317.934 >> 0.925375393x >> StringIndexOf.advancedWithShortSub11039.081 1053.96 >> 1.014319384x >> StringIndexOf.advancedWithShortSub255.828110.541 >> 1.980027943x >> StringIndexOf.constantPattern9.361 11.906 >> 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 >> 1.000474383x >> StringIndexOf.searchCharMediumSuccess3.133 3.216 >> 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.763.761 >> 1.000265957x >> StringIndexOf.success9.186 >> 9.713 1.057369911x >> StringIndexOf.successBig 14.34146.343 >> 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 >> 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 >> 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 >> 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 >> 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 >> 6863.3590.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 >> 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.12314771.622 >> 1.15517x >> StringIndexOfChar.latin1_mixed_char9901.671 9782.245 >> 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional > commit since the last revision: > > Remove duplicate vm.compiler2.enabled My testing passed. - Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16753#pullrequestreview-2086978326
Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v48]
> Re-write the IndexOf code without the use of the pcmpestri instruction, only > using AVX2 instructions. This change accelerates String.IndexOf on average > 1.3x for AVX2. The benchmark numbers: > > > BenchmarkScore > Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 > 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 > 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828110.541 > 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 > 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 > 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 > 1.02649218x > StringIndexOf.searchCharShortSuccess 3.763.761 > 1.000265957x > StringIndexOf.success 9.186 9.713 > 1.057369911x > StringIndexOf.successBig14.34146.343 > 3.231504079x > StringIndexOfChar.latin1_AVX2_String6220.918 12154.52 > 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 > 1.006629895x > StringIndexOfChar.latin1_SSE4_String6978.854 6818.689 > 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 > 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 > 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 > 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.12314771.622 > 1.15517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 > 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Remove duplicate vm.compiler2.enabled - Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/db0ab75a..ed06edd6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=16753=47 - incr: https://webrevs.openjdk.org/?repo=jdk=16753=46-47 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753