On Sat, 21 Jan 2023 00:15:10 GMT, Scott Gibbons <d...@openjdk.org> wrote:
> Added code for Base64 acceleration (encode and decode) which will accelerate > ~4x for AVX2 platforms. > > Encode performance: > **Old:** > > Benchmark (maxNumBytes) Mode Cnt Score Error > Units > Base64Encode.testBase64Encode 1024 thrpt 3 4309.439 ± 2.632 > ops/ms > > > **New:** > > Benchmark (maxNumBytes) Mode Cnt Score Error > Units > Base64Encode.testBase64Encode 1024 thrpt 3 24211.397 ± 102.026 > ops/ms > > > Decode performance: > **Old:** > > Benchmark (errorIndex) (lineSize) (maxNumBytes) Mode > Cnt Score Error Units > Base64Decode.testBase64Decode 144 4 1024 thrpt > 3 3961.768 ± 93.409 ops/ms > > **New:** > Benchmark (errorIndex) (lineSize) (maxNumBytes) Mode > Cnt Score Error Units > Base64Decode.testBase64Decode 144 4 1024 thrpt > 3 14738.051 ± 24.383 ops/ms src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2661: > 2659: __ vpbroadcastq(xmm4, Address(r13, 0), Assembler::AVX_256bit); > 2660: __ vmovdqu(xmm11, Address(r13, 0x28)); > 2661: __ vpbroadcastb(xmm10, Address(r13, 0), Assembler::AVX_256bit); Sorry in advance since I'm probably reading this wrong: the data that `r13` is pointing to appears to be a repeated byte pattern (`0x2f2f2f...`), does this mean this `vpbroadcastb` and the `vpbroadcastq` above end up filling up their respective registers with the exact same bits? If so, and since neither of them is mutated in the code below, then perhaps this can be simplified a bit. ------------- PR: https://git.openjdk.org/jdk/pull/12126