On Sat, 21 Jan 2023 00:15:10 GMT, Scott Gibbons <d...@openjdk.org> wrote:

> Added code for Base64 acceleration (encode and decode) which will accelerate 
> ~4x for AVX2 platforms.
> 
> Encode performance:
> **Old:**
> 
> Benchmark                      (maxNumBytes)   Mode  Cnt     Score   Error   
> Units
> Base64Encode.testBase64Encode           1024  thrpt    3  4309.439 ± 2.632  
> ops/ms
> 
> 
> **New:**
> 
> Benchmark                      (maxNumBytes)   Mode  Cnt      Score     Error 
>   Units
> Base64Encode.testBase64Encode           1024  thrpt    3  24211.397 ± 102.026 
>  ops/ms
> 
> 
> Decode performance:
> **Old:**
> 
> Benchmark                      (errorIndex)  (lineSize)  (maxNumBytes)   Mode 
>  Cnt     Score    Error   Units
> Base64Decode.testBase64Decode           144           4           1024  thrpt 
>    3  3961.768 ± 93.409  ops/ms
> 
> **New:**
> Benchmark                      (errorIndex)  (lineSize)  (maxNumBytes)   Mode 
>  Cnt      Score    Error   Units
> Base64Decode.testBase64Decode           144           4           1024  thrpt 
>    3  14738.051 ± 24.383  ops/ms

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2661:

> 2659:     __ vpbroadcastq(xmm4, Address(r13, 0), Assembler::AVX_256bit);
> 2660:     __ vmovdqu(xmm11, Address(r13, 0x28));
> 2661:     __ vpbroadcastb(xmm10, Address(r13, 0), Assembler::AVX_256bit);

Sorry in advance since I'm probably reading this wrong: the data that `r13` is 
pointing to appears to be a repeated byte pattern (`0x2f2f2f...`), does this 
mean this `vpbroadcastb` and the `vpbroadcastq` above end up filling up their 
respective registers with the exact same bits? If so, and since neither of them 
is mutated in the code below, then perhaps this can be simplified a bit.

-------------

PR: https://git.openjdk.org/jdk/pull/12126

Reply via email to