Patch optimizes Adler32 stub for AVX512 target.

Main computation loop now uses zero extended lane widening load vector 
operation.

New sequence also honors AVX3Thresholds so that implementation uses existing 
AVX2 instruction sequence on relevant targets
if input size is smaller than threshold limit (default 4096).

Following are the result of an [existing JMH micro 
](https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/java/util/TestAdler32.java)on
 various targets.

**System Configurations : Turbo frequency scaling is disabled, all the data is 
collected at fixed frequency of 2.8 GHz.
SUT1   : Intel® Xeon® Platinum 8480+ Processor (Sapphire Rapids)  56C 2S
SUT2   : Intel(R) Xeon(R) Platinum 8380 CPU (Icelake Server) 40C 2S
SUT3   : Intel(R) Xeon(R) Platinum 8280 CPU (Cascadelake Server) 28C 2S**


![image](https://user-images.githubusercontent.com/59989778/212934730-68717a61-191f-4dba-8c83-2eddf6007a47.png)

![image](https://user-images.githubusercontent.com/59989778/212934945-cada95ad-c93c-487f-bacc-928a2e3b5c21.png)

![image](https://user-images.githubusercontent.com/59989778/212935059-511aca3b-c736-40a2-bff6-89caf0664828.png)


Please review and share your feedback.

Best Regards,
Jatin

-------------

Commit messages:
 - 8300208: Optimize Adler32 stub for AVX-512 targets.

Changes: https://git.openjdk.org/jdk/pull/12045/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12045&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8300208
  Stats: 142 lines in 4 files changed: 84 ins; 28 del; 30 mod
  Patch: https://git.openjdk.org/jdk/pull/12045.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12045/head:pull/12045

PR: https://git.openjdk.org/jdk/pull/12045

Reply via email to