Partial in-lining handles copy and mismatch for small array sizes less than -XX:ArrayOperationPartialInlineSize bytes through JIT code rather than calling optimized stubs thereby saving costly call overhead.
Enabling partial in-lining optimization for AMD EPYC servers supporting AVX-512 feature. Following are the performance numbers on Turin at fixed frequency of 2.1GHz <img width="440" height="440" alt="image" src="https://github.com/user-attachments/assets/14b55ee3-b65c-4247-8739-67f1b94dceb4" /> <img width="440" height="300" alt="image" src="https://github.com/user-attachments/assets/c00d6443-45a2-4277-961d-580ceea5da88" /> Kindly review and share your feedback. Best Regards, Jatin ------------- Commit messages: - Extending micro-benchmark for short array mismatch - 8376794: Enable copy and mismatch Partial Inlining for AMD AVX512 targets Changes: https://git.openjdk.org/jdk/pull/29519/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29519&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8376794 Stats: 75 lines in 2 files changed: 47 ins; 5 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/29519.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29519/head:pull/29519 PR: https://git.openjdk.org/jdk/pull/29519
