On Wed, 6 May 2026 17:27:38 GMT, Kerem Kat <[email protected]> wrote:

> I did see libsimdsort before starting the implementation. I don't think 
> search algorithm belongs in it, and I don't think there should be a new 
> libsimdsearch, as it would be loaded as a separate .so file, have its own 
> build/make code etc. which seems like an overkill for a ~400 line stub.

You frame it in such a way that JVM stubs are the go-to option to speed up 
execution of Java code. Unfortunately, it's quite the opposite: stubs are the 
last resort when there are no other options left on the table. Stubs are hard 
to review & maintain, they are a source of tricky bugs which frequently 
compromise JVM integrity, they add overhead during startup. Amount of 
boilerplate code needed is low on that list, but even there the ceremony on JVM 
side to introduce a single stub is tedious. And, in that picture, extra build 
changes are not at the top as well.  

The bar to introduce new stubs is high and there should be compelling reasons 
to end up with new ones when there are alternatives. Usually, it was about 
performance when JNI was the only option to interact with native code. 
(Critical natives were an adhoc way to lift some of the limitations.) Now, 
there's `java.lang.foreign` and it provides competitive level of performance 
compared to JVM stubs. So, I'd like to see more data to justify why JVM stubs 
are superior and the way to go. 


> I also found that libsimdsort is linux-only, whereas the current stub 
> supports Windows too.
  
All those details are not cut in stone, but specifics of libsimdsort case. If 
extra native libraries cause problems, they can be merged into a single one. 
Portable C/C++ code can be compiled for all supported platforms from a single 
source. libsimdsort is not fully portable (due to immintin.h dependency), but 
extending it to windows-x86 is rather straightforward (more about filling in 
wrappers and build support rather than rewriting the implementation). (There's 
another case to study: vectormath which is supported on wider range of 
platforms.) But `simdsort` and `vectormath` prove that it's possible to meet 
performance goals without using JVM stubs.


> Perhaps more importantly, stub integrates better with C2. It allows us to 
> create two entry conditions based on input array length, one for compile-time 
> known length (length_type check) and another check for runtime (using 
> generate_fair_guard), to fallback to non-intrinsic version for small arrays, 
> for some definition of small. These checks are especially relevant, when the 
> cost of calling the stub is more expensive than the non-intrinsified default 
> version.

Can you quantify how much it adds to performance?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/30612#issuecomment-4391325123

Reply via email to