Actually this is exactly the same for Java: You can try whatever you want, the outcome of the dynamic optimization applied by various dynamic building blocks (Java bytecode, Java/Hotspot version, command line parameters, hardware CPU, virtualization) is not predictable and any change anywhere may produce different results. So we should stop on arguing about changing *our* code to improve assembly code. If we have some code on our side and it is not correctly converted to CMOV, we should open bug report on OpenJDK (Chris H. and I can do this easily - and ask for improvement).

As you have seen in my other answer to this thread: Hotspot applies CMOV depending on analysis of branches. So in general our code *should* make us of CMOV. You can only get certainity by using hsdis and print of assembly for some of our methods which you think should use CMOV. But there's no guarantee that it is applied. And as always: It may take a very long time until Hotspot replaces the standard branched code by conditional moves (as they have significant overhead if used in cases where the result is

With Hotspot you can try to add -XX:ConditionalMoveLimit ("Limit of ops to make speculative when using CMOVE") and try with different values (0 disables, default is 3 on x86 and aarch64, 4 on arm). But as always: Wait long enough.

To enforce usage of CMOV (maybe that's the first thing for trying around and to look on the type of assembly created; but this may slow down other code as CMOV is always used, without analysis): -XX:+UseCMoveUnconditionally ("Generates CMove (scalar and vector) instructions regardless of profitability analysis.")

Uwe

P.S.: Hotspot also has cmov for vectorized code

Am 28.07.2023 um 09:08 schrieb Dawid Weiss:

    Specifically, one of the fascinating Tantivy optimizations is the
    branchless binary search:
    https://quickwit.io/blog/search-a-sorted-block.


This is an interesting post, thanks for sharing, Mike. I remember when people did such low-level tricks frequently (but on much simpler processors and fairly consistent hardware) and it always makes me wonder whether all the moving blocks involved here (rust, llvm, actual hardware) make it sane - any change in any of these layers may affect the outcome (and debugging what actually happened will be a nightmare...). I like it though - nice intellectual exercise and some assembly dumps for a change. ;)

D.

--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail:u...@thetaphi.de

Reply via email to