Actually this is exactly the same for Java: You can try whatever you
want, the outcome of the dynamic optimization applied by various dynamic
building blocks (Java bytecode, Java/Hotspot version, command line
parameters, hardware CPU, virtualization) is not predictable and any
change anywhere may produce different results. So we should stop on
arguing about changing *our* code to improve assembly code. If we have
some code on our side and it is not correctly converted to CMOV, we
should open bug report on OpenJDK (Chris H. and I can do this easily -
and ask for improvement).
As you have seen in my other answer to this thread: Hotspot applies CMOV
depending on analysis of branches. So in general our code *should* make
us of CMOV. You can only get certainity by using hsdis and print of
assembly for some of our methods which you think should use CMOV. But
there's no guarantee that it is applied. And as always: It may take a
very long time until Hotspot replaces the standard branched code by
conditional moves (as they have significant overhead if used in cases
where the result is
With Hotspot you can try to add -XX:ConditionalMoveLimit ("Limit of ops
to make speculative when using CMOVE") and try with different values (0
disables, default is 3 on x86 and aarch64, 4 on arm). But as always:
Wait long enough.
To enforce usage of CMOV (maybe that's the first thing for trying around
and to look on the type of assembly created; but this may slow down
other code as CMOV is always used, without analysis):
-XX:+UseCMoveUnconditionally ("Generates CMove (scalar and vector)
instructions regardless of profitability analysis.")
Uwe
P.S.: Hotspot also has cmov for vectorized code
Am 28.07.2023 um 09:08 schrieb Dawid Weiss:
Specifically, one of the fascinating Tantivy optimizations is the
branchless binary search:
https://quickwit.io/blog/search-a-sorted-block.
This is an interesting post, thanks for sharing, Mike. I remember when
people did such low-level tricks frequently (but on much simpler
processors and fairly consistent hardware) and it
always makes me wonder whether all the moving blocks involved here
(rust, llvm, actual hardware) make it sane - any change in any of
these layers may affect the outcome (and debugging what
actually happened will be a nightmare...). I like it though - nice
intellectual exercise and some assembly dumps for a change. ;)
D.
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail:u...@thetaphi.de