See Shipilevs blog: https://shipilev.net/jvm/anatomy-quarks/30-conditional-moves/
He also has some examples and also there's a command line option to tell hotspot when to use cmov: -XX:ConditionalMoveLimit Uwe Am 27. Juli 2023 14:43:19 MESZ schrieb Uwe Schindler <u...@thetaphi.de>: >Hi Mike, > >actually Hotspot is using CMOV. Some nodes after bytecode analysis are >converted to CMoveNode and it has implementations to generate machine code (as >far as i see) for x86, s390 and arm. > >The generic code is here: >https://github.com/openjdk/jdk/blob/486c7844f902728ce580c3994f58e3e497834952/src/hotspot/share/opto/movenode.cpp > >Actually it is used in some cases, but I did not find out when it uses it to >generate instructions from bytecode. It also has some code to optimize the >cmov away if the result is known before (or not, e.g. for floats it does not >do this). > >I think best would be to ask on the hotspot compiler list on suggestions how >to write Java code to trigger the JVM to insert the CMOV. > >Uwe > >Am 27.07.2023 um 13:40 schrieb Michael McCandless: >> Hi Team, >> >> At Amazon (customer facing product search team) we've been playing with / >> benchmarking Tantivy (exciting Rust search engine loosely inspired by >> Lucene: https://github.com/quickwit-oss/tantivy, created by Paul Masurel and >> developed now by Quickwit and the Tantivy open source dev community) vs >> Lucene, by building a benchmark that blends Tantivy's search-benchmark-game >> (https://github.com/quickwit-oss/search-benchmark-game) and Lucene's nightly >> benchmarks (luceneutil: https://github.com/mikemccand/luceneutil and >> https://home.apache.org/~mikemccand/lucenebench). >> >> It's great fun, and we would love more eyeballs to spot any remaining unfair >> aspects of the comparison (thank you Uwe and Adrien for catching things >> already!). We are trying hard for an apples to apples comparison: same >> (enwiki) corpus, same queries, confirming we get precisely the same top N >> hits and total hit counts, latest versions of both engines. >> >> Indeed, Tantivy is substantially (2-3X?) faster than Lucene for many >> queries, and we've been trying to understand why. >> >> Sometimes it is due to a more efficient algorithms, e.g. the count() API for >> pure disjunctive queries, which Adrien is now porting to Lucene (thank >> you!), showing sizable (~80% faster in one query) speedups: >> https://github.com/apache/lucene/issues/12358. Other times may be due to >> Rust's more efficient/immediate/Python-like GC, or direct access to SIMD >> (Lucene now has this for aKNN search -- big speedup -- and maybe soon for >> postings too?), and unsafe code, different skip data / block postings >> encoding, or ... >> >> Specifically, one of the fascinating Tantivy optimizations is the branchless >> binary search: https://quickwit.io/blog/search-a-sorted-block. Here's >> another blog post about it (implemented in C++): >> https://probablydance.com/2023/04/27/beautiful-branchless-binary-search/ >> >> The idea is to get rustc/gcc to compile down to x86-64's CMOVcc >> ("conditional move") instruction (I'm not sure if ARM has an equivalent? >> Maybe "conditional execution"?). The idea is to avoid a "true" branch of >> the instruction stream (costly pipeline flush on mis-prediction, which is >> likely in a binary search or priority queue context) by instead >> conditionally moving a value from one location to another (register or >> memory). Tantivy uses this for skipping through postings, in a single layer >> in-memory skip list structure (versus Lucene's on-disk (memory-mapped, by >> default) multi-layer skip list) -- see the above blog post. >> >> This made me wonder: does javac's hotspot compiler use CMOVcc? I see javac >> bug fixes like https://github.com/openjdk/mobile/commit/a03e9220 which seems >> to imply C2 does in fact compile to CMOVcc sometimes. So then I wondered >> whether a branchless binary search in Java is a possibility? Has anyone >> played with this? >> >> Before Robert gets too upset :) Even if we could build such a thing, the >> right place for such optimizations is likely the JDK itself (see the similar >> discussion about SIMD-optimized sorting: >> https://github.com/apache/lucene/issues/12399). Still, I'm curious if anyone >> has explored this, and maybe saw some performance gains from way up in >> javaland where we can barely see the bare metal shining beneath us :) >> >> Sorry for the long post! >> >> Mike McCandless >> >> http://blog.mikemccandless.com > >-- >Uwe Schindler >Achterdiek 19, D-28357 Bremen >https://www.thetaphi.de >eMail:u...@thetaphi.de -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de