Hi,

First of all: We have to be a bit careful with the results. E.g., the SegmentReader::get_live_docs() returns null, so the code does not use deleted documents. Of course this is not relevant if the original Lucene Index is also without deletions, but you need to keep an eye on it.

What's great: the indexing part ist perfectly optimized in Lucene Java. He figured also out that our highly multithreaded IndexWriter is almost impossible to rewrite in C++. He told about segmentation faults occuring all the time when he tried to make the code parallel and spending weeks on debugging this. So actually Java's concurrency with the Java memory model is much easier to handle. So what he has actually shown: You can make queries faster by specialization (see below).

It is also nice what he figured out: LZ4 and indexing itsself is as fast in C++ as in Java, so we see that Hotspot is doing a good job. There are only some smaller optimizations possible because Lucene Java sometimes copies data needlessy (which is still a limitation of our IndexInput design).

What is a good outcome: If we go to completely drop IndexInput and all directory abstractions at some point in Lucene 11 with Java 24 and solely work on MemorySegments, we could improve a lot. I agree with his info that we still copy a lot of data from memory segments to heap when decoding PFOR and similar stuff instead of directly accessing the memory using VarHandles/MemorySegment. So we should really get rid of the IndexInput abstractions (Robert and I are always getting crazy when we see the IndexInput bullshit with seek and unaligned accesses....).

What is of course very crazy and the main reason for his improvements: He figured out that the query part can be made faster by using some tricks like not using virtual function calls. This is not possible in Java and has the downside of requiring to recompile whole of Lucene on C++ side whenever you add a new query type (as everything is hardcoded). So he looses a lot of flexibility.

P.S.: Maybe we should make the BulkScorer window size configurable...
P.P.S.: He did not yet implement HNSW at all, so he does not use SIMD. I wonder why the Lucene is not faster for stuff that autovectorizes nicely (like bitcounts ind FixedBitSet,...).

Uwe

Am 22.07.2024 um 17:30 schrieb Michael McCandless:
Thanks for sharing Adrien, this is really cool! It's neat that the relative gains of Java vs C are quite a bit less than they were ~11 years ago when I played with a much smaller subset of queries.  Also, COUNT on disjunction queries with Lucene Cyborg got slower.  What a feat, to port so much of our complex Search code to C!

Mike McCandless

http://blog.mikemccandless.com


On Mon, Jul 22, 2024 at 9:43 AM Adrien Grand <jpou...@gmail.com> wrote:

    Hello everyone,

    I recently stumbled on this paper after Ishan shared it on
    LinkedIn:
    
https://github.com/0ctopus13prime/lucene-cyborg-paper/blob/main/LuceneCyborg_Hybrid_Search_Engine_Written_in_Java_and_C%2B%2B.pdf.

    This is quite impressive: this person did a high-fidelity rewrite
    of Lucene in C++: it can even read indexes created by Lucene
    as-is. Then they ran the Tantivy benchmark to compare performance
    with Lucene, Tantivy and PISA. There are many takeaways, this is
    an interesting read.

-- Adrien

--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail:u...@thetaphi.de

Reply via email to