benwtrent opened a new issue, #15707:
URL: https://github.com/apache/lucene/issues/15707

   ### Description
   
   During the 10.4 release process, we noticed a 100-300x slowdown on a 
benchmark that utilizes doc-skippers. 
   
   The query is rather innocuous, its an attempt to sort the result by a SORTED 
SET field while filtering by a separate sorted set field. The index is sorted 
by both fields, but also by multiple others. 
   
   Before the changes to `TermOrdValComparator.SkipperBasedCompetitiveState` 
this query took about 4ms, now its 700+ms. 
   
   Here are some interesting flamegraphs (I am attempting to get JFR to pair 
down the actual times of the queries, but its taking hours as the actual 
runtime of the benchmark is many hours...)
   
   
   baseline (note how its 0.06% of samples)
   <img width="1722" height="117" alt="Image" 
src="https://github.com/user-attachments/assets/21527a63-4bcc-40d5-a1d6-090baa00d9bd";
 />
   
   candidate (note how its over 3% of samples)
   
   <img width="1722" height="127" alt="Image" 
src="https://github.com/user-attachments/assets/598dfc01-2387-41ac-b4a5-a6c2c1731779";
 />
   
   I am still attempting to fully replicate without an insane benchmark run of 
multiple hours. The key things seem to be:
   
    - Index sorted by (but not exclusively by) two doc value fields
    - Using a skipper with larger windows
    - ... and I am not sure what else... need to dig further, this is new 
territory for me.
   
   Regardless, this competitive iterator is still doing WAY too much work and 
obviously needs fixing.
   
   
   One of the key things this shows me is the lack of Lucene Util nightly 
benchmarks that use the new doc-skipper stuff. I think @romseygeek is working 
on these to catch stuff like this early.
   
   Related PRs:
   
    - https://github.com/apache/lucene/pull/15696
    - https://github.com/apache/lucene/pull/15511
   
   ### Version and environment details
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to