Copying the github issue I created since it doesn't look like every github
issue is automatically copied to this mailing list. Apologies if this ends
up as a duplicate.
https://github.com/apache/lucene/issues/14986

Description

   1. Each IndexSearcher has its own UsageTrackingQueryCachingPolicy that
   is shared across all segments.
   2. This caching policy uses a 256-length ring buffer to keep track of
   recently used queries.
   3. A TermInSetQuery with rewriteMethod =
   MultiTermQuery.CONSTANT_SCORE_BLENDED_REWRITE yields a RewritingWeight.
   4. Getting a scorer from this RewritingWeight for a segment could
   involve rewriting to a BooleanQuery of multiple TermQuery with only the
   terms present in that particular segment - ref
   
org.apache.lucene.search.AbstractMultiTermQueryConstantScoreWrapper.RewritingWeight#scorerSupplier
   5. Thus a single TermInSetQuery will end up thrashing the ring buffer as
   multiple distinct BooleanQuerys from different segments.
   6. This leads to a poor caching rate for indexes with a large number of
   segments.

We could verify this behavior with a new caching policy that delegates to
UsageTrackingQueryCachingPolicy after logging the onUse() and shouldCache()
 calls.

Is there a good reason to not have this ring buffer tracking at a per
segment level? That would fix this issue.
Version and environment details

Lucene 9.12.1

Reply via email to