[ https://issues.apache.org/jira/browse/LUCENE-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17397339#comment-17397339 ]
ASF subversion and git services commented on LUCENE-10043: ---------------------------------------------------------- Commit 2e3620fe0a70d7e1dc3261112ea314ab5512bd3f in lucene-solr's branch refs/heads/branch_8x from Julie Tibshirani [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2e3620f ] LUCENE-10043: Decrease default LRUQueryCache#skipCacheFactor to 10 In LUCENE-9002 we introduced logic to skip caching a clause if it would be too expensive compared to the usual query cost. Specifically, we avoid caching a clause if its cost is estimated to be a 250x higher than the lead iterator's. We've found that the default of 250 is quite high and can lead to poor tail latencies. This PR decreases it to 10 to cache more conservatively > Decrease default for LRUQueryCache#skipCacheFactor? > --------------------------------------------------- > > Key: LUCENE-10043 > URL: https://issues.apache.org/jira/browse/LUCENE-10043 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Julie Tibshirani > Priority: Minor > Time Spent: 1h 20m > Remaining Estimate: 0h > > In LUCENE-9002 we introduced logic to skip caching a clause if it would be > too expensive compared to the usual query cost. Specifically, we avoid > caching a clause if its cost is estimated to be a factor higher than the lead > iterator's: > {code} > // skip cache operation which would slow query down too much > if (cost / skipCacheFactor > leadCost) { > return supplier.get(leadCost); > } > {code} > Choosing good defaults is hard! We've seen some examples in Elasticsearch > where caching a query clause causes a major slowdown, contributing to poor > tail latencies. It made me think that the default 'skipCacheFactor' of 250 > may be too high -- interpreted simply, this means we'll cache a clause even > if it is ~250 times more expensive than running the top-level query on its > own. Would it make sense to decrease this to 10 or so? It seems okay to air > on the side of less caching for individual clauses, especially since any > parent 'BooleanQuery' is already eligible for caching? > As a note, the interpretation "~250 times more expensive than running the > top-level query on its own" isn't perfectly accurate. The true cost doesn't > dependent on the number of matched documents, but also the cost of matching > itself. Making it even more complex, some queries like > 'IndexOrDocValuesQuery' have different matching strategies based on whether > they're used as a lead iterator or verifier. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org