jgq2008303393 commented on a change in pull request #940: LUCENE-9002: Query 
caching leads to absurdly slow queries
URL: https://github.com/apache/lucene-solr/pull/940#discussion_r333896831
 
 

 ##########
 File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java
 ##########
 @@ -732,8 +741,39 @@ public ScorerSupplier scorerSupplier(LeafReaderContext 
context) throws IOExcepti
 
       if (docIdSet == null) {
         if (policy.shouldCache(in.getQuery())) {
-          docIdSet = cache(context);
-          putIfAbsent(in.getQuery(), docIdSet, cacheHelper);
+          final ScorerSupplier supplier = in.scorerSupplier(context);
+          if (supplier == null) {
+            putIfAbsent(in.getQuery(), DocIdSet.EMPTY, cacheHelper);
+            return null;
+          }
+
+          final long cost = supplier.cost();
+          return new ScorerSupplier() {
+            @Override
+            public Scorer get(long leadCost) throws IOException {
+              // skip cache operation which would slow query down too much
+              if ((cost > skipCacheCost || cost > leadCost * skipCacheFactor)
 
 Review comment:
   We have tested different scenarios to observe the query latency with/without 
cacheing in an online ES cluster. Here is the result:
   
   | queryPattern      | latencyWithoutCaching  | latencyWithCaching | leadCost 
| rangeQueryCost  | skipCacheFactor |
   | ---------- | :-----------:  | :-----------: | :-----------:  | 
:-----------: | :-----------:  |
   | ip:xxx AND time:[t-1h, t] | 10ms | 36ms(+260%) | 20528 | 878979 | 42 |
   | ip:xxx AND time:[t-4h, t] | 10ms | 100ms(+900%) | 20528 | 4365870 | 212 |
   | ip:xxx AND time:[t-8h, t] | 11ms | 200ms(+1700%) | 20528 | 8724483 | 425 |
   | ip:xxx AND time:[t-12h, t] | 12ms | 300ms(+2400%) | 20528 | 13083096 | 637 
|
   | ip:xxx AND time:[t-24h, t] | 16ms | 500ms(+3000%) | 20528 | 26158936 | 
1274 |
   | ip:xxx AND time:[t-48h, t] | 30ms | 1200ms(3900%) | 20528 | 52310616 | 
2548 |
   
   As the table shows, query latency without caching is low and it's related 
with the final result set. Query latency with caching is much high and it's 
mainly related with _rangeQueryCost_. According to the above test, we set the 
default value of _skipCacheFactor_ to 250, which make the query slower by no 
more than 10 times.
   
   In addition to _skipCacheFactor_ which is similar to _maxCostFactor_ in 
LUCENE-8027, we add a new parameter _skipCacheCost_. The mainly reasons are:
   - control the time used for caching as the caching time is related to the 
cost of range query.
   - skip caching too large range queries which will consume too much memory 
and evict cache entries frequently.
   
   How do you think? Looking forward to your ideas. @jpountz 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to