jgq2008303393 commented on a change in pull request #940: LUCENE-9002: Query 
caching leads to absurdly slow queries
URL: https://github.com/apache/lucene-solr/pull/940#discussion_r334230846
 
 

 ##########
 File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java
 ##########
 @@ -732,8 +741,39 @@ public ScorerSupplier scorerSupplier(LeafReaderContext 
context) throws IOExcepti
 
       if (docIdSet == null) {
         if (policy.shouldCache(in.getQuery())) {
-          docIdSet = cache(context);
-          putIfAbsent(in.getQuery(), docIdSet, cacheHelper);
+          final ScorerSupplier supplier = in.scorerSupplier(context);
+          if (supplier == null) {
+            putIfAbsent(in.getQuery(), DocIdSet.EMPTY, cacheHelper);
+            return null;
+          }
+
+          final long cost = supplier.cost();
+          return new ScorerSupplier() {
+            @Override
+            public Scorer get(long leadCost) throws IOException {
+              // skip cache operation which would slow query down too much
+              if ((cost > skipCacheCost || cost > leadCost * skipCacheFactor)
+                  && in.getQuery() instanceof IndexOrDocValuesQuery) {
 
 Review comment:
   @jpountz Thanks for reminding. I found the following ES query has similar 
problem. 
   ```java
   GET host_monitor/_search
   {
     "size": 100,
     "query": {
       "bool": {
         "filter": [
           {
             // term query cost is 1.8w
             "term": {
               "timestamp": 1570291200
             }
           },
           {
             // terms query cost is 759.0w
             "terms": {
               "cpu_name": ["cpu0", "cpu1", "cpu2", "cpu3", "cpu4", "cpu5", 
"cpu6", "cpu7"]
             }
           }
         ]
       }
     }
   }
   ```
   The test result of this query is as follows:
   
   Query | 1th | 2th | 3th | 4th | 5th | 6th
   -- | -- | -- | -- | -- | -- | --
   Before Optimization | 28ms | 27ms | 29ms | 616ms | 16ms | 16ms
   After Optimization| 27ms | 30ms | 25ms | 27ms | 28ms | 27ms
   
   | queryPattern      | latencyWithoutCaching  | latencyWithCaching | leadCost 
| rangeQueryCost  | skipCacheFactor |
   | ---------- | :-----------:  | :-----------: | :-----------:  | 
:-----------: | :-----------:  |
   | cpu_name with 2 vlaues | 22ms | 165ms(+650%) | 18432 | 1911910 | 103 |
   | cpu_name with 4 vlaues | 24ms | 307ms(+1180%) | 18432 | 3778721| 205 |
   | cpu_name with 8 vlaues | 27ms | 616ms(+2270%) | 18432 | 7590457 | 411 |
   
   
   According to above test result, this PR also do a very good job at 
optimizing terms query execution. I will apply this PR to all query types in 
next patch.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to