sgup432 opened a new issue, #15887:
URL: https://github.com/apache/lucene/issues/15887

   ### Description
   
   While debugging a latency spike issue in OpenSearch, I saw that one of user 
was firing a complex boolean queries which had hundreds(900 to be exact) of 
SHOULD clauses where most of the time was being spent in scorer construction 
rather than actual query execution.
   
   This is how their query looked like below:
   ```
   {
     "bool": {
       "must": [{"term": {"field1": "value1"}}],
       "filter": [{
         "bool": {
           "should": [
             {"bool": {"filter": [
               {"terms": {"field2": ["val_a", "val_b"]}},
               {"term": {"field3": "val_001"}},
               {"term": {"field4": "val_x"}}
             ]}},
             {"bool": {"filter": [
               {"terms": {"field2": ["val_c"]}},
               {"term": {"field3": "val_002"}},
               {"term": {"field4": "val_x"}}
             ]}},
             ... // ~900 more such clauses
           ]
         }
       }]
     }
   }
   ```
   
   
   In one of the hot threads dump, I saw this:
   ```
   100.5% cpu usage by thread 'search[T#10]'
     BooleanScorerSupplier.req(BooleanScorerSupplier.java:496)
     BooleanScorerSupplier.getInternal(BooleanScorerSupplier.java:137)
     BooleanScorerSupplier.get(BooleanScorerSupplier.java:117)
     BooleanScorerSupplier.opt(BooleanScorerSupplier.java:537) ← building 
scorers for 900+ clauses
     BooleanScorerSupplier.getInternal(BooleanScorerSupplier.java:145)
     BooleanScorerSupplier.get(BooleanScorerSupplier.java:117)
     BooleanScorerSupplier.requiredBulkScorer(BooleanScorerSupplier.java:377)
     BooleanScorerSupplier.booleanScorer(BooleanScorerSupplier.java:219)
     BooleanScorerSupplier.bulkScorer(BooleanScorerSupplier.java:177)
   ```
   
   Note that the above query had zero hits, so most of the should clauses must 
have had zero cost. 
   
   I see that BooleanScorerSupplier already computes and caches `cost()` for 
every child ScorerSupplier during `computeShouldCost() / computeCost()`. So 
maybe we can use this cached cost to skip the expensive `scorer.get(leadCost)`?
   
   We can do that for should clauses where if `minShouldMatch <= 1`, and a 
clause has `cost() == 0`, we simply skip it and return an empty scorer?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to