jgq2008303393 commented on issue #916: LUCENE-8213: Asynchronous Caching in 
LRUQueryCache
URL: https://github.com/apache/lucene-solr/pull/916#issuecomment-539562885
 
 
   # Problem scene
   [LUCENE-8027](https://issues.apache.org/jira/browse/LUCENE-8027) is very 
similar to our idea of solving the problem in our ES clusters(based on Lucene). 
We have dozens of clusters for metric scenarios. As most of the queries are 
similar to the following, we frequently encounter some absurdly slow queries.
   ```
   GET host_monitor/_search
   {
     "size": 10000, 
     "query": {
       "bool": {
         "filter": [
           {
             "term": {
               "host_ip": "xxx"
             }
           },
           {
             "range": {
               "timestamp": {
                 "gte": "now-5d/d",
                 "lt": "now/d"
               }
             }
           }
         ]
       }
     },
     "docvalue_fields": ["cpu_usage"]
   }
   ```
   After found out that it's a caching issue, we skip the caching process 
directly when IndexOrDocValuesQuery appears with a selective lead iterator. And 
the following is the test result. 
   
   Query | 1th | 2th | 3th | 4th | 5th | 6th
   -- | -- | -- | -- | -- | -- | --
   Before Optimization | 47ms | 42ms | 48ms | 49ms | 742ms | 25ms
   After Optimization| 44ms | 45ms | 46ms | 52ms | 43ms | 47ms
   
   As the above result shows:
   - Before optimization: after the 5th query consumes 742ms for caching, the 
next query latency drops to 25ms.
   - After optimization: query latency is stable and keeps at 45ms~.
   
   # Our proposal
   In general, if there are enough queries to hit the cache, then the 
asynchronous caching is valuable. However, for a long time range query(e.g. 5 
days), each range query will consume tens of megabytes of memory and spend 
hundreds of milliseconds, but the benefits are not obvious. And those large 
cache entries will cause frequent cache eviction.
   
   So in my opinion, although asynchronous caching would benefit for most 
queries, it's still necessary to apply heuristic caching when the range query 
is too large. 
   
   If you also think that heuristic caching is necessary, I am pleasure to open 
a new PR and provide our patch refer to 
[LUCENE-8027](https://issues.apache.org/jira/browse/LUCENE-8027). Looking 
forward to more discussions. @jpountz @atris  

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to