jgq2008303393 commented on issue #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache URL: https://github.com/apache/lucene-solr/pull/916#issuecomment-539562885 # Problem scene [LUCENE-8027](https://issues.apache.org/jira/browse/LUCENE-8027) is very similar to our idea of solving the problem in our ES clusters(based on Lucene). We have dozens of clusters for metric scenarios. As most of the queries are similar to the following, we frequently encounter some absurdly slow queries. ``` GET host_monitor/_search { "size": 10000, "query": { "bool": { "filter": [ { "term": { "host_ip": "xxx" } }, { "range": { "timestamp": { "gte": "now-5d/d", "lt": "now/d" } } } ] } }, "docvalue_fields": ["cpu_usage"] } ``` After found out that it's a caching issue, we skip the caching process directly when IndexOrDocValuesQuery appears with a selective lead iterator. And the following is the test result. Query | 1th | 2th | 3th | 4th | 5th | 6th -- | -- | -- | -- | -- | -- | -- Before Optimization | 47ms | 42ms | 48ms | 49ms | 742ms | 25ms After Optimization| 44ms | 45ms | 46ms | 52ms | 43ms | 47ms As the above result shows: - Before optimization: after the 5th query consumes 742ms for caching, the next query latency drops to 25ms. - After optimization: query latency is stable and keeps at 45ms~. # Our proposal In general, if there are enough queries to hit the cache, then the asynchronous caching is valuable. However, for a long time range query(e.g. 5 days), each range query will consume tens of megabytes of memory and spend hundreds of milliseconds, but the benefits are not obvious. And those large cache entries will cause frequent cache eviction. So in my opinion, although asynchronous caching would benefit for most queries, it's still necessary to apply heuristic caching when the range query is too large. If you also think that heuristic caching is necessary, I am pleasure to open a new PR and provide our patch refer to [LUCENE-8027](https://issues.apache.org/jira/browse/LUCENE-8027). Looking forward to more discussions. @jpountz @atris
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org