[jira] [Commented] (LUCENE-8017) FunctionRangeQuery and FunctionMatchQuery can pollute the QueryCache

Adrien Grand (JIRA) Thu, 26 Oct 2017 07:29:25 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16220527#comment-16220527
 ]


Adrien Grand commented on LUCENE-8017:
--------------------------------------

There is a TODO about this issue in {{LRUQueryCache}}:

{noformat}
      // TODO: should it be pluggable, eg. for queries that run on doc values?
      final IndexReader.CacheHelper cacheHelper = 
context.reader().getCoreCacheHelper();
{noformat}

My idea was that we could add a {{CacheHelper 
Weight.getCacheHelper(LeafReaderContext)}} API, that would tell how a query is 
allowed to be cached:
 - {{null}} if matches should never be cached
 - {{context.reader().getCoreCacheHelper()}} for queries that only depend on 
core data-structures like phrase queries, point queries, etc.
 - {{context.reader().getReaderCacheHelper()}} for queries that run on doc 
values (or live docs, but I can't think of a use-case for looking at live docs 
in a query)

bq. One could either use marker interfaces

I thought about this at some point but it doesn't work well with compound 
queries, ie. what interface should ConstantScoreQuery and BooleanQuery 
implement?

bq. The easiest solution is to just exclude the Function queries from the cache

It has a similar issue I think, how can we know that a Boolean Query may not be 
cached, do we need to unwrap all sub queries? What about 3rd-party compound 
queries that we cannot introspect?

bq. a cacheCost() method - the latter I quite like, as it means that different 
cache implementations can choose whether or not to cache in a more fine-grained 
manner

What would this cacheCost compute? Isn't it a metric that we already have with 
the scorer cost? This is a bit orthogonal to this issue, but I agree it would 
be good to avoid caching sub clauses whose cost is more than X times the cost 
of the entire query in order to preserve good tail latencies.

> FunctionRangeQuery and FunctionMatchQuery can pollute the QueryCache
> --------------------------------------------------------------------
>
>                 Key: LUCENE-8017
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8017
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>
> The QueryCache assumes that queries will return the same set of documents 
> when run over the same segment, independent of all other segments held by the 
> parent IndexSearcher.  However, both FunctionRangeQuery and 
> FunctionMatchQuery can select hits based on score, which depend on term 
> statistics over the whole index, and could therefore theoretically return 
> different result sets on a given segment.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8017) FunctionRangeQuery and FunctionMatchQuery can pollute the QueryCache

Reply via email to