[jira] [Commented] (LUCENE-8017) FunctionRangeQuery and FunctionMatchQuery can pollute the QueryCache

Adrien Grand (JIRA) Thu, 02 Nov 2017 05:42:24 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16235664#comment-16235664
 ]


Adrien Grand commented on LUCENE-8017:
--------------------------------------

bq.  I've made this abstract and implemented it on all Weights - it could 
theoretically default to returning null or the Reader-level cache helper, but I 
think it's better to be explicit about it.

+1

bq. // TODO should this return null? (on MatchAllDocsQuery)

I think it should return the core cache helper. Otherwise if a 
MatchAllDocsQuery is part of a more complex BooleanQuery, the BooleanQuery may 
never be cached even though it might be costly? Ensuring MatchAllDocsQuery is 
never cached is is better done in {{UsageTrackingQueryCachingPolicy}}?

bq. // TODO: can we work out which supplier will be used for this context 
cheaply? (on IndexOrDocValuesQuery)

Even though doc values are updateable, this query relies on the fact that both 
queries have the same matches, so I think we should just delegate to the 
{{indexQuery}}.

bq. re-instate getCoreAndDeletesCacheHelper (or CoreAndDocValues, or whatever 
we want to call it) so that doc-values based queries can be re-used between 
searchers if the DV gen is the same

I realize that I might have introduced some confusion by calling it "reader" 
cache helper. It _does_ cache based on core and dv updates / deletes, I wanted 
to change the name so that it wouldn't look as if it only applied to deletes, 
since it also applies to dv updates.

bq. Instead couldn't we alternatively just check 
context.reader.fieldInfos(field).dvGen from the new method in such queries, and 
if its -1, return the core cache helper, otherwise return null (or crappy 
per-reader stuff if we really must, but i don't think its helpful)

+1 to read the dvGen. Thinking about it again, I agree we should not cache 
queries on the reader helper, as there is no evidence that we will be able to 
keep entries in the cache for a long time if there is a constant stream of 
updates. So maybe Alan's original idea to just make it a boolean isCacheable() 
was better? Or maybe it is fine this way since otherwise the call-site of this 
method will convert the boolean into either `null` or the core cache helper 
anyway?



> FunctionRangeQuery and FunctionMatchQuery can pollute the QueryCache
> --------------------------------------------------------------------
>
>                 Key: LUCENE-8017
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8017
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Major
>         Attachments: LUCENE-8017.patch
>
>
> The QueryCache assumes that queries will return the same set of documents 
> when run over the same segment, independent of all other segments held by the 
> parent IndexSearcher.  However, both FunctionRangeQuery and 
> FunctionMatchQuery can select hits based on score, which depend on term 
> statistics over the whole index, and could therefore theoretically return 
> different result sets on a given segment.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8017) FunctionRangeQuery and FunctionMatchQuery can pollute the QueryCache

Reply via email to