[jira] Commented: (LUCENE-2868) It should be easy to make use of TermState; rewritten queries should be shared automatically

Simon Willnauer (JIRA) Tue, 25 Jan 2011 01:07:14 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986281#action_12986281
 ]


Simon Willnauer commented on LUCENE-2868:
-----------------------------------------

bq.Here's my take on the patch, including ability to cache weight objects.
I have a couple of comments here - first I can not apply your patch to the 
current trunk can you update it?

* you keep a cache per IndexSeacher (btw. QueryDataCache is missing in the 
patch) which is used to cache several things across searches. This is very 
dangerous! While I don't know how it is implemented I would guess you need to 
synchronized access to it so it would slow down searches ey? 

* Caching Scorers is going to break since Scorers are stateful and might be 
advanced to different documents. Yet, I can see what you are trying to do here 
since doing work in a scorer is costly so common TermQueries for instance 
should not need to load the same posting list twice. There are two things which 
come to my mind right away. 1. Postinglist caching - should be done on a codec 
level IMO 2. Building PerReaderTermState only once for a common TermQuery. 
While caching PostingLists is going to be tricky and quite a task reusing 
PerReaderTermState could work fine as far as I can see if you are in the same 
searcher. 

* Caching Weights is kind of weird - what is the reason for this again? The 
only thing you really save here is setup costs which are generally very low.

Overall I don' t like that this way you tightly couple  something to Weight / 
Query etc. for a single purpose what could be solved with some kind of query 
optimization phase similar to what I had in my last patch and Earwin has 
proposed. I think we should not tight couple things like that into lucene. This 
is really extremely application dependent in the most cases and we should only 
provide the infrastructure to do it. 

bq. Earwin - I think we should make a new issue and get something like that 
implemented in there which is more general than what I just sketched out. If 
you could share your code that would be awesome!
Earwin, any new on this - shall I open an issue for that?

bq. It occurs to me that the name of the common class that gets created in 
IndexSearcher and passed around should probably be named something more 
appropriate, like QueryContext. That way people will feel free to extend it to 
hold all sorts of query-local data, in time. Thoughts?
You refer to ScorerContext? This class was actually not intended to be 
expendable its public final until now. I am not sure if we should open that up 
though. 

> It should be easy to make use of TermState; rewritten queries should be 
> shared automatically
> --------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2868
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2868
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Query/Scoring
>            Reporter: Karl Wright
>         Attachments: lucene-2868.patch, query-rewriter.patch
>
>
> When you have the same query in a query hierarchy multiple times, tremendous 
> savings can now be had if the user knows enough to share the rewritten 
> queries in the hierarchy, due to the TermState addition.  But this is clumsy 
> and requires a lot of coding by the user to take advantage of.  Lucene should 
> be smart enough to share the rewritten queries automatically.
> This can be most readily (and powerfully) done by introducing a new method to 
> Query.java:
> Query rewriteUsingCache(IndexReader indexReader)
> ... and including a caching implementation right in Query.java which would 
> then work for all.  Of course, all callers would want to use this new method 
> rather than the current rewrite().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2868) It should be easy to make use of TermState; rewritten queries should be shared automatically

Reply via email to