[jira] [Commented] (LUCENE-2868) It should be easy to make use of TermState; rewritten queries should be shared automatically

2011-04-22 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023224#comment-13023224
 ] 

Chris Male commented on LUCENE-2868:


I've opened LUCENE-3041 to work on the suggestions made by Earwin.

 It should be easy to make use of TermState; rewritten queries should be 
 shared automatically
 

 Key: LUCENE-2868
 URL: https://issues.apache.org/jira/browse/LUCENE-2868
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Query/Scoring
Reporter: Karl Wright
 Attachments: LUCENE-2868.patch, LUCENE-2868.patch, lucene-2868.patch, 
 lucene-2868.patch, query-rewriter.patch


 When you have the same query in a query hierarchy multiple times, tremendous 
 savings can now be had if the user knows enough to share the rewritten 
 queries in the hierarchy, due to the TermState addition.  But this is clumsy 
 and requires a lot of coding by the user to take advantage of.  Lucene should 
 be smart enough to share the rewritten queries automatically.
 This can be most readily (and powerfully) done by introducing a new method to 
 Query.java:
 Query rewriteUsingCache(IndexReader indexReader)
 ... and including a caching implementation right in Query.java which would 
 then work for all.  Of course, all callers would want to use this new method 
 rather than the current rewrite().

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2868) It should be easy to make use of TermState; rewritten queries should be shared automatically

2011-01-25 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986281#action_12986281
 ] 

Simon Willnauer commented on LUCENE-2868:
-

bq.Here's my take on the patch, including ability to cache weight objects.
I have a couple of comments here - first I can not apply your patch to the 
current trunk can you update it?

* you keep a cache per IndexSeacher (btw. QueryDataCache is missing in the 
patch) which is used to cache several things across searches. This is very 
dangerous! While I don't know how it is implemented I would guess you need to 
synchronized access to it so it would slow down searches ey? 

* Caching Scorers is going to break since Scorers are stateful and might be 
advanced to different documents. Yet, I can see what you are trying to do here 
since doing work in a scorer is costly so common TermQueries for instance 
should not need to load the same posting list twice. There are two things which 
come to my mind right away. 1. Postinglist caching - should be done on a codec 
level IMO 2. Building PerReaderTermState only once for a common TermQuery. 
While caching PostingLists is going to be tricky and quite a task reusing 
PerReaderTermState could work fine as far as I can see if you are in the same 
searcher. 

* Caching Weights is kind of weird - what is the reason for this again? The 
only thing you really save here is setup costs which are generally very low.

Overall I don' t like that this way you tightly couple  something to Weight / 
Query etc. for a single purpose what could be solved with some kind of query 
optimization phase similar to what I had in my last patch and Earwin has 
proposed. I think we should not tight couple things like that into lucene. This 
is really extremely application dependent in the most cases and we should only 
provide the infrastructure to do it. 

bq. Earwin - I think we should make a new issue and get something like that 
implemented in there which is more general than what I just sketched out. If 
you could share your code that would be awesome!
Earwin, any new on this - shall I open an issue for that?

bq. It occurs to me that the name of the common class that gets created in 
IndexSearcher and passed around should probably be named something more 
appropriate, like QueryContext. That way people will feel free to extend it to 
hold all sorts of query-local data, in time. Thoughts?
You refer to ScorerContext? This class was actually not intended to be 
expendable its public final until now. I am not sure if we should open that up 
though. 

 It should be easy to make use of TermState; rewritten queries should be 
 shared automatically
 

 Key: LUCENE-2868
 URL: https://issues.apache.org/jira/browse/LUCENE-2868
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Query/Scoring
Reporter: Karl Wright
 Attachments: lucene-2868.patch, query-rewriter.patch


 When you have the same query in a query hierarchy multiple times, tremendous 
 savings can now be had if the user knows enough to share the rewritten 
 queries in the hierarchy, due to the TermState addition.  But this is clumsy 
 and requires a lot of coding by the user to take advantage of.  Lucene should 
 be smart enough to share the rewritten queries automatically.
 This can be most readily (and powerfully) done by introducing a new method to 
 Query.java:
 Query rewriteUsingCache(IndexReader indexReader)
 ... and including a caching implementation right in Query.java which would 
 then work for all.  Of course, all callers would want to use this new method 
 rather than the current rewrite().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2868) It should be easy to make use of TermState; rewritten queries should be shared automatically

2011-01-24 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986248#action_12986248
 ] 

Karl Wright commented on LUCENE-2868:
-

It occurs to me that the name of the common class that gets created in 
IndexSearcher and passed around should probably be named something more 
appropriate, like QueryContext.  That way people will feel free to extend it to 
hold all sorts of query-local data, in time.  Thoughts?


 It should be easy to make use of TermState; rewritten queries should be 
 shared automatically
 

 Key: LUCENE-2868
 URL: https://issues.apache.org/jira/browse/LUCENE-2868
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Query/Scoring
Reporter: Karl Wright
 Attachments: lucene-2868.patch, query-rewriter.patch


 When you have the same query in a query hierarchy multiple times, tremendous 
 savings can now be had if the user knows enough to share the rewritten 
 queries in the hierarchy, due to the TermState addition.  But this is clumsy 
 and requires a lot of coding by the user to take advantage of.  Lucene should 
 be smart enough to share the rewritten queries automatically.
 This can be most readily (and powerfully) done by introducing a new method to 
 Query.java:
 Query rewriteUsingCache(IndexReader indexReader)
 ... and including a caching implementation right in Query.java which would 
 then work for all.  Of course, all callers would want to use this new method 
 rather than the current rewrite().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2868) It should be easy to make use of TermState; rewritten queries should be shared automatically

2011-01-14 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981746#action_12981746
 ] 

Karl Wright commented on LUCENE-2868:
-

I reworded the description.

I think the word cache is correct, but what we really need is simply a cache 
that has the lifetime of a top-level rewrite.  I agree that putting the data in 
the query object itself would not have this characteristic, but on the other 
hand a second Query method that is cache aware seems reasonable.  For example:

Query rewriteMinimal(RewriteCache rc, IndexReader ir)

... where RewriteCache was an object that had a lifetime consistent with the 
highest-level rewrite operation done on the query graph.  The rewriteMinimal() 
method would look for the rewrite of the the current query in the RewriteCache, 
and if found, would return that, otherwise would call plain old rewrite() and 
then save the result.

So the patch would include:
(a) the change as specified to Query.java
(b) an implementation of RewriteCache, which *could* just be simplified to 
MapQuery,Query
(c) changes to the callers of rewrite(), so that the minimal rewrite was called 
instead.

Thoughts?


 It should be easy to make use of TermState; rewritten queries should be 
 shared automatically
 

 Key: LUCENE-2868
 URL: https://issues.apache.org/jira/browse/LUCENE-2868
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Query/Scoring
Reporter: Karl Wright

 When you have the same query in a query hierarchy multiple times, tremendous 
 savings can now be had if the user knows enough to share the rewritten 
 queries in the hierarchy, due to the TermState addition.  But this is clumsy 
 and requires a lot of coding by the user to take advantage of.  Lucene should 
 be smart enough to share the rewritten queries automatically.
 This can be most readily (and powerfully) done by introducing a new method to 
 Query.java:
 Query rewriteUsingCache(IndexReader indexReader)
 ... and including a caching implementation right in Query.java which would 
 then work for all.  Of course, all callers would want to use this new method 
 rather than the current rewrite().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2868) It should be easy to make use of TermState; rewritten queries should be shared automatically

2011-01-14 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981774#action_12981774
 ] 

Earwin Burrfoot commented on LUCENE-2868:
-

We here use an intermediate query AST, with a number of walkers that do synonym 
substitution, optimization, caching, rewriting for multiple fields, and finally 
- generating a tree of Lucene Queries.

I can share a generic reflection-based visitor that's somewhat more handy than 
default visitor pattern in java.
Usage looks roughly like: 
{code}
class ToStringWalker extends DispatchingVisitorString { // String here stands 
for the type of walk result
  String visit(TermQuery q) {
return {term:  + q.getTerm() + };
  }

  String visit(BooleanQuery q) {
StringBuffer buf = new StringBuffer();
buf.append({boolean: );
for (BooleanQuery.Clause clause: q.clauses()) {
  buf.append(dispatch(clause.getQuery()).append(, ); // Here we 
}
buf.append(});
return buf.toString();
  }

  String visit(SpanQuery q) { // Runs for all SpanQueries
.
  }

  String visit(Query q) { // Runs for all Queries not covered by a more exact 
visit() method 
..
  }
}

Query query = ...;
String stringRepresentation = new ToStringWalker().dispatch(query);
{code}

dispatch() checks its parameter runtime type, picks a visit()'s most close 
overload (according to java rules for compile-time overloaded method 
resolution), and invokes it.

 It should be easy to make use of TermState; rewritten queries should be 
 shared automatically
 

 Key: LUCENE-2868
 URL: https://issues.apache.org/jira/browse/LUCENE-2868
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Query/Scoring
Reporter: Karl Wright
 Attachments: query-rewriter.patch


 When you have the same query in a query hierarchy multiple times, tremendous 
 savings can now be had if the user knows enough to share the rewritten 
 queries in the hierarchy, due to the TermState addition.  But this is clumsy 
 and requires a lot of coding by the user to take advantage of.  Lucene should 
 be smart enough to share the rewritten queries automatically.
 This can be most readily (and powerfully) done by introducing a new method to 
 Query.java:
 Query rewriteUsingCache(IndexReader indexReader)
 ... and including a caching implementation right in Query.java which would 
 then work for all.  Of course, all callers would want to use this new method 
 rather than the current rewrite().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2868) It should be easy to make use of TermState; rewritten queries should be shared automatically

2011-01-14 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981778#action_12981778
 ] 

Simon Willnauer commented on LUCENE-2868:
-

bq. I can share a generic reflection-based visitor that's somewhat more handy 
than default visitor pattern in java.
Earwin - I think we should make a new issue and get something like that 
implemented in there which is more general than what I just sketched out. If 
you could share your code that would be awesome!

 It should be easy to make use of TermState; rewritten queries should be 
 shared automatically
 

 Key: LUCENE-2868
 URL: https://issues.apache.org/jira/browse/LUCENE-2868
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Query/Scoring
Reporter: Karl Wright
 Attachments: query-rewriter.patch


 When you have the same query in a query hierarchy multiple times, tremendous 
 savings can now be had if the user knows enough to share the rewritten 
 queries in the hierarchy, due to the TermState addition.  But this is clumsy 
 and requires a lot of coding by the user to take advantage of.  Lucene should 
 be smart enough to share the rewritten queries automatically.
 This can be most readily (and powerfully) done by introducing a new method to 
 Query.java:
 Query rewriteUsingCache(IndexReader indexReader)
 ... and including a caching implementation right in Query.java which would 
 then work for all.  Of course, all callers would want to use this new method 
 rather than the current rewrite().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org