[jira] [Commented] (LUCENE-2868) It should be easy to make use of TermState; rewritten queries should be shared automatically
[ https://issues.apache.org/jira/browse/LUCENE-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13023224#comment-13023224 ] Chris Male commented on LUCENE-2868: I've opened LUCENE-3041 to work on the suggestions made by Earwin. > It should be easy to make use of TermState; rewritten queries should be > shared automatically > > > Key: LUCENE-2868 > URL: https://issues.apache.org/jira/browse/LUCENE-2868 > Project: Lucene - Java > Issue Type: Improvement > Components: Query/Scoring >Reporter: Karl Wright > Attachments: LUCENE-2868.patch, LUCENE-2868.patch, lucene-2868.patch, > lucene-2868.patch, query-rewriter.patch > > > When you have the same query in a query hierarchy multiple times, tremendous > savings can now be had if the user knows enough to share the rewritten > queries in the hierarchy, due to the TermState addition. But this is clumsy > and requires a lot of coding by the user to take advantage of. Lucene should > be smart enough to share the rewritten queries automatically. > This can be most readily (and powerfully) done by introducing a new method to > Query.java: > Query rewriteUsingCache(IndexReader indexReader) > ... and including a caching implementation right in Query.java which would > then work for all. Of course, all callers would want to use this new method > rather than the current rewrite(). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2868) It should be easy to make use of TermState; rewritten queries should be shared automatically
[ https://issues.apache.org/jira/browse/LUCENE-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986281#action_12986281 ] Simon Willnauer commented on LUCENE-2868: - bq.Here's my take on the patch, including ability to cache weight objects. I have a couple of comments here - first I can not apply your patch to the current trunk can you update it? * you keep a cache per IndexSeacher (btw. QueryDataCache is missing in the patch) which is used to cache several things across searches. This is very dangerous! While I don't know how it is implemented I would guess you need to synchronized access to it so it would slow down searches ey? * Caching Scorers is going to break since Scorers are stateful and might be advanced to different documents. Yet, I can see what you are trying to do here since doing work in a scorer is costly so common TermQueries for instance should not need to load the same posting list twice. There are two things which come to my mind right away. 1. Postinglist caching - should be done on a codec level IMO 2. Building PerReaderTermState only once for a common TermQuery. While caching PostingLists is going to be tricky and quite a task reusing PerReaderTermState could work fine as far as I can see if you are in the same searcher. * Caching Weights is kind of weird - what is the reason for this again? The only thing you really save here is setup costs which are generally very low. Overall I don' t like that this way you tightly couple something to Weight / Query etc. for a single purpose what could be solved with some kind of query optimization phase similar to what I had in my last patch and Earwin has proposed. I think we should not tight couple things like that into lucene. This is really extremely application dependent in the most cases and we should only provide the infrastructure to do it. bq. Earwin - I think we should make a new issue and get something like that implemented in there which is more general than what I just sketched out. If you could share your code that would be awesome! Earwin, any new on this - shall I open an issue for that? bq. It occurs to me that the name of the common class that gets created in IndexSearcher and passed around should probably be named something more appropriate, like QueryContext. That way people will feel free to extend it to hold all sorts of query-local data, in time. Thoughts? You refer to ScorerContext? This class was actually not intended to be expendable its public final until now. I am not sure if we should open that up though. > It should be easy to make use of TermState; rewritten queries should be > shared automatically > > > Key: LUCENE-2868 > URL: https://issues.apache.org/jira/browse/LUCENE-2868 > Project: Lucene - Java > Issue Type: Improvement > Components: Query/Scoring >Reporter: Karl Wright > Attachments: lucene-2868.patch, query-rewriter.patch > > > When you have the same query in a query hierarchy multiple times, tremendous > savings can now be had if the user knows enough to share the rewritten > queries in the hierarchy, due to the TermState addition. But this is clumsy > and requires a lot of coding by the user to take advantage of. Lucene should > be smart enough to share the rewritten queries automatically. > This can be most readily (and powerfully) done by introducing a new method to > Query.java: > Query rewriteUsingCache(IndexReader indexReader) > ... and including a caching implementation right in Query.java which would > then work for all. Of course, all callers would want to use this new method > rather than the current rewrite(). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2868) It should be easy to make use of TermState; rewritten queries should be shared automatically
[ https://issues.apache.org/jira/browse/LUCENE-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986248#action_12986248 ] Karl Wright commented on LUCENE-2868: - It occurs to me that the name of the common class that gets created in IndexSearcher and passed around should probably be named something more appropriate, like QueryContext. That way people will feel free to extend it to hold all sorts of query-local data, in time. Thoughts? > It should be easy to make use of TermState; rewritten queries should be > shared automatically > > > Key: LUCENE-2868 > URL: https://issues.apache.org/jira/browse/LUCENE-2868 > Project: Lucene - Java > Issue Type: Improvement > Components: Query/Scoring >Reporter: Karl Wright > Attachments: lucene-2868.patch, query-rewriter.patch > > > When you have the same query in a query hierarchy multiple times, tremendous > savings can now be had if the user knows enough to share the rewritten > queries in the hierarchy, due to the TermState addition. But this is clumsy > and requires a lot of coding by the user to take advantage of. Lucene should > be smart enough to share the rewritten queries automatically. > This can be most readily (and powerfully) done by introducing a new method to > Query.java: > Query rewriteUsingCache(IndexReader indexReader) > ... and including a caching implementation right in Query.java which would > then work for all. Of course, all callers would want to use this new method > rather than the current rewrite(). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2868) It should be easy to make use of TermState; rewritten queries should be shared automatically
[ https://issues.apache.org/jira/browse/LUCENE-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981778#action_12981778 ] Simon Willnauer commented on LUCENE-2868: - bq. I can share a generic reflection-based visitor that's somewhat more handy than default visitor pattern in java. Earwin - I think we should make a new issue and get something like that implemented in there which is more general than what I just sketched out. If you could share your code that would be awesome! > It should be easy to make use of TermState; rewritten queries should be > shared automatically > > > Key: LUCENE-2868 > URL: https://issues.apache.org/jira/browse/LUCENE-2868 > Project: Lucene - Java > Issue Type: Improvement > Components: Query/Scoring >Reporter: Karl Wright > Attachments: query-rewriter.patch > > > When you have the same query in a query hierarchy multiple times, tremendous > savings can now be had if the user knows enough to share the rewritten > queries in the hierarchy, due to the TermState addition. But this is clumsy > and requires a lot of coding by the user to take advantage of. Lucene should > be smart enough to share the rewritten queries automatically. > This can be most readily (and powerfully) done by introducing a new method to > Query.java: > Query rewriteUsingCache(IndexReader indexReader) > ... and including a caching implementation right in Query.java which would > then work for all. Of course, all callers would want to use this new method > rather than the current rewrite(). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2868) It should be easy to make use of TermState; rewritten queries should be shared automatically
[ https://issues.apache.org/jira/browse/LUCENE-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981774#action_12981774 ] Earwin Burrfoot commented on LUCENE-2868: - We here use an intermediate query AST, with a number of walkers that do synonym substitution, optimization, caching, rewriting for multiple fields, and finally - generating a tree of Lucene Queries. I can share a generic reflection-based visitor that's somewhat more handy than default visitor pattern in java. Usage looks roughly like: {code} class ToStringWalker extends DispatchingVisitor { // String here stands for the type of walk result String visit(TermQuery q) { return "{term: " + q.getTerm() + "}"; } String visit(BooleanQuery q) { StringBuffer buf = new StringBuffer(); buf.append("{boolean: "); for (BooleanQuery.Clause clause: q.clauses()) { buf.append(dispatch(clause.getQuery()).append(", "); // Here we } buf.append("}"); return buf.toString(); } String visit(SpanQuery q) { // Runs for all SpanQueries . } String visit(Query q) { // Runs for all Queries not covered by a more exact visit() method .. } } Query query = ...; String stringRepresentation = new ToStringWalker().dispatch(query); {code} dispatch() checks its parameter runtime type, picks a visit()'s most close overload (according to java rules for compile-time overloaded method resolution), and invokes it. > It should be easy to make use of TermState; rewritten queries should be > shared automatically > > > Key: LUCENE-2868 > URL: https://issues.apache.org/jira/browse/LUCENE-2868 > Project: Lucene - Java > Issue Type: Improvement > Components: Query/Scoring >Reporter: Karl Wright > Attachments: query-rewriter.patch > > > When you have the same query in a query hierarchy multiple times, tremendous > savings can now be had if the user knows enough to share the rewritten > queries in the hierarchy, due to the TermState addition. But this is clumsy > and requires a lot of coding by the user to take advantage of. Lucene should > be smart enough to share the rewritten queries automatically. > This can be most readily (and powerfully) done by introducing a new method to > Query.java: > Query rewriteUsingCache(IndexReader indexReader) > ... and including a caching implementation right in Query.java which would > then work for all. Of course, all callers would want to use this new method > rather than the current rewrite(). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2868) It should be easy to make use of TermState; rewritten queries should be shared automatically
[ https://issues.apache.org/jira/browse/LUCENE-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981746#action_12981746 ] Karl Wright commented on LUCENE-2868: - I reworded the description. I think the word "cache" is correct, but what we really need is simply a cache that has the lifetime of a top-level rewrite. I agree that putting the data in the query object itself would not have this characteristic, but on the other hand a second Query method that is cache aware seems reasonable. For example: Query rewriteMinimal(RewriteCache rc, IndexReader ir) ... where RewriteCache was an object that had a lifetime consistent with the highest-level rewrite operation done on the query graph. The rewriteMinimal() method would look for the rewrite of the the current query in the RewriteCache, and if found, would return that, otherwise would call plain old rewrite() and then save the result. So the patch would include: (a) the change as specified to Query.java (b) an implementation of RewriteCache, which *could* just be simplified to Map (c) changes to the callers of rewrite(), so that the minimal rewrite was called instead. Thoughts? > It should be easy to make use of TermState; rewritten queries should be > shared automatically > > > Key: LUCENE-2868 > URL: https://issues.apache.org/jira/browse/LUCENE-2868 > Project: Lucene - Java > Issue Type: Improvement > Components: Query/Scoring >Reporter: Karl Wright > > When you have the same query in a query hierarchy multiple times, tremendous > savings can now be had if the user knows enough to share the rewritten > queries in the hierarchy, due to the TermState addition. But this is clumsy > and requires a lot of coding by the user to take advantage of. Lucene should > be smart enough to share the rewritten queries automatically. > This can be most readily (and powerfully) done by introducing a new method to > Query.java: > Query rewriteUsingCache(IndexReader indexReader) > ... and including a caching implementation right in Query.java which would > then work for all. Of course, all callers would want to use this new method > rather than the current rewrite(). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org