[jira] [Commented] (LUCENE-6785) Consider merging Query.rewrite() into Query.createWeight()
[ https://issues.apache.org/jira/browse/LUCENE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939628#comment-14939628 ] Alan Woodward commented on LUCENE-6785: --- The API isn't as tidy, but I think you're right that this gives us more flexibility when it comes to query optimization, etc. I'll extend the patch to all modules. Thanks! > Consider merging Query.rewrite() into Query.createWeight() > -- > > Key: LUCENE-6785 > URL: https://issues.apache.org/jira/browse/LUCENE-6785 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward > Attachments: LUCENE-6785-alt.patch, LUCENE-6785.patch, > LUCENE-6785.patch > > > Prompted by the discussion on LUCENE-6590. > Query.rewrite() is a bit of an oddity. You call it to create a query for a > specific IndexSearcher, and to ensure that you get a query implementation > that has a working createWeight() method. However, Weight itself already > encapsulates the notion of a per-searcher query. > You also need to repeatedly call rewrite() until the query has stopped > rewriting itself, which is a bit trappy - there are a few places (in > highlighting code for example) that just call rewrite() once, rather than > looping round as IndexSearcher.rewrite() does. Most queries don't need to be > called multiple times, however, so this seems a bit redundant. And the ones > that do currently return un-rewritten queries can be changed simply enough to > rewrite them. > Finally, in pretty much every case I can find in the codebase, rewrite() is > called purely as a prelude to createWeight(). This means, in the case of for > example large BooleanQueries, we end up cloning the whole query structure, > only to throw it away immediately. > I'd like to try removing rewrite() entirely, and merging the logic into > createWeight(), simplifying the API and removing the trap where code only > calls rewrite once. What do people think? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6785) Consider merging Query.rewrite() into Query.createWeight()
[ https://issues.apache.org/jira/browse/LUCENE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907871#comment-14907871 ] Adrien Grand commented on LUCENE-6785: -- Sorry for the late reply, I was on vacation and just returned yesterday. Overall I'm torn with this patch because I like it a lot from a usability perspective (I really hate how you need to call rewrite in a loop today before calling createWeight) but Query.rewrite was our only opportunity to perform some kind of query optimization, and it's gone now. I can give a try to the alternative I mentionned above on lucene-core to see how things fit together. > Consider merging Query.rewrite() into Query.createWeight() > -- > > Key: LUCENE-6785 > URL: https://issues.apache.org/jira/browse/LUCENE-6785 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward > Attachments: LUCENE-6785.patch, LUCENE-6785.patch > > > Prompted by the discussion on LUCENE-6590. > Query.rewrite() is a bit of an oddity. You call it to create a query for a > specific IndexSearcher, and to ensure that you get a query implementation > that has a working createWeight() method. However, Weight itself already > encapsulates the notion of a per-searcher query. > You also need to repeatedly call rewrite() until the query has stopped > rewriting itself, which is a bit trappy - there are a few places (in > highlighting code for example) that just call rewrite() once, rather than > looping round as IndexSearcher.rewrite() does. Most queries don't need to be > called multiple times, however, so this seems a bit redundant. And the ones > that do currently return un-rewritten queries can be changed simply enough to > rewrite them. > Finally, in pretty much every case I can find in the codebase, rewrite() is > called purely as a prelude to createWeight(). This means, in the case of for > example large BooleanQueries, we end up cloning the whole query structure, > only to throw it away immediately. > I'd like to try removing rewrite() entirely, and merging the logic into > createWeight(), simplifying the API and removing the trap where code only > calls rewrite once. What do people think? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6785) Consider merging Query.rewrite() into Query.createWeight()
[ https://issues.apache.org/jira/browse/LUCENE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745750#comment-14745750 ] Alan Woodward commented on LUCENE-6785: --- [~jpountz] are you happy with the latest patch? Or do you want to try your alternative API above? > Consider merging Query.rewrite() into Query.createWeight() > -- > > Key: LUCENE-6785 > URL: https://issues.apache.org/jira/browse/LUCENE-6785 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward > Attachments: LUCENE-6785.patch, LUCENE-6785.patch > > > Prompted by the discussion on LUCENE-6590. > Query.rewrite() is a bit of an oddity. You call it to create a query for a > specific IndexSearcher, and to ensure that you get a query implementation > that has a working createWeight() method. However, Weight itself already > encapsulates the notion of a per-searcher query. > You also need to repeatedly call rewrite() until the query has stopped > rewriting itself, which is a bit trappy - there are a few places (in > highlighting code for example) that just call rewrite() once, rather than > looping round as IndexSearcher.rewrite() does. Most queries don't need to be > called multiple times, however, so this seems a bit redundant. And the ones > that do currently return un-rewritten queries can be changed simply enough to > rewrite them. > Finally, in pretty much every case I can find in the codebase, rewrite() is > called purely as a prelude to createWeight(). This means, in the case of for > example large BooleanQueries, we end up cloning the whole query structure, > only to throw it away immediately. > I'd like to try removing rewrite() entirely, and merging the logic into > createWeight(), simplifying the API and removing the trap where code only > calls rewrite once. What do people think? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6785) Consider merging Query.rewrite() into Query.createWeight()
[ https://issues.apache.org/jira/browse/LUCENE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14742579#comment-14742579 ] Alan Woodward commented on LUCENE-6785: --- Do you mean the getCacheKey() method? It decouples query cacheing and weight creation a bit more, and will make BooleanWeight.createWeight() tidier, so it's a tradeoff between nicer internal implementations and smaller APIs. But either way, it's not necessary for this issue. > Consider merging Query.rewrite() into Query.createWeight() > -- > > Key: LUCENE-6785 > URL: https://issues.apache.org/jira/browse/LUCENE-6785 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward > Attachments: LUCENE-6785.patch, LUCENE-6785.patch > > > Prompted by the discussion on LUCENE-6590. > Query.rewrite() is a bit of an oddity. You call it to create a query for a > specific IndexSearcher, and to ensure that you get a query implementation > that has a working createWeight() method. However, Weight itself already > encapsulates the notion of a per-searcher query. > You also need to repeatedly call rewrite() until the query has stopped > rewriting itself, which is a bit trappy - there are a few places (in > highlighting code for example) that just call rewrite() once, rather than > looping round as IndexSearcher.rewrite() does. Most queries don't need to be > called multiple times, however, so this seems a bit redundant. And the ones > that do currently return un-rewritten queries can be changed simply enough to > rewrite them. > Finally, in pretty much every case I can find in the codebase, rewrite() is > called purely as a prelude to createWeight(). This means, in the case of for > example large BooleanQueries, we end up cloning the whole query structure, > only to throw it away immediately. > I'd like to try removing rewrite() entirely, and merging the logic into > createWeight(), simplifying the API and removing the trap where code only > calls rewrite once. What do people think? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6785) Consider merging Query.rewrite() into Query.createWeight()
[ https://issues.apache.org/jira/browse/LUCENE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14742575#comment-14742575 ] Robert Muir commented on LUCENE-6785: - What is the reason to add such complexity? > Consider merging Query.rewrite() into Query.createWeight() > -- > > Key: LUCENE-6785 > URL: https://issues.apache.org/jira/browse/LUCENE-6785 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward > Attachments: LUCENE-6785.patch, LUCENE-6785.patch > > > Prompted by the discussion on LUCENE-6590. > Query.rewrite() is a bit of an oddity. You call it to create a query for a > specific IndexSearcher, and to ensure that you get a query implementation > that has a working createWeight() method. However, Weight itself already > encapsulates the notion of a per-searcher query. > You also need to repeatedly call rewrite() until the query has stopped > rewriting itself, which is a bit trappy - there are a few places (in > highlighting code for example) that just call rewrite() once, rather than > looping round as IndexSearcher.rewrite() does. Most queries don't need to be > called multiple times, however, so this seems a bit redundant. And the ones > that do currently return un-rewritten queries can be changed simply enough to > rewrite them. > Finally, in pretty much every case I can find in the codebase, rewrite() is > called purely as a prelude to createWeight(). This means, in the case of for > example large BooleanQueries, we end up cloning the whole query structure, > only to throw it away immediately. > I'd like to try removing rewrite() entirely, and merging the logic into > createWeight(), simplifying the API and removing the trap where code only > calls rewrite once. What do people think? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6785) Consider merging Query.rewrite() into Query.createWeight()
[ https://issues.apache.org/jira/browse/LUCENE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739701#comment-14739701 ] Adrien Grand commented on LUCENE-6785: -- bq. The bits keeping the QueryCache happy are a bit hacky, but I think it's worth the pain of that to make the API nicer. Maybe in another issue we could look at using the Weights themselves as cache keys, rather than their parent queries? I mentioned the query cache, but highlighting relies on this behaviour too if I recall correctly, because it needs to know about the rewritten boolean query in order to be able to highlight eg. a FuzzyQuery. I don't think weights can make good cache keys: Could two weights that have been created against different index readers be equal? I think they could not, but on the other hand this is something we need from the cache keys if we want to be able to retain cache entries on a given segment across reopens? > Consider merging Query.rewrite() into Query.createWeight() > -- > > Key: LUCENE-6785 > URL: https://issues.apache.org/jira/browse/LUCENE-6785 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward > Attachments: LUCENE-6785.patch > > > Prompted by the discussion on LUCENE-6590. > Query.rewrite() is a bit of an oddity. You call it to create a query for a > specific IndexSearcher, and to ensure that you get a query implementation > that has a working createWeight() method. However, Weight itself already > encapsulates the notion of a per-searcher query. > You also need to repeatedly call rewrite() until the query has stopped > rewriting itself, which is a bit trappy - there are a few places (in > highlighting code for example) that just call rewrite() once, rather than > looping round as IndexSearcher.rewrite() does. Most queries don't need to be > called multiple times, however, so this seems a bit redundant. And the ones > that do currently return un-rewritten queries can be changed simply enough to > rewrite them. > Finally, in pretty much every case I can find in the codebase, rewrite() is > called purely as a prelude to createWeight(). This means, in the case of for > example large BooleanQueries, we end up cloning the whole query structure, > only to throw it away immediately. > I'd like to try removing rewrite() entirely, and merging the logic into > createWeight(), simplifying the API and removing the trap where code only > calls rewrite once. What do people think? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6785) Consider merging Query.rewrite() into Query.createWeight()
[ https://issues.apache.org/jira/browse/LUCENE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738913#comment-14738913 ] Alan Woodward commented on LUCENE-6785: --- I'm travelling at the moment, will put up a larger patch changing all the modules + solr when I get back (including Terry's fix, thank you!). I still have some tests failing around highlighting multiterm queries. The bits keeping the QueryCache happy are a bit hacky, but I think it's worth the pain of that to make the API nicer. Maybe in another issue we could look at using the Weights themselves as cache keys, rather than their parent queries? bq. dropping weights could be problematic since they can be expensive to create due to statistics collection One thought I had was that term statistics could be collected and cached by an object that's passed to createWeight(). That way we only collect stats for each term once per top-level query. This would also be a nicer solution than the searcher term cache I proposed in LUCENE-6561. > Consider merging Query.rewrite() into Query.createWeight() > -- > > Key: LUCENE-6785 > URL: https://issues.apache.org/jira/browse/LUCENE-6785 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward > Attachments: LUCENE-6785.patch > > > Prompted by the discussion on LUCENE-6590. > Query.rewrite() is a bit of an oddity. You call it to create a query for a > specific IndexSearcher, and to ensure that you get a query implementation > that has a working createWeight() method. However, Weight itself already > encapsulates the notion of a per-searcher query. > You also need to repeatedly call rewrite() until the query has stopped > rewriting itself, which is a bit trappy - there are a few places (in > highlighting code for example) that just call rewrite() once, rather than > looping round as IndexSearcher.rewrite() does. Most queries don't need to be > called multiple times, however, so this seems a bit redundant. And the ones > that do currently return un-rewritten queries can be changed simply enough to > rewrite them. > Finally, in pretty much every case I can find in the codebase, rewrite() is > called purely as a prelude to createWeight(). This means, in the case of for > example large BooleanQueries, we end up cloning the whole query structure, > only to throw it away immediately. > I'd like to try removing rewrite() entirely, and merging the logic into > createWeight(), simplifying the API and removing the trap where code only > calls rewrite once. What do people think? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6785) Consider merging Query.rewrite() into Query.createWeight()
[ https://issues.apache.org/jira/browse/LUCENE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736932#comment-14736932 ] Terry Smith commented on LUCENE-6785: - The original patch drops a few key settings from the BooleanQuery in BQ.createWeight, the following patch puts them back and makes the tests happier. {noformat} diff --git a/lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java b/lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java index fb5f7c8..8dec338 100644 --- a/lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java +++ b/lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java @@ -210,7 +210,9 @@ public class BooleanQuery extends Query implements Iterable { } List subweights = new ArrayList<>(); -Builder builder = new Builder(); +Builder builder = new Builder() + .setDisableCoord(disableCoord) + .setMinimumNumberShouldMatch(minimumNumberShouldMatch); for (BooleanClause clause : query) { Weight w = searcher.createWeight(clause.getQuery(), needsScores); subweights.add(w); {noformat} > Consider merging Query.rewrite() into Query.createWeight() > -- > > Key: LUCENE-6785 > URL: https://issues.apache.org/jira/browse/LUCENE-6785 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward > Attachments: LUCENE-6785.patch > > > Prompted by the discussion on LUCENE-6590. > Query.rewrite() is a bit of an oddity. You call it to create a query for a > specific IndexSearcher, and to ensure that you get a query implementation > that has a working createWeight() method. However, Weight itself already > encapsulates the notion of a per-searcher query. > You also need to repeatedly call rewrite() until the query has stopped > rewriting itself, which is a bit trappy - there are a few places (in > highlighting code for example) that just call rewrite() once, rather than > looping round as IndexSearcher.rewrite() does. Most queries don't need to be > called multiple times, however, so this seems a bit redundant. And the ones > that do currently return un-rewritten queries can be changed simply enough to > rewrite them. > Finally, in pretty much every case I can find in the codebase, rewrite() is > called purely as a prelude to createWeight(). This means, in the case of for > example large BooleanQueries, we end up cloning the whole query structure, > only to throw it away immediately. > I'd like to try removing rewrite() entirely, and merging the logic into > createWeight(), simplifying the API and removing the trap where code only > calls rewrite once. What do people think? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6785) Consider merging Query.rewrite() into Query.createWeight()
[ https://issues.apache.org/jira/browse/LUCENE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736431#comment-14736431 ] Adrien Grand commented on LUCENE-6785: -- I think the API is more usable this way. The only concern I might have is that we don't have as much flexibility for rewriting as we used to since we can't make decisions for rewriting based on the impl of the rewritten queries, since we only know about the rewritten queries once we have the weights and dropping weights could be problematic since they can be expensive to create due to statistics collection. I am not sure how much of an issue this would be but eg. I think Terry's patch on LUCENE-6787 would be impacted. An alternative to the current patch might be to do something like this: {noformat} Index: lucene/core/src/java/org/apache/lucene/search/Query.java === --- lucene/core/src/java/org/apache/lucene/search/Query.java(revision 1701820) +++ lucene/core/src/java/org/apache/lucene/search/Query.java(working copy) @@ -57,18 +57,32 @@ /** * Expert: Constructs an appropriate Weight implementation for this query. * - * Only implemented by primitive queries, which re-write to themselves. + * This method will first {@link #rewrite(IndexReader)} the query and then + * construct a concrete {@link Weight} on the rewritten query using + * {@link #doCreateWeight(IndexSearcher, boolean)}. * * @param needsScores True if document scores ({@link Scorer#score}) or match * frequencies ({@link Scorer#freq}) are needed. */ - public Weight createWeight(IndexSearcher searcher, boolean needsScores) throws IOException { -throw new UnsupportedOperationException("Query " + this + " does not implement createWeight"); + public final Weight createWeight(IndexSearcher searcher, boolean needsScores) throws IOException { +Query query = this; +final IndexReader reader = searcher.getIndexReader(); +for (Query rewrittenQuery = query.rewrite(reader); rewrittenQuery != query; + rewrittenQuery = query.rewrite(reader)) { + query = rewrittenQuery; +} +return query.doCreateWeight(searcher, needsScores); } - /** Expert: called to re-write queries into primitive queries. For example, - * a PrefixQuery will be rewritten into a BooleanQuery that consists - * of TermQuerys. + /** Construct a Weight for this {@link Query}. In case this {@link Query} object + * always {@link #rewrite(IndexReader) rewrites} to a different Query + * implementation, it is safe to implement this method by just throwing an + * exception as it will never get called. */ + protected abstract Weight doCreateWeight(IndexSearcher searcher, boolean needsScores); + + /** Internal: called to re-write queries into primitive queries. This method + * only exists as an implementation detail of + * {@link #createWeight(IndexSearcher, boolean)}. Do not call directly. */ public Query rewrite(IndexReader reader) throws IOException { return this; {noformat} We would still have the same flexibility for rewriting and consumers of the Query API would not have the trap anymore that they need to rewrite queries first. However I agree it's not as clean... > Consider merging Query.rewrite() into Query.createWeight() > -- > > Key: LUCENE-6785 > URL: https://issues.apache.org/jira/browse/LUCENE-6785 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward > Attachments: LUCENE-6785.patch > > > Prompted by the discussion on LUCENE-6590. > Query.rewrite() is a bit of an oddity. You call it to create a query for a > specific IndexSearcher, and to ensure that you get a query implementation > that has a working createWeight() method. However, Weight itself already > encapsulates the notion of a per-searcher query. > You also need to repeatedly call rewrite() until the query has stopped > rewriting itself, which is a bit trappy - there are a few places (in > highlighting code for example) that just call rewrite() once, rather than > looping round as IndexSearcher.rewrite() does. Most queries don't need to be > called multiple times, however, so this seems a bit redundant. And the ones > that do currently return un-rewritten queries can be changed simply enough to > rewrite them. > Finally, in pretty much every case I can find in the codebase, rewrite() is > called purely as a prelude to createWeight(). This means, in the case of for > example large BooleanQueries, we end up cloning the whole query structure, > only to throw it away immediately. > I'd like to try removing rewrite() entirely, and merging the logic into > createWeight(), simplifying the API and remov
[jira] [Commented] (LUCENE-6785) Consider merging Query.rewrite() into Query.createWeight()
[ https://issues.apache.org/jira/browse/LUCENE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735655#comment-14735655 ] Robert Muir commented on LUCENE-6785: - I didn't thoroughly examine the patch, but this part alone is worth the trouble. Its crazy today that if you subclass Query, you only need to implement toString() for it to compile! {noformat} - public Weight createWeight(IndexSearcher searcher, boolean needsScores) throws IOException { -throw new UnsupportedOperationException("Query " + this + " does not implement createWeight"); - } + public abstract Weight createWeight(IndexSearcher searcher, boolean needsScores) throws IOException; {noformat} > Consider merging Query.rewrite() into Query.createWeight() > -- > > Key: LUCENE-6785 > URL: https://issues.apache.org/jira/browse/LUCENE-6785 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward > Attachments: LUCENE-6785.patch > > > Prompted by the discussion on LUCENE-6590. > Query.rewrite() is a bit of an oddity. You call it to create a query for a > specific IndexSearcher, and to ensure that you get a query implementation > that has a working createWeight() method. However, Weight itself already > encapsulates the notion of a per-searcher query. > You also need to repeatedly call rewrite() until the query has stopped > rewriting itself, which is a bit trappy - there are a few places (in > highlighting code for example) that just call rewrite() once, rather than > looping round as IndexSearcher.rewrite() does. Most queries don't need to be > called multiple times, however, so this seems a bit redundant. And the ones > that do currently return un-rewritten queries can be changed simply enough to > rewrite them. > Finally, in pretty much every case I can find in the codebase, rewrite() is > called purely as a prelude to createWeight(). This means, in the case of for > example large BooleanQueries, we end up cloning the whole query structure, > only to throw it away immediately. > I'd like to try removing rewrite() entirely, and merging the logic into > createWeight(), simplifying the API and removing the trap where code only > calls rewrite once. What do people think? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6785) Consider merging Query.rewrite() into Query.createWeight()
[ https://issues.apache.org/jira/browse/LUCENE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735088#comment-14735088 ] David Smiley commented on LUCENE-6785: -- It'll be interesting to see how it all looks in the patch. I could imagine it reducing the LOC -- I'm thinking of wrapping Queries that simply delegate that no longer need to do so. > Consider merging Query.rewrite() into Query.createWeight() > -- > > Key: LUCENE-6785 > URL: https://issues.apache.org/jira/browse/LUCENE-6785 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward > > Prompted by the discussion on LUCENE-6590. > Query.rewrite() is a bit of an oddity. You call it to create a query for a > specific IndexSearcher, and to ensure that you get a query implementation > that has a working createWeight() method. However, Weight itself already > encapsulates the notion of a per-searcher query. > You also need to repeatedly call rewrite() until the query has stopped > rewriting itself, which is a bit trappy - there are a few places (in > highlighting code for example) that just call rewrite() once, rather than > looping round as IndexSearcher.rewrite() does. Most queries don't need to be > called multiple times, however, so this seems a bit redundant. And the ones > that do currently return un-rewritten queries can be changed simply enough to > rewrite them. > Finally, in pretty much every case I can find in the codebase, rewrite() is > called purely as a prelude to createWeight(). This means, in the case of for > example large BooleanQueries, we end up cloning the whole query structure, > only to throw it away immediately. > I'd like to try removing rewrite() entirely, and merging the logic into > createWeight(), simplifying the API and removing the trap where code only > calls rewrite once. What do people think? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6785) Consider merging Query.rewrite() into Query.createWeight()
[ https://issues.apache.org/jira/browse/LUCENE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735064#comment-14735064 ] Adrien Grand commented on LUCENE-6785: -- +1 to fold rewrite into createWeight For the record I tried to merge them in the past but got some issues. For instance if you have a constant-score query that wraps a fuzzy query today, Weight.getQuery on the CSQ would return a CSQ that wraps a BooleanQuery, since rewriting happened before we created the Weight. This is desirable because query caching is based on the result of Weight.getQuery and using the rewritten query is better since there are odds that several queries will rewrite to the same primitive query. If we simply put the rewriting logic at the top of createWeight, then this won't be the case anymore. So I guess BooleanQuery would have to be changed eg. to create the sub-weights first, then rewrite itself entirely using the queries returned by Weight.getQuery() on the sub weights and finally create the BooleanWeight? > Consider merging Query.rewrite() into Query.createWeight() > -- > > Key: LUCENE-6785 > URL: https://issues.apache.org/jira/browse/LUCENE-6785 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward > > Prompted by the discussion on LUCENE-6590. > Query.rewrite() is a bit of an oddity. You call it to create a query for a > specific IndexSearcher, and to ensure that you get a query implementation > that has a working createWeight() method. However, Weight itself already > encapsulates the notion of a per-searcher query. > You also need to repeatedly call rewrite() until the query has stopped > rewriting itself, which is a bit trappy - there are a few places (in > highlighting code for example) that just call rewrite() once, rather than > looping round as IndexSearcher.rewrite() does. Most queries don't need to be > called multiple times, however, so this seems a bit redundant. And the ones > that do currently return un-rewritten queries can be changed simply enough to > rewrite them. > Finally, in pretty much every case I can find in the codebase, rewrite() is > called purely as a prelude to createWeight(). This means, in the case of for > example large BooleanQueries, we end up cloning the whole query structure, > only to throw it away immediately. > I'd like to try removing rewrite() entirely, and merging the logic into > createWeight(), simplifying the API and removing the trap where code only > calls rewrite once. What do people think? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org