[jira] [Commented] (LUCENE-6785) Consider merging Query.rewrite() into Query.createWeight()

2015-10-01 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939628#comment-14939628
 ] 

Alan Woodward commented on LUCENE-6785:
---

The API isn't as tidy, but I think you're right that this gives us more 
flexibility when it comes to query optimization, etc.  I'll extend the patch to 
all modules.  Thanks!

> Consider merging Query.rewrite() into Query.createWeight()
> --
>
> Key: LUCENE-6785
> URL: https://issues.apache.org/jira/browse/LUCENE-6785
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
> Attachments: LUCENE-6785-alt.patch, LUCENE-6785.patch, 
> LUCENE-6785.patch
>
>
> Prompted by the discussion on LUCENE-6590.
> Query.rewrite() is a bit of an oddity.  You call it to create a query for a 
> specific IndexSearcher, and to ensure that you get a query implementation 
> that has a working createWeight() method.  However, Weight itself already 
> encapsulates the notion of a per-searcher query.
> You also need to repeatedly call rewrite() until the query has stopped 
> rewriting itself, which is a bit trappy - there are a few places (in 
> highlighting code for example) that just call rewrite() once, rather than 
> looping round as IndexSearcher.rewrite() does.  Most queries don't need to be 
> called multiple times, however, so this seems a bit redundant.  And the ones 
> that do currently return un-rewritten queries can be changed simply enough to 
> rewrite them.
> Finally, in pretty much every case I can find in the codebase, rewrite() is 
> called purely as a prelude to createWeight().  This means, in the case of for 
> example large BooleanQueries, we end up cloning the whole query structure, 
> only to throw it away immediately.
> I'd like to try removing rewrite() entirely, and merging the logic into 
> createWeight(), simplifying the API and removing the trap where code only 
> calls rewrite once.  What do people think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6785) Consider merging Query.rewrite() into Query.createWeight()

2015-09-25 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907871#comment-14907871
 ] 

Adrien Grand commented on LUCENE-6785:
--

Sorry for the late reply, I was on vacation and just returned yesterday. 
Overall I'm torn with this patch because I like it a lot from a usability 
perspective (I really hate how you need to call rewrite in a loop today before 
calling createWeight) but Query.rewrite was our only opportunity to perform 
some kind of query optimization, and it's gone now. I can give a try to the 
alternative I mentionned above on lucene-core to see how things fit together.

> Consider merging Query.rewrite() into Query.createWeight()
> --
>
> Key: LUCENE-6785
> URL: https://issues.apache.org/jira/browse/LUCENE-6785
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
> Attachments: LUCENE-6785.patch, LUCENE-6785.patch
>
>
> Prompted by the discussion on LUCENE-6590.
> Query.rewrite() is a bit of an oddity.  You call it to create a query for a 
> specific IndexSearcher, and to ensure that you get a query implementation 
> that has a working createWeight() method.  However, Weight itself already 
> encapsulates the notion of a per-searcher query.
> You also need to repeatedly call rewrite() until the query has stopped 
> rewriting itself, which is a bit trappy - there are a few places (in 
> highlighting code for example) that just call rewrite() once, rather than 
> looping round as IndexSearcher.rewrite() does.  Most queries don't need to be 
> called multiple times, however, so this seems a bit redundant.  And the ones 
> that do currently return un-rewritten queries can be changed simply enough to 
> rewrite them.
> Finally, in pretty much every case I can find in the codebase, rewrite() is 
> called purely as a prelude to createWeight().  This means, in the case of for 
> example large BooleanQueries, we end up cloning the whole query structure, 
> only to throw it away immediately.
> I'd like to try removing rewrite() entirely, and merging the logic into 
> createWeight(), simplifying the API and removing the trap where code only 
> calls rewrite once.  What do people think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6785) Consider merging Query.rewrite() into Query.createWeight()

2015-09-15 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745750#comment-14745750
 ] 

Alan Woodward commented on LUCENE-6785:
---

[~jpountz] are you happy with the latest patch?  Or do you want to try your 
alternative API above?

> Consider merging Query.rewrite() into Query.createWeight()
> --
>
> Key: LUCENE-6785
> URL: https://issues.apache.org/jira/browse/LUCENE-6785
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
> Attachments: LUCENE-6785.patch, LUCENE-6785.patch
>
>
> Prompted by the discussion on LUCENE-6590.
> Query.rewrite() is a bit of an oddity.  You call it to create a query for a 
> specific IndexSearcher, and to ensure that you get a query implementation 
> that has a working createWeight() method.  However, Weight itself already 
> encapsulates the notion of a per-searcher query.
> You also need to repeatedly call rewrite() until the query has stopped 
> rewriting itself, which is a bit trappy - there are a few places (in 
> highlighting code for example) that just call rewrite() once, rather than 
> looping round as IndexSearcher.rewrite() does.  Most queries don't need to be 
> called multiple times, however, so this seems a bit redundant.  And the ones 
> that do currently return un-rewritten queries can be changed simply enough to 
> rewrite them.
> Finally, in pretty much every case I can find in the codebase, rewrite() is 
> called purely as a prelude to createWeight().  This means, in the case of for 
> example large BooleanQueries, we end up cloning the whole query structure, 
> only to throw it away immediately.
> I'd like to try removing rewrite() entirely, and merging the logic into 
> createWeight(), simplifying the API and removing the trap where code only 
> calls rewrite once.  What do people think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6785) Consider merging Query.rewrite() into Query.createWeight()

2015-09-13 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14742579#comment-14742579
 ] 

Alan Woodward commented on LUCENE-6785:
---

Do you mean the getCacheKey() method?  It decouples query cacheing and weight 
creation a bit more, and will make BooleanWeight.createWeight() tidier, so it's 
a tradeoff between nicer internal implementations and smaller APIs.  But either 
way, it's not necessary for this issue.

> Consider merging Query.rewrite() into Query.createWeight()
> --
>
> Key: LUCENE-6785
> URL: https://issues.apache.org/jira/browse/LUCENE-6785
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
> Attachments: LUCENE-6785.patch, LUCENE-6785.patch
>
>
> Prompted by the discussion on LUCENE-6590.
> Query.rewrite() is a bit of an oddity.  You call it to create a query for a 
> specific IndexSearcher, and to ensure that you get a query implementation 
> that has a working createWeight() method.  However, Weight itself already 
> encapsulates the notion of a per-searcher query.
> You also need to repeatedly call rewrite() until the query has stopped 
> rewriting itself, which is a bit trappy - there are a few places (in 
> highlighting code for example) that just call rewrite() once, rather than 
> looping round as IndexSearcher.rewrite() does.  Most queries don't need to be 
> called multiple times, however, so this seems a bit redundant.  And the ones 
> that do currently return un-rewritten queries can be changed simply enough to 
> rewrite them.
> Finally, in pretty much every case I can find in the codebase, rewrite() is 
> called purely as a prelude to createWeight().  This means, in the case of for 
> example large BooleanQueries, we end up cloning the whole query structure, 
> only to throw it away immediately.
> I'd like to try removing rewrite() entirely, and merging the logic into 
> createWeight(), simplifying the API and removing the trap where code only 
> calls rewrite once.  What do people think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6785) Consider merging Query.rewrite() into Query.createWeight()

2015-09-13 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14742575#comment-14742575
 ] 

Robert Muir commented on LUCENE-6785:
-

What is the reason to add such complexity?

> Consider merging Query.rewrite() into Query.createWeight()
> --
>
> Key: LUCENE-6785
> URL: https://issues.apache.org/jira/browse/LUCENE-6785
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
> Attachments: LUCENE-6785.patch, LUCENE-6785.patch
>
>
> Prompted by the discussion on LUCENE-6590.
> Query.rewrite() is a bit of an oddity.  You call it to create a query for a 
> specific IndexSearcher, and to ensure that you get a query implementation 
> that has a working createWeight() method.  However, Weight itself already 
> encapsulates the notion of a per-searcher query.
> You also need to repeatedly call rewrite() until the query has stopped 
> rewriting itself, which is a bit trappy - there are a few places (in 
> highlighting code for example) that just call rewrite() once, rather than 
> looping round as IndexSearcher.rewrite() does.  Most queries don't need to be 
> called multiple times, however, so this seems a bit redundant.  And the ones 
> that do currently return un-rewritten queries can be changed simply enough to 
> rewrite them.
> Finally, in pretty much every case I can find in the codebase, rewrite() is 
> called purely as a prelude to createWeight().  This means, in the case of for 
> example large BooleanQueries, we end up cloning the whole query structure, 
> only to throw it away immediately.
> I'd like to try removing rewrite() entirely, and merging the logic into 
> createWeight(), simplifying the API and removing the trap where code only 
> calls rewrite once.  What do people think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6785) Consider merging Query.rewrite() into Query.createWeight()

2015-09-10 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739701#comment-14739701
 ] 

Adrien Grand commented on LUCENE-6785:
--

bq. The bits keeping the QueryCache happy are a bit hacky, but I think it's 
worth the pain of that to make the API nicer. Maybe in another issue we could 
look at using the Weights themselves as cache keys, rather than their parent 
queries?

I mentioned the query cache, but highlighting relies on this behaviour too if I 
recall correctly, because it needs to know about the rewritten boolean query in 
order to be able to highlight eg. a FuzzyQuery.

I don't think weights can make good cache keys: Could two weights that have 
been created against different index readers be equal? I think they could not, 
but on the other hand this is something we need from the cache keys if we want 
to be able to retain cache entries on a given segment across reopens?

> Consider merging Query.rewrite() into Query.createWeight()
> --
>
> Key: LUCENE-6785
> URL: https://issues.apache.org/jira/browse/LUCENE-6785
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
> Attachments: LUCENE-6785.patch
>
>
> Prompted by the discussion on LUCENE-6590.
> Query.rewrite() is a bit of an oddity.  You call it to create a query for a 
> specific IndexSearcher, and to ensure that you get a query implementation 
> that has a working createWeight() method.  However, Weight itself already 
> encapsulates the notion of a per-searcher query.
> You also need to repeatedly call rewrite() until the query has stopped 
> rewriting itself, which is a bit trappy - there are a few places (in 
> highlighting code for example) that just call rewrite() once, rather than 
> looping round as IndexSearcher.rewrite() does.  Most queries don't need to be 
> called multiple times, however, so this seems a bit redundant.  And the ones 
> that do currently return un-rewritten queries can be changed simply enough to 
> rewrite them.
> Finally, in pretty much every case I can find in the codebase, rewrite() is 
> called purely as a prelude to createWeight().  This means, in the case of for 
> example large BooleanQueries, we end up cloning the whole query structure, 
> only to throw it away immediately.
> I'd like to try removing rewrite() entirely, and merging the logic into 
> createWeight(), simplifying the API and removing the trap where code only 
> calls rewrite once.  What do people think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6785) Consider merging Query.rewrite() into Query.createWeight()

2015-09-10 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738913#comment-14738913
 ] 

Alan Woodward commented on LUCENE-6785:
---

I'm travelling at the moment, will put up a larger patch changing all the 
modules + solr when I get back (including Terry's fix, thank you!).  I still 
have some tests failing around highlighting multiterm queries.

The bits keeping the QueryCache happy are a bit hacky, but I think it's worth 
the pain of that to make the API nicer.  Maybe in another issue we could look 
at using the Weights themselves as cache keys, rather than their parent queries?

bq. dropping weights could be problematic since they can be expensive to create 
due to statistics collection

One thought I had was that term statistics could be collected and cached by an 
object that's passed to createWeight().  That way we only collect stats for 
each term once per top-level query.  This would also be a nicer solution than 
the searcher term cache I proposed in LUCENE-6561.

> Consider merging Query.rewrite() into Query.createWeight()
> --
>
> Key: LUCENE-6785
> URL: https://issues.apache.org/jira/browse/LUCENE-6785
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
> Attachments: LUCENE-6785.patch
>
>
> Prompted by the discussion on LUCENE-6590.
> Query.rewrite() is a bit of an oddity.  You call it to create a query for a 
> specific IndexSearcher, and to ensure that you get a query implementation 
> that has a working createWeight() method.  However, Weight itself already 
> encapsulates the notion of a per-searcher query.
> You also need to repeatedly call rewrite() until the query has stopped 
> rewriting itself, which is a bit trappy - there are a few places (in 
> highlighting code for example) that just call rewrite() once, rather than 
> looping round as IndexSearcher.rewrite() does.  Most queries don't need to be 
> called multiple times, however, so this seems a bit redundant.  And the ones 
> that do currently return un-rewritten queries can be changed simply enough to 
> rewrite them.
> Finally, in pretty much every case I can find in the codebase, rewrite() is 
> called purely as a prelude to createWeight().  This means, in the case of for 
> example large BooleanQueries, we end up cloning the whole query structure, 
> only to throw it away immediately.
> I'd like to try removing rewrite() entirely, and merging the logic into 
> createWeight(), simplifying the API and removing the trap where code only 
> calls rewrite once.  What do people think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6785) Consider merging Query.rewrite() into Query.createWeight()

2015-09-09 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736932#comment-14736932
 ] 

Terry Smith commented on LUCENE-6785:
-

The original patch drops a few key settings from the BooleanQuery in 
BQ.createWeight, the following patch puts them back and makes the tests happier.

{noformat}
diff --git a/lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java 
b/lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java
index fb5f7c8..8dec338 100644
--- a/lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java
+++ b/lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java
@@ -210,7 +210,9 @@ public class BooleanQuery extends Query implements 
Iterable {
 }
 
 List subweights = new ArrayList<>();
-Builder builder = new Builder();
+Builder builder = new Builder()
+  .setDisableCoord(disableCoord)
+  .setMinimumNumberShouldMatch(minimumNumberShouldMatch);
 for (BooleanClause clause : query) {
   Weight w = searcher.createWeight(clause.getQuery(), needsScores);
   subweights.add(w);
{noformat}


> Consider merging Query.rewrite() into Query.createWeight()
> --
>
> Key: LUCENE-6785
> URL: https://issues.apache.org/jira/browse/LUCENE-6785
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
> Attachments: LUCENE-6785.patch
>
>
> Prompted by the discussion on LUCENE-6590.
> Query.rewrite() is a bit of an oddity.  You call it to create a query for a 
> specific IndexSearcher, and to ensure that you get a query implementation 
> that has a working createWeight() method.  However, Weight itself already 
> encapsulates the notion of a per-searcher query.
> You also need to repeatedly call rewrite() until the query has stopped 
> rewriting itself, which is a bit trappy - there are a few places (in 
> highlighting code for example) that just call rewrite() once, rather than 
> looping round as IndexSearcher.rewrite() does.  Most queries don't need to be 
> called multiple times, however, so this seems a bit redundant.  And the ones 
> that do currently return un-rewritten queries can be changed simply enough to 
> rewrite them.
> Finally, in pretty much every case I can find in the codebase, rewrite() is 
> called purely as a prelude to createWeight().  This means, in the case of for 
> example large BooleanQueries, we end up cloning the whole query structure, 
> only to throw it away immediately.
> I'd like to try removing rewrite() entirely, and merging the logic into 
> createWeight(), simplifying the API and removing the trap where code only 
> calls rewrite once.  What do people think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6785) Consider merging Query.rewrite() into Query.createWeight()

2015-09-09 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736431#comment-14736431
 ] 

Adrien Grand commented on LUCENE-6785:
--

I think the API is more usable this way. The only concern I might have is that 
we don't have as much flexibility for rewriting as we used to since we can't 
make decisions for rewriting based on the impl of the rewritten queries, since 
we only know about the rewritten queries once we have the weights and dropping 
weights could be problematic since they can be expensive to create due to 
statistics collection. I am not sure how much of an issue this would be but eg. 
I think Terry's patch on LUCENE-6787 would be impacted. An alternative to the 
current patch might be to do something like this:

{noformat}
Index: lucene/core/src/java/org/apache/lucene/search/Query.java
===
--- lucene/core/src/java/org/apache/lucene/search/Query.java(revision 
1701820)
+++ lucene/core/src/java/org/apache/lucene/search/Query.java(working copy)
@@ -57,18 +57,32 @@
   /**
* Expert: Constructs an appropriate Weight implementation for this query.
* 
-   * Only implemented by primitive queries, which re-write to themselves.
+   * This method will first {@link #rewrite(IndexReader)} the query and then
+   * construct a concrete {@link Weight} on the rewritten query using
+   * {@link #doCreateWeight(IndexSearcher, boolean)}.
*
* @param needsScores   True if document scores ({@link Scorer#score}) or 
match
*  frequencies ({@link Scorer#freq}) are needed.
*/
-  public Weight createWeight(IndexSearcher searcher, boolean needsScores) 
throws IOException {
-throw new UnsupportedOperationException("Query " + this + " does not 
implement createWeight");
+  public final Weight createWeight(IndexSearcher searcher, boolean 
needsScores) throws IOException {
+Query query = this;
+final IndexReader reader = searcher.getIndexReader();
+for (Query rewrittenQuery = query.rewrite(reader); rewrittenQuery != query;
+ rewrittenQuery = query.rewrite(reader)) {
+  query = rewrittenQuery;
+}
+return query.doCreateWeight(searcher, needsScores);
   }
 
-  /** Expert: called to re-write queries into primitive queries. For example,
-   * a PrefixQuery will be rewritten into a BooleanQuery that consists
-   * of TermQuerys.
+  /** Construct a Weight for this {@link Query}. In case this {@link Query} 
object
+   *  always {@link #rewrite(IndexReader) rewrites} to a different Query
+   *  implementation, it is safe to implement this method by just throwing an
+   *  exception as it will never get called. */
+  protected abstract Weight doCreateWeight(IndexSearcher searcher, boolean 
needsScores);
+
+  /** Internal: called to re-write queries into primitive queries. This method
+   *  only exists as an implementation detail of
+   *  {@link #createWeight(IndexSearcher, boolean)}. Do not call directly.
*/
   public Query rewrite(IndexReader reader) throws IOException {
 return this;
{noformat}

We would still have the same flexibility for rewriting and consumers of the 
Query API would not have the trap anymore that they need to rewrite queries 
first. However I agree it's not as clean... 

> Consider merging Query.rewrite() into Query.createWeight()
> --
>
> Key: LUCENE-6785
> URL: https://issues.apache.org/jira/browse/LUCENE-6785
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
> Attachments: LUCENE-6785.patch
>
>
> Prompted by the discussion on LUCENE-6590.
> Query.rewrite() is a bit of an oddity.  You call it to create a query for a 
> specific IndexSearcher, and to ensure that you get a query implementation 
> that has a working createWeight() method.  However, Weight itself already 
> encapsulates the notion of a per-searcher query.
> You also need to repeatedly call rewrite() until the query has stopped 
> rewriting itself, which is a bit trappy - there are a few places (in 
> highlighting code for example) that just call rewrite() once, rather than 
> looping round as IndexSearcher.rewrite() does.  Most queries don't need to be 
> called multiple times, however, so this seems a bit redundant.  And the ones 
> that do currently return un-rewritten queries can be changed simply enough to 
> rewrite them.
> Finally, in pretty much every case I can find in the codebase, rewrite() is 
> called purely as a prelude to createWeight().  This means, in the case of for 
> example large BooleanQueries, we end up cloning the whole query structure, 
> only to throw it away immediately.
> I'd like to try removing rewrite() entirely, and merging the logic into 
> createWeight(), simplifying the API and remov

[jira] [Commented] (LUCENE-6785) Consider merging Query.rewrite() into Query.createWeight()

2015-09-08 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735655#comment-14735655
 ] 

Robert Muir commented on LUCENE-6785:
-

I didn't thoroughly examine the patch, but this part alone is worth the 
trouble. Its crazy today that if you subclass Query, you only need to implement 
toString() for it to compile!

{noformat}
-  public Weight createWeight(IndexSearcher searcher, boolean needsScores) 
throws IOException {
-throw new UnsupportedOperationException("Query " + this + " does not 
implement createWeight");
-  }
+  public abstract Weight createWeight(IndexSearcher searcher, boolean 
needsScores) throws IOException;
{noformat}

> Consider merging Query.rewrite() into Query.createWeight()
> --
>
> Key: LUCENE-6785
> URL: https://issues.apache.org/jira/browse/LUCENE-6785
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
> Attachments: LUCENE-6785.patch
>
>
> Prompted by the discussion on LUCENE-6590.
> Query.rewrite() is a bit of an oddity.  You call it to create a query for a 
> specific IndexSearcher, and to ensure that you get a query implementation 
> that has a working createWeight() method.  However, Weight itself already 
> encapsulates the notion of a per-searcher query.
> You also need to repeatedly call rewrite() until the query has stopped 
> rewriting itself, which is a bit trappy - there are a few places (in 
> highlighting code for example) that just call rewrite() once, rather than 
> looping round as IndexSearcher.rewrite() does.  Most queries don't need to be 
> called multiple times, however, so this seems a bit redundant.  And the ones 
> that do currently return un-rewritten queries can be changed simply enough to 
> rewrite them.
> Finally, in pretty much every case I can find in the codebase, rewrite() is 
> called purely as a prelude to createWeight().  This means, in the case of for 
> example large BooleanQueries, we end up cloning the whole query structure, 
> only to throw it away immediately.
> I'd like to try removing rewrite() entirely, and merging the logic into 
> createWeight(), simplifying the API and removing the trap where code only 
> calls rewrite once.  What do people think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6785) Consider merging Query.rewrite() into Query.createWeight()

2015-09-08 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735088#comment-14735088
 ] 

David Smiley commented on LUCENE-6785:
--

It'll be interesting to see how it all looks in the patch.  I could imagine it 
reducing the LOC -- I'm thinking of wrapping Queries that simply delegate that 
no longer need to do so.

> Consider merging Query.rewrite() into Query.createWeight()
> --
>
> Key: LUCENE-6785
> URL: https://issues.apache.org/jira/browse/LUCENE-6785
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>
> Prompted by the discussion on LUCENE-6590.
> Query.rewrite() is a bit of an oddity.  You call it to create a query for a 
> specific IndexSearcher, and to ensure that you get a query implementation 
> that has a working createWeight() method.  However, Weight itself already 
> encapsulates the notion of a per-searcher query.
> You also need to repeatedly call rewrite() until the query has stopped 
> rewriting itself, which is a bit trappy - there are a few places (in 
> highlighting code for example) that just call rewrite() once, rather than 
> looping round as IndexSearcher.rewrite() does.  Most queries don't need to be 
> called multiple times, however, so this seems a bit redundant.  And the ones 
> that do currently return un-rewritten queries can be changed simply enough to 
> rewrite them.
> Finally, in pretty much every case I can find in the codebase, rewrite() is 
> called purely as a prelude to createWeight().  This means, in the case of for 
> example large BooleanQueries, we end up cloning the whole query structure, 
> only to throw it away immediately.
> I'd like to try removing rewrite() entirely, and merging the logic into 
> createWeight(), simplifying the API and removing the trap where code only 
> calls rewrite once.  What do people think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6785) Consider merging Query.rewrite() into Query.createWeight()

2015-09-08 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735064#comment-14735064
 ] 

Adrien Grand commented on LUCENE-6785:
--

+1 to fold rewrite into createWeight

For the record I tried to merge them in the past but got some issues. For 
instance if you have a constant-score query that wraps a fuzzy query today, 
Weight.getQuery on the CSQ would return a CSQ that wraps a BooleanQuery, since 
rewriting happened before we created the Weight. This is desirable because 
query caching is based on the result of Weight.getQuery and using the rewritten 
query is better since there are odds that several queries will rewrite to the 
same primitive query. If we simply put the rewriting logic at the top of 
createWeight, then this won't be the case anymore. So I guess BooleanQuery 
would have to be changed eg. to create the sub-weights first, then rewrite 
itself entirely using the queries returned by Weight.getQuery() on the sub 
weights and finally create the BooleanWeight?

> Consider merging Query.rewrite() into Query.createWeight()
> --
>
> Key: LUCENE-6785
> URL: https://issues.apache.org/jira/browse/LUCENE-6785
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>
> Prompted by the discussion on LUCENE-6590.
> Query.rewrite() is a bit of an oddity.  You call it to create a query for a 
> specific IndexSearcher, and to ensure that you get a query implementation 
> that has a working createWeight() method.  However, Weight itself already 
> encapsulates the notion of a per-searcher query.
> You also need to repeatedly call rewrite() until the query has stopped 
> rewriting itself, which is a bit trappy - there are a few places (in 
> highlighting code for example) that just call rewrite() once, rather than 
> looping round as IndexSearcher.rewrite() does.  Most queries don't need to be 
> called multiple times, however, so this seems a bit redundant.  And the ones 
> that do currently return un-rewritten queries can be changed simply enough to 
> rewrite them.
> Finally, in pretty much every case I can find in the codebase, rewrite() is 
> called purely as a prelude to createWeight().  This means, in the case of for 
> example large BooleanQueries, we end up cloning the whole query structure, 
> only to throw it away immediately.
> I'd like to try removing rewrite() entirely, and merging the logic into 
> createWeight(), simplifying the API and removing the trap where code only 
> calls rewrite once.  What do people think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org