from:"Terry Smith"


[ 
https://issues.apache.org/jira/browse/LUCENE-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745623#comment-14745623
 ] 

Terry Smith commented on LUCENE-6590:
-

[~jpountz]: PhraseQuery is missing a call to ToStringUtils.boost in it's 
toString method on the 5.x branch.


> Explore different ways to apply boosts
> --
>
> Key: LUCENE-6590
> URL: https://issues.apache.org/jira/browse/LUCENE-6590
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.4
>
> Attachments: LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch, 
> LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch
>
>
> Follow-up from LUCENE-6570: the fact that all queries are mutable in order to 
> allow for applying a boost raises issues since it makes queries bad cache 
> keys since their hashcode can change anytime. We could just document that 
> queries should never be modified after they have gone through IndexSearcher 
> but it would be even better if the API made queries impossible to mutate at 
> all.
> I think there are two main options:
>  - either replace "void setBoost(boost)" with something like "Query 
> withBoost(boost)" which would return a clone that has a different boost
>  - or move boost handling outside of Query, for instance we could have a 
> (immutable) query impl that would be dedicated to applying boosts, that 
> queries that need to change boosts at rewrite time (such as BooleanQuery) 
> would use as a wrapper.
> The latter idea is from Robert and I like it a lot given how often I either 
> introduced or found a bug which was due to the boost parameter being ignored. 
> Maybe there are other options, but I think this is worth exploring.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6590) Explore different ways to apply boosts


[ 
https://issues.apache.org/jira/browse/LUCENE-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745675#comment-14745675
 ] 

Terry Smith commented on LUCENE-6590:
-

Also FunctionQuery.


 



> Explore different ways to apply boosts
> --
>
> Key: LUCENE-6590
> URL: https://issues.apache.org/jira/browse/LUCENE-6590
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.4
>
> Attachments: LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch, 
> LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch
>
>
> Follow-up from LUCENE-6570: the fact that all queries are mutable in order to 
> allow for applying a boost raises issues since it makes queries bad cache 
> keys since their hashcode can change anytime. We could just document that 
> queries should never be modified after they have gone through IndexSearcher 
> but it would be even better if the API made queries impossible to mutate at 
> all.
> I think there are two main options:
>  - either replace "void setBoost(boost)" with something like "Query 
> withBoost(boost)" which would return a clone that has a different boost
>  - or move boost handling outside of Query, for instance we could have a 
> (immutable) query impl that would be dedicated to applying boosts, that 
> queries that need to change boosts at rewrite time (such as BooleanQuery) 
> would use as a wrapper.
> The latter idea is from Robert and I like it a lot given how often I either 
> introduced or found a bug which was due to the boost parameter being ignored. 
> Maybe there are other options, but I think this is worth exploring.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6590) Explore different ways to apply boosts


[ 
https://issues.apache.org/jira/browse/LUCENE-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745653#comment-14745653
 ] 

Terry Smith commented on LUCENE-6590:
-

Hmm, so is NumericRangeQuery.

> Explore different ways to apply boosts
> --
>
> Key: LUCENE-6590
> URL: https://issues.apache.org/jira/browse/LUCENE-6590
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.4
>
> Attachments: LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch, 
> LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch
>
>
> Follow-up from LUCENE-6570: the fact that all queries are mutable in order to 
> allow for applying a boost raises issues since it makes queries bad cache 
> keys since their hashcode can change anytime. We could just document that 
> queries should never be modified after they have gone through IndexSearcher 
> but it would be even better if the API made queries impossible to mutate at 
> all.
> I think there are two main options:
>  - either replace "void setBoost(boost)" with something like "Query 
> withBoost(boost)" which would return a clone that has a different boost
>  - or move boost handling outside of Query, for instance we could have a 
> (immutable) query impl that would be dedicated to applying boosts, that 
> queries that need to change boosts at rewrite time (such as BooleanQuery) 
> would use as a wrapper.
> The latter idea is from Robert and I like it a lot given how often I either 
> introduced or found a bug which was due to the boost parameter being ignored. 
> Maybe there are other options, but I think this is worth exploring.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-6806) FunctionQuery.AllScorer.explain overwrites FunctionWeight.queryNorm in trappy fashion

Terry Smith created LUCENE-6806:
---

 Summary: FunctionQuery.AllScorer.explain overwrites 
FunctionWeight.queryNorm in trappy fashion
 Key: LUCENE-6806
 URL: https://issues.apache.org/jira/browse/LUCENE-6806
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: Trunk
Reporter: Terry Smith
Priority: Minor


FunctionQuery.AllScorer.explain is:

{code:java}
public Explanation explain(int doc, float queryNorm) throws IOException {
  float sc = qWeight * vals.floatVal(doc);

  return Explanation.match(sc, "FunctionQuery(" + func + "), product of:",
  vals.explain(doc),
  Explanation.match(queryNorm, "boost"),
  Explanation.match(weight.queryNorm = 1f, "queryNorm"));
}
{code}

The following line has a subtle assignment that overwrites weight.queryNorm.

{code:java}
  Explanation.match(weight.queryNorm = 1f, "queryNorm"));
{code}

Because weights aren't reused between search and explain this doesn't break 
anything but it's awfully subtle.

Seeing as queryNorm is ALWAYS 1 here, could we just drop this extra line from 
the explain output and use the following instead?

{code:java}
public Explanation explain(int doc, float queryNorm) throws IOException {
  float sc = qWeight * vals.floatVal(doc);

  return Explanation.match(sc, "FunctionQuery(" + func + "), product of:",
  vals.explain(doc),
  Explanation.match(queryNorm, "boost"));
}
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6785) Consider merging Query.rewrite() into Query.createWeight()

2015-09-09 Thread Terry Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736932#comment-14736932
 ] 

Terry Smith commented on LUCENE-6785:
-

The original patch drops a few key settings from the BooleanQuery in 
BQ.createWeight, the following patch puts them back and makes the tests happier.

{noformat}
diff --git a/lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java 
b/lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java
index fb5f7c8..8dec338 100644
--- a/lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java
+++ b/lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java
@@ -210,7 +210,9 @@ public class BooleanQuery extends Query implements 
Iterable {
 }
 
 List subweights = new ArrayList<>();
-Builder builder = new Builder();
+Builder builder = new Builder()
+  .setDisableCoord(disableCoord)
+  .setMinimumNumberShouldMatch(minimumNumberShouldMatch);
 for (BooleanClause clause : query) {
   Weight w = searcher.createWeight(clause.getQuery(), needsScores);
   subweights.add(w);
{noformat}


> Consider merging Query.rewrite() into Query.createWeight()
> --
>
> Key: LUCENE-6785
> URL: https://issues.apache.org/jira/browse/LUCENE-6785
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
> Attachments: LUCENE-6785.patch
>
>
> Prompted by the discussion on LUCENE-6590.
> Query.rewrite() is a bit of an oddity.  You call it to create a query for a 
> specific IndexSearcher, and to ensure that you get a query implementation 
> that has a working createWeight() method.  However, Weight itself already 
> encapsulates the notion of a per-searcher query.
> You also need to repeatedly call rewrite() until the query has stopped 
> rewriting itself, which is a bit trappy - there are a few places (in 
> highlighting code for example) that just call rewrite() once, rather than 
> looping round as IndexSearcher.rewrite() does.  Most queries don't need to be 
> called multiple times, however, so this seems a bit redundant.  And the ones 
> that do currently return un-rewritten queries can be changed simply enough to 
> rewrite them.
> Finally, in pretty much every case I can find in the codebase, rewrite() is 
> called purely as a prelude to createWeight().  This means, in the case of for 
> example large BooleanQueries, we end up cloning the whole query structure, 
> only to throw it away immediately.
> I'd like to try removing rewrite() entirely, and merging the logic into 
> createWeight(), simplifying the API and removing the trap where code only 
> calls rewrite once.  What do people think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-6787) BooleanQuery should be able to drop duplicate non-scoring clauses

2015-09-09 Thread Terry Smith (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-6787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Terry Smith updated LUCENE-6787:

Attachment: LUCENE-6787.patch

Absolutely, updated patch attached.


> BooleanQuery should be able to drop duplicate non-scoring clauses
> -
>
> Key: LUCENE-6787
> URL: https://issues.apache.org/jira/browse/LUCENE-6787
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: Trunk
>    Reporter: Terry Smith
>Priority: Minor
> Attachments: LUCENE-6787.patch, LUCENE-6787.patch
>
>
> Pulling out of the discussion on LUCENE-6305.
> BooleanQuery could drop duplicate non-scoring (MUST_NOT, FILTER) clauses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-6787) BooleanQuery should be able to drop duplicate non-scoring clauses

2015-09-09 Thread Terry Smith (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-6787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Terry Smith updated LUCENE-6787:

Attachment: LUCENE-6787-on-6785.patch

Here is an alternate patch applied after LUCENE-6785.


> BooleanQuery should be able to drop duplicate non-scoring clauses
> -
>
> Key: LUCENE-6787
> URL: https://issues.apache.org/jira/browse/LUCENE-6787
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: Trunk
>    Reporter: Terry Smith
>Priority: Minor
> Attachments: LUCENE-6787-on-6785.patch, LUCENE-6787.patch, 
> LUCENE-6787.patch
>
>
> Pulling out of the discussion on LUCENE-6305.
> BooleanQuery could drop duplicate non-scoring (MUST_NOT, FILTER) clauses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-6787) BooleanQuery should be able to drop duplicate non-scoring clauses

Terry Smith created LUCENE-6787:
---

 Summary: BooleanQuery should be able to drop duplicate non-scoring 
clauses
 Key: LUCENE-6787
 URL: https://issues.apache.org/jira/browse/LUCENE-6787
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: Trunk
Reporter: Terry Smith
Priority: Minor


Pulling out of the discussion on LUCENE-6305.

BooleanQuery could drop duplicate non-scoring (MUST_NOT, FILTER) clauses.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-6679) Filter's Weight.explain returns an explanation with isMatch==true even on documents that don't match


 [ 
https://issues.apache.org/jira/browse/LUCENE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Terry Smith updated LUCENE-6679:

Attachment: LUCENE-6679.patch

Here is a patch (against trunk) that adds test coverage for explanations on 
hits only.

I'm looking for feedback to the approach used before expanding to cover 
explanations for misses.

Currently I get a couple of failures when running just the Lucene tests:

{noformat}
Tests with failures:
  - org.apache.lucene.search.TestSortRandom.testRandomStringValSort
  - org.apache.lucene.search.TestSortRandom.testRandomStringSort


JVM J0: 1.42 ..   284.75 =   283.33s
JVM J1: 1.64 ..   284.77 =   283.13s
JVM J2: 1.42 ..   284.70 =   283.28s
JVM J3: 1.42 ..   284.68 =   283.26s
Execution time total: 4 minutes 44 seconds
Tests summary: 404 suites, 3235 tests, 2 failures, 104 ignored (100 assumptions)
{noformat}

Happy to dig into these more once an approach has been found that people like.


> Filter's Weight.explain returns an explanation with isMatch==true even on 
> documents that don't match
> 
>
> Key: LUCENE-6679
> URL: https://issues.apache.org/jira/browse/LUCENE-6679
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
> Attachments: LUCENE-6679.patch
>
>
> This was reported by Trejkaz on the java-user list: 
> http://search-lucene.com/m/l6pAi19h4Y3DclgB1=Re+What+on+earth+is+FilteredQuery+explain+doing+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-6787) BooleanQuery should be able to drop duplicate non-scoring clauses


 [ 
https://issues.apache.org/jira/browse/LUCENE-6787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Terry Smith updated LUCENE-6787:

Attachment: LUCENE-6787.patch

Here is a patch based on [~jpountz]'s suggestion of putting this optimization 
in BooleanQuery.rewrite().


> BooleanQuery should be able to drop duplicate non-scoring clauses
> -
>
> Key: LUCENE-6787
> URL: https://issues.apache.org/jira/browse/LUCENE-6787
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: Trunk
>    Reporter: Terry Smith
>Priority: Minor
> Attachments: LUCENE-6787.patch
>
>
> Pulling out of the discussion on LUCENE-6305.
> BooleanQuery could drop duplicate non-scoring (MUST_NOT, FILTER) clauses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6758) Adding a SHOULD clause to a BQ over an empty field clears the score when using DefaultSimilarity


[ 
https://issues.apache.org/jira/browse/LUCENE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734796#comment-14734796
 ] 

Terry Smith commented on LUCENE-6758:
-

Ah, you've changed DefaultSimilarity.idf() to use (docCount + 1) instead of 
just docCount forcing it to be larger than 0.

That looks like a great fix, thanks.


> Adding a SHOULD clause to a BQ over an empty field clears the score when 
> using DefaultSimilarity
> 
>
> Key: LUCENE-6758
> URL: https://issues.apache.org/jira/browse/LUCENE-6758
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: Trunk
>    Reporter: Terry Smith
> Attachments: LUCENE-6758.patch, LUCENE-6758.patch
>
>
> Patch with unit test to show the bug will be attached.
> I've narrowed this change in behavior with git bisect to the following commit:
> {noformat}
> commit 698b4b56f0f2463b21c9e3bc67b8b47d635b7d1f
> Author: Robert Muir <rm...@apache.org>
> Date:   Thu Aug 13 17:37:15 2015 +
> LUCENE-6711: Use CollectionStatistics.docCount() for IDF and average 
> field length computations
> 
> git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1695744 
> 13f79535-47bb-0310-9956-ffa450edef68
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-6758) Adding a SHOULD clause to a BQ over an empty field clears the score when using DefaultSimilarity

2015-08-21 Thread Terry Smith (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Terry Smith updated LUCENE-6758:

Attachment: LUCENE-6758.patch

Run this unit test a few times and you'll hit a failure when DefaultSimilarity 
is picked.

The method testBQHitOrEmpty() will fail because the score is zero. It's friend 
testBQHitOrMiss() has a non-zero score.

The difference between the two is that the field empty is unused, whereas the 
field test has one token (hit).


 Adding a SHOULD clause to a BQ over an empty field clears the score when 
 using DefaultSimilarity
 

 Key: LUCENE-6758
 URL: https://issues.apache.org/jira/browse/LUCENE-6758
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: Trunk
Reporter: Terry Smith
 Attachments: LUCENE-6758.patch


 Patch with unit test to show the bug will be attached.
 I've narrowed this change in behavior with git bisect to the following commit:
 {noformat}
 commit 698b4b56f0f2463b21c9e3bc67b8b47d635b7d1f
 Author: Robert Muir rm...@apache.org
 Date:   Thu Aug 13 17:37:15 2015 +
 LUCENE-6711: Use CollectionStatistics.docCount() for IDF and average 
 field length computations
 
 git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1695744 
 13f79535-47bb-0310-9956-ffa450edef68
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6758) Adding a SHOULD clause to a BQ over an empty field clears the score when using DefaultSimilarity

2015-08-21 Thread Terry Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706963#comment-14706963
 ] 

Terry Smith commented on LUCENE-6758:
-

Explain output for the failing query (testBQHitOrEmpty):

{noformat}
0.0 = product of:
  0.0 = sum of:
0.0 = weight(test:hit in 0) [DefaultSimilarity], result of:
  0.0 = score(doc=0,freq=1.0), product of:
0.0 = queryWeight, product of:
  0.30685282 = idf(docFreq=1, docCount=1)
  0.0 = queryNorm
0.30685282 = fieldWeight in 0, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
  0.30685282 = idf(docFreq=1, docCount=1)
  1.0 = fieldNorm(doc=0)
  0.5 = coord(1/2)
{noformat}

Explain output for the variant against a populated field  (testBQHitOrMiss):

{noformat}
0.04500804 = product of:
  0.09001608 = sum of:
0.09001608 = weight(test:hit in 0) [DefaultSimilarity], result of:
  0.09001608 = score(doc=0,freq=1.0), product of:
0.29335263 = queryWeight, product of:
  0.30685282 = idf(docFreq=1, docCount=1)
  0.9560043 = queryNorm
0.30685282 = fieldWeight in 0, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
  0.30685282 = idf(docFreq=1, docCount=1)
  1.0 = fieldNorm(doc=0)
  0.5 = coord(1/2)
{noformat}



 Adding a SHOULD clause to a BQ over an empty field clears the score when 
 using DefaultSimilarity
 

 Key: LUCENE-6758
 URL: https://issues.apache.org/jira/browse/LUCENE-6758
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: Trunk
Reporter: Terry Smith
 Attachments: LUCENE-6758.patch


 Patch with unit test to show the bug will be attached.
 I've narrowed this change in behavior with git bisect to the following commit:
 {noformat}
 commit 698b4b56f0f2463b21c9e3bc67b8b47d635b7d1f
 Author: Robert Muir rm...@apache.org
 Date:   Thu Aug 13 17:37:15 2015 +
 LUCENE-6711: Use CollectionStatistics.docCount() for IDF and average 
 field length computations
 
 git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1695744 
 13f79535-47bb-0310-9956-ffa450edef68
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-6758) Adding a SHOULD clause to a BQ over an empty field clears the score when using DefaultSimilarity

2015-08-21 Thread Terry Smith (JIRA)

Terry Smith created LUCENE-6758:
---

 Summary: Adding a SHOULD clause to a BQ over an empty field clears 
the score when using DefaultSimilarity
 Key: LUCENE-6758
 URL: https://issues.apache.org/jira/browse/LUCENE-6758
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: Trunk
Reporter: Terry Smith


Patch with unit test to show the bug will be attached.

I've narrowed this change in behavior with git bisect to the following commit:

{noformat}
commit 698b4b56f0f2463b21c9e3bc67b8b47d635b7d1f
Author: Robert Muir rm...@apache.org
Date:   Thu Aug 13 17:37:15 2015 +

LUCENE-6711: Use CollectionStatistics.docCount() for IDF and average field 
length computations

git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1695744 
13f79535-47bb-0310-9956-ffa450edef68
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6748) The query cache should not cache trivial queries

2015-08-20 Thread Terry Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705045#comment-14705045
 ] 

Terry Smith commented on LUCENE-6748:
-

I'd add a case to the patch to include empty DisjunctionMaxQuery instances also.


 The query cache should not cache trivial queries
 

 Key: LUCENE-6748
 URL: https://issues.apache.org/jira/browse/LUCENE-6748
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-6748.patch


 The query cache already avoids caching term queries because they are cheap, 
 but it doesn't do it with even cheaper queries like MatchAllDocsQuery.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6531) Make PhraseQuery immutable

2015-07-30 Thread Terry Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647690#comment-14647690
 ] 

Terry Smith commented on LUCENE-6531:
-

[~jpountz] The PhraseQuery.Builder setter methods are all void, where as the 
ones for BooleanQuery and BlendedTermQuery return the Builder itself.

Can the set/add methods on PhraseQuery.Builder return this to make the various 
Query builders consistent with each other?


 Make PhraseQuery immutable
 --

 Key: LUCENE-6531
 URL: https://issues.apache.org/jira/browse/LUCENE-6531
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Fix For: 5.3, 6.0

 Attachments: LUCENE-6531.patch, LUCENE-6531.patch


 Mutable queries are an issue for automatic filter caching since modifying a 
 query after it has been put into the cache will corrupt the cache. We should 
 make all queries immutable (up to the boost) to avoid this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6531) Make PhraseQuery immutable

2015-07-30 Thread Terry Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647701#comment-14647701
 ] 

Terry Smith commented on LUCENE-6531:
-

Awesome, you rock!


 Make PhraseQuery immutable
 --

 Key: LUCENE-6531
 URL: https://issues.apache.org/jira/browse/LUCENE-6531
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Fix For: 5.3, 6.0

 Attachments: LUCENE-6531.patch, LUCENE-6531.patch


 Mutable queries are an issue for automatic filter caching since modifying a 
 query after it has been put into the cache will corrupt the cache. We should 
 make all queries immutable (up to the boost) to avoid this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6590) Explore different ways to apply boosts

2015-07-23 Thread Terry Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639166#comment-14639166
 ] 

Terry Smith commented on LUCENE-6590:
-

I think this looks great and will certainly make the boost handling more robust 
in my custom queries. Especially looking forward to fully immutable queries.

What do you think is possible in terms of updating 5.x to make the transition 
easier?


 Explore different ways to apply boosts
 --

 Key: LUCENE-6590
 URL: https://issues.apache.org/jira/browse/LUCENE-6590
 Project: Lucene - Core
  Issue Type: Wish
Reporter: Adrien Grand
Priority: Minor
 Attachments: LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch


 Follow-up from LUCENE-6570: the fact that all queries are mutable in order to 
 allow for applying a boost raises issues since it makes queries bad cache 
 keys since their hashcode can change anytime. We could just document that 
 queries should never be modified after they have gone through IndexSearcher 
 but it would be even better if the API made queries impossible to mutate at 
 all.
 I think there are two main options:
  - either replace void setBoost(boost) with something like Query 
 withBoost(boost) which would return a clone that has a different boost
  - or move boost handling outside of Query, for instance we could have a 
 (immutable) query impl that would be dedicated to applying boosts, that 
 queries that need to change boosts at rewrite time (such as BooleanQuery) 
 would use as a wrapper.
 The latter idea is from Robert and I like it a lot given how often I either 
 introduced or found a bug which was due to the boost parameter being ignored. 
 Maybe there are other options, but I think this is worth exploring.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6679) Filter's Weight.explain returns an explanation with isMatch==true even on documents that don't match

2015-07-16 Thread Terry Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629956#comment-14629956
 ] 

Terry Smith commented on LUCENE-6679:
-

Trejkaz confirmed the patch referenced from the mailing list works.

This bug is fixed as a side effect of LUCENE-6601 so will automatically be 
fixed as part of release 5.3.

I'll work on cleaning up the new test contributed by Trejkaz for inclusion and 
then move onto a more generic hook to catch other explanation mistakes.



 Filter's Weight.explain returns an explanation with isMatch==true even on 
 documents that don't match
 

 Key: LUCENE-6679
 URL: https://issues.apache.org/jira/browse/LUCENE-6679
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Adrien Grand

 This was reported by Trejkaz on the java-user list: 
 http://search-lucene.com/m/l6pAi19h4Y3DclgB1subj=Re+What+on+earth+is+FilteredQuery+explain+doing+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6679) Filter's Weight.explan returns an explanation with isMatch==true even on documents that don't match

2015-07-15 Thread Terry Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628153#comment-14628153
 ] 

Terry Smith commented on LUCENE-6679:
-

[~jpountz] Absolutely, I'd love to give it a stab. Currently waiting for 
feedback from TX on the users list.

I think you are spot on about adding some additional testing to the test suite 
to catch explanation mismatches. I'll take a peek at that also and see if I can 
figure out something worth submitting.


 Filter's Weight.explan returns an explanation with isMatch==true even on 
 documents that don't match
 ---

 Key: LUCENE-6679
 URL: https://issues.apache.org/jira/browse/LUCENE-6679
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Adrien Grand

 This was reported by Trejkaz on the java-user list: 
 http://search-lucene.com/m/l6pAi19h4Y3DclgB1subj=Re+What+on+earth+is+FilteredQuery+explain+doing+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6661) Allow queries to opt out of caching

2015-07-10 Thread Terry Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622271#comment-14622271
 ] 

Terry Smith commented on LUCENE-6661:
-

I agree that we shouldn't base API's off of already hacking solutions.

I'm going to play with your suggestion a little more and see how it pans out 
for my usecases, will report back.

The ring buffer frequency for non-cacheable queries issue is interesting. If in 
some obscure but easy to understand scenario half of my queries are good cache 
candidates but the other half are never to be cached (using the 
Weight.getQuery() equals busting method) then the ring buffer will be a lot 
less effective at finding new cache candidates purely based on the churn of 
never-to-be-cached queries. Still, I can see why that might also be a good 
thing, it all depends on your definition of frequently used.

Where would be the best place to expand this discussion to include score based 
caching? A new Jira, one of the mailing lists?




 Allow queries to opt out of caching
 ---

 Key: LUCENE-6661
 URL: https://issues.apache.org/jira/browse/LUCENE-6661
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 5.2
Reporter: Terry Smith
Priority: Minor
 Attachments: LUCENE-6661.patch


 Some queries have out-of-band dependencies that make them incompatible with 
 caching, it'd be great if they could opt out of the new fancy query/filter 
 cache in IndexSearcher.
 This affects DrillSidewaysQuery and any user-provided custom Query 
 implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6661) Allow queries to opt out of caching

2015-07-07 Thread Terry Smith (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-6661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616723#comment-14616723
]

Terry Smith commented on LUCENE-6661:
-

I'd completely missed the issue with marker interfaces, this really ought to be
a method on Weight itself, perhaps Weight.cacheCompatible().

You suggested workaround sounds a little special casey. I'd be concerned that a
future release something would change in such a way that the workaround would
be lost with no alternative. Specifically, it relies on the cache
implementation tracking usage when the cache itself is pluggable (it could be
replaced with one that does not) and when LRUQueryCache itself in play I see
the following issues:

1) The queries that we know ahead of time should never be cached would still
take up room in the ring buffer and thus push aside other less frequent queries
that could be great cache candidates.

2) Special care would want to be taken over the Query instances used in the
ring buffer and cache so that things like dependent FacetCollectors don't get
added and bloat memory usage. You described earlier how to handle this from
createWeight().

3) CachingWrapperWeight forces the cached query to use scorer() instead of
bulkScorer(). Both my custom query and DrillSidewaysQuery implement a custom
bulkScorer() method and throw an UnsupportedOperationException from scorer().
They break when wrapped in a CachingWraperWeight. The ability to opt of of
caching would remove the need for the hacky workaround in DrillSideways.

My current solution is a custom QueryCache implementation that just delegates
to the LRUQueryCache and does not propagate doCache() for some Weights.
However, this has the same problem with wrapped queries as the marker interface
scenario.

Allow queries to opt out of caching
---

Key: LUCENE-6661
URL: https://issues.apache.org/jira/browse/LUCENE-6661
Project: Lucene - Core
Issue Type: Improvement
Affects Versions: 5.2
Reporter: Terry Smith
Priority: Minor
Attachments: LUCENE-6661.patch

Some queries have out-of-band dependencies that make them incompatible with
caching, it'd be great if they could opt out of the new fancy query/filter
cache in IndexSearcher.
This affects DrillSidewaysQuery and any user-provided custom Query
implementations.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-6661) Allow queries to opt out of caching

2015-07-06 Thread Terry Smith (JIRA)

Terry Smith created LUCENE-6661:
---

 Summary: Allow queries to opt out of caching
 Key: LUCENE-6661
 URL: https://issues.apache.org/jira/browse/LUCENE-6661
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 5.2
Reporter: Terry Smith
Priority: Minor


Some queries have out-of-band dependencies that make them incompatible with 
caching, it'd be great if they could opt out of the new fancy query/filter 
cache in IndexSearcher.

This affects DrillSidewaysQuery and any user-provided custom Query 
implementations.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-6661) Allow queries to opt out of caching

2015-07-06 Thread Terry Smith (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-6661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Terry Smith updated LUCENE-6661:

Attachment: LUCENE-6661.patch

Rather than add a new method to Query/Weight this feature I've added a small 
marker interface and an instanceof check to prototype this feature.

If this is of interest we should decide whether Query, Weight, or both could 
implement this interface to disable caching.


 Allow queries to opt out of caching
 ---

 Key: LUCENE-6661
 URL: https://issues.apache.org/jira/browse/LUCENE-6661
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 5.2
Reporter: Terry Smith
Priority: Minor
 Attachments: LUCENE-6661.patch


 Some queries have out-of-band dependencies that make them incompatible with 
 caching, it'd be great if they could opt out of the new fancy query/filter 
 cache in IndexSearcher.
 This affects DrillSidewaysQuery and any user-provided custom Query 
 implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6639) LRUQueryCache.CachingWrapperWeight not calling policy.onUse() if the first scorer is skipped

2015-07-06 Thread Terry Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615087#comment-14615087
 ] 

Terry Smith commented on LUCENE-6639:
-

Ah, I didn't realize the highlighters were creating the weights to extract the 
terms, that makes sense.

I like the idea of just calling onUse() the first time scorer() is called, that 
ought to be more robust and is very easy to understand.


 LRUQueryCache.CachingWrapperWeight not calling policy.onUse() if the first 
 scorer is skipped
 

 Key: LUCENE-6639
 URL: https://issues.apache.org/jira/browse/LUCENE-6639
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 5.3
Reporter: Terry Smith
Priority: Minor
 Attachments: LUCENE-6639.patch


 The method 
 {{org.apache.lucene.search.LRUQueryCache.CachingWrapperWeight.scorer(LeafReaderContext)}}
  starts with
 {code}
 if (context.ord == 0) {
 policy.onUse(getQuery());
 }
 {code}
 which can result in a missed call for queries that return a null scorer for 
 the first segment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6639) LRUQueryCache.CachingWrapperWeight not calling policy.onUse() if the first scorer is skipped

2015-06-30 Thread Terry Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608281#comment-14608281
 ] 

Terry Smith commented on LUCENE-6639:
-

This doesn't seem pressing but irked me enough to submit a ticket. It feels 
that we should be able to be more correct but the current API isn't very 
supportive of that work flow.

I slightly prefer calling onUse() from createWeight() as it does make this edge 
case of the first segment go away which I feel is harder to reason about than 
someone creating a weight and not using it. The improved multi-threaded search 
code in IndexSearcher is a great example of this misbehaving where there is no 
guarantee that the first segment's Weight.scorer() will be called before the 
other segments. However I'm not familiar with use cases that use 
Query.createWeight() without executing some kind of search or explain to know 
if they are more of an issue.

Is adding bookend methods to more correctly detect the begin/end of the search 
phase seen as too messy and special casey?

At the end of the day I also wonder if it's worth the complexity but wanted to 
open this ticket to bootstrap the discussion as this could be a hard problem to 
diagnose in the future (someone wants to know why their query isn't getting 
cached and it's due to some obscure detail like this).





 LRUQueryCache.CachingWrapperWeight not calling policy.onUse() if the first 
 scorer is skipped
 

 Key: LUCENE-6639
 URL: https://issues.apache.org/jira/browse/LUCENE-6639
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 5.3
Reporter: Terry Smith
Priority: Minor
 Attachments: LUCENE-6639.patch


 The method 
 {{org.apache.lucene.search.LRUQueryCache.CachingWrapperWeight.scorer(LeafReaderContext)}}
  starts with
 {code}
 if (context.ord == 0) {
 policy.onUse(getQuery());
 }
 {code}
 which can result in a missed call for queries that return a null scorer for 
 the first segment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-6639) LRUQueryCache.CachingWrapperWeight not calling policy.onUse() if the first scorer is skipped

2015-06-29 Thread Terry Smith (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-6639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Terry Smith updated LUCENE-6639:

Attachment: LUCENE-6639.patch

Attached unit test will fail if the extra IndexWriter.commit() gets triggered 
or the BooleanQuery clauses are shuffled to make the first clauses' scorer null 
for the first segment.


 LRUQueryCache.CachingWrapperWeight not calling policy.onUse() if the first 
 scorer is skipped
 

 Key: LUCENE-6639
 URL: https://issues.apache.org/jira/browse/LUCENE-6639
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 5.3
Reporter: Terry Smith
Priority: Minor
 Attachments: LUCENE-6639.patch


 The method 
 {{org.apache.lucene.search.LRUQueryCache.CachingWrapperWeight.scorer(LeafReaderContext)}}
  starts with
 {code}
 if (context.ord == 0) {
 policy.onUse(getQuery());
 }
 {code}
 which can result in a missed call for queries that return a null scorer for 
 the first segment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-6639) LRUQueryCache.CachingWrapperWeight not calling policy.onUse() if the first scorer is skipped

2015-06-29 Thread Terry Smith (JIRA)

Terry Smith created LUCENE-6639:
---

 Summary: LRUQueryCache.CachingWrapperWeight not calling 
policy.onUse() if the first scorer is skipped
 Key: LUCENE-6639
 URL: https://issues.apache.org/jira/browse/LUCENE-6639
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 5.3
Reporter: Terry Smith
Priority: Minor


The method 
{{org.apache.lucene.search.LRUQueryCache.CachingWrapperWeight.scorer(LeafReaderContext)}}
 starts with

{code}
if (context.ord == 0) {
policy.onUse(getQuery());
}
{code}

which can result in a missed call for queries that return a null scorer for the 
first segment.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6305) BooleanQuery.equals should ignore clause order

2015-06-19 Thread Terry Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593457#comment-14593457
 ] 

Terry Smith commented on LUCENE-6305:
-

Oops, read the patch too quickly and missed that key detail! Sorry for the 
noise.



 BooleanQuery.equals should ignore clause order
 --

 Key: LUCENE-6305
 URL: https://issues.apache.org/jira/browse/LUCENE-6305
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-6305.patch


 BooleanQuery.equals is sensible to the order in which clauses have been 
 added. So for instance +A +B would be considered different from +B +A 
 although it generates the same matches and scores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6305) BooleanQuery.equals should ignore clause order

2015-06-19 Thread Terry Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593447#comment-14593447
 ] 

Terry Smith commented on LUCENE-6305:
-

Having BooleanQuery.equals() ignore order is a great idea but I think it'd be 
better if we could preserve the original clause order for Query.toString(), the 
Explanation, debugging and test expectations. 

Additionally, I've been burnt by JVM changes to String.hashCode() that cause 
HashMapString,? to order entries differently when run in a newer JVM. Are the 
Query hash codes immune to this problem?


 BooleanQuery.equals should ignore clause order
 --

 Key: LUCENE-6305
 URL: https://issues.apache.org/jira/browse/LUCENE-6305
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-6305.patch


 BooleanQuery.equals is sensible to the order in which clauses have been 
 added. So for instance +A +B would be considered different from +B +A 
 although it generates the same matches and scores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6305) BooleanQuery.equals should ignore clause order

2015-06-19 Thread Terry Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593467#comment-14593467
 ] 

Terry Smith commented on LUCENE-6305:
-

Slightly off topic to your original goal, but what do you think about deduping 
repeated non scoring (FILTER, MUST_NOT) clauses from the list in the query or 
do you see that as an possible optimization when building the weights/scorers?




 BooleanQuery.equals should ignore clause order
 --

 Key: LUCENE-6305
 URL: https://issues.apache.org/jira/browse/LUCENE-6305
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-6305.patch


 BooleanQuery.equals is sensible to the order in which clauses have been 
 added. So for instance +A +B would be considered different from +B +A 
 although it generates the same matches and scores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6446) Simplify Explanation API

2015-04-22 Thread Terry Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507111#comment-14507111
 ] 

Terry Smith commented on LUCENE-6446:
-

bq. I removed it because it was not always in the summary (only when using 
ComplexExplanation) as well as redundant with the description which is explicit 
when there is no match, for instance TermWeight's no matching term or 
BooleanWeight no match on required clause?

That makes sense. Removing the redundant information is definitely the way to 
go.


I also noticed that the new Explanation.noMatch() methods look a little trappy. 
They both take the child details and drop them on the floor.

{code}
  public static Explanation noMatch(String description, CollectionExplanation 
details) {
return new Explanation(false, 0f, description, Collections.emptyList());
  }
{code}

I think the noMatch() methods should either add the details to the created 
explanation or not accept them as parameters. Having a non-matching explanation 
contain child details can be really useful for complex queries. What do you 
think?



 Simplify Explanation API
 

 Key: LUCENE-6446
 URL: https://issues.apache.org/jira/browse/LUCENE-6446
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Fix For: Trunk, 5.2

 Attachments: LUCENE-6446.patch


 We should make this API easier to consume, for instance:
  - enforce important components to be non-null (eg. description)
  - decouple entirely the score computation from whether there is a match or 
 not (Explanation assumes there is a match if the score is  0, you need to 
 use ComplexExplanation to override this behaviour)
  - return an empty array instead of null when there are no details



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6446) Simplify Explanation API

2015-04-22 Thread Terry Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507038#comment-14507038
 ] 

Terry Smith commented on LUCENE-6446:
-

The refactored Explanation looks great, however I see a couple of small issues 
worth raising.

1. The constructor is private and there is a protected toString(int depth) 
method, it doesn't look like anyone else is calling it and no-one can subclass 
it. Should this method be private?

2. The toString() output is different! ComplexExplanation had a slightly 
different getSummary() method:

{code}
return getValue() +  = 
  + (isMatch() ? (MATCH)  : (NON-MATCH) )
  + getDescription();
{code}

versus

{code}
return getValue() +  =  + getDescription();
{code}

I find this extra context invaluable, especially with the decoupling of score 
and match, we can't assume that a score of 0 is a NON-MATCH yet the output no 
longer tells is if an explanation is a MATCH or not.

I understand that I can roll my own string building code with the current API. 
It'd be great if the default output was as useful as possible.


 Simplify Explanation API
 

 Key: LUCENE-6446
 URL: https://issues.apache.org/jira/browse/LUCENE-6446
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Fix For: Trunk, 5.2

 Attachments: LUCENE-6446.patch


 We should make this API easier to consume, for instance:
  - enforce important components to be non-null (eg. description)
  - decouple entirely the score computation from whether there is a match or 
 not (Explanation assumes there is a match if the score is  0, you need to 
 use ComplexExplanation to override this behaviour)
  - return an empty array instead of null when there are no details



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-6385) NullPointerException from Highlighter.getBestFragment()

2015-04-01 Thread Terry Smith (JIRA)

Terry Smith created LUCENE-6385:
---

 Summary: NullPointerException from Highlighter.getBestFragment()
 Key: LUCENE-6385
 URL: https://issues.apache.org/jira/browse/LUCENE-6385
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/highlighter
Affects Versions: 5.1
Reporter: Terry Smith


When testing against the 5.1 nightly snapshots I've come across a 
NullPointerException in highlighting when nothing would be highlighted. This 
does not happen with 5.0.

{noformat}
java.lang.NullPointerException
at 
__randomizedtesting.SeedInfo.seed([3EDC6EB0FA552B34:9971866E394F5FD0]:0)
at 
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extractWeightedSpanTerms(WeightedSpanTermExtractor.java:311)
at 
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:151)
at 
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:515)
at 
org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:219)
at 
org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:187)
at 
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:196)
at 
org.apache.lucene.search.highlight.Highlighter.getBestFragments(Highlighter.java:156)
at 
org.apache.lucene.search.highlight.Highlighter.getBestFragment(Highlighter.java:102)
at 
org.apache.lucene.search.highlight.Highlighter.getBestFragment(Highlighter.java:80)
at 
org.apache.lucene.search.highlight.MissesTest.testPhraseQuery(MissesTest.java:50)
{noformat}

I've written a small unit test and used git bisect to narrow the regression to 
the following commit:

{noformat}
commit 24e4eefaefb1837d1d4fa35f7669c2b264f872ac
Author: Michael McCandless mikemcc...@apache.org
Date:   Tue Mar 31 08:48:28 2015 +

LUCENE-6308: cutover Spans to DISI, reuse ConjunctionDISI, use two-phased 
iteration

git-svn-id: 
https://svn.apache.org/repos/asf/lucene/dev/branches/branch_5x@1670273 
13f79535-47bb-0310-9956-ffa450edef68
{noformat}

The problem looks quite simple, 
WeightedSpanTermExtractor.extractWeightedSpanTerms() needs an early return if 
SpanQuery.getSpans() returns null. All other callers check against this.

Unit test and fix (against the regressed commit) attached.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-6385) NullPointerException from Highlighter.getBestFragment()

2015-04-01 Thread Terry Smith (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Terry Smith updated LUCENE-6385:

Attachment: LUCENE-6385.patch

 NullPointerException from Highlighter.getBestFragment()
 ---

 Key: LUCENE-6385
 URL: https://issues.apache.org/jira/browse/LUCENE-6385
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/highlighter
Affects Versions: 5.1
Reporter: Terry Smith
 Attachments: LUCENE-6385.patch


 When testing against the 5.1 nightly snapshots I've come across a 
 NullPointerException in highlighting when nothing would be highlighted. This 
 does not happen with 5.0.
 {noformat}
 java.lang.NullPointerException
   at 
 __randomizedtesting.SeedInfo.seed([3EDC6EB0FA552B34:9971866E394F5FD0]:0)
   at 
 org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extractWeightedSpanTerms(WeightedSpanTermExtractor.java:311)
   at 
 org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:151)
   at 
 org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:515)
   at 
 org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:219)
   at 
 org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:187)
   at 
 org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:196)
   at 
 org.apache.lucene.search.highlight.Highlighter.getBestFragments(Highlighter.java:156)
   at 
 org.apache.lucene.search.highlight.Highlighter.getBestFragment(Highlighter.java:102)
   at 
 org.apache.lucene.search.highlight.Highlighter.getBestFragment(Highlighter.java:80)
   at 
 org.apache.lucene.search.highlight.MissesTest.testPhraseQuery(MissesTest.java:50)
 {noformat}
 I've written a small unit test and used git bisect to narrow the regression 
 to the following commit:
 {noformat}
 commit 24e4eefaefb1837d1d4fa35f7669c2b264f872ac
 Author: Michael McCandless mikemcc...@apache.org
 Date:   Tue Mar 31 08:48:28 2015 +
 LUCENE-6308: cutover Spans to DISI, reuse ConjunctionDISI, use two-phased 
 iteration
 
 git-svn-id: 
 https://svn.apache.org/repos/asf/lucene/dev/branches/branch_5x@1670273 
 13f79535-47bb-0310-9956-ffa450edef68
 {noformat}
 The problem looks quite simple, 
 WeightedSpanTermExtractor.extractWeightedSpanTerms() needs an early return if 
 SpanQuery.getSpans() returns null. All other callers check against this.
 Unit test and fix (against the regressed commit) attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [DISCUSS] Change Query API to make queries immutable in 6.0

2015-03-31 Thread Terry Smith

Adrien,

Thanks for the explanation. It seems a pity to make queries just nearly
immutable. Do you have any interest in adding a boost parameter to clone()
so they really could be immutable?

--Terry


On Tue, Mar 31, 2015 at 9:44 AM, Adrien Grand jpou...@gmail.com wrote:

 Hi Terry,

 Indeed this is for query rewriting. For instance if you have a boolean
 query with a boost of 5 that wraps a single MUST clause with a term
 query, then we rewrite to this to the inner term query and update its
 boost using clone() and setBoost() in order to not modify in-place a
 user-modified query.

 On Tue, Mar 31, 2015 at 3:37 PM, Terry Smith sheb...@gmail.com wrote:
  Adrien,
 
  I missed the reason that boost is going to stay mutable. Is this to
 support
  query rewriting?
 
  --Terry
 
 
  On Tue, Mar 31, 2015 at 7:21 AM, Robert Muir rcm...@gmail.com wrote:
 
  Same with BooleanQuery. the go-to ctor should just take 'clauses'
 
  On Tue, Mar 31, 2015 at 5:18 AM, Michael McCandless
  luc...@mikemccandless.com wrote:
   +1
  
   For PhraseQuery we could also have a common-case ctor that just takes
   the terms (and assumes sequential positions)?
  
   Mike McCandless
  
   http://blog.mikemccandless.com
  
  
   On Tue, Mar 31, 2015 at 5:10 AM, Adrien Grand jpou...@gmail.com
 wrote:
   Recent changes that added automatic filter caching to IndexSearcher
   uncovered some traps with our queries when it comes to using them as
   cache keys. The problem comes from the fact that some of our main
   queries are mutable, and modifying them while they are used as cache
   keys makes the entry that they are caching invisible (because the
 hash
   code changed too) yet still using memory.
  
   While I think most users would be unaffected as it is rather uncommon
   to modify queries after having passed them to IndexSearcher, I would
   like to remove this trap by making queries immutable: everything
   should be set at construction time except the boost parameter that
   could still be changed with the same clone()/setBoost() mechanism as
   today.
  
   First I would like to make sure that it sounds good to everyone and
   then to discuss what the API should look like. Most of our queries
   happen to be immutable already (NumericRangeQuery, TermsQuery,
   SpanNearQuery, etc.) but some aren't and the main exceptions are:
- BooleanQuery,
- DisjunctionMaxQuery,
- PhraseQuery,
- MultiPhraseQuery.
  
   We could take all parameters that are set as setters and move them to
   constructor arguments. For the above queries, this would mean (using
   varargs for ease of use):
  
 BooleanQuery(boolean disableCoord, int minShouldMatch,
   BooleanClause... clauses)
 DisjunctionMaxQuery(float tieBreakMul, Query... clauses)
  
   For PhraseQuery and MultiPhraseQuery, the closest to what we have
   today would require adding new classes to wrap terms and positions
   together, for instance:
  
   class TermAndPosition {
 public final BytesRef term;
 public final int position;
   }
  
   so that eg. PhraseQuery would look like:
  
 PhraseQuery(int slop, String field, TermAndPosition... terms)
  
   MultiPhraseQuery would be the same with several terms at the same
   position.
  
   Comments/ideas/concerns are highly welcome.
  
   --
   Adrien
  
   -
   To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
   For additional commands, e-mail: dev-h...@lucene.apache.org
  
  
   -
   To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
   For additional commands, e-mail: dev-h...@lucene.apache.org
  
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 
 



 --
 Adrien

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [DISCUSS] Change Query API to make queries immutable in 6.0

2015-03-31 Thread Terry Smith

Adrien,

I missed the reason that boost is going to stay mutable. Is this to support
query rewriting?

--Terry


On Tue, Mar 31, 2015 at 7:21 AM, Robert Muir rcm...@gmail.com wrote:

 Same with BooleanQuery. the go-to ctor should just take 'clauses'

 On Tue, Mar 31, 2015 at 5:18 AM, Michael McCandless
 luc...@mikemccandless.com wrote:
  +1
 
  For PhraseQuery we could also have a common-case ctor that just takes
  the terms (and assumes sequential positions)?
 
  Mike McCandless
 
  http://blog.mikemccandless.com
 
 
  On Tue, Mar 31, 2015 at 5:10 AM, Adrien Grand jpou...@gmail.com wrote:
  Recent changes that added automatic filter caching to IndexSearcher
  uncovered some traps with our queries when it comes to using them as
  cache keys. The problem comes from the fact that some of our main
  queries are mutable, and modifying them while they are used as cache
  keys makes the entry that they are caching invisible (because the hash
  code changed too) yet still using memory.
 
  While I think most users would be unaffected as it is rather uncommon
  to modify queries after having passed them to IndexSearcher, I would
  like to remove this trap by making queries immutable: everything
  should be set at construction time except the boost parameter that
  could still be changed with the same clone()/setBoost() mechanism as
  today.
 
  First I would like to make sure that it sounds good to everyone and
  then to discuss what the API should look like. Most of our queries
  happen to be immutable already (NumericRangeQuery, TermsQuery,
  SpanNearQuery, etc.) but some aren't and the main exceptions are:
   - BooleanQuery,
   - DisjunctionMaxQuery,
   - PhraseQuery,
   - MultiPhraseQuery.
 
  We could take all parameters that are set as setters and move them to
  constructor arguments. For the above queries, this would mean (using
  varargs for ease of use):
 
BooleanQuery(boolean disableCoord, int minShouldMatch,
  BooleanClause... clauses)
DisjunctionMaxQuery(float tieBreakMul, Query... clauses)
 
  For PhraseQuery and MultiPhraseQuery, the closest to what we have
  today would require adding new classes to wrap terms and positions
  together, for instance:
 
  class TermAndPosition {
public final BytesRef term;
public final int position;
  }
 
  so that eg. PhraseQuery would look like:
 
PhraseQuery(int slop, String field, TermAndPosition... terms)
 
  MultiPhraseQuery would be the same with several terms at the same
 position.
 
  Comments/ideas/concerns are highly welcome.
 
  --
  Adrien
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2015-02-24 Thread Terry Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335022#comment-14335022
 ] 

Terry Smith commented on LUCENE-6229:
-

Understood.

If you end up keeping getChildren(), how do you feel about making it well 
defined by capturing these constraints in the Javadoc?



 Remove Scorer.getChildren?
 --

 Key: LUCENE-6229
 URL: https://issues.apache.org/jira/browse/LUCENE-6229
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Priority: Minor

 This API is used in a single place in our code base: 
 ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
 using this API from a collector only works if setScorer is called with an 
 actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
 disjunctions) so it needs a custom IndexSearcher that does not use the 
 BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2015-02-24 Thread Terry Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334937#comment-14334937
 ] 

Terry Smith commented on LUCENE-6229:
-

[~rcmuir] Sorry for excluding that scenario, it wasn't intentional.

If you all decide to keep getChildren(), then I'd love to get the contract 
described so people know what to expect.

I think these statements are correct:

# Scorer.getChildren() returns the immediate child scorers
# A returned scorer may be 
## unpositioned (never had next() or advance() called on it)
## positioned on a valid document that is before, on, or after the current 
document
## exhausted and thus positioned at NO_MORE_DOCS
# You MUST NOT call next() or advance() on the returned scorers yourself

And have these questions:

# Can I walk the returned scorers to get to all non-null leaf scorers?
# Can I position the returned scorers on the current document by calling 
freq(), score() or something else?




 Remove Scorer.getChildren?
 --

 Key: LUCENE-6229
 URL: https://issues.apache.org/jira/browse/LUCENE-6229
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Priority: Minor

 This API is used in a single place in our code base: 
 ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
 using this API from a collector only works if setScorer is called with an 
 actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
 disjunctions) so it needs a custom IndexSearcher that does not use the 
 BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2015-02-20 Thread Terry Smith (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329069#comment-14329069
]

Terry Smith commented on LUCENE-6229:
-

I’ll summarize this as two options:

# Remove getChildren() as it complicates the code hurting the ability to
maintain it and make performance enhancements.
# Make getChildren() a more well defined API that gives you the ability to
retrieve child scorers that are correctly positioned.

You are looking for data to backup option 2 to determine if this is an API that
is worth fixing/keeping.

Here are the use cases that I have:

# Custom scoring of a BooleanQuery. A query that wraps any BooleanQuery which
it uses for recall but supplies it’s own scoring algorithm to aggregate the
scores from the original clauses.
# Custom DrillSidewaysQuery. A query that can use the sideways scorers for
precision instead of just recall.
# Recursive DrillSidewaysQuery (not implemented, tricky). A query that could
perform DrillSideways for union or in a nested fashion.
# Auxillary metadata. An enhancement that can augment the current recall
(boolean match) and precision (float score) for a document in the search
pipeline to add extra information that can be used from Query and FunctionValue
instances (collected via a custom Collector) and supported by a custom
SortField.

These can be categorized into two camps:

# Using an existing Query (typically BooleanQuery) to find matches but
providing some combination of
## custom scoring that isn’t compatible with the Similarity API.
## custom recall (think DrillSideways)
# Adding extra information to the search pipeline that can be
## generated by leaf queries and value sources
## aggregated by composing queries (BooleanQuery, DisjunctionMaxQuery, etc)
## survive wrapping queries and value sources that don’t know about it
## collected and sorted on

Hope this helps.

Remove Scorer.getChildren?
--

Key: LUCENE-6229
URL: https://issues.apache.org/jira/browse/LUCENE-6229
Project: Lucene - Core
Issue Type: Improvement
Reporter: Adrien Grand
Priority: Minor

This API is used in a single place in our code base:
ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that
using this API from a collector only works if setScorer is called with an
actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in
disjunctions) so it needs a custom IndexSearcher that does not use the
BulkScorer API.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2015-02-17 Thread Terry Smith (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324686#comment-14324686
]

Terry Smith commented on LUCENE-6229:
-

[~rcmuir] Thanks for the backstory. I've been trying to wrap my head around
where Lucene is going and this kind of information really helps.

It sounds like both [~rcmuir] and [~thetaphi] agree that Scorer.getChildren()
is not an API that Lucene should support. Reading between the lines, this
implies to me that scoring is moving to a bulk-only approach, which will bring
great performance gains.

A best effort implementation of Scorer.getChildren() would be something that
I'd be uncomfortable adding features on top of, although it could be useful for
debugging. Unfortunately this is a showstopper for me as I rely on
Scorer.getChildren() for some critical features, and need to do some serious
thinking to figure out if I can formulate an alternative approach.

Remove Scorer.getChildren?
--

Key: LUCENE-6229
URL: https://issues.apache.org/jira/browse/LUCENE-6229
Project: Lucene - Core
Issue Type: Improvement
Reporter: Adrien Grand
Priority: Minor

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2015-02-12 Thread Terry Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318272#comment-14318272
 ] 

Terry Smith commented on LUCENE-6229:
-

[~jpountz] - I'm going to split the freq() vs score() thing into a separate 
ticket so it doesn't hijack this one. I intend to take the unit test I 
previously pasted and extend it to create some randomized BooleanQuerys to try 
and locate possibly broken edge cases and give a safety blanket for future 
refactoring.

I'll make these assumptions, shout out if they are incorrect.

For a BooleanQuery I should be able to perform doc-at-a-time scoring, meaning 
that in a Collector or Scorer I can

1. Find all Scorers from the child clauses of the BooleanQuery
2. Have those Scorers be positioned for me by calling score() or freq()


 Remove Scorer.getChildren?
 --

 Key: LUCENE-6229
 URL: https://issues.apache.org/jira/browse/LUCENE-6229
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Priority: Minor

 This API is used in a single place in our code base: 
 ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
 using this API from a collector only works if setScorer is called with an 
 actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
 disjunctions) so it needs a custom IndexSearcher that does not use the 
 BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?


[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314219#comment-14314219
 ] 

Terry Smith commented on LUCENE-6229:
-

Like Stefan, I'm also using this functionality to access child scorers on a per 
document basis. Currently for some custom query enhancements and a custom drill 
sideways implementation.

Like Adrien, I've also had to wrap queries in a custom NonBulkScoringQuery to 
force doc-at-a-time scoring.

It'd be great to simplify this workflow as I've been calling Scorer.freq() to 
position all the child scorers (from a BooleanQuery) and as of the 5.1 nightly 
builds am needing to call Scorer.score() instead for positioning due to changes 
in MinShouldMatchSumScorer.

I'd love to have a way to not only get the child scorers but be confident that 
they were all correctly positioned.


 Remove Scorer.getChildren?
 --

 Key: LUCENE-6229
 URL: https://issues.apache.org/jira/browse/LUCENE-6229
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Priority: Minor

 This API is used in a single place in our code base: 
 ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
 using this API from a collector only works if setScorer is called with an 
 actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
 disjunctions) so it needs a custom IndexSearcher that does not use the 
 BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?


[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314361#comment-14314361
 ] 

Terry Smith commented on LUCENE-6229:
-

h2. freq() vs score()

I think the lazy positioning in MinShouldMatchSumScorer is misbehaving.

Drop these three methods into TestBooleanMinShouldMatch.java to see.
{code:java}
public void testMinNrShouldMatchFreq() throws Exception {
  BooleanQuery q = new BooleanQuery();
  q.add(new TermQuery(new Term(data, 1)), Occur.SHOULD);
  q.add(new TermQuery(new Term(data, 2)), Occur.SHOULD);
  q.add(new TermQuery(new Term(data, 3)), Occur.SHOULD);
  q.add(new TermQuery(new Term(id, 0)), Occur.MUST);
  q.setMinimumNumberShouldMatch(2);
  verifyNrHits(q, 1);
  s.search(q, new SimpleCollector() {
private Scorer scorer;
private CollectionScorer leafScorers;
@Override
public void setScorer(Scorer scorer) throws IOException {
  this.scorer = scorer;
  this.leafScorers = leafScorers(new ArrayListScorer(), scorer);
  assertEquals(4, leafScorers.size());
}

@Override
public void collect(int doc) throws IOException {
  assertEquals(0, doc);
  scorer.freq(); // position leaf scorers
  for (Scorer leafScorer : leafScorers) {
assertEquals(0, leafScorer.docID());
  }
}
  });
}

public void testMinNrShouldMatchScore() throws Exception {
  BooleanQuery q = new BooleanQuery();
  q.add(new TermQuery(new Term(data, 1)), Occur.SHOULD);
  q.add(new TermQuery(new Term(data, 2)), Occur.SHOULD);
  q.add(new TermQuery(new Term(data, 3)), Occur.SHOULD);
  q.add(new TermQuery(new Term(id, 0)), Occur.MUST);
  q.setMinimumNumberShouldMatch(2);
  verifyNrHits(q, 1);
  s.search(q, new SimpleCollector() {
private Scorer scorer;
private CollectionScorer leafScorers;
@Override
public void setScorer(Scorer scorer) throws IOException {
  this.scorer = scorer;
  this.leafScorers = leafScorers(new ArrayListScorer(), scorer);
  assertEquals(4, leafScorers.size());
}

@Override
public void collect(int doc) throws IOException {
  assertEquals(0, doc);
  scorer.score(); // position leaf scorers
  for (Scorer leafScorer : leafScorers) {
assertEquals(0, leafScorer.docID());
  }
}
  });
}

private static CollectionScorer leafScorers(CollectionScorer target, 
Scorer scorer) {
  CollectionChildScorer childScorers = scorer.getChildren();
  if (childScorers.isEmpty()) {
target.add(scorer);
  } else {
for (ChildScorer childScorer : childScorers) {
  leafScorers(target, childScorer.child);
}
  }
  return target;
}
{code}

Here the one that uses freq() to position the sub scorers fails but the one 
that uses score() succeeds.

h2. middle ground

I have Scorer constructors, Weight.scorer(), Weight.explain() and Collectors 
all calling Scorer.getChildren(). But when using my custom Collectors I'm 
careful to wrap the Query in a custom NonBulkScoringQuery that prevents bulk 
scoring to work around the trap. The NonBulkScoringQuery I mention is a simple 
delegating Query that allows Weight.bulkScorer() to use it's default 
implementation instead of allowing the wrapped Query to override it.

I like removing the trap for bulk scoring queries, it's really subtle and it 
took me a while to diagnose the first time I hit it.

Having a separate entry point into IndexSearcher to achieve doc-at-a-time 
scoring that supports getChildren() would be awesome. I'm not so hot on having 
to cast the collector, do you think there could be a way to preserve type 
safety here?


 Remove Scorer.getChildren?
 --

 Key: LUCENE-6229
 URL: https://issues.apache.org/jira/browse/LUCENE-6229
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Priority: Minor

 This API is used in a single place in our code base: 
 ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
 using this API from a collector only works if setScorer is called with an 
 actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
 disjunctions) so it needs a custom IndexSearcher that does not use the 
 BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6232) Replace ValueSource context Map with a more concrete data type


[ 
https://issues.apache.org/jira/browse/LUCENE-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314242#comment-14314242
 ] 

Terry Smith commented on LUCENE-6232:
-

I have custom code that injects objects into this map. If you refactor this to 
be concrete class could you leave it non-final so a custom FunctionQuery could 
provide it's own subclassed instance of this context?


 Replace ValueSource context Map with a more concrete data type
 --

 Key: LUCENE-6232
 URL: https://issues.apache.org/jira/browse/LUCENE-6232
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Mike Drob

 Inspired by LUCENE-3973
 The context object used by ValueSource and friends is a raw Map that provides 
 no type safety guarantees. In our current state, there are lots of warnings 
 about unchecked casts, raw types, and generally unsafe code from the 
 compiler's perspective.
 There are several common patterns and types of Objects that we store in the 
 context. It would be beneficial to instead use a class with typed methods for 
 get/set of Scorer, Weights, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6232) Replace ValueSource context Map with a more concrete data type