[jira] [Commented] (LUCENE-7382) Wrong default attribute factory in use

2016-07-15 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15379453#comment-15379453
 ] 

Terry Smith commented on LUCENE-7382:
-

Thanks, I didn't realize this would hit 6.2. I have nightly builds that follow 
the 6.2.0-SNAPSHOT and 7.0.0-SNAPSHOT artifacts on the ASF snapshot maven repo 
and this didn't hit my 6.2 branch yet.


> Wrong default attribute factory in use
> --
>
> Key: LUCENE-7382
> URL: https://issues.apache.org/jira/browse/LUCENE-7382
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: master (7.0), 6.2
>    Reporter: Terry Smith
>Assignee: Uwe Schindler
> Fix For: 6.2
>
>
> Originally reported to the mailing list: 
> http://mail-archives.apache.org/mod_mbox/lucene-java-user/201607.mbox/%3cCAJ0VynnMAH7N7byPevTV9Htxo-Nk-B7mwUwRgP4X8gN=v4p...@mail.gmail.com%3e
> LUCENE-7355 made a change to CustomAnalyzer.createComponents() such that it 
> uses a different AttributeFactory. 
> https://github.com/apache/lucene-solr/commit/e92a38af90d12e51390b4307ccbe0c24ac7b6b4e#diff-b39a076156e10aa7a4ba86af0357a0feL122
> The previous default was TokenStream.DEFAULT_TOKEN_ATTRIBUTE_FACTORY which 
> uses PackedTokenAttributeImpl while the new default is now 
> AttributeFactory.DEFAULT_ATTRIBUTE_FACTORY which does not use 
> PackedTokenAttributeImpl.
> [~thetaphi] Asked me to open an issue for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-7382) Wrong default attribute factory in use

2016-07-15 Thread Terry Smith (JIRA)
Terry Smith created LUCENE-7382:
---

 Summary: Wrong default attribute factory in use
 Key: LUCENE-7382
 URL: https://issues.apache.org/jira/browse/LUCENE-7382
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: master (7.0)
Reporter: Terry Smith


Originally reported to the mailing list: 
http://mail-archives.apache.org/mod_mbox/lucene-java-user/201607.mbox/%3cCAJ0VynnMAH7N7byPevTV9Htxo-Nk-B7mwUwRgP4X8gN=v4p...@mail.gmail.com%3e

LUCENE-7355 made a change to CustomAnalyzer.createComponents() such that it 
uses a different AttributeFactory. 
https://github.com/apache/lucene-solr/commit/e92a38af90d12e51390b4307ccbe0c24ac7b6b4e#diff-b39a076156e10aa7a4ba86af0357a0feL122


The previous default was TokenStream.DEFAULT_TOKEN_ATTRIBUTE_FACTORY which uses 
PackedTokenAttributeImpl while the new default is now 
AttributeFactory.DEFAULT_ATTRIBUTE_FACTORY which does not use 
PackedTokenAttributeImpl.

[~thetaphi] Asked me to open an issue for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6922) Improve svn to git workaround script

2015-12-16 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060811#comment-15060811
 ] 

Terry Smith commented on LUCENE-6922:
-

Does LUCENE-6933 affect this ticket?

> Improve svn to git workaround script
> 
>
> Key: LUCENE-6922
> URL: https://issues.apache.org/jira/browse/LUCENE-6922
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: -tools
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: svnBranchToGit.py, svnBranchToGit.py
>
>
> As the git-svn mirror for Lucene/Solr will be turned off near the end of 
> 2015, try and improve the workaround script to become more usable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6922) Improve svn to git workaround script

2015-12-16 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060830#comment-15060830
 ] 

Terry Smith commented on LUCENE-6922:
-

Ah, so I could consider this as a backup plan until LUCENE-6933 is ready?


> Improve svn to git workaround script
> 
>
> Key: LUCENE-6922
> URL: https://issues.apache.org/jira/browse/LUCENE-6922
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: -tools
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: svnBranchToGit.py, svnBranchToGit.py
>
>
> As the git-svn mirror for Lucene/Solr will be turned off near the end of 
> 2015, try and improve the workaround script to become more usable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6922) Improve svn to git workaround script

2015-12-07 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045060#comment-15045060
 ] 

Terry Smith commented on LUCENE-6922:
-

Is the announcement of the EOL to the git-svn mirror available publicly?

How will this affect users that rely on the github mirror to access the 
Lucene/Solr repository?



> Improve svn to git workaround script
> 
>
> Key: LUCENE-6922
> URL: https://issues.apache.org/jira/browse/LUCENE-6922
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: -tools
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: svnBranchToGit.py
>
>
> As the git-svn mirror for Lucene/Solr will be turned off near the end of 
> 2015, try and improve the workaround script to become more usable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6922) Improve svn to git workaround script

2015-12-07 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045069#comment-15045069
 ] 

Terry Smith commented on LUCENE-6922:
-

Oops, I guess I'm behind with the mailing list, just found the discussion and 
will include the link here for anyone else that misses it: 
http://mail-archives.apache.org/mod_mbox/lucene-dev/201512.mbox/%3ccal8pwkbfvt83zbczm0y-x-mdeth6hyc_xyejrev9fzzk5yx...@mail.gmail.com%3e



> Improve svn to git workaround script
> 
>
> Key: LUCENE-6922
> URL: https://issues.apache.org/jira/browse/LUCENE-6922
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: -tools
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: svnBranchToGit.py
>
>
> As the git-svn mirror for Lucene/Solr will be turned off near the end of 
> 2015, try and improve the workaround script to become more usable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6889) BooleanQuery.rewrite could easily optimize some simple cases

2015-11-10 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14998891#comment-14998891
 ] 

Terry Smith commented on LUCENE-6889:
-

Ah, that makes sense. I didn't realize the first scenario was dropping the MUST 
when the FILTER and MUST wrapped identical clauses or that the second scenario 
also included boost handling to avoid the scoring issue. Given that, this 
sounds like a great optimization.

I'll summarize the rules below, mind shouting out if I still misunderstand?

Rule 1
{noformat}
#a +a -> +a
{noformat}

 Rule 2
{noformat}
+*:*^b #f -> ConstantScoreQuery(f)^b
{noformat}

Rule 3
{noformat}
-a +a -> MatchNoDocsQuery
-a #a -> MatchNoDocsQuery
{noformat}






> BooleanQuery.rewrite could easily optimize some simple cases
> 
>
> Key: LUCENE-6889
> URL: https://issues.apache.org/jira/browse/LUCENE-6889
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> Follow-up of SOLR-8251: APIs and user interfaces sometimes encourage to write 
> BooleanQuery instances that are not optimal, for instance a typical case that 
> happens often with Solr/Elasticsearch is to send a request that has a 
> MatchAllDocsQuery as a query and some filter, which could be executed more 
> efficiently by directly wrapping the filter into a ConstantScoreQuery.
> Here are some ideas of rewrite operations that BooleanQuery could perform:
>  - remove FILTER clauses when they are also a MUST clause
>  - rewrite queries of the form "+*:* #filter" to a ConstantScoreQuery(filter)
>  - rewrite to a MatchNoDocsQuery when a clause that is a MUST or FILTER 
> clause is also a MUST_NOT clause



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6889) BooleanQuery.rewrite could easily optimize some simple cases

2015-11-10 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14998808#comment-14998808
 ] 

Terry Smith commented on LUCENE-6889:
-

I like the last one but believe that the other two aren't correct.


bq. remove FILTER clauses when they are also a MUST clause

Seeing as a FILTER is a non scoring MUST this just doesn't sound right. The 
FILTER could constrain the result set more than just the MUST alone.

e.g. +foo #(+foo +bar)


bq. rewrite queries of the form +*:* #filter" to a ConstantScoreQuery(filter)

I don't think you can drop a +:*:* without affecting the score, but you could 
drop a #*:* if the BooleanQuery has something else to force inclusion (other 
MUST, FILTER or some SHOULD with an appropriate minNumShouldMatch).

For this case could Solr/ElasticSearch add the MatchAllDocs as a FILTER instead 
of a MUST to allow for this optimization?


We could detect duplicate FILTER and MUST_NOT clauses as described in 
LUCENE-6787.

Jira is turning star colon star (*:*) to a bold colon, so apologies if this 
doesn't read well through the web interface.





> BooleanQuery.rewrite could easily optimize some simple cases
> 
>
> Key: LUCENE-6889
> URL: https://issues.apache.org/jira/browse/LUCENE-6889
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> Follow-up of SOLR-8251: APIs and user interfaces sometimes encourage to write 
> BooleanQuery instances that are not optimal, for instance a typical case that 
> happens often with Solr/Elasticsearch is to send a request that has a 
> MatchAllDocsQuery as a query and some filter, which could be executed more 
> efficiently by directly wrapping the filter into a ConstantScoreQuery.
> Here are some ideas of rewrite operations that BooleanQuery could perform:
>  - remove FILTER clauses when they are also a MUST clause
>  - rewrite queries of the form "+*:* #filter" to a ConstantScoreQuery(filter)
>  - rewrite to a MatchNoDocsQuery when a clause that is a MUST or FILTER 
> clause is also a MUST_NOT clause



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6679) Filter's Weight.explain returns an explanation with isMatch==true even on documents that don't match

2015-11-09 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996583#comment-14996583
 ] 

Terry Smith commented on LUCENE-6679:
-

Thanks Adrien, I'll give it a shot this morning and reach out as needed.

As a reminder, my patch only checks hits and the bug reports are for misses. 
I'll need to expand upon it to get better coverage for those also.

If I can't turn something around early this week I'll backup a little and at 
least get direct test for the underlying bug behind SOLR-8245 to go with it's 
fix.

> Filter's Weight.explain returns an explanation with isMatch==true even on 
> documents that don't match
> 
>
> Key: LUCENE-6679
> URL: https://issues.apache.org/jira/browse/LUCENE-6679
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
> Attachments: LUCENE-6679.patch
>
>
> This was reported by Trejkaz on the java-user list: 
> http://search-lucene.com/m/l6pAi19h4Y3DclgB1=Re+What+on+earth+is+FilteredQuery+explain+doing+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6679) Filter's Weight.explain returns an explanation with isMatch==true even on documents that don't match

2015-11-09 Thread Terry Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Terry Smith updated LUCENE-6679:

Attachment: LUCENE-6679.patch

Here is an updated patch against trunk that adds hit and miss explain checks to 
AssertingLeafCollector and hooks it up with the surrounding classes.

I've also introduced a new annotation called SuppressExplainChecks that I've 
applied to the following tests that would fail without.

  * TestSortRandom
  * TestLazyProxSkipping
  * TestDrillSideways
  * TestRangeFacetCounts
  * TestJoinUtil
  * TestFieldCacheSortRandom
  * TestCustomScoreQuery
  * TestCustomScoreQueryExplanations
  * TestFunctionQueryExplanations
  * TestForTooMuchCloning
  * TestTermAutomationQuery

Once you are happy with this patch, I'd like to get it on trunk so the jenkins 
servers can shake out any more failures and we can create tickets for any 
uncovered bugs.



> Filter's Weight.explain returns an explanation with isMatch==true even on 
> documents that don't match
> 
>
> Key: LUCENE-6679
> URL: https://issues.apache.org/jira/browse/LUCENE-6679
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
> Attachments: LUCENE-6679.patch, LUCENE-6679.patch
>
>
> This was reported by Trejkaz on the java-user list: 
> http://search-lucene.com/m/l6pAi19h4Y3DclgB1=Re+What+on+earth+is+FilteredQuery+explain+doing+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6679) Filter's Weight.explain returns an explanation with isMatch==true even on documents that don't match

2015-11-06 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994430#comment-14994430
 ] 

Terry Smith commented on LUCENE-6679:
-

Darn, sorry I didn't get back to pushing these new tests into the source, they 
would have helped to catch this.


It looks like branch_5x and trunk diverged here, this commit shows where this 
code was removed on branch_5x: 
https://github.com/apache/lucene-solr/commit/11cc6e53f85f7bc4b616bb38370ddfc704987337#diff-2b293bfd95e32f715a5c05b4e132f047L82

And this commit from trunk shows the move of Filter from Lucene to Solr: 
https://github.com/apache/lucene-solr/commit/9f8d64d4fb34eb2480e9a667c45f262d20f0#diff-4bf1c256f49c1a4ee4a50b1f8aeda1ddL1

You can see the trunk version of the file here: 
https://github.com/apache/lucene-solr/blob/9f8d64d4fb34eb2480e9a667c45f262d20f0/solr/core/src/java/org/apache/solr/search/Filter.java

The changes from the branch_5x commit are missing.

I don't know why this is the case, perhaps [~jpountz] can chime in as the 
original commit that fixes the explain output as a side effect (LUCENE-6601) 
looks like it ought to be on both branch_5x and trunk.



> Filter's Weight.explain returns an explanation with isMatch==true even on 
> documents that don't match
> 
>
> Key: LUCENE-6679
> URL: https://issues.apache.org/jira/browse/LUCENE-6679
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
> Attachments: LUCENE-6679.patch
>
>
> This was reported by Trejkaz on the java-user list: 
> http://search-lucene.com/m/l6pAi19h4Y3DclgB1=Re+What+on+earth+is+FilteredQuery+explain+doing+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6834) Remove BoostQuery.toString()'s hack with parenthesis

2015-10-08 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948670#comment-14948670
 ] 

Terry Smith commented on LUCENE-6834:
-

Great idea. It currently puts user-defined queries at a disadvantage as they 
cannot opt out of the parens, if all queries are wrapped in parens when boosted 
then the toString output is easier to read.


> Remove BoostQuery.toString()'s hack with parenthesis
> 
>
> Key: LUCENE-6834
> URL: https://issues.apache.org/jira/browse/LUCENE-6834
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: 6.0
>
>
> This hack was added in order not to break the string representation of our 
> queries in 5.x. However I don't think we should have it in trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6590) Explore different ways to apply boosts

2015-10-07 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947239#comment-14947239
 ] 

Terry Smith commented on LUCENE-6590:
-

Thanks Adrien.

My nightly regressions just picked this up from the published maven snapshots 
and I see that BoostQuery now includes MatchAllDocsQuery in it's 
NO_PARENS_REQUIRED_QUERIES list on branch_5x (this is awesome!). However this 
change is *not* available on trunk.

I've confirmed by checked the SVN repo directly (the github mirror tends to 
lag).

http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/BoostQuery.java

http://svn.apache.org/repos/asf/lucene/dev/branches/branch_5x/lucene/core/src/java/org/apache/lucene/search/BoostQuery.java

--Terry


> Explore different ways to apply boosts
> --
>
> Key: LUCENE-6590
> URL: https://issues.apache.org/jira/browse/LUCENE-6590
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.4
>
> Attachments: LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch, 
> LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch
>
>
> Follow-up from LUCENE-6570: the fact that all queries are mutable in order to 
> allow for applying a boost raises issues since it makes queries bad cache 
> keys since their hashcode can change anytime. We could just document that 
> queries should never be modified after they have gone through IndexSearcher 
> but it would be even better if the API made queries impossible to mutate at 
> all.
> I think there are two main options:
>  - either replace "void setBoost(boost)" with something like "Query 
> withBoost(boost)" which would return a clone that has a different boost
>  - or move boost handling outside of Query, for instance we could have a 
> (immutable) query impl that would be dedicated to applying boosts, that 
> queries that need to change boosts at rewrite time (such as BooleanQuery) 
> would use as a wrapper.
> The latter idea is from Robert and I like it a lot given how often I either 
> introduced or found a bug which was due to the boost parameter being ignored. 
> Maybe there are other options, but I think this is worth exploring.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6590) Explore different ways to apply boosts

2015-09-28 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933375#comment-14933375
 ] 

Terry Smith commented on LUCENE-6590:
-

Cheers Adrien. Sorry for the spammy replies before -- I wasn't expecting to see 
more than one discrepancy!

While you are looking at the Query.toString() behavior with respect to 
boosting, how would you feel about adding MatchAllDocsQuery.class to 
BoostQuery.NO_PARENS_REQUIRED_QUERIES so it's toString() doesn't change across 
releases?

Query q = new MatchAllDocsQuery();
q.setBoost(0);
q.toString() -> *:*^0.0

new BoostQuery(new MatchAllDocsQuery(), 0).toString() -> (*:*)^0.0


> Explore different ways to apply boosts
> --
>
> Key: LUCENE-6590
> URL: https://issues.apache.org/jira/browse/LUCENE-6590
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.4
>
> Attachments: LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch, 
> LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch
>
>
> Follow-up from LUCENE-6570: the fact that all queries are mutable in order to 
> allow for applying a boost raises issues since it makes queries bad cache 
> keys since their hashcode can change anytime. We could just document that 
> queries should never be modified after they have gone through IndexSearcher 
> but it would be even better if the API made queries impossible to mutate at 
> all.
> I think there are two main options:
>  - either replace "void setBoost(boost)" with something like "Query 
> withBoost(boost)" which would return a clone that has a different boost
>  - or move boost handling outside of Query, for instance we could have a 
> (immutable) query impl that would be dedicated to applying boosts, that 
> queries that need to change boosts at rewrite time (such as BooleanQuery) 
> would use as a wrapper.
> The latter idea is from Robert and I like it a lot given how often I either 
> introduced or found a bug which was due to the boost parameter being ignored. 
> Maybe there are other options, but I think this is worth exploring.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-6590) Explore different ways to apply boosts

2015-09-28 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933375#comment-14933375
 ] 

Terry Smith edited comment on LUCENE-6590 at 9/28/15 2:38 PM:
--

Cheers Adrien. Sorry for the spammy replies before -- I wasn't expecting to see 
more than one discrepancy!

While you are looking at the Query.toString() behavior with respect to 
boosting, how would you feel about adding MatchAllDocsQuery.class to 
BoostQuery.NO_PARENS_REQUIRED_QUERIES so it's toString() doesn't change across 
releases?

{noformat}
Query q = new MatchAllDocsQuery();
q.setBoost(0);
q.toString() -> *:*^0.0

new BoostQuery(new MatchAllDocsQuery(), 0).toString() -> (*:*)^0.0
{noformat}


was (Author: shebiki):
Cheers Adrien. Sorry for the spammy replies before -- I wasn't expecting to see 
more than one discrepancy!

While you are looking at the Query.toString() behavior with respect to 
boosting, how would you feel about adding MatchAllDocsQuery.class to 
BoostQuery.NO_PARENS_REQUIRED_QUERIES so it's toString() doesn't change across 
releases?

Query q = new MatchAllDocsQuery();
q.setBoost(0);
q.toString() -> *:*^0.0

new BoostQuery(new MatchAllDocsQuery(), 0).toString() -> (*:*)^0.0


> Explore different ways to apply boosts
> --
>
> Key: LUCENE-6590
> URL: https://issues.apache.org/jira/browse/LUCENE-6590
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.4
>
> Attachments: LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch, 
> LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch
>
>
> Follow-up from LUCENE-6570: the fact that all queries are mutable in order to 
> allow for applying a boost raises issues since it makes queries bad cache 
> keys since their hashcode can change anytime. We could just document that 
> queries should never be modified after they have gone through IndexSearcher 
> but it would be even better if the API made queries impossible to mutate at 
> all.
> I think there are two main options:
>  - either replace "void setBoost(boost)" with something like "Query 
> withBoost(boost)" which would return a clone that has a different boost
>  - or move boost handling outside of Query, for instance we could have a 
> (immutable) query impl that would be dedicated to applying boosts, that 
> queries that need to change boosts at rewrite time (such as BooleanQuery) 
> would use as a wrapper.
> The latter idea is from Robert and I like it a lot given how often I either 
> introduced or found a bug which was due to the boost parameter being ignored. 
> Maybe there are other options, but I think this is worth exploring.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6699) Integrate lat/lon BKD and spatial3d

2015-09-24 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906465#comment-14906465
 ] 

Terry Smith commented on LUCENE-6699:
-

Thanks guys. I was hoping to squeeze those x,y,z values into a 64 bits instead 
of 96. I'm not a bit twiddler but I'll take a look at Nicholas' patch and see 
if I can adapt it.



> Integrate lat/lon BKD and spatial3d
> ---
>
> Key: LUCENE-6699
> URL: https://issues.apache.org/jira/browse/LUCENE-6699
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: Trunk, 5.4
>
> Attachments: Geo3DPacking.java, LUCENE-6699.patch, LUCENE-6699.patch, 
> LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, 
> LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, 
> LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, 
> LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, 
> LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, 
> LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch
>
>
> I'm opening this for discussion, because I'm not yet sure how to do
> this integration, because of my ignorance about spatial in general and
> spatial3d in particular :)
> Our BKD tree impl is very fast at doing lat/lon shape intersection
> (bbox, polygon, soon distance: LUCENE-6698) against previously indexed
> points.
> I think to integrate with spatial3d, we would first need to record
> lat/lon/z into doc values.  Somewhere I saw discussion about how we
> could stuff all 3 into a single long value with acceptable precision
> loss?  Or, we could use BinaryDocValues?  We need all 3 dims available
> to do the fast per-hit query time filtering.
> But, second: what do we index into the BKD tree?  Can we "just" index
> earth surface lat/lon, and then at query time is spatial3d able to
> give me an enclosing "surface lat/lon" bbox for a 3d shape?  Or
> ... must we index all 3 dimensions into the BKD tree (seems like this
> could be somewhat wasteful)?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6815) Should DisjunctionScorer advance more lazily?

2015-09-24 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906401#comment-14906401
 ] 

Terry Smith commented on LUCENE-6815:
-

Additionally, I don't see DisiPriorityQueue taking the cost of each scorer into 
account. I'd imagine that the scorer with highest cost is more likely to be a 
hit which would make this kind of lazy advancing even better.


> Should DisjunctionScorer advance more lazily?
> -
>
> Key: LUCENE-6815
> URL: https://issues.apache.org/jira/browse/LUCENE-6815
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
>
> Today if you call DisjunctionScorer.advance(X), it will try to advance all 
> sub scorers to X. However, if DisjunctionScorer is being intersected with 
> another scorer (which is almost always the case as we use BooleanScorer for 
> top-level disjunctions), we could stop as soon as we find one matching sub 
> scorer, and only advance the remaining sub scorers when freq() or score() is 
> called. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6699) Integrate lat/lon BKD and spatial3d

2015-09-23 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14904650#comment-14904650
 ] 

Terry Smith commented on LUCENE-6699:
-

Karl, were you able to find that packing scheme? I'm interested in poking the 
x,y,z values into a SortedNumericDocValuesField to see how well it would 
perform.


> Integrate lat/lon BKD and spatial3d
> ---
>
> Key: LUCENE-6699
> URL: https://issues.apache.org/jira/browse/LUCENE-6699
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: Trunk, 5.4
>
> Attachments: Geo3DPacking.java, LUCENE-6699.patch, LUCENE-6699.patch, 
> LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, 
> LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, 
> LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, 
> LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, 
> LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, 
> LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch
>
>
> I'm opening this for discussion, because I'm not yet sure how to do
> this integration, because of my ignorance about spatial in general and
> spatial3d in particular :)
> Our BKD tree impl is very fast at doing lat/lon shape intersection
> (bbox, polygon, soon distance: LUCENE-6698) against previously indexed
> points.
> I think to integrate with spatial3d, we would first need to record
> lat/lon/z into doc values.  Somewhere I saw discussion about how we
> could stuff all 3 into a single long value with acceptable precision
> loss?  Or, we could use BinaryDocValues?  We need all 3 dims available
> to do the fast per-hit query time filtering.
> But, second: what do we index into the BKD tree?  Can we "just" index
> earth surface lat/lon, and then at query time is spatial3d able to
> give me an enclosing "surface lat/lon" bbox for a 3d shape?  Or
> ... must we index all 3 dimensions into the BKD tree (seems like this
> could be somewhat wasteful)?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6590) Explore different ways to apply boosts

2015-09-15 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745623#comment-14745623
 ] 

Terry Smith commented on LUCENE-6590:
-

[~jpountz]: PhraseQuery is missing a call to ToStringUtils.boost in it's 
toString method on the 5.x branch.


> Explore different ways to apply boosts
> --
>
> Key: LUCENE-6590
> URL: https://issues.apache.org/jira/browse/LUCENE-6590
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.4
>
> Attachments: LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch, 
> LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch
>
>
> Follow-up from LUCENE-6570: the fact that all queries are mutable in order to 
> allow for applying a boost raises issues since it makes queries bad cache 
> keys since their hashcode can change anytime. We could just document that 
> queries should never be modified after they have gone through IndexSearcher 
> but it would be even better if the API made queries impossible to mutate at 
> all.
> I think there are two main options:
>  - either replace "void setBoost(boost)" with something like "Query 
> withBoost(boost)" which would return a clone that has a different boost
>  - or move boost handling outside of Query, for instance we could have a 
> (immutable) query impl that would be dedicated to applying boosts, that 
> queries that need to change boosts at rewrite time (such as BooleanQuery) 
> would use as a wrapper.
> The latter idea is from Robert and I like it a lot given how often I either 
> introduced or found a bug which was due to the boost parameter being ignored. 
> Maybe there are other options, but I think this is worth exploring.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6590) Explore different ways to apply boosts

2015-09-15 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745675#comment-14745675
 ] 

Terry Smith commented on LUCENE-6590:
-

Also FunctionQuery.


 



> Explore different ways to apply boosts
> --
>
> Key: LUCENE-6590
> URL: https://issues.apache.org/jira/browse/LUCENE-6590
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.4
>
> Attachments: LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch, 
> LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch
>
>
> Follow-up from LUCENE-6570: the fact that all queries are mutable in order to 
> allow for applying a boost raises issues since it makes queries bad cache 
> keys since their hashcode can change anytime. We could just document that 
> queries should never be modified after they have gone through IndexSearcher 
> but it would be even better if the API made queries impossible to mutate at 
> all.
> I think there are two main options:
>  - either replace "void setBoost(boost)" with something like "Query 
> withBoost(boost)" which would return a clone that has a different boost
>  - or move boost handling outside of Query, for instance we could have a 
> (immutable) query impl that would be dedicated to applying boosts, that 
> queries that need to change boosts at rewrite time (such as BooleanQuery) 
> would use as a wrapper.
> The latter idea is from Robert and I like it a lot given how often I either 
> introduced or found a bug which was due to the boost parameter being ignored. 
> Maybe there are other options, but I think this is worth exploring.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6590) Explore different ways to apply boosts

2015-09-15 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745653#comment-14745653
 ] 

Terry Smith commented on LUCENE-6590:
-

Hmm, so is NumericRangeQuery.

> Explore different ways to apply boosts
> --
>
> Key: LUCENE-6590
> URL: https://issues.apache.org/jira/browse/LUCENE-6590
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.4
>
> Attachments: LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch, 
> LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch
>
>
> Follow-up from LUCENE-6570: the fact that all queries are mutable in order to 
> allow for applying a boost raises issues since it makes queries bad cache 
> keys since their hashcode can change anytime. We could just document that 
> queries should never be modified after they have gone through IndexSearcher 
> but it would be even better if the API made queries impossible to mutate at 
> all.
> I think there are two main options:
>  - either replace "void setBoost(boost)" with something like "Query 
> withBoost(boost)" which would return a clone that has a different boost
>  - or move boost handling outside of Query, for instance we could have a 
> (immutable) query impl that would be dedicated to applying boosts, that 
> queries that need to change boosts at rewrite time (such as BooleanQuery) 
> would use as a wrapper.
> The latter idea is from Robert and I like it a lot given how often I either 
> introduced or found a bug which was due to the boost parameter being ignored. 
> Maybe there are other options, but I think this is worth exploring.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-6806) FunctionQuery.AllScorer.explain overwrites FunctionWeight.queryNorm in trappy fashion

2015-09-15 Thread Terry Smith (JIRA)
Terry Smith created LUCENE-6806:
---

 Summary: FunctionQuery.AllScorer.explain overwrites 
FunctionWeight.queryNorm in trappy fashion
 Key: LUCENE-6806
 URL: https://issues.apache.org/jira/browse/LUCENE-6806
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: Trunk
Reporter: Terry Smith
Priority: Minor


FunctionQuery.AllScorer.explain is:

{code:java}
public Explanation explain(int doc, float queryNorm) throws IOException {
  float sc = qWeight * vals.floatVal(doc);

  return Explanation.match(sc, "FunctionQuery(" + func + "), product of:",
  vals.explain(doc),
  Explanation.match(queryNorm, "boost"),
  Explanation.match(weight.queryNorm = 1f, "queryNorm"));
}
{code}

The following line has a subtle assignment that overwrites weight.queryNorm.

{code:java}
  Explanation.match(weight.queryNorm = 1f, "queryNorm"));
{code}

Because weights aren't reused between search and explain this doesn't break 
anything but it's awfully subtle.

Seeing as queryNorm is ALWAYS 1 here, could we just drop this extra line from 
the explain output and use the following instead?

{code:java}
public Explanation explain(int doc, float queryNorm) throws IOException {
  float sc = qWeight * vals.floatVal(doc);

  return Explanation.match(sc, "FunctionQuery(" + func + "), product of:",
  vals.explain(doc),
  Explanation.match(queryNorm, "boost"));
}
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6785) Consider merging Query.rewrite() into Query.createWeight()

2015-09-09 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736932#comment-14736932
 ] 

Terry Smith commented on LUCENE-6785:
-

The original patch drops a few key settings from the BooleanQuery in 
BQ.createWeight, the following patch puts them back and makes the tests happier.

{noformat}
diff --git a/lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java 
b/lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java
index fb5f7c8..8dec338 100644
--- a/lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java
+++ b/lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java
@@ -210,7 +210,9 @@ public class BooleanQuery extends Query implements 
Iterable {
 }
 
 List subweights = new ArrayList<>();
-Builder builder = new Builder();
+Builder builder = new Builder()
+  .setDisableCoord(disableCoord)
+  .setMinimumNumberShouldMatch(minimumNumberShouldMatch);
 for (BooleanClause clause : query) {
   Weight w = searcher.createWeight(clause.getQuery(), needsScores);
   subweights.add(w);
{noformat}


> Consider merging Query.rewrite() into Query.createWeight()
> --
>
> Key: LUCENE-6785
> URL: https://issues.apache.org/jira/browse/LUCENE-6785
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
> Attachments: LUCENE-6785.patch
>
>
> Prompted by the discussion on LUCENE-6590.
> Query.rewrite() is a bit of an oddity.  You call it to create a query for a 
> specific IndexSearcher, and to ensure that you get a query implementation 
> that has a working createWeight() method.  However, Weight itself already 
> encapsulates the notion of a per-searcher query.
> You also need to repeatedly call rewrite() until the query has stopped 
> rewriting itself, which is a bit trappy - there are a few places (in 
> highlighting code for example) that just call rewrite() once, rather than 
> looping round as IndexSearcher.rewrite() does.  Most queries don't need to be 
> called multiple times, however, so this seems a bit redundant.  And the ones 
> that do currently return un-rewritten queries can be changed simply enough to 
> rewrite them.
> Finally, in pretty much every case I can find in the codebase, rewrite() is 
> called purely as a prelude to createWeight().  This means, in the case of for 
> example large BooleanQueries, we end up cloning the whole query structure, 
> only to throw it away immediately.
> I'd like to try removing rewrite() entirely, and merging the logic into 
> createWeight(), simplifying the API and removing the trap where code only 
> calls rewrite once.  What do people think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6787) BooleanQuery should be able to drop duplicate non-scoring clauses

2015-09-09 Thread Terry Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Terry Smith updated LUCENE-6787:

Attachment: LUCENE-6787.patch

Absolutely, updated patch attached.


> BooleanQuery should be able to drop duplicate non-scoring clauses
> -
>
> Key: LUCENE-6787
> URL: https://issues.apache.org/jira/browse/LUCENE-6787
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: Trunk
>    Reporter: Terry Smith
>Priority: Minor
> Attachments: LUCENE-6787.patch, LUCENE-6787.patch
>
>
> Pulling out of the discussion on LUCENE-6305.
> BooleanQuery could drop duplicate non-scoring (MUST_NOT, FILTER) clauses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6787) BooleanQuery should be able to drop duplicate non-scoring clauses

2015-09-09 Thread Terry Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Terry Smith updated LUCENE-6787:

Attachment: LUCENE-6787-on-6785.patch

Here is an alternate patch applied after LUCENE-6785.


> BooleanQuery should be able to drop duplicate non-scoring clauses
> -
>
> Key: LUCENE-6787
> URL: https://issues.apache.org/jira/browse/LUCENE-6787
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: Trunk
>    Reporter: Terry Smith
>Priority: Minor
> Attachments: LUCENE-6787-on-6785.patch, LUCENE-6787.patch, 
> LUCENE-6787.patch
>
>
> Pulling out of the discussion on LUCENE-6305.
> BooleanQuery could drop duplicate non-scoring (MUST_NOT, FILTER) clauses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-6787) BooleanQuery should be able to drop duplicate non-scoring clauses

2015-09-08 Thread Terry Smith (JIRA)
Terry Smith created LUCENE-6787:
---

 Summary: BooleanQuery should be able to drop duplicate non-scoring 
clauses
 Key: LUCENE-6787
 URL: https://issues.apache.org/jira/browse/LUCENE-6787
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: Trunk
Reporter: Terry Smith
Priority: Minor


Pulling out of the discussion on LUCENE-6305.

BooleanQuery could drop duplicate non-scoring (MUST_NOT, FILTER) clauses.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6679) Filter's Weight.explain returns an explanation with isMatch==true even on documents that don't match

2015-09-08 Thread Terry Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Terry Smith updated LUCENE-6679:

Attachment: LUCENE-6679.patch

Here is a patch (against trunk) that adds test coverage for explanations on 
hits only.

I'm looking for feedback to the approach used before expanding to cover 
explanations for misses.

Currently I get a couple of failures when running just the Lucene tests:

{noformat}
Tests with failures:
  - org.apache.lucene.search.TestSortRandom.testRandomStringValSort
  - org.apache.lucene.search.TestSortRandom.testRandomStringSort


JVM J0: 1.42 ..   284.75 =   283.33s
JVM J1: 1.64 ..   284.77 =   283.13s
JVM J2: 1.42 ..   284.70 =   283.28s
JVM J3: 1.42 ..   284.68 =   283.26s
Execution time total: 4 minutes 44 seconds
Tests summary: 404 suites, 3235 tests, 2 failures, 104 ignored (100 assumptions)
{noformat}

Happy to dig into these more once an approach has been found that people like.


> Filter's Weight.explain returns an explanation with isMatch==true even on 
> documents that don't match
> 
>
> Key: LUCENE-6679
> URL: https://issues.apache.org/jira/browse/LUCENE-6679
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
> Attachments: LUCENE-6679.patch
>
>
> This was reported by Trejkaz on the java-user list: 
> http://search-lucene.com/m/l6pAi19h4Y3DclgB1=Re+What+on+earth+is+FilteredQuery+explain+doing+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6787) BooleanQuery should be able to drop duplicate non-scoring clauses

2015-09-08 Thread Terry Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Terry Smith updated LUCENE-6787:

Attachment: LUCENE-6787.patch

Here is a patch based on [~jpountz]'s suggestion of putting this optimization 
in BooleanQuery.rewrite().


> BooleanQuery should be able to drop duplicate non-scoring clauses
> -
>
> Key: LUCENE-6787
> URL: https://issues.apache.org/jira/browse/LUCENE-6787
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: Trunk
>    Reporter: Terry Smith
>Priority: Minor
> Attachments: LUCENE-6787.patch
>
>
> Pulling out of the discussion on LUCENE-6305.
> BooleanQuery could drop duplicate non-scoring (MUST_NOT, FILTER) clauses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6758) Adding a SHOULD clause to a BQ over an empty field clears the score when using DefaultSimilarity

2015-09-08 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734796#comment-14734796
 ] 

Terry Smith commented on LUCENE-6758:
-

Ah, you've changed DefaultSimilarity.idf() to use (docCount + 1) instead of 
just docCount forcing it to be larger than 0.

That looks like a great fix, thanks.


> Adding a SHOULD clause to a BQ over an empty field clears the score when 
> using DefaultSimilarity
> 
>
> Key: LUCENE-6758
> URL: https://issues.apache.org/jira/browse/LUCENE-6758
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: Trunk
>    Reporter: Terry Smith
> Attachments: LUCENE-6758.patch, LUCENE-6758.patch
>
>
> Patch with unit test to show the bug will be attached.
> I've narrowed this change in behavior with git bisect to the following commit:
> {noformat}
> commit 698b4b56f0f2463b21c9e3bc67b8b47d635b7d1f
> Author: Robert Muir <rm...@apache.org>
> Date:   Thu Aug 13 17:37:15 2015 +
> LUCENE-6711: Use CollectionStatistics.docCount() for IDF and average 
> field length computations
> 
> git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1695744 
> 13f79535-47bb-0310-9956-ffa450edef68
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6758) Adding a SHOULD clause to a BQ over an empty field clears the score when using DefaultSimilarity

2015-08-21 Thread Terry Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Terry Smith updated LUCENE-6758:

Attachment: LUCENE-6758.patch

Run this unit test a few times and you'll hit a failure when DefaultSimilarity 
is picked.

The method testBQHitOrEmpty() will fail because the score is zero. It's friend 
testBQHitOrMiss() has a non-zero score.

The difference between the two is that the field empty is unused, whereas the 
field test has one token (hit).


 Adding a SHOULD clause to a BQ over an empty field clears the score when 
 using DefaultSimilarity
 

 Key: LUCENE-6758
 URL: https://issues.apache.org/jira/browse/LUCENE-6758
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: Trunk
Reporter: Terry Smith
 Attachments: LUCENE-6758.patch


 Patch with unit test to show the bug will be attached.
 I've narrowed this change in behavior with git bisect to the following commit:
 {noformat}
 commit 698b4b56f0f2463b21c9e3bc67b8b47d635b7d1f
 Author: Robert Muir rm...@apache.org
 Date:   Thu Aug 13 17:37:15 2015 +
 LUCENE-6711: Use CollectionStatistics.docCount() for IDF and average 
 field length computations
 
 git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1695744 
 13f79535-47bb-0310-9956-ffa450edef68
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6758) Adding a SHOULD clause to a BQ over an empty field clears the score when using DefaultSimilarity

2015-08-21 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706963#comment-14706963
 ] 

Terry Smith commented on LUCENE-6758:
-

Explain output for the failing query (testBQHitOrEmpty):

{noformat}
0.0 = product of:
  0.0 = sum of:
0.0 = weight(test:hit in 0) [DefaultSimilarity], result of:
  0.0 = score(doc=0,freq=1.0), product of:
0.0 = queryWeight, product of:
  0.30685282 = idf(docFreq=1, docCount=1)
  0.0 = queryNorm
0.30685282 = fieldWeight in 0, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
  0.30685282 = idf(docFreq=1, docCount=1)
  1.0 = fieldNorm(doc=0)
  0.5 = coord(1/2)
{noformat}

Explain output for the variant against a populated field  (testBQHitOrMiss):

{noformat}
0.04500804 = product of:
  0.09001608 = sum of:
0.09001608 = weight(test:hit in 0) [DefaultSimilarity], result of:
  0.09001608 = score(doc=0,freq=1.0), product of:
0.29335263 = queryWeight, product of:
  0.30685282 = idf(docFreq=1, docCount=1)
  0.9560043 = queryNorm
0.30685282 = fieldWeight in 0, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
  0.30685282 = idf(docFreq=1, docCount=1)
  1.0 = fieldNorm(doc=0)
  0.5 = coord(1/2)
{noformat}



 Adding a SHOULD clause to a BQ over an empty field clears the score when 
 using DefaultSimilarity
 

 Key: LUCENE-6758
 URL: https://issues.apache.org/jira/browse/LUCENE-6758
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: Trunk
Reporter: Terry Smith
 Attachments: LUCENE-6758.patch


 Patch with unit test to show the bug will be attached.
 I've narrowed this change in behavior with git bisect to the following commit:
 {noformat}
 commit 698b4b56f0f2463b21c9e3bc67b8b47d635b7d1f
 Author: Robert Muir rm...@apache.org
 Date:   Thu Aug 13 17:37:15 2015 +
 LUCENE-6711: Use CollectionStatistics.docCount() for IDF and average 
 field length computations
 
 git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1695744 
 13f79535-47bb-0310-9956-ffa450edef68
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-6758) Adding a SHOULD clause to a BQ over an empty field clears the score when using DefaultSimilarity

2015-08-21 Thread Terry Smith (JIRA)
Terry Smith created LUCENE-6758:
---

 Summary: Adding a SHOULD clause to a BQ over an empty field clears 
the score when using DefaultSimilarity
 Key: LUCENE-6758
 URL: https://issues.apache.org/jira/browse/LUCENE-6758
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: Trunk
Reporter: Terry Smith


Patch with unit test to show the bug will be attached.

I've narrowed this change in behavior with git bisect to the following commit:

{noformat}
commit 698b4b56f0f2463b21c9e3bc67b8b47d635b7d1f
Author: Robert Muir rm...@apache.org
Date:   Thu Aug 13 17:37:15 2015 +

LUCENE-6711: Use CollectionStatistics.docCount() for IDF and average field 
length computations

git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1695744 
13f79535-47bb-0310-9956-ffa450edef68
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6748) The query cache should not cache trivial queries

2015-08-20 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705045#comment-14705045
 ] 

Terry Smith commented on LUCENE-6748:
-

I'd add a case to the patch to include empty DisjunctionMaxQuery instances also.


 The query cache should not cache trivial queries
 

 Key: LUCENE-6748
 URL: https://issues.apache.org/jira/browse/LUCENE-6748
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-6748.patch


 The query cache already avoids caching term queries because they are cheap, 
 but it doesn't do it with even cheaper queries like MatchAllDocsQuery.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6531) Make PhraseQuery immutable

2015-07-30 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647690#comment-14647690
 ] 

Terry Smith commented on LUCENE-6531:
-

[~jpountz] The PhraseQuery.Builder setter methods are all void, where as the 
ones for BooleanQuery and BlendedTermQuery return the Builder itself.

Can the set/add methods on PhraseQuery.Builder return this to make the various 
Query builders consistent with each other?


 Make PhraseQuery immutable
 --

 Key: LUCENE-6531
 URL: https://issues.apache.org/jira/browse/LUCENE-6531
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Fix For: 5.3, 6.0

 Attachments: LUCENE-6531.patch, LUCENE-6531.patch


 Mutable queries are an issue for automatic filter caching since modifying a 
 query after it has been put into the cache will corrupt the cache. We should 
 make all queries immutable (up to the boost) to avoid this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6531) Make PhraseQuery immutable

2015-07-30 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647701#comment-14647701
 ] 

Terry Smith commented on LUCENE-6531:
-

Awesome, you rock!


 Make PhraseQuery immutable
 --

 Key: LUCENE-6531
 URL: https://issues.apache.org/jira/browse/LUCENE-6531
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Fix For: 5.3, 6.0

 Attachments: LUCENE-6531.patch, LUCENE-6531.patch


 Mutable queries are an issue for automatic filter caching since modifying a 
 query after it has been put into the cache will corrupt the cache. We should 
 make all queries immutable (up to the boost) to avoid this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6590) Explore different ways to apply boosts

2015-07-23 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639166#comment-14639166
 ] 

Terry Smith commented on LUCENE-6590:
-

I think this looks great and will certainly make the boost handling more robust 
in my custom queries. Especially looking forward to fully immutable queries.

What do you think is possible in terms of updating 5.x to make the transition 
easier?


 Explore different ways to apply boosts
 --

 Key: LUCENE-6590
 URL: https://issues.apache.org/jira/browse/LUCENE-6590
 Project: Lucene - Core
  Issue Type: Wish
Reporter: Adrien Grand
Priority: Minor
 Attachments: LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch


 Follow-up from LUCENE-6570: the fact that all queries are mutable in order to 
 allow for applying a boost raises issues since it makes queries bad cache 
 keys since their hashcode can change anytime. We could just document that 
 queries should never be modified after they have gone through IndexSearcher 
 but it would be even better if the API made queries impossible to mutate at 
 all.
 I think there are two main options:
  - either replace void setBoost(boost) with something like Query 
 withBoost(boost) which would return a clone that has a different boost
  - or move boost handling outside of Query, for instance we could have a 
 (immutable) query impl that would be dedicated to applying boosts, that 
 queries that need to change boosts at rewrite time (such as BooleanQuery) 
 would use as a wrapper.
 The latter idea is from Robert and I like it a lot given how often I either 
 introduced or found a bug which was due to the boost parameter being ignored. 
 Maybe there are other options, but I think this is worth exploring.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6679) Filter's Weight.explain returns an explanation with isMatch==true even on documents that don't match

2015-07-16 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629956#comment-14629956
 ] 

Terry Smith commented on LUCENE-6679:
-

Trejkaz confirmed the patch referenced from the mailing list works.

This bug is fixed as a side effect of LUCENE-6601 so will automatically be 
fixed as part of release 5.3.

I'll work on cleaning up the new test contributed by Trejkaz for inclusion and 
then move onto a more generic hook to catch other explanation mistakes.



 Filter's Weight.explain returns an explanation with isMatch==true even on 
 documents that don't match
 

 Key: LUCENE-6679
 URL: https://issues.apache.org/jira/browse/LUCENE-6679
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Adrien Grand

 This was reported by Trejkaz on the java-user list: 
 http://search-lucene.com/m/l6pAi19h4Y3DclgB1subj=Re+What+on+earth+is+FilteredQuery+explain+doing+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6679) Filter's Weight.explan returns an explanation with isMatch==true even on documents that don't match

2015-07-15 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628153#comment-14628153
 ] 

Terry Smith commented on LUCENE-6679:
-

[~jpountz] Absolutely, I'd love to give it a stab. Currently waiting for 
feedback from TX on the users list.

I think you are spot on about adding some additional testing to the test suite 
to catch explanation mismatches. I'll take a peek at that also and see if I can 
figure out something worth submitting.


 Filter's Weight.explan returns an explanation with isMatch==true even on 
 documents that don't match
 ---

 Key: LUCENE-6679
 URL: https://issues.apache.org/jira/browse/LUCENE-6679
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Adrien Grand

 This was reported by Trejkaz on the java-user list: 
 http://search-lucene.com/m/l6pAi19h4Y3DclgB1subj=Re+What+on+earth+is+FilteredQuery+explain+doing+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6661) Allow queries to opt out of caching

2015-07-10 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622271#comment-14622271
 ] 

Terry Smith commented on LUCENE-6661:
-

I agree that we shouldn't base API's off of already hacking solutions.

I'm going to play with your suggestion a little more and see how it pans out 
for my usecases, will report back.

The ring buffer frequency for non-cacheable queries issue is interesting. If in 
some obscure but easy to understand scenario half of my queries are good cache 
candidates but the other half are never to be cached (using the 
Weight.getQuery() equals busting method) then the ring buffer will be a lot 
less effective at finding new cache candidates purely based on the churn of 
never-to-be-cached queries. Still, I can see why that might also be a good 
thing, it all depends on your definition of frequently used.

Where would be the best place to expand this discussion to include score based 
caching? A new Jira, one of the mailing lists?




 Allow queries to opt out of caching
 ---

 Key: LUCENE-6661
 URL: https://issues.apache.org/jira/browse/LUCENE-6661
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 5.2
Reporter: Terry Smith
Priority: Minor
 Attachments: LUCENE-6661.patch


 Some queries have out-of-band dependencies that make them incompatible with 
 caching, it'd be great if they could opt out of the new fancy query/filter 
 cache in IndexSearcher.
 This affects DrillSidewaysQuery and any user-provided custom Query 
 implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6661) Allow queries to opt out of caching

2015-07-07 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616723#comment-14616723
 ] 

Terry Smith commented on LUCENE-6661:
-

I'd completely missed the issue with marker interfaces, this really ought to be 
a method on Weight itself, perhaps Weight.cacheCompatible().

You suggested workaround sounds a little special casey. I'd be concerned that a 
future release something would change in such a way that the workaround would 
be lost with no alternative. Specifically, it relies on the cache 
implementation tracking usage when the cache itself is pluggable (it could be 
replaced with one that does not) and when LRUQueryCache itself in play I see 
the following issues:

1) The queries that we know ahead of time should never be cached would still 
take up room in the ring buffer and thus push aside other less frequent queries 
that could be great cache candidates.

2)  Special care would want to be taken over the Query instances used in the 
ring buffer and cache so that things like dependent FacetCollectors don't get 
added and bloat memory usage. You described earlier how to handle this from 
createWeight().

3) CachingWrapperWeight forces the cached query to use scorer() instead of 
bulkScorer(). Both my custom query and DrillSidewaysQuery implement a custom 
bulkScorer() method and throw an UnsupportedOperationException from scorer(). 
They break when wrapped in a CachingWraperWeight. The ability to opt of of 
caching would remove the need for the hacky workaround in DrillSideways.

My current solution is a custom QueryCache implementation that just delegates 
to the LRUQueryCache and does not propagate doCache() for some Weights. 
However, this has the same problem with wrapped queries as the marker interface 
scenario.





 Allow queries to opt out of caching
 ---

 Key: LUCENE-6661
 URL: https://issues.apache.org/jira/browse/LUCENE-6661
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 5.2
Reporter: Terry Smith
Priority: Minor
 Attachments: LUCENE-6661.patch


 Some queries have out-of-band dependencies that make them incompatible with 
 caching, it'd be great if they could opt out of the new fancy query/filter 
 cache in IndexSearcher.
 This affects DrillSidewaysQuery and any user-provided custom Query 
 implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-6661) Allow queries to opt out of caching

2015-07-06 Thread Terry Smith (JIRA)
Terry Smith created LUCENE-6661:
---

 Summary: Allow queries to opt out of caching
 Key: LUCENE-6661
 URL: https://issues.apache.org/jira/browse/LUCENE-6661
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 5.2
Reporter: Terry Smith
Priority: Minor


Some queries have out-of-band dependencies that make them incompatible with 
caching, it'd be great if they could opt out of the new fancy query/filter 
cache in IndexSearcher.

This affects DrillSidewaysQuery and any user-provided custom Query 
implementations.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6661) Allow queries to opt out of caching

2015-07-06 Thread Terry Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Terry Smith updated LUCENE-6661:

Attachment: LUCENE-6661.patch

Rather than add a new method to Query/Weight this feature I've added a small 
marker interface and an instanceof check to prototype this feature.

If this is of interest we should decide whether Query, Weight, or both could 
implement this interface to disable caching.


 Allow queries to opt out of caching
 ---

 Key: LUCENE-6661
 URL: https://issues.apache.org/jira/browse/LUCENE-6661
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 5.2
Reporter: Terry Smith
Priority: Minor
 Attachments: LUCENE-6661.patch


 Some queries have out-of-band dependencies that make them incompatible with 
 caching, it'd be great if they could opt out of the new fancy query/filter 
 cache in IndexSearcher.
 This affects DrillSidewaysQuery and any user-provided custom Query 
 implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6639) LRUQueryCache.CachingWrapperWeight not calling policy.onUse() if the first scorer is skipped

2015-07-06 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615087#comment-14615087
 ] 

Terry Smith commented on LUCENE-6639:
-

Ah, I didn't realize the highlighters were creating the weights to extract the 
terms, that makes sense.

I like the idea of just calling onUse() the first time scorer() is called, that 
ought to be more robust and is very easy to understand.


 LRUQueryCache.CachingWrapperWeight not calling policy.onUse() if the first 
 scorer is skipped
 

 Key: LUCENE-6639
 URL: https://issues.apache.org/jira/browse/LUCENE-6639
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 5.3
Reporter: Terry Smith
Priority: Minor
 Attachments: LUCENE-6639.patch


 The method 
 {{org.apache.lucene.search.LRUQueryCache.CachingWrapperWeight.scorer(LeafReaderContext)}}
  starts with
 {code}
 if (context.ord == 0) {
 policy.onUse(getQuery());
 }
 {code}
 which can result in a missed call for queries that return a null scorer for 
 the first segment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6639) LRUQueryCache.CachingWrapperWeight not calling policy.onUse() if the first scorer is skipped

2015-06-30 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608281#comment-14608281
 ] 

Terry Smith commented on LUCENE-6639:
-

This doesn't seem pressing but irked me enough to submit a ticket. It feels 
that we should be able to be more correct but the current API isn't very 
supportive of that work flow.

I slightly prefer calling onUse() from createWeight() as it does make this edge 
case of the first segment go away which I feel is harder to reason about than 
someone creating a weight and not using it. The improved multi-threaded search 
code in IndexSearcher is a great example of this misbehaving where there is no 
guarantee that the first segment's Weight.scorer() will be called before the 
other segments. However I'm not familiar with use cases that use 
Query.createWeight() without executing some kind of search or explain to know 
if they are more of an issue.

Is adding bookend methods to more correctly detect the begin/end of the search 
phase seen as too messy and special casey?

At the end of the day I also wonder if it's worth the complexity but wanted to 
open this ticket to bootstrap the discussion as this could be a hard problem to 
diagnose in the future (someone wants to know why their query isn't getting 
cached and it's due to some obscure detail like this).





 LRUQueryCache.CachingWrapperWeight not calling policy.onUse() if the first 
 scorer is skipped
 

 Key: LUCENE-6639
 URL: https://issues.apache.org/jira/browse/LUCENE-6639
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 5.3
Reporter: Terry Smith
Priority: Minor
 Attachments: LUCENE-6639.patch


 The method 
 {{org.apache.lucene.search.LRUQueryCache.CachingWrapperWeight.scorer(LeafReaderContext)}}
  starts with
 {code}
 if (context.ord == 0) {
 policy.onUse(getQuery());
 }
 {code}
 which can result in a missed call for queries that return a null scorer for 
 the first segment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6639) LRUQueryCache.CachingWrapperWeight not calling policy.onUse() if the first scorer is skipped

2015-06-29 Thread Terry Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Terry Smith updated LUCENE-6639:

Attachment: LUCENE-6639.patch

Attached unit test will fail if the extra IndexWriter.commit() gets triggered 
or the BooleanQuery clauses are shuffled to make the first clauses' scorer null 
for the first segment.


 LRUQueryCache.CachingWrapperWeight not calling policy.onUse() if the first 
 scorer is skipped
 

 Key: LUCENE-6639
 URL: https://issues.apache.org/jira/browse/LUCENE-6639
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 5.3
Reporter: Terry Smith
Priority: Minor
 Attachments: LUCENE-6639.patch


 The method 
 {{org.apache.lucene.search.LRUQueryCache.CachingWrapperWeight.scorer(LeafReaderContext)}}
  starts with
 {code}
 if (context.ord == 0) {
 policy.onUse(getQuery());
 }
 {code}
 which can result in a missed call for queries that return a null scorer for 
 the first segment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-6639) LRUQueryCache.CachingWrapperWeight not calling policy.onUse() if the first scorer is skipped

2015-06-29 Thread Terry Smith (JIRA)
Terry Smith created LUCENE-6639:
---

 Summary: LRUQueryCache.CachingWrapperWeight not calling 
policy.onUse() if the first scorer is skipped
 Key: LUCENE-6639
 URL: https://issues.apache.org/jira/browse/LUCENE-6639
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 5.3
Reporter: Terry Smith
Priority: Minor


The method 
{{org.apache.lucene.search.LRUQueryCache.CachingWrapperWeight.scorer(LeafReaderContext)}}
 starts with

{code}
if (context.ord == 0) {
policy.onUse(getQuery());
}
{code}

which can result in a missed call for queries that return a null scorer for the 
first segment.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6305) BooleanQuery.equals should ignore clause order

2015-06-19 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593457#comment-14593457
 ] 

Terry Smith commented on LUCENE-6305:
-

Oops, read the patch too quickly and missed that key detail! Sorry for the 
noise.



 BooleanQuery.equals should ignore clause order
 --

 Key: LUCENE-6305
 URL: https://issues.apache.org/jira/browse/LUCENE-6305
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-6305.patch


 BooleanQuery.equals is sensible to the order in which clauses have been 
 added. So for instance +A +B would be considered different from +B +A 
 although it generates the same matches and scores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6305) BooleanQuery.equals should ignore clause order

2015-06-19 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593447#comment-14593447
 ] 

Terry Smith commented on LUCENE-6305:
-

Having BooleanQuery.equals() ignore order is a great idea but I think it'd be 
better if we could preserve the original clause order for Query.toString(), the 
Explanation, debugging and test expectations. 

Additionally, I've been burnt by JVM changes to String.hashCode() that cause 
HashMapString,? to order entries differently when run in a newer JVM. Are the 
Query hash codes immune to this problem?


 BooleanQuery.equals should ignore clause order
 --

 Key: LUCENE-6305
 URL: https://issues.apache.org/jira/browse/LUCENE-6305
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-6305.patch


 BooleanQuery.equals is sensible to the order in which clauses have been 
 added. So for instance +A +B would be considered different from +B +A 
 although it generates the same matches and scores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6305) BooleanQuery.equals should ignore clause order

2015-06-19 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593467#comment-14593467
 ] 

Terry Smith commented on LUCENE-6305:
-

Slightly off topic to your original goal, but what do you think about deduping 
repeated non scoring (FILTER, MUST_NOT) clauses from the list in the query or 
do you see that as an possible optimization when building the weights/scorers?




 BooleanQuery.equals should ignore clause order
 --

 Key: LUCENE-6305
 URL: https://issues.apache.org/jira/browse/LUCENE-6305
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-6305.patch


 BooleanQuery.equals is sensible to the order in which clauses have been 
 added. So for instance +A +B would be considered different from +B +A 
 although it generates the same matches and scores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6446) Simplify Explanation API

2015-04-22 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507111#comment-14507111
 ] 

Terry Smith commented on LUCENE-6446:
-

bq. I removed it because it was not always in the summary (only when using 
ComplexExplanation) as well as redundant with the description which is explicit 
when there is no match, for instance TermWeight's no matching term or 
BooleanWeight no match on required clause?

That makes sense. Removing the redundant information is definitely the way to 
go.


I also noticed that the new Explanation.noMatch() methods look a little trappy. 
They both take the child details and drop them on the floor.

{code}
  public static Explanation noMatch(String description, CollectionExplanation 
details) {
return new Explanation(false, 0f, description, Collections.emptyList());
  }
{code}

I think the noMatch() methods should either add the details to the created 
explanation or not accept them as parameters. Having a non-matching explanation 
contain child details can be really useful for complex queries. What do you 
think?



 Simplify Explanation API
 

 Key: LUCENE-6446
 URL: https://issues.apache.org/jira/browse/LUCENE-6446
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Fix For: Trunk, 5.2

 Attachments: LUCENE-6446.patch


 We should make this API easier to consume, for instance:
  - enforce important components to be non-null (eg. description)
  - decouple entirely the score computation from whether there is a match or 
 not (Explanation assumes there is a match if the score is  0, you need to 
 use ComplexExplanation to override this behaviour)
  - return an empty array instead of null when there are no details



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6446) Simplify Explanation API

2015-04-22 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507038#comment-14507038
 ] 

Terry Smith commented on LUCENE-6446:
-

The refactored Explanation looks great, however I see a couple of small issues 
worth raising.

1. The constructor is private and there is a protected toString(int depth) 
method, it doesn't look like anyone else is calling it and no-one can subclass 
it. Should this method be private?

2. The toString() output is different! ComplexExplanation had a slightly 
different getSummary() method:

{code}
return getValue() +  = 
  + (isMatch() ? (MATCH)  : (NON-MATCH) )
  + getDescription();
{code}

versus

{code}
return getValue() +  =  + getDescription();
{code}

I find this extra context invaluable, especially with the decoupling of score 
and match, we can't assume that a score of 0 is a NON-MATCH yet the output no 
longer tells is if an explanation is a MATCH or not.

I understand that I can roll my own string building code with the current API. 
It'd be great if the default output was as useful as possible.


 Simplify Explanation API
 

 Key: LUCENE-6446
 URL: https://issues.apache.org/jira/browse/LUCENE-6446
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Fix For: Trunk, 5.2

 Attachments: LUCENE-6446.patch


 We should make this API easier to consume, for instance:
  - enforce important components to be non-null (eg. description)
  - decouple entirely the score computation from whether there is a match or 
 not (Explanation assumes there is a match if the score is  0, you need to 
 use ComplexExplanation to override this behaviour)
  - return an empty array instead of null when there are no details



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-6385) NullPointerException from Highlighter.getBestFragment()

2015-04-01 Thread Terry Smith (JIRA)
Terry Smith created LUCENE-6385:
---

 Summary: NullPointerException from Highlighter.getBestFragment()
 Key: LUCENE-6385
 URL: https://issues.apache.org/jira/browse/LUCENE-6385
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/highlighter
Affects Versions: 5.1
Reporter: Terry Smith


When testing against the 5.1 nightly snapshots I've come across a 
NullPointerException in highlighting when nothing would be highlighted. This 
does not happen with 5.0.

{noformat}
java.lang.NullPointerException
at 
__randomizedtesting.SeedInfo.seed([3EDC6EB0FA552B34:9971866E394F5FD0]:0)
at 
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extractWeightedSpanTerms(WeightedSpanTermExtractor.java:311)
at 
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:151)
at 
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:515)
at 
org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:219)
at 
org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:187)
at 
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:196)
at 
org.apache.lucene.search.highlight.Highlighter.getBestFragments(Highlighter.java:156)
at 
org.apache.lucene.search.highlight.Highlighter.getBestFragment(Highlighter.java:102)
at 
org.apache.lucene.search.highlight.Highlighter.getBestFragment(Highlighter.java:80)
at 
org.apache.lucene.search.highlight.MissesTest.testPhraseQuery(MissesTest.java:50)
{noformat}

I've written a small unit test and used git bisect to narrow the regression to 
the following commit:

{noformat}
commit 24e4eefaefb1837d1d4fa35f7669c2b264f872ac
Author: Michael McCandless mikemcc...@apache.org
Date:   Tue Mar 31 08:48:28 2015 +

LUCENE-6308: cutover Spans to DISI, reuse ConjunctionDISI, use two-phased 
iteration

git-svn-id: 
https://svn.apache.org/repos/asf/lucene/dev/branches/branch_5x@1670273 
13f79535-47bb-0310-9956-ffa450edef68
{noformat}

The problem looks quite simple, 
WeightedSpanTermExtractor.extractWeightedSpanTerms() needs an early return if 
SpanQuery.getSpans() returns null. All other callers check against this.

Unit test and fix (against the regressed commit) attached.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6385) NullPointerException from Highlighter.getBestFragment()

2015-04-01 Thread Terry Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Terry Smith updated LUCENE-6385:

Attachment: LUCENE-6385.patch

 NullPointerException from Highlighter.getBestFragment()
 ---

 Key: LUCENE-6385
 URL: https://issues.apache.org/jira/browse/LUCENE-6385
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/highlighter
Affects Versions: 5.1
Reporter: Terry Smith
 Attachments: LUCENE-6385.patch


 When testing against the 5.1 nightly snapshots I've come across a 
 NullPointerException in highlighting when nothing would be highlighted. This 
 does not happen with 5.0.
 {noformat}
 java.lang.NullPointerException
   at 
 __randomizedtesting.SeedInfo.seed([3EDC6EB0FA552B34:9971866E394F5FD0]:0)
   at 
 org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extractWeightedSpanTerms(WeightedSpanTermExtractor.java:311)
   at 
 org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:151)
   at 
 org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:515)
   at 
 org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:219)
   at 
 org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:187)
   at 
 org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:196)
   at 
 org.apache.lucene.search.highlight.Highlighter.getBestFragments(Highlighter.java:156)
   at 
 org.apache.lucene.search.highlight.Highlighter.getBestFragment(Highlighter.java:102)
   at 
 org.apache.lucene.search.highlight.Highlighter.getBestFragment(Highlighter.java:80)
   at 
 org.apache.lucene.search.highlight.MissesTest.testPhraseQuery(MissesTest.java:50)
 {noformat}
 I've written a small unit test and used git bisect to narrow the regression 
 to the following commit:
 {noformat}
 commit 24e4eefaefb1837d1d4fa35f7669c2b264f872ac
 Author: Michael McCandless mikemcc...@apache.org
 Date:   Tue Mar 31 08:48:28 2015 +
 LUCENE-6308: cutover Spans to DISI, reuse ConjunctionDISI, use two-phased 
 iteration
 
 git-svn-id: 
 https://svn.apache.org/repos/asf/lucene/dev/branches/branch_5x@1670273 
 13f79535-47bb-0310-9956-ffa450edef68
 {noformat}
 The problem looks quite simple, 
 WeightedSpanTermExtractor.extractWeightedSpanTerms() needs an early return if 
 SpanQuery.getSpans() returns null. All other callers check against this.
 Unit test and fix (against the regressed commit) attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSS] Change Query API to make queries immutable in 6.0

2015-03-31 Thread Terry Smith
Adrien,

Thanks for the explanation. It seems a pity to make queries just nearly
immutable. Do you have any interest in adding a boost parameter to clone()
so they really could be immutable?

--Terry


On Tue, Mar 31, 2015 at 9:44 AM, Adrien Grand jpou...@gmail.com wrote:

 Hi Terry,

 Indeed this is for query rewriting. For instance if you have a boolean
 query with a boost of 5 that wraps a single MUST clause with a term
 query, then we rewrite to this to the inner term query and update its
 boost using clone() and setBoost() in order to not modify in-place a
 user-modified query.

 On Tue, Mar 31, 2015 at 3:37 PM, Terry Smith sheb...@gmail.com wrote:
  Adrien,
 
  I missed the reason that boost is going to stay mutable. Is this to
 support
  query rewriting?
 
  --Terry
 
 
  On Tue, Mar 31, 2015 at 7:21 AM, Robert Muir rcm...@gmail.com wrote:
 
  Same with BooleanQuery. the go-to ctor should just take 'clauses'
 
  On Tue, Mar 31, 2015 at 5:18 AM, Michael McCandless
  luc...@mikemccandless.com wrote:
   +1
  
   For PhraseQuery we could also have a common-case ctor that just takes
   the terms (and assumes sequential positions)?
  
   Mike McCandless
  
   http://blog.mikemccandless.com
  
  
   On Tue, Mar 31, 2015 at 5:10 AM, Adrien Grand jpou...@gmail.com
 wrote:
   Recent changes that added automatic filter caching to IndexSearcher
   uncovered some traps with our queries when it comes to using them as
   cache keys. The problem comes from the fact that some of our main
   queries are mutable, and modifying them while they are used as cache
   keys makes the entry that they are caching invisible (because the
 hash
   code changed too) yet still using memory.
  
   While I think most users would be unaffected as it is rather uncommon
   to modify queries after having passed them to IndexSearcher, I would
   like to remove this trap by making queries immutable: everything
   should be set at construction time except the boost parameter that
   could still be changed with the same clone()/setBoost() mechanism as
   today.
  
   First I would like to make sure that it sounds good to everyone and
   then to discuss what the API should look like. Most of our queries
   happen to be immutable already (NumericRangeQuery, TermsQuery,
   SpanNearQuery, etc.) but some aren't and the main exceptions are:
- BooleanQuery,
- DisjunctionMaxQuery,
- PhraseQuery,
- MultiPhraseQuery.
  
   We could take all parameters that are set as setters and move them to
   constructor arguments. For the above queries, this would mean (using
   varargs for ease of use):
  
 BooleanQuery(boolean disableCoord, int minShouldMatch,
   BooleanClause... clauses)
 DisjunctionMaxQuery(float tieBreakMul, Query... clauses)
  
   For PhraseQuery and MultiPhraseQuery, the closest to what we have
   today would require adding new classes to wrap terms and positions
   together, for instance:
  
   class TermAndPosition {
 public final BytesRef term;
 public final int position;
   }
  
   so that eg. PhraseQuery would look like:
  
 PhraseQuery(int slop, String field, TermAndPosition... terms)
  
   MultiPhraseQuery would be the same with several terms at the same
   position.
  
   Comments/ideas/concerns are highly welcome.
  
   --
   Adrien
  
   -
   To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
   For additional commands, e-mail: dev-h...@lucene.apache.org
  
  
   -
   To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
   For additional commands, e-mail: dev-h...@lucene.apache.org
  
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 
 



 --
 Adrien

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




Re: [DISCUSS] Change Query API to make queries immutable in 6.0

2015-03-31 Thread Terry Smith
Adrien,

I missed the reason that boost is going to stay mutable. Is this to support
query rewriting?

--Terry


On Tue, Mar 31, 2015 at 7:21 AM, Robert Muir rcm...@gmail.com wrote:

 Same with BooleanQuery. the go-to ctor should just take 'clauses'

 On Tue, Mar 31, 2015 at 5:18 AM, Michael McCandless
 luc...@mikemccandless.com wrote:
  +1
 
  For PhraseQuery we could also have a common-case ctor that just takes
  the terms (and assumes sequential positions)?
 
  Mike McCandless
 
  http://blog.mikemccandless.com
 
 
  On Tue, Mar 31, 2015 at 5:10 AM, Adrien Grand jpou...@gmail.com wrote:
  Recent changes that added automatic filter caching to IndexSearcher
  uncovered some traps with our queries when it comes to using them as
  cache keys. The problem comes from the fact that some of our main
  queries are mutable, and modifying them while they are used as cache
  keys makes the entry that they are caching invisible (because the hash
  code changed too) yet still using memory.
 
  While I think most users would be unaffected as it is rather uncommon
  to modify queries after having passed them to IndexSearcher, I would
  like to remove this trap by making queries immutable: everything
  should be set at construction time except the boost parameter that
  could still be changed with the same clone()/setBoost() mechanism as
  today.
 
  First I would like to make sure that it sounds good to everyone and
  then to discuss what the API should look like. Most of our queries
  happen to be immutable already (NumericRangeQuery, TermsQuery,
  SpanNearQuery, etc.) but some aren't and the main exceptions are:
   - BooleanQuery,
   - DisjunctionMaxQuery,
   - PhraseQuery,
   - MultiPhraseQuery.
 
  We could take all parameters that are set as setters and move them to
  constructor arguments. For the above queries, this would mean (using
  varargs for ease of use):
 
BooleanQuery(boolean disableCoord, int minShouldMatch,
  BooleanClause... clauses)
DisjunctionMaxQuery(float tieBreakMul, Query... clauses)
 
  For PhraseQuery and MultiPhraseQuery, the closest to what we have
  today would require adding new classes to wrap terms and positions
  together, for instance:
 
  class TermAndPosition {
public final BytesRef term;
public final int position;
  }
 
  so that eg. PhraseQuery would look like:
 
PhraseQuery(int slop, String field, TermAndPosition... terms)
 
  MultiPhraseQuery would be the same with several terms at the same
 position.
 
  Comments/ideas/concerns are highly welcome.
 
  --
  Adrien
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2015-02-24 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335022#comment-14335022
 ] 

Terry Smith commented on LUCENE-6229:
-

Understood.

If you end up keeping getChildren(), how do you feel about making it well 
defined by capturing these constraints in the Javadoc?



 Remove Scorer.getChildren?
 --

 Key: LUCENE-6229
 URL: https://issues.apache.org/jira/browse/LUCENE-6229
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Priority: Minor

 This API is used in a single place in our code base: 
 ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
 using this API from a collector only works if setScorer is called with an 
 actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
 disjunctions) so it needs a custom IndexSearcher that does not use the 
 BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2015-02-24 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334937#comment-14334937
 ] 

Terry Smith commented on LUCENE-6229:
-

[~rcmuir] Sorry for excluding that scenario, it wasn't intentional.

If you all decide to keep getChildren(), then I'd love to get the contract 
described so people know what to expect.

I think these statements are correct:

# Scorer.getChildren() returns the immediate child scorers
# A returned scorer may be 
## unpositioned (never had next() or advance() called on it)
## positioned on a valid document that is before, on, or after the current 
document
## exhausted and thus positioned at NO_MORE_DOCS
# You MUST NOT call next() or advance() on the returned scorers yourself

And have these questions:

# Can I walk the returned scorers to get to all non-null leaf scorers?
# Can I position the returned scorers on the current document by calling 
freq(), score() or something else?




 Remove Scorer.getChildren?
 --

 Key: LUCENE-6229
 URL: https://issues.apache.org/jira/browse/LUCENE-6229
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Priority: Minor

 This API is used in a single place in our code base: 
 ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
 using this API from a collector only works if setScorer is called with an 
 actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
 disjunctions) so it needs a custom IndexSearcher that does not use the 
 BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2015-02-20 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329069#comment-14329069
 ] 

Terry Smith commented on LUCENE-6229:
-

I’ll summarize this as two options:

# Remove getChildren() as it complicates the code hurting the ability to 
maintain it and make performance enhancements.
# Make getChildren() a more well defined API that gives you the ability to 
retrieve child scorers that are correctly positioned.

You are looking for data to backup option 2 to determine if this is an API that 
is worth fixing/keeping.

Here are the use cases that I have:

# Custom scoring of a BooleanQuery. A query that wraps any BooleanQuery which 
it uses for recall but supplies it’s own scoring algorithm to aggregate the 
scores from the original clauses.
# Custom DrillSidewaysQuery. A query that can use the sideways scorers for 
precision instead of just recall.
# Recursive DrillSidewaysQuery (not implemented, tricky). A query that could 
perform DrillSideways for union or in a nested fashion.
# Auxillary metadata. An enhancement that can augment the current recall 
(boolean match) and precision (float score) for a document in the search 
pipeline to add extra information that can be used from Query and FunctionValue 
instances (collected via a custom Collector) and supported by a custom 
SortField.

These can be categorized into two camps:

# Using an existing Query (typically BooleanQuery) to find matches but 
providing some combination of
## custom scoring that isn’t compatible with the Similarity API.
## custom recall (think DrillSideways)
# Adding extra information to the search pipeline that can be 
## generated by leaf queries and value sources
## aggregated by composing queries (BooleanQuery, DisjunctionMaxQuery, etc)
## survive wrapping queries and value sources that don’t know about it
## collected and sorted on

Hope this helps.

 Remove Scorer.getChildren?
 --

 Key: LUCENE-6229
 URL: https://issues.apache.org/jira/browse/LUCENE-6229
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Priority: Minor

 This API is used in a single place in our code base: 
 ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
 using this API from a collector only works if setScorer is called with an 
 actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
 disjunctions) so it needs a custom IndexSearcher that does not use the 
 BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2015-02-17 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324686#comment-14324686
 ] 

Terry Smith commented on LUCENE-6229:
-

[~rcmuir] Thanks for the backstory. I've been trying to wrap my head around 
where Lucene is going and this kind of information really helps.

It sounds like both [~rcmuir] and [~thetaphi] agree that Scorer.getChildren() 
is not an API that Lucene should support. Reading between the lines, this 
implies to me that scoring is moving to a bulk-only approach, which will bring 
great performance gains.

A best effort implementation of Scorer.getChildren() would be something that 
I'd be uncomfortable adding features on top of, although it could be useful for 
debugging. Unfortunately this is a showstopper for me as I rely on 
Scorer.getChildren() for some critical features, and need to do some serious 
thinking to figure out if I can formulate an alternative approach.


 Remove Scorer.getChildren?
 --

 Key: LUCENE-6229
 URL: https://issues.apache.org/jira/browse/LUCENE-6229
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Priority: Minor

 This API is used in a single place in our code base: 
 ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
 using this API from a collector only works if setScorer is called with an 
 actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
 disjunctions) so it needs a custom IndexSearcher that does not use the 
 BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2015-02-12 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318272#comment-14318272
 ] 

Terry Smith commented on LUCENE-6229:
-

[~jpountz] - I'm going to split the freq() vs score() thing into a separate 
ticket so it doesn't hijack this one. I intend to take the unit test I 
previously pasted and extend it to create some randomized BooleanQuerys to try 
and locate possibly broken edge cases and give a safety blanket for future 
refactoring.

I'll make these assumptions, shout out if they are incorrect.

For a BooleanQuery I should be able to perform doc-at-a-time scoring, meaning 
that in a Collector or Scorer I can

1. Find all Scorers from the child clauses of the BooleanQuery
2. Have those Scorers be positioned for me by calling score() or freq()


 Remove Scorer.getChildren?
 --

 Key: LUCENE-6229
 URL: https://issues.apache.org/jira/browse/LUCENE-6229
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Priority: Minor

 This API is used in a single place in our code base: 
 ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
 using this API from a collector only works if setScorer is called with an 
 actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
 disjunctions) so it needs a custom IndexSearcher that does not use the 
 BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2015-02-10 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314219#comment-14314219
 ] 

Terry Smith commented on LUCENE-6229:
-

Like Stefan, I'm also using this functionality to access child scorers on a per 
document basis. Currently for some custom query enhancements and a custom drill 
sideways implementation.

Like Adrien, I've also had to wrap queries in a custom NonBulkScoringQuery to 
force doc-at-a-time scoring.

It'd be great to simplify this workflow as I've been calling Scorer.freq() to 
position all the child scorers (from a BooleanQuery) and as of the 5.1 nightly 
builds am needing to call Scorer.score() instead for positioning due to changes 
in MinShouldMatchSumScorer.

I'd love to have a way to not only get the child scorers but be confident that 
they were all correctly positioned.


 Remove Scorer.getChildren?
 --

 Key: LUCENE-6229
 URL: https://issues.apache.org/jira/browse/LUCENE-6229
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Priority: Minor

 This API is used in a single place in our code base: 
 ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
 using this API from a collector only works if setScorer is called with an 
 actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
 disjunctions) so it needs a custom IndexSearcher that does not use the 
 BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?

2015-02-10 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314361#comment-14314361
 ] 

Terry Smith commented on LUCENE-6229:
-

h2. freq() vs score()

I think the lazy positioning in MinShouldMatchSumScorer is misbehaving.

Drop these three methods into TestBooleanMinShouldMatch.java to see.
{code:java}
public void testMinNrShouldMatchFreq() throws Exception {
  BooleanQuery q = new BooleanQuery();
  q.add(new TermQuery(new Term(data, 1)), Occur.SHOULD);
  q.add(new TermQuery(new Term(data, 2)), Occur.SHOULD);
  q.add(new TermQuery(new Term(data, 3)), Occur.SHOULD);
  q.add(new TermQuery(new Term(id, 0)), Occur.MUST);
  q.setMinimumNumberShouldMatch(2);
  verifyNrHits(q, 1);
  s.search(q, new SimpleCollector() {
private Scorer scorer;
private CollectionScorer leafScorers;
@Override
public void setScorer(Scorer scorer) throws IOException {
  this.scorer = scorer;
  this.leafScorers = leafScorers(new ArrayListScorer(), scorer);
  assertEquals(4, leafScorers.size());
}

@Override
public void collect(int doc) throws IOException {
  assertEquals(0, doc);
  scorer.freq(); // position leaf scorers
  for (Scorer leafScorer : leafScorers) {
assertEquals(0, leafScorer.docID());
  }
}
  });
}

public void testMinNrShouldMatchScore() throws Exception {
  BooleanQuery q = new BooleanQuery();
  q.add(new TermQuery(new Term(data, 1)), Occur.SHOULD);
  q.add(new TermQuery(new Term(data, 2)), Occur.SHOULD);
  q.add(new TermQuery(new Term(data, 3)), Occur.SHOULD);
  q.add(new TermQuery(new Term(id, 0)), Occur.MUST);
  q.setMinimumNumberShouldMatch(2);
  verifyNrHits(q, 1);
  s.search(q, new SimpleCollector() {
private Scorer scorer;
private CollectionScorer leafScorers;
@Override
public void setScorer(Scorer scorer) throws IOException {
  this.scorer = scorer;
  this.leafScorers = leafScorers(new ArrayListScorer(), scorer);
  assertEquals(4, leafScorers.size());
}

@Override
public void collect(int doc) throws IOException {
  assertEquals(0, doc);
  scorer.score(); // position leaf scorers
  for (Scorer leafScorer : leafScorers) {
assertEquals(0, leafScorer.docID());
  }
}
  });
}

private static CollectionScorer leafScorers(CollectionScorer target, 
Scorer scorer) {
  CollectionChildScorer childScorers = scorer.getChildren();
  if (childScorers.isEmpty()) {
target.add(scorer);
  } else {
for (ChildScorer childScorer : childScorers) {
  leafScorers(target, childScorer.child);
}
  }
  return target;
}
{code}

Here the one that uses freq() to position the sub scorers fails but the one 
that uses score() succeeds.

h2. middle ground

I have Scorer constructors, Weight.scorer(), Weight.explain() and Collectors 
all calling Scorer.getChildren(). But when using my custom Collectors I'm 
careful to wrap the Query in a custom NonBulkScoringQuery that prevents bulk 
scoring to work around the trap. The NonBulkScoringQuery I mention is a simple 
delegating Query that allows Weight.bulkScorer() to use it's default 
implementation instead of allowing the wrapped Query to override it.

I like removing the trap for bulk scoring queries, it's really subtle and it 
took me a while to diagnose the first time I hit it.

Having a separate entry point into IndexSearcher to achieve doc-at-a-time 
scoring that supports getChildren() would be awesome. I'm not so hot on having 
to cast the collector, do you think there could be a way to preserve type 
safety here?


 Remove Scorer.getChildren?
 --

 Key: LUCENE-6229
 URL: https://issues.apache.org/jira/browse/LUCENE-6229
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Priority: Minor

 This API is used in a single place in our code base: 
 ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that 
 using this API from a collector only works if setScorer is called with an 
 actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in 
 disjunctions) so it needs a custom IndexSearcher that does not use the 
 BulkScorer API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6232) Replace ValueSource context Map with a more concrete data type

2015-02-10 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314242#comment-14314242
 ] 

Terry Smith commented on LUCENE-6232:
-

I have custom code that injects objects into this map. If you refactor this to 
be concrete class could you leave it non-final so a custom FunctionQuery could 
provide it's own subclassed instance of this context?


 Replace ValueSource context Map with a more concrete data type
 --

 Key: LUCENE-6232
 URL: https://issues.apache.org/jira/browse/LUCENE-6232
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Mike Drob

 Inspired by LUCENE-3973
 The context object used by ValueSource and friends is a raw Map that provides 
 no type safety guarantees. In our current state, there are lots of warnings 
 about unchecked casts, raw types, and generally unsafe code from the 
 compiler's perspective.
 There are several common patterns and types of Objects that we store in the 
 context. It would be beneficial to instead use a class with typed methods for 
 get/set of Scorer, Weights, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6232) Replace ValueSource context Map with a more concrete data type

2015-02-10 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314555#comment-14314555
 ] 

Terry Smith commented on LUCENE-6232:
-

[~mdrob] Thanks, either would work.

 Replace ValueSource context Map with a more concrete data type
 --

 Key: LUCENE-6232
 URL: https://issues.apache.org/jira/browse/LUCENE-6232
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Mike Drob

 Inspired by LUCENE-3973
 The context object used by ValueSource and friends is a raw Map that provides 
 no type safety guarantees. In our current state, there are lots of warnings 
 about unchecked casts, raw types, and generally unsafe code from the 
 compiler's perspective.
 There are several common patterns and types of Objects that we store in the 
 context. It would be beneficial to instead use a class with typed methods for 
 get/set of Scorer, Weights, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5796) Scorer.getChildren() can throw or hide a subscorer for some boolean queries

2014-07-02 Thread Terry Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049941#comment-14049941
 ] 

Terry Smith commented on LUCENE-5796:
-

Thanks for taking the time to review my patch and comment on the approach.

The reason that I advocated changing FilterScorer and BoostedScorer is to allow 
some of my custom Query implementations to use a regular BooleanQuery for 
recall and optionally scoring while taking advantage of the actual Scorers used 
on a per document, per clause basis.

This has been working great across quite a few Lucene releases but failed when 
I upgraded to 4.9 due to the two regressions in behavior for 
Scorer.getChildren() as described in this ticket.

In this scenario, a BooleanQuery containing two TermQueries (one a miss and the 
other a hit) returns the following from BooleanWeight.scorer():

* BoostedScorer
** TermScorer (hit)

Calling getChildren() on this returns an empty list because the BoostedScorer 
just returns in.getChildren() and thus you are unable to navigate to the actual 
TermScorer in play. This would impact any classes that extend FilterScorer and 
don't override getChildren(). In other words, the current wiring does make the 
BoostedScorer transparent but with the disadvantage of hiding the actual scorer 
that performs the work.

If this is an unsupported workflow, I'm happy to move the discussion over to 
the user mailing list.

 Scorer.getChildren() can throw or hide a subscorer for some boolean queries
 ---

 Key: LUCENE-5796
 URL: https://issues.apache.org/jira/browse/LUCENE-5796
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.9
Reporter: Terry Smith
Priority: Minor
 Attachments: LUCENE-5796.patch


 I've isolated two example boolean queries that don't behave with release 4.9 
 of Lucene.
 # A BooleanQuery with three SHOULD clauses and a minimumNumberShouldMatch of 
 2 will throw an ArrayIndexOutOfBoundsException.
 {noformat}
 java.lang.ArrayIndexOutOfBoundsException: 2
   at 
 __randomizedtesting.SeedInfo.seed([2F79B3DF917D071B:2539E6DBC4DF793C]:0)
   at 
 org.apache.lucene.search.MinShouldMatchSumScorer.getChildren(MinShouldMatchSumScorer.java:119)
   at 
 org.apache.lucene.search.TestBooleanQueryVisitSubscorers$ScorerSummarizingCollector.summarizeScorer(TestBooleanQueryVisitSubscorers.java:261)
   at 
 org.apache.lucene.search.TestBooleanQueryVisitSubscorers$ScorerSummarizingCollector.setScorer(TestBooleanQueryVisitSubscorers.java:238)
   at 
 org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:161)
   at 
 org.apache.lucene.search.AssertingBulkScorer.score(AssertingBulkScorer.java:64)
   at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
   at 
 org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:94)
   at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
   at 
 org.apache.lucene.search.TestBooleanQueryVisitSubscorers.testGetChildrenMinShouldMatchSumScorer(TestBooleanQueryVisitSubscorers.java:196)
 {noformat}
 # A BooleanQuery with two should clauses, one of which is a miss for all 
 documents in the current segment will accidentally mask the scorer that was a 
 hit.
 Unit tests and patch based on {{branch_4x}} are available and will be 
 attached as soon as this ticket has a number.
 They are immediately available on GitHub on branch 
 [shebiki/bqgetchildren|https://github.com/shebiki/lucene-solr/commits/bqgetchildren]
  as commit 
 [c64bb6f|https://github.com/shebiki/lucene-solr/commit/c64bb6f2df8f33dd8daafc953d9c27b5cbf29fa3].
 I took the liberty of naming the relationship in BoostingScorer.getChildren() 
 {{BOOSTING}}. Suspect someone will offer a better name for this. Here is a 
 summary of the various relationships in play for all Scorer.getChildren() 
 implementations on {{branch_4x}} to help choose.
 || class   || 
 relationships
 | org.apache.lucene.search.AssertingScorer | 
 SHOULD
 | org.apache.lucene.search.join.ToParentBlockJoinQuery.BlockJoinScorer | 
 BLOCK_JOIN
 | org.apache.lucene.search.ConjunctionScorer   | MUST
 | org.apache.lucene.search.ConstantScoreQuery.ConstantScorer   | 
 constant
 | org.apache.lucene.queries.function.BoostedQuery.CustomScorer | 
 CUSTOM
 | org.apache.lucene.queries.CustomScoreQuery.CustomScorer  | 
 CUSTOM
 | org.apache.lucene.search.DisjunctionScorer   | 
 SHOULD
 | org.apache.lucene.facet.DrillSidewaysScorer.FakeScorer   | MUST
 | org.apache.lucene.search.FilterScorer

[jira] [Created] (LUCENE-5796) Scorer.getChildren() can throw or hide a subscorer for some boolean queries

2014-06-30 Thread Terry Smith (JIRA)
Terry Smith created LUCENE-5796:
---

 Summary: Scorer.getChildren() can throw or hide a subscorer for 
some boolean queries
 Key: LUCENE-5796
 URL: https://issues.apache.org/jira/browse/LUCENE-5796
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.9
Reporter: Terry Smith
Priority: Minor


I've isolated two example boolean queries that don't behave with release 4.9 of 
Lucene.

# A BooleanQuery with three SHOULD clauses and a minimumNumberShouldMatch of 2 
will throw an ArrayIndexOutOfBoundsException.
{noformat}
java.lang.ArrayIndexOutOfBoundsException: 2
at 
__randomizedtesting.SeedInfo.seed([2F79B3DF917D071B:2539E6DBC4DF793C]:0)
at 
org.apache.lucene.search.MinShouldMatchSumScorer.getChildren(MinShouldMatchSumScorer.java:119)
at 
org.apache.lucene.search.TestBooleanQueryVisitSubscorers$ScorerSummarizingCollector.summarizeScorer(TestBooleanQueryVisitSubscorers.java:261)
at 
org.apache.lucene.search.TestBooleanQueryVisitSubscorers$ScorerSummarizingCollector.setScorer(TestBooleanQueryVisitSubscorers.java:238)
at 
org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:161)
at 
org.apache.lucene.search.AssertingBulkScorer.score(AssertingBulkScorer.java:64)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
at 
org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:94)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
at 
org.apache.lucene.search.TestBooleanQueryVisitSubscorers.testGetChildrenMinShouldMatchSumScorer(TestBooleanQueryVisitSubscorers.java:196)
{noformat}
# A BooleanQuery with two should clauses, one of which is a miss for all 
documents in the current segment will accidentally mask the scorer that was a 
hit.


Unit tests and patch based on {{branch_4x}} are available and will be attached 
as soon as this ticket has a number.

They are immediately available on GitHub on branch 
[shebiki/bqgetchildren|https://github.com/shebiki/lucene-solr/commits/bqgetchildren]
 as commit 
[c64bb6f|https://github.com/shebiki/lucene-solr/commit/c64bb6f2df8f33dd8daafc953d9c27b5cbf29fa3].

I took the liberty of naming the relationship in BoostingScorer.getChildren() 
{{BOOSTING}}. Suspect someone will offer a better name for this. Here is a 
summary of the various relationships in play for all Scorer.getChildren() 
implementations on {{branch_4x}} to help choose.


|| class   || 
relationships
| org.apache.lucene.search.AssertingScorer | SHOULD
| org.apache.lucene.search.join.ToParentBlockJoinQuery.BlockJoinScorer | 
BLOCK_JOIN
| org.apache.lucene.search.ConjunctionScorer   | MUST
| org.apache.lucene.search.ConstantScoreQuery.ConstantScorer   | 
constant
| org.apache.lucene.queries.function.BoostedQuery.CustomScorer | CUSTOM
| org.apache.lucene.queries.CustomScoreQuery.CustomScorer  | CUSTOM
| org.apache.lucene.search.DisjunctionScorer   | SHOULD
| org.apache.lucene.facet.DrillSidewaysScorer.FakeScorer   | MUST
| org.apache.lucene.search.FilterScorer| calls 
in.getChildren() 
| org.apache.lucene.search.ScoreCachingWrappingScorer  | CACHED
| org.apache.lucene.search.FilteredQuery.LeapFrogScorer| 
FILTERED
| org.apache.lucene.search.MinShouldMatchSumScorer | SHOULD
| org.apache.lucene.search.FilteredQuery   | 
FILTERED
| org.apache.lucene.search.ReqExclScorer   | MUST
| org.apache.lucene.search.ReqOptSumScorer | MUST, 
SHOULD
| org.apache.lucene.search.join.ToChildBlockJoinQuery  | 
BLOCK_JOIN

I also removed FilterScorer.getChildren() to prevent mistakes and force 
subclasses to provide a correct implementation.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5796) Scorer.getChildren() can throw or hide a subscorer for some boolean queries

2014-06-30 Thread Terry Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Terry Smith updated LUCENE-5796:


Attachment: LUCENE-5796.patch

 Scorer.getChildren() can throw or hide a subscorer for some boolean queries
 ---

 Key: LUCENE-5796
 URL: https://issues.apache.org/jira/browse/LUCENE-5796
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.9
Reporter: Terry Smith
Priority: Minor
 Attachments: LUCENE-5796.patch


 I've isolated two example boolean queries that don't behave with release 4.9 
 of Lucene.
 # A BooleanQuery with three SHOULD clauses and a minimumNumberShouldMatch of 
 2 will throw an ArrayIndexOutOfBoundsException.
 {noformat}
 java.lang.ArrayIndexOutOfBoundsException: 2
   at 
 __randomizedtesting.SeedInfo.seed([2F79B3DF917D071B:2539E6DBC4DF793C]:0)
   at 
 org.apache.lucene.search.MinShouldMatchSumScorer.getChildren(MinShouldMatchSumScorer.java:119)
   at 
 org.apache.lucene.search.TestBooleanQueryVisitSubscorers$ScorerSummarizingCollector.summarizeScorer(TestBooleanQueryVisitSubscorers.java:261)
   at 
 org.apache.lucene.search.TestBooleanQueryVisitSubscorers$ScorerSummarizingCollector.setScorer(TestBooleanQueryVisitSubscorers.java:238)
   at 
 org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:161)
   at 
 org.apache.lucene.search.AssertingBulkScorer.score(AssertingBulkScorer.java:64)
   at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
   at 
 org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:94)
   at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
   at 
 org.apache.lucene.search.TestBooleanQueryVisitSubscorers.testGetChildrenMinShouldMatchSumScorer(TestBooleanQueryVisitSubscorers.java:196)
 {noformat}
 # A BooleanQuery with two should clauses, one of which is a miss for all 
 documents in the current segment will accidentally mask the scorer that was a 
 hit.
 Unit tests and patch based on {{branch_4x}} are available and will be 
 attached as soon as this ticket has a number.
 They are immediately available on GitHub on branch 
 [shebiki/bqgetchildren|https://github.com/shebiki/lucene-solr/commits/bqgetchildren]
  as commit 
 [c64bb6f|https://github.com/shebiki/lucene-solr/commit/c64bb6f2df8f33dd8daafc953d9c27b5cbf29fa3].
 I took the liberty of naming the relationship in BoostingScorer.getChildren() 
 {{BOOSTING}}. Suspect someone will offer a better name for this. Here is a 
 summary of the various relationships in play for all Scorer.getChildren() 
 implementations on {{branch_4x}} to help choose.
 || class   || 
 relationships
 | org.apache.lucene.search.AssertingScorer | 
 SHOULD
 | org.apache.lucene.search.join.ToParentBlockJoinQuery.BlockJoinScorer | 
 BLOCK_JOIN
 | org.apache.lucene.search.ConjunctionScorer   | MUST
 | org.apache.lucene.search.ConstantScoreQuery.ConstantScorer   | 
 constant
 | org.apache.lucene.queries.function.BoostedQuery.CustomScorer | 
 CUSTOM
 | org.apache.lucene.queries.CustomScoreQuery.CustomScorer  | 
 CUSTOM
 | org.apache.lucene.search.DisjunctionScorer   | 
 SHOULD
 | org.apache.lucene.facet.DrillSidewaysScorer.FakeScorer   | MUST
 | org.apache.lucene.search.FilterScorer| 
 calls in.getChildren() 
 | org.apache.lucene.search.ScoreCachingWrappingScorer  | 
 CACHED
 | org.apache.lucene.search.FilteredQuery.LeapFrogScorer| 
 FILTERED
 | org.apache.lucene.search.MinShouldMatchSumScorer | 
 SHOULD
 | org.apache.lucene.search.FilteredQuery   | 
 FILTERED
 | org.apache.lucene.search.ReqExclScorer   | MUST
 | org.apache.lucene.search.ReqOptSumScorer | 
 MUST, SHOULD
 | org.apache.lucene.search.join.ToChildBlockJoinQuery  | 
 BLOCK_JOIN
 I also removed FilterScorer.getChildren() to prevent mistakes and force 
 subclasses to provide a correct implementation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Wiki edit permission

2014-03-24 Thread Terry Smith
Could I be allowed edit permissions on the wiki? I've signed up on the
Lucene and Solr wiki with the same name (TerrySmith) but am just emailing
lucene-dev to start.

--Terry


Re: Wiki edit permission

2014-03-24 Thread Terry Smith
Steve,

Boy that was quick! Thanks.

--Terry



On Mon, Mar 24, 2014 at 9:28 AM, Steve Rowe sar...@gmail.com wrote:

 Hi Terry,

 I've added your account name to the ContributorsGroup page on both the
 Solr and Lucene wikis, so you should be able to edit both now.

 Steve

 On Mar 24, 2014, at 9:21 AM, Terry Smith sheb...@gmail.com wrote:

  Could I be allowed edit permissions on the wiki? I've signed up on the
 Lucene and Solr wiki with the same name (TerrySmith) but am just emailing
 lucene-dev to start.
 
  --Terry
 


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




Re: Stalled unit tests

2014-03-13 Thread Terry Smith
It seems that you need to run the tests with `-Dtests.disableHdfs=true` for
them to succeed. Is there any interested in making this the default
behavior?

If not, I'll happily start a new email thread to get wiki permissions so
that the contribution pages linked from the main
README.txthttps://github.com/apache/lucene-solr/blob/trunk/README.txt
will
both mention this important flag.


   - http://wiki.apache.org/lucene-java/HowToContribute
   - http://wiki.apache.org/solr/HowToContribute


Right now they both state that you can run `ant clean test`, unfortunately
that command will fail if you run the tests from either the top level of
the project or the solr subdirectory unless you instead run `ant
-Dtests.disableHdfs=true clean test` or create a build.properties file.

I also couldn't find any references to build.properties on the wiki, here
are the searches I tried:


   -
   
http://wiki.apache.org/general/FrontPage?action=fullsearchcontext=180value=build.propertiesfullsearch=Text
   -
   http://www.google.com/?q=%22build.properties%22+site:wiki.apache.org%2Flucene
   -
   http://www.google.com/?q=%22build.properties%22+site:wiki.apache.org%2Fsolr


Is this documented somewhere else? I'd be happy to back some out from the
ant files, collate documentation from other sources and make it easier to
find.

--Terry


On Mon, Mar 10, 2014 at 2:55 PM, Dawid Weiss
dawid.we...@cs.put.poznan.plwrote:

  Dawid: Boy, those are some large timeouts!

 I know... I wasn't the one to bump them; my default was, I think,
 about 3 minutes per class...

 Dawid

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




Re: Stalled unit tests

2014-03-13 Thread Terry Smith
Dawid,

Thanks, I didn't even know about it until Mike mentioned it earlier in this
thread. I've had it work from ~/lucene.build.properties and
~/build.properties but didn't have any luck putting it in the root of
project (I'm probably just miss reading the ant file).

--Terry



On Thu, Mar 13, 2014 at 9:35 AM, Dawid Weiss
dawid.we...@cs.put.poznan.plwrote:

 Terry,

 The build.properties file holds the current user's config, as opposed
 to the defaults stored in the repository. In fact, there are more
 locations where you can put such defaults (see common-build.xml's
 header):

   !-- Give user a chance to override without editing this file
   (and without typing -D each time it compiles it --
   property file=${user.home}/lucene.build.properties/
   property file=${user.home}/build.properties/
   property file=${basedir}/build.properties/
   property file=${common.dir}/build.properties/


 Dawid

 On Thu, Mar 13, 2014 at 2:29 PM, Terry Smith sheb...@gmail.com wrote:
  It seems that you need to run the tests with `-Dtests.disableHdfs=true`
 for
  them to succeed. Is there any interested in making this the default
  behavior?
 
  If not, I'll happily start a new email thread to get wiki permissions so
  that the contribution pages linked from the main README.txt will both
  mention this important flag.
 
  http://wiki.apache.org/lucene-java/HowToContribute
  http://wiki.apache.org/solr/HowToContribute
 
 
  Right now they both state that you can run `ant clean test`,
 unfortunately
  that command will fail if you run the tests from either the top level of
 the
  project or the solr subdirectory unless you instead run `ant
  -Dtests.disableHdfs=true clean test` or create a build.properties file.
 
  I also couldn't find any references to build.properties on the wiki, here
  are the searches I tried:
 
 
 http://wiki.apache.org/general/FrontPage?action=fullsearchcontext=180value=build.propertiesfullsearch=Text
 
 http://www.google.com/?q=%22build.properties%22+site:wiki.apache.org%2Flucene
 
 http://www.google.com/?q=%22build.properties%22+site:wiki.apache.org%2Fsolr
 
 
  Is this documented somewhere else? I'd be happy to back some out from the
  ant files, collate documentation from other sources and make it easier to
  find.
 
  --Terry
 
 
  On Mon, Mar 10, 2014 at 2:55 PM, Dawid Weiss 
 dawid.we...@cs.put.poznan.pl
  wrote:
 
   Dawid: Boy, those are some large timeouts!
 
  I know... I wasn't the one to bump them; my default was, I think,
  about 3 minutes per class...
 
  Dawid
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




Re: Stalled unit tests

2014-03-10 Thread Terry Smith
Dawid: Boy, those are some large timeouts!

Mike: The build.properties suggestion resolved my issue. I can now run the
test to completion.

On a Mid 2009 MacBook Pro running Mavericks and using Java 6 executing ant
from the top level of the lucene-solr project I get the following timings:

ant clean compile -- 3 minutes
ant clean test (tests.disableHdfs=true, tests.slow=false) -- 55 minutes
ant clean test (tests.disableHdfs=true) -- 88 minutes

On a Mid 2012 MacBook Pro with the same software stack:

ant clean compile -- 1 minute
ant clean test (tests.disableHdfs=true, tests.slow=false) -- 8 minutes

All running from the same git commit mentioned at the top of this thread.

The tests make great use of multiple CPU/cores so a faster machine makes a
huge difference to the total runtime.

Do the HDFS tests fail due to test bugs or implementation issues?

How do you feel about changing the default value of tests.disableHdfs to
true versus updating the wiki documentation to let knew contributors know
how to work around this?

--Terry




On Fri, Mar 7, 2014 at 12:46 PM, Michael McCandless 
luc...@mikemccandless.com wrote:

 I just ran ant test under Solr; it took 4 minutes 25 seconds.

 But, in my ~/build.properties I have:

 tests.disableHdfs=true
 tests.slow=false

 Which makes things substantially faster, and also [seems to] sidestep
 the Solr tests that false fail.

 Mike McCandless

 http://blog.mikemccandless.com


 On Fri, Mar 7, 2014 at 9:04 AM, Terry Smith sheb...@gmail.com wrote:
  Mike,
 
  Fair enough. I'll let them run for more than 30 minutes and see what
  happens.
 
  How long does it take on your machine? I'm happy to signup for the wiki
 and
  add some extra information to
  http://wiki.apache.org/lucene-java/HowToContribute for folks wanting to
  tinker with Lucene.
 
  Do the Lucene developers typically run a subset of the test suite to make
  committing cheaper?
 
  Thanks,
 
  --Terry
 
 
 
  On Fri, Mar 7, 2014 at 5:52 AM, Michael McCandless
  luc...@mikemccandless.com wrote:
 
  Unfortunately, some tests take a very long time, and the test infra
  will print these HEARTBEAT messages notifying you that they are still
  running.  They should eventually finish?
 
  Mike McCandless
 
  http://blog.mikemccandless.com
 
 
  On Thu, Mar 6, 2014 at 5:09 PM, Terry Smith sheb...@gmail.com wrote:
   I'm sure that I'm just missing something obvious but I'm having
 trouble
   getting the unit tests to run to completion on my laptop and was
 hoping
   that
   someone would be kind enough to point me in the right direction.
  
   I've cloned the repository from GitHub
   (http://git.apache.org/lucene-solr.git) and checked out the latest
   commit on
   branch_4x.
  
   commit 6e06247cec1410f32592bfd307c1020b814def06
  
   Author: Robert Muir rm...@apache.org
  
   Date:   Thu Mar 6 19:54:07 2014 +
  
  
   disable slow solr tests in smoketester
  
  
  
   git-svn-id:
  
 https://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x@1575025
   13f79535-47bb-0310-9956-ffa450edef68
  
  
   Executing ant clean test from the top level directory of the project
   shows
   the tests running but they seems to get stuck in loop with some
 stalled
   heartbeat messages. If I run the tests directly from lucene/ then they
   complete successfully after about 10 minutes.
  
   I'm using Java 6 under OS X (10.9.2).
  
   $ java -version
  
   java version 1.6.0_65
  
   Java(TM) SE Runtime Environment (build 1.6.0_65-b14-462-11M4609)
  
   Java HotSpot(TM) 64-Bit Server VM (build 20.65-b04-462, mixed mode)
  
  
   My terminal lists repeating stalled heartbeat messages like so:
  
   HEARTBEAT J2 PID(20104@onyx.local): 2014-03-06T16:53:35, stalled for
   2111s
   at: HdfsLockFactoryTest.testBasic
  
   HEARTBEAT J0 PID(20106@onyx.local): 2014-03-06T16:53:47, stalled for
   2108s
   at: TestSurroundQueryParser.testQueryParser
  
   HEARTBEAT J1 PID(20103@onyx.local): 2014-03-06T16:54:11, stalled for
   2167s
   at: TestRecoveryHdfs.testBuffering
  
   HEARTBEAT J3 PID(20105@onyx.local): 2014-03-06T16:54:23, stalled for
   2165s
   at: HdfsDirectoryTest.testEOF
  
  
   My machine does have 3 java processes chewing CPU, see attached jstack
   dumps
   for more information.
  
   Should I expect the tests to complete on my platform? Do I need to
   specify
   any special flags to give them more memory or to avoid any bad apples?
  
   Thanks in advance,
  
   --Terry
  
  
  
  
   -
   To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
   For additional commands, e-mail: dev-h...@lucene.apache.org
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr

Re: Stalled unit tests

2014-03-10 Thread Terry Smith
Oops, the second set of timings on the Mid 2012 MacBook Pro were for JUST
the solr tests.



On Mon, Mar 10, 2014 at 9:31 AM, Terry Smith sheb...@gmail.com wrote:

 Dawid: Boy, those are some large timeouts!

 Mike: The build.properties suggestion resolved my issue. I can now run the
 test to completion.

 On a Mid 2009 MacBook Pro running Mavericks and using Java 6 executing ant
 from the top level of the lucene-solr project I get the following timings:

 ant clean compile -- 3 minutes
 ant clean test (tests.disableHdfs=true, tests.slow=false) -- 55 minutes
 ant clean test (tests.disableHdfs=true) -- 88 minutes

 On a Mid 2012 MacBook Pro with the same software stack:

 ant clean compile -- 1 minute
 ant clean test (tests.disableHdfs=true, tests.slow=false) -- 8 minutes

 All running from the same git commit mentioned at the top of this thread.

 The tests make great use of multiple CPU/cores so a faster machine makes a
 huge difference to the total runtime.

 Do the HDFS tests fail due to test bugs or implementation issues?

 How do you feel about changing the default value of tests.disableHdfs to
 true versus updating the wiki documentation to let knew contributors know
 how to work around this?

 --Terry




 On Fri, Mar 7, 2014 at 12:46 PM, Michael McCandless 
 luc...@mikemccandless.com wrote:

 I just ran ant test under Solr; it took 4 minutes 25 seconds.

 But, in my ~/build.properties I have:

 tests.disableHdfs=true
 tests.slow=false

 Which makes things substantially faster, and also [seems to] sidestep
 the Solr tests that false fail.

 Mike McCandless

 http://blog.mikemccandless.com


 On Fri, Mar 7, 2014 at 9:04 AM, Terry Smith sheb...@gmail.com wrote:
  Mike,
 
  Fair enough. I'll let them run for more than 30 minutes and see what
  happens.
 
  How long does it take on your machine? I'm happy to signup for the wiki
 and
  add some extra information to
  http://wiki.apache.org/lucene-java/HowToContribute for folks wanting to
  tinker with Lucene.
 
  Do the Lucene developers typically run a subset of the test suite to
 make
  committing cheaper?
 
  Thanks,
 
  --Terry
 
 
 
  On Fri, Mar 7, 2014 at 5:52 AM, Michael McCandless
  luc...@mikemccandless.com wrote:
 
  Unfortunately, some tests take a very long time, and the test infra
  will print these HEARTBEAT messages notifying you that they are still
  running.  They should eventually finish?
 
  Mike McCandless
 
  http://blog.mikemccandless.com
 
 
  On Thu, Mar 6, 2014 at 5:09 PM, Terry Smith sheb...@gmail.com wrote:
   I'm sure that I'm just missing something obvious but I'm having
 trouble
   getting the unit tests to run to completion on my laptop and was
 hoping
   that
   someone would be kind enough to point me in the right direction.
  
   I've cloned the repository from GitHub
   (http://git.apache.org/lucene-solr.git) and checked out the latest
   commit on
   branch_4x.
  
   commit 6e06247cec1410f32592bfd307c1020b814def06
  
   Author: Robert Muir rm...@apache.org
  
   Date:   Thu Mar 6 19:54:07 2014 +
  
  
   disable slow solr tests in smoketester
  
  
  
   git-svn-id:
  
 https://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x@1575025
   13f79535-47bb-0310-9956-ffa450edef68
  
  
   Executing ant clean test from the top level directory of the
 project
   shows
   the tests running but they seems to get stuck in loop with some
 stalled
   heartbeat messages. If I run the tests directly from lucene/ then
 they
   complete successfully after about 10 minutes.
  
   I'm using Java 6 under OS X (10.9.2).
  
   $ java -version
  
   java version 1.6.0_65
  
   Java(TM) SE Runtime Environment (build 1.6.0_65-b14-462-11M4609)
  
   Java HotSpot(TM) 64-Bit Server VM (build 20.65-b04-462, mixed mode)
  
  
   My terminal lists repeating stalled heartbeat messages like so:
  
   HEARTBEAT J2 PID(20104@onyx.local): 2014-03-06T16:53:35, stalled for
   2111s
   at: HdfsLockFactoryTest.testBasic
  
   HEARTBEAT J0 PID(20106@onyx.local): 2014-03-06T16:53:47, stalled for
   2108s
   at: TestSurroundQueryParser.testQueryParser
  
   HEARTBEAT J1 PID(20103@onyx.local): 2014-03-06T16:54:11, stalled for
   2167s
   at: TestRecoveryHdfs.testBuffering
  
   HEARTBEAT J3 PID(20105@onyx.local): 2014-03-06T16:54:23, stalled for
   2165s
   at: HdfsDirectoryTest.testEOF
  
  
   My machine does have 3 java processes chewing CPU, see attached
 jstack
   dumps
   for more information.
  
   Should I expect the tests to complete on my platform? Do I need to
   specify
   any special flags to give them more memory or to avoid any bad
 apples?
  
   Thanks in advance,
  
   --Terry
  
  
  
  
   -
   To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
   For additional commands, e-mail: dev-h...@lucene.apache.org
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org

Re: Stalled unit tests

2014-03-10 Thread Terry Smith
Shalin: That makes sense. Both the machines I used for testing have SSDs.



On Mon, Mar 10, 2014 at 9:35 AM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 In my experience, the test suite is much faster on an SSD. Around 18
 minutes on my mac book pro and 12 minutes on my PC for just the Solr
 tests with -Dtests.slow=true (both have SSDs)

 On Mon, Mar 10, 2014 at 7:02 PM, Terry Smith sheb...@gmail.com wrote:
  Oops, the second set of timings on the Mid 2012 MacBook Pro were for JUST
  the solr tests.
 
 
 
  On Mon, Mar 10, 2014 at 9:31 AM, Terry Smith sheb...@gmail.com wrote:
 
  Dawid: Boy, those are some large timeouts!
 
  Mike: The build.properties suggestion resolved my issue. I can now run
 the
  test to completion.
 
  On a Mid 2009 MacBook Pro running Mavericks and using Java 6 executing
 ant
  from the top level of the lucene-solr project I get the following
 timings:
 
  ant clean compile -- 3 minutes
  ant clean test (tests.disableHdfs=true, tests.slow=false) -- 55 minutes
  ant clean test (tests.disableHdfs=true) -- 88 minutes
 
  On a Mid 2012 MacBook Pro with the same software stack:
 
  ant clean compile -- 1 minute
  ant clean test (tests.disableHdfs=true, tests.slow=false) -- 8 minutes
 
  All running from the same git commit mentioned at the top of this
 thread.
 
  The tests make great use of multiple CPU/cores so a faster machine
 makes a
  huge difference to the total runtime.
 
  Do the HDFS tests fail due to test bugs or implementation issues?
 
  How do you feel about changing the default value of tests.disableHdfs to
  true versus updating the wiki documentation to let knew contributors
 know
  how to work around this?
 
  --Terry
 
 
 
 
  On Fri, Mar 7, 2014 at 12:46 PM, Michael McCandless
  luc...@mikemccandless.com wrote:
 
  I just ran ant test under Solr; it took 4 minutes 25 seconds.
 
  But, in my ~/build.properties I have:
 
  tests.disableHdfs=true
  tests.slow=false
 
  Which makes things substantially faster, and also [seems to] sidestep
  the Solr tests that false fail.
 
  Mike McCandless
 
  http://blog.mikemccandless.com
 
 
  On Fri, Mar 7, 2014 at 9:04 AM, Terry Smith sheb...@gmail.com wrote:
   Mike,
  
   Fair enough. I'll let them run for more than 30 minutes and see what
   happens.
  
   How long does it take on your machine? I'm happy to signup for the
 wiki
   and
   add some extra information to
   http://wiki.apache.org/lucene-java/HowToContribute for folks
 wanting to
   tinker with Lucene.
  
   Do the Lucene developers typically run a subset of the test suite to
   make
   committing cheaper?
  
   Thanks,
  
   --Terry
  
  
  
   On Fri, Mar 7, 2014 at 5:52 AM, Michael McCandless
   luc...@mikemccandless.com wrote:
  
   Unfortunately, some tests take a very long time, and the test infra
   will print these HEARTBEAT messages notifying you that they are
 still
   running.  They should eventually finish?
  
   Mike McCandless
  
   http://blog.mikemccandless.com
  
  
   On Thu, Mar 6, 2014 at 5:09 PM, Terry Smith sheb...@gmail.com
 wrote:
I'm sure that I'm just missing something obvious but I'm having
trouble
getting the unit tests to run to completion on my laptop and was
hoping
that
someone would be kind enough to point me in the right direction.
   
I've cloned the repository from GitHub
(http://git.apache.org/lucene-solr.git) and checked out the
 latest
commit on
branch_4x.
   
commit 6e06247cec1410f32592bfd307c1020b814def06
   
Author: Robert Muir rm...@apache.org
   
Date:   Thu Mar 6 19:54:07 2014 +
   
   
disable slow solr tests in smoketester
   
   
   
git-svn-id:
   
   
 https://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x@1575025
13f79535-47bb-0310-9956-ffa450edef68
   
   
Executing ant clean test from the top level directory of the
project
shows
the tests running but they seems to get stuck in loop with some
stalled
heartbeat messages. If I run the tests directly from lucene/ then
they
complete successfully after about 10 minutes.
   
I'm using Java 6 under OS X (10.9.2).
   
$ java -version
   
java version 1.6.0_65
   
Java(TM) SE Runtime Environment (build 1.6.0_65-b14-462-11M4609)
   
Java HotSpot(TM) 64-Bit Server VM (build 20.65-b04-462, mixed
 mode)
   
   
My terminal lists repeating stalled heartbeat messages like so:
   
HEARTBEAT J2 PID(20104@onyx.local): 2014-03-06T16:53:35, stalled
 for
2111s
at: HdfsLockFactoryTest.testBasic
   
HEARTBEAT J0 PID(20106@onyx.local): 2014-03-06T16:53:47, stalled
 for
2108s
at: TestSurroundQueryParser.testQueryParser
   
HEARTBEAT J1 PID(20103@onyx.local): 2014-03-06T16:54:11, stalled
 for
2167s
at: TestRecoveryHdfs.testBuffering
   
HEARTBEAT J3 PID(20105@onyx.local): 2014-03-06T16:54:23, stalled
 for
2165s
at: HdfsDirectoryTest.testEOF
   
   
My machine does have 3 java

Re: Stalled unit tests

2014-03-07 Thread Terry Smith
Mike,

Fair enough. I'll let them run for more than 30 minutes and see what
happens.

How long does it take on your machine? I'm happy to signup for the wiki and
add some extra information to
http://wiki.apache.org/lucene-java/HowToContribute for folks wanting to
tinker with Lucene.

Do the Lucene developers typically run a subset of the test suite to make
committing cheaper?

Thanks,

--Terry



On Fri, Mar 7, 2014 at 5:52 AM, Michael McCandless 
luc...@mikemccandless.com wrote:

 Unfortunately, some tests take a very long time, and the test infra
 will print these HEARTBEAT messages notifying you that they are still
 running.  They should eventually finish?

 Mike McCandless

 http://blog.mikemccandless.com


 On Thu, Mar 6, 2014 at 5:09 PM, Terry Smith sheb...@gmail.com wrote:
  I'm sure that I'm just missing something obvious but I'm having trouble
  getting the unit tests to run to completion on my laptop and was hoping
 that
  someone would be kind enough to point me in the right direction.
 
  I've cloned the repository from GitHub
  (http://git.apache.org/lucene-solr.git) and checked out the latest
 commit on
  branch_4x.
 
  commit 6e06247cec1410f32592bfd307c1020b814def06
 
  Author: Robert Muir rm...@apache.org
 
  Date:   Thu Mar 6 19:54:07 2014 +
 
 
  disable slow solr tests in smoketester
 
 
 
  git-svn-id:
  https://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x@1575025
  13f79535-47bb-0310-9956-ffa450edef68
 
 
  Executing ant clean test from the top level directory of the project
 shows
  the tests running but they seems to get stuck in loop with some stalled
  heartbeat messages. If I run the tests directly from lucene/ then they
  complete successfully after about 10 minutes.
 
  I'm using Java 6 under OS X (10.9.2).
 
  $ java -version
 
  java version 1.6.0_65
 
  Java(TM) SE Runtime Environment (build 1.6.0_65-b14-462-11M4609)
 
  Java HotSpot(TM) 64-Bit Server VM (build 20.65-b04-462, mixed mode)
 
 
  My terminal lists repeating stalled heartbeat messages like so:
 
  HEARTBEAT J2 PID(20104@onyx.local): 2014-03-06T16:53:35, stalled for
 2111s
  at: HdfsLockFactoryTest.testBasic
 
  HEARTBEAT J0 PID(20106@onyx.local): 2014-03-06T16:53:47, stalled for
 2108s
  at: TestSurroundQueryParser.testQueryParser
 
  HEARTBEAT J1 PID(20103@onyx.local): 2014-03-06T16:54:11, stalled for
 2167s
  at: TestRecoveryHdfs.testBuffering
 
  HEARTBEAT J3 PID(20105@onyx.local): 2014-03-06T16:54:23, stalled for
 2165s
  at: HdfsDirectoryTest.testEOF
 
 
  My machine does have 3 java processes chewing CPU, see attached jstack
 dumps
  for more information.
 
  Should I expect the tests to complete on my platform? Do I need to
 specify
  any special flags to give them more memory or to avoid any bad apples?
 
  Thanks in advance,
 
  --Terry
 
 
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




Stalled unit tests

2014-03-06 Thread Terry Smith
I'm sure that I'm just missing something obvious but I'm having trouble
getting the unit tests to run to completion on my laptop and was hoping
that someone would be kind enough to point me in the right direction.

I've cloned the repository from GitHub (
http://git.apache.org/lucene-solr.git) and checked out the latest commit on
branch_4x.

commit 6e06247cec1410f32592bfd307c1020b814def06

Author: Robert Muir rm...@apache.org

Date:   Thu Mar 6 19:54:07 2014 +


disable slow solr tests in smoketester



git-svn-id:
https://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x@157502513f79535-47bb-0310-9956-ffa450edef68


Executing ant clean test from the top level directory of the project
shows the tests running but they seems to get stuck in loop with some
stalled heartbeat messages. If I run the tests directly from lucene/ then
they complete successfully after about 10 minutes.

I'm using Java 6 under OS X (10.9.2).

$ java -version

java version 1.6.0_65

Java(TM) SE Runtime Environment (build 1.6.0_65-b14-462-11M4609)

Java HotSpot(TM) 64-Bit Server VM (build 20.65-b04-462, mixed mode)


My terminal lists repeating stalled heartbeat messages like so:

HEARTBEAT J2 PID(20104@onyx.local): 2014-03-06T16:53:35, stalled for 2111s
at: HdfsLockFactoryTest.testBasic

HEARTBEAT J0 PID(20106@onyx.local): 2014-03-06T16:53:47, stalled for 2108s
at: TestSurroundQueryParser.testQueryParser

HEARTBEAT J1 PID(20103@onyx.local): 2014-03-06T16:54:11, stalled for 2167s
at: TestRecoveryHdfs.testBuffering

HEARTBEAT J3 PID(20105@onyx.local): 2014-03-06T16:54:23, stalled for 2165s
at: HdfsDirectoryTest.testEOF

My machine does have 3 java processes chewing CPU, see attached jstack
dumps for more information.

Should I expect the tests to complete on my platform? Do I need to specify
any special flags to give them more memory or to avoid any bad apples?

Thanks in advance,

--Terry


20103.jstack.txt.gz
Description: GNU Zip compressed data


20104.jstack.txt.gz
Description: GNU Zip compressed data


20105.jstack.txt.gz
Description: GNU Zip compressed data

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org