[jira] [Commented] (LUCENE-7382) Wrong default attribute factory in use
[ https://issues.apache.org/jira/browse/LUCENE-7382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15379453#comment-15379453 ] Terry Smith commented on LUCENE-7382: - Thanks, I didn't realize this would hit 6.2. I have nightly builds that follow the 6.2.0-SNAPSHOT and 7.0.0-SNAPSHOT artifacts on the ASF snapshot maven repo and this didn't hit my 6.2 branch yet. > Wrong default attribute factory in use > -- > > Key: LUCENE-7382 > URL: https://issues.apache.org/jira/browse/LUCENE-7382 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: master (7.0), 6.2 > Reporter: Terry Smith >Assignee: Uwe Schindler > Fix For: 6.2 > > > Originally reported to the mailing list: > http://mail-archives.apache.org/mod_mbox/lucene-java-user/201607.mbox/%3cCAJ0VynnMAH7N7byPevTV9Htxo-Nk-B7mwUwRgP4X8gN=v4p...@mail.gmail.com%3e > LUCENE-7355 made a change to CustomAnalyzer.createComponents() such that it > uses a different AttributeFactory. > https://github.com/apache/lucene-solr/commit/e92a38af90d12e51390b4307ccbe0c24ac7b6b4e#diff-b39a076156e10aa7a4ba86af0357a0feL122 > The previous default was TokenStream.DEFAULT_TOKEN_ATTRIBUTE_FACTORY which > uses PackedTokenAttributeImpl while the new default is now > AttributeFactory.DEFAULT_ATTRIBUTE_FACTORY which does not use > PackedTokenAttributeImpl. > [~thetaphi] Asked me to open an issue for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-7382) Wrong default attribute factory in use
Terry Smith created LUCENE-7382: --- Summary: Wrong default attribute factory in use Key: LUCENE-7382 URL: https://issues.apache.org/jira/browse/LUCENE-7382 Project: Lucene - Core Issue Type: Bug Affects Versions: master (7.0) Reporter: Terry Smith Originally reported to the mailing list: http://mail-archives.apache.org/mod_mbox/lucene-java-user/201607.mbox/%3cCAJ0VynnMAH7N7byPevTV9Htxo-Nk-B7mwUwRgP4X8gN=v4p...@mail.gmail.com%3e LUCENE-7355 made a change to CustomAnalyzer.createComponents() such that it uses a different AttributeFactory. https://github.com/apache/lucene-solr/commit/e92a38af90d12e51390b4307ccbe0c24ac7b6b4e#diff-b39a076156e10aa7a4ba86af0357a0feL122 The previous default was TokenStream.DEFAULT_TOKEN_ATTRIBUTE_FACTORY which uses PackedTokenAttributeImpl while the new default is now AttributeFactory.DEFAULT_ATTRIBUTE_FACTORY which does not use PackedTokenAttributeImpl. [~thetaphi] Asked me to open an issue for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6922) Improve svn to git workaround script
[ https://issues.apache.org/jira/browse/LUCENE-6922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060811#comment-15060811 ] Terry Smith commented on LUCENE-6922: - Does LUCENE-6933 affect this ticket? > Improve svn to git workaround script > > > Key: LUCENE-6922 > URL: https://issues.apache.org/jira/browse/LUCENE-6922 > Project: Lucene - Core > Issue Type: Improvement > Components: -tools >Reporter: Paul Elschot >Priority: Minor > Attachments: svnBranchToGit.py, svnBranchToGit.py > > > As the git-svn mirror for Lucene/Solr will be turned off near the end of > 2015, try and improve the workaround script to become more usable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6922) Improve svn to git workaround script
[ https://issues.apache.org/jira/browse/LUCENE-6922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060830#comment-15060830 ] Terry Smith commented on LUCENE-6922: - Ah, so I could consider this as a backup plan until LUCENE-6933 is ready? > Improve svn to git workaround script > > > Key: LUCENE-6922 > URL: https://issues.apache.org/jira/browse/LUCENE-6922 > Project: Lucene - Core > Issue Type: Improvement > Components: -tools >Reporter: Paul Elschot >Priority: Minor > Attachments: svnBranchToGit.py, svnBranchToGit.py > > > As the git-svn mirror for Lucene/Solr will be turned off near the end of > 2015, try and improve the workaround script to become more usable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6922) Improve svn to git workaround script
[ https://issues.apache.org/jira/browse/LUCENE-6922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045060#comment-15045060 ] Terry Smith commented on LUCENE-6922: - Is the announcement of the EOL to the git-svn mirror available publicly? How will this affect users that rely on the github mirror to access the Lucene/Solr repository? > Improve svn to git workaround script > > > Key: LUCENE-6922 > URL: https://issues.apache.org/jira/browse/LUCENE-6922 > Project: Lucene - Core > Issue Type: Improvement > Components: -tools >Reporter: Paul Elschot >Priority: Minor > Attachments: svnBranchToGit.py > > > As the git-svn mirror for Lucene/Solr will be turned off near the end of > 2015, try and improve the workaround script to become more usable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6922) Improve svn to git workaround script
[ https://issues.apache.org/jira/browse/LUCENE-6922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045069#comment-15045069 ] Terry Smith commented on LUCENE-6922: - Oops, I guess I'm behind with the mailing list, just found the discussion and will include the link here for anyone else that misses it: http://mail-archives.apache.org/mod_mbox/lucene-dev/201512.mbox/%3ccal8pwkbfvt83zbczm0y-x-mdeth6hyc_xyejrev9fzzk5yx...@mail.gmail.com%3e > Improve svn to git workaround script > > > Key: LUCENE-6922 > URL: https://issues.apache.org/jira/browse/LUCENE-6922 > Project: Lucene - Core > Issue Type: Improvement > Components: -tools >Reporter: Paul Elschot >Priority: Minor > Attachments: svnBranchToGit.py > > > As the git-svn mirror for Lucene/Solr will be turned off near the end of > 2015, try and improve the workaround script to become more usable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6889) BooleanQuery.rewrite could easily optimize some simple cases
[ https://issues.apache.org/jira/browse/LUCENE-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14998891#comment-14998891 ] Terry Smith commented on LUCENE-6889: - Ah, that makes sense. I didn't realize the first scenario was dropping the MUST when the FILTER and MUST wrapped identical clauses or that the second scenario also included boost handling to avoid the scoring issue. Given that, this sounds like a great optimization. I'll summarize the rules below, mind shouting out if I still misunderstand? Rule 1 {noformat} #a +a -> +a {noformat} Rule 2 {noformat} +*:*^b #f -> ConstantScoreQuery(f)^b {noformat} Rule 3 {noformat} -a +a -> MatchNoDocsQuery -a #a -> MatchNoDocsQuery {noformat} > BooleanQuery.rewrite could easily optimize some simple cases > > > Key: LUCENE-6889 > URL: https://issues.apache.org/jira/browse/LUCENE-6889 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > Follow-up of SOLR-8251: APIs and user interfaces sometimes encourage to write > BooleanQuery instances that are not optimal, for instance a typical case that > happens often with Solr/Elasticsearch is to send a request that has a > MatchAllDocsQuery as a query and some filter, which could be executed more > efficiently by directly wrapping the filter into a ConstantScoreQuery. > Here are some ideas of rewrite operations that BooleanQuery could perform: > - remove FILTER clauses when they are also a MUST clause > - rewrite queries of the form "+*:* #filter" to a ConstantScoreQuery(filter) > - rewrite to a MatchNoDocsQuery when a clause that is a MUST or FILTER > clause is also a MUST_NOT clause -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6889) BooleanQuery.rewrite could easily optimize some simple cases
[ https://issues.apache.org/jira/browse/LUCENE-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14998808#comment-14998808 ] Terry Smith commented on LUCENE-6889: - I like the last one but believe that the other two aren't correct. bq. remove FILTER clauses when they are also a MUST clause Seeing as a FILTER is a non scoring MUST this just doesn't sound right. The FILTER could constrain the result set more than just the MUST alone. e.g. +foo #(+foo +bar) bq. rewrite queries of the form +*:* #filter" to a ConstantScoreQuery(filter) I don't think you can drop a +:*:* without affecting the score, but you could drop a #*:* if the BooleanQuery has something else to force inclusion (other MUST, FILTER or some SHOULD with an appropriate minNumShouldMatch). For this case could Solr/ElasticSearch add the MatchAllDocs as a FILTER instead of a MUST to allow for this optimization? We could detect duplicate FILTER and MUST_NOT clauses as described in LUCENE-6787. Jira is turning star colon star (*:*) to a bold colon, so apologies if this doesn't read well through the web interface. > BooleanQuery.rewrite could easily optimize some simple cases > > > Key: LUCENE-6889 > URL: https://issues.apache.org/jira/browse/LUCENE-6889 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > Follow-up of SOLR-8251: APIs and user interfaces sometimes encourage to write > BooleanQuery instances that are not optimal, for instance a typical case that > happens often with Solr/Elasticsearch is to send a request that has a > MatchAllDocsQuery as a query and some filter, which could be executed more > efficiently by directly wrapping the filter into a ConstantScoreQuery. > Here are some ideas of rewrite operations that BooleanQuery could perform: > - remove FILTER clauses when they are also a MUST clause > - rewrite queries of the form "+*:* #filter" to a ConstantScoreQuery(filter) > - rewrite to a MatchNoDocsQuery when a clause that is a MUST or FILTER > clause is also a MUST_NOT clause -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6679) Filter's Weight.explain returns an explanation with isMatch==true even on documents that don't match
[ https://issues.apache.org/jira/browse/LUCENE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996583#comment-14996583 ] Terry Smith commented on LUCENE-6679: - Thanks Adrien, I'll give it a shot this morning and reach out as needed. As a reminder, my patch only checks hits and the bug reports are for misses. I'll need to expand upon it to get better coverage for those also. If I can't turn something around early this week I'll backup a little and at least get direct test for the underlying bug behind SOLR-8245 to go with it's fix. > Filter's Weight.explain returns an explanation with isMatch==true even on > documents that don't match > > > Key: LUCENE-6679 > URL: https://issues.apache.org/jira/browse/LUCENE-6679 > Project: Lucene - Core > Issue Type: Bug >Reporter: Adrien Grand > Attachments: LUCENE-6679.patch > > > This was reported by Trejkaz on the java-user list: > http://search-lucene.com/m/l6pAi19h4Y3DclgB1=Re+What+on+earth+is+FilteredQuery+explain+doing+ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6679) Filter's Weight.explain returns an explanation with isMatch==true even on documents that don't match
[ https://issues.apache.org/jira/browse/LUCENE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Terry Smith updated LUCENE-6679: Attachment: LUCENE-6679.patch Here is an updated patch against trunk that adds hit and miss explain checks to AssertingLeafCollector and hooks it up with the surrounding classes. I've also introduced a new annotation called SuppressExplainChecks that I've applied to the following tests that would fail without. * TestSortRandom * TestLazyProxSkipping * TestDrillSideways * TestRangeFacetCounts * TestJoinUtil * TestFieldCacheSortRandom * TestCustomScoreQuery * TestCustomScoreQueryExplanations * TestFunctionQueryExplanations * TestForTooMuchCloning * TestTermAutomationQuery Once you are happy with this patch, I'd like to get it on trunk so the jenkins servers can shake out any more failures and we can create tickets for any uncovered bugs. > Filter's Weight.explain returns an explanation with isMatch==true even on > documents that don't match > > > Key: LUCENE-6679 > URL: https://issues.apache.org/jira/browse/LUCENE-6679 > Project: Lucene - Core > Issue Type: Bug >Reporter: Adrien Grand > Attachments: LUCENE-6679.patch, LUCENE-6679.patch > > > This was reported by Trejkaz on the java-user list: > http://search-lucene.com/m/l6pAi19h4Y3DclgB1=Re+What+on+earth+is+FilteredQuery+explain+doing+ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6679) Filter's Weight.explain returns an explanation with isMatch==true even on documents that don't match
[ https://issues.apache.org/jira/browse/LUCENE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994430#comment-14994430 ] Terry Smith commented on LUCENE-6679: - Darn, sorry I didn't get back to pushing these new tests into the source, they would have helped to catch this. It looks like branch_5x and trunk diverged here, this commit shows where this code was removed on branch_5x: https://github.com/apache/lucene-solr/commit/11cc6e53f85f7bc4b616bb38370ddfc704987337#diff-2b293bfd95e32f715a5c05b4e132f047L82 And this commit from trunk shows the move of Filter from Lucene to Solr: https://github.com/apache/lucene-solr/commit/9f8d64d4fb34eb2480e9a667c45f262d20f0#diff-4bf1c256f49c1a4ee4a50b1f8aeda1ddL1 You can see the trunk version of the file here: https://github.com/apache/lucene-solr/blob/9f8d64d4fb34eb2480e9a667c45f262d20f0/solr/core/src/java/org/apache/solr/search/Filter.java The changes from the branch_5x commit are missing. I don't know why this is the case, perhaps [~jpountz] can chime in as the original commit that fixes the explain output as a side effect (LUCENE-6601) looks like it ought to be on both branch_5x and trunk. > Filter's Weight.explain returns an explanation with isMatch==true even on > documents that don't match > > > Key: LUCENE-6679 > URL: https://issues.apache.org/jira/browse/LUCENE-6679 > Project: Lucene - Core > Issue Type: Bug >Reporter: Adrien Grand > Attachments: LUCENE-6679.patch > > > This was reported by Trejkaz on the java-user list: > http://search-lucene.com/m/l6pAi19h4Y3DclgB1=Re+What+on+earth+is+FilteredQuery+explain+doing+ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6834) Remove BoostQuery.toString()'s hack with parenthesis
[ https://issues.apache.org/jira/browse/LUCENE-6834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948670#comment-14948670 ] Terry Smith commented on LUCENE-6834: - Great idea. It currently puts user-defined queries at a disadvantage as they cannot opt out of the parens, if all queries are wrapped in parens when boosted then the toString output is easier to read. > Remove BoostQuery.toString()'s hack with parenthesis > > > Key: LUCENE-6834 > URL: https://issues.apache.org/jira/browse/LUCENE-6834 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Fix For: 6.0 > > > This hack was added in order not to break the string representation of our > queries in 5.x. However I don't think we should have it in trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6590) Explore different ways to apply boosts
[ https://issues.apache.org/jira/browse/LUCENE-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947239#comment-14947239 ] Terry Smith commented on LUCENE-6590: - Thanks Adrien. My nightly regressions just picked this up from the published maven snapshots and I see that BoostQuery now includes MatchAllDocsQuery in it's NO_PARENS_REQUIRED_QUERIES list on branch_5x (this is awesome!). However this change is *not* available on trunk. I've confirmed by checked the SVN repo directly (the github mirror tends to lag). http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/BoostQuery.java http://svn.apache.org/repos/asf/lucene/dev/branches/branch_5x/lucene/core/src/java/org/apache/lucene/search/BoostQuery.java --Terry > Explore different ways to apply boosts > -- > > Key: LUCENE-6590 > URL: https://issues.apache.org/jira/browse/LUCENE-6590 > Project: Lucene - Core > Issue Type: Wish >Reporter: Adrien Grand >Priority: Minor > Fix For: 5.4 > > Attachments: LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch, > LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch > > > Follow-up from LUCENE-6570: the fact that all queries are mutable in order to > allow for applying a boost raises issues since it makes queries bad cache > keys since their hashcode can change anytime. We could just document that > queries should never be modified after they have gone through IndexSearcher > but it would be even better if the API made queries impossible to mutate at > all. > I think there are two main options: > - either replace "void setBoost(boost)" with something like "Query > withBoost(boost)" which would return a clone that has a different boost > - or move boost handling outside of Query, for instance we could have a > (immutable) query impl that would be dedicated to applying boosts, that > queries that need to change boosts at rewrite time (such as BooleanQuery) > would use as a wrapper. > The latter idea is from Robert and I like it a lot given how often I either > introduced or found a bug which was due to the boost parameter being ignored. > Maybe there are other options, but I think this is worth exploring. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6590) Explore different ways to apply boosts
[ https://issues.apache.org/jira/browse/LUCENE-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933375#comment-14933375 ] Terry Smith commented on LUCENE-6590: - Cheers Adrien. Sorry for the spammy replies before -- I wasn't expecting to see more than one discrepancy! While you are looking at the Query.toString() behavior with respect to boosting, how would you feel about adding MatchAllDocsQuery.class to BoostQuery.NO_PARENS_REQUIRED_QUERIES so it's toString() doesn't change across releases? Query q = new MatchAllDocsQuery(); q.setBoost(0); q.toString() -> *:*^0.0 new BoostQuery(new MatchAllDocsQuery(), 0).toString() -> (*:*)^0.0 > Explore different ways to apply boosts > -- > > Key: LUCENE-6590 > URL: https://issues.apache.org/jira/browse/LUCENE-6590 > Project: Lucene - Core > Issue Type: Wish >Reporter: Adrien Grand >Priority: Minor > Fix For: 5.4 > > Attachments: LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch, > LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch > > > Follow-up from LUCENE-6570: the fact that all queries are mutable in order to > allow for applying a boost raises issues since it makes queries bad cache > keys since their hashcode can change anytime. We could just document that > queries should never be modified after they have gone through IndexSearcher > but it would be even better if the API made queries impossible to mutate at > all. > I think there are two main options: > - either replace "void setBoost(boost)" with something like "Query > withBoost(boost)" which would return a clone that has a different boost > - or move boost handling outside of Query, for instance we could have a > (immutable) query impl that would be dedicated to applying boosts, that > queries that need to change boosts at rewrite time (such as BooleanQuery) > would use as a wrapper. > The latter idea is from Robert and I like it a lot given how often I either > introduced or found a bug which was due to the boost parameter being ignored. > Maybe there are other options, but I think this is worth exploring. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-6590) Explore different ways to apply boosts
[ https://issues.apache.org/jira/browse/LUCENE-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933375#comment-14933375 ] Terry Smith edited comment on LUCENE-6590 at 9/28/15 2:38 PM: -- Cheers Adrien. Sorry for the spammy replies before -- I wasn't expecting to see more than one discrepancy! While you are looking at the Query.toString() behavior with respect to boosting, how would you feel about adding MatchAllDocsQuery.class to BoostQuery.NO_PARENS_REQUIRED_QUERIES so it's toString() doesn't change across releases? {noformat} Query q = new MatchAllDocsQuery(); q.setBoost(0); q.toString() -> *:*^0.0 new BoostQuery(new MatchAllDocsQuery(), 0).toString() -> (*:*)^0.0 {noformat} was (Author: shebiki): Cheers Adrien. Sorry for the spammy replies before -- I wasn't expecting to see more than one discrepancy! While you are looking at the Query.toString() behavior with respect to boosting, how would you feel about adding MatchAllDocsQuery.class to BoostQuery.NO_PARENS_REQUIRED_QUERIES so it's toString() doesn't change across releases? Query q = new MatchAllDocsQuery(); q.setBoost(0); q.toString() -> *:*^0.0 new BoostQuery(new MatchAllDocsQuery(), 0).toString() -> (*:*)^0.0 > Explore different ways to apply boosts > -- > > Key: LUCENE-6590 > URL: https://issues.apache.org/jira/browse/LUCENE-6590 > Project: Lucene - Core > Issue Type: Wish >Reporter: Adrien Grand >Priority: Minor > Fix For: 5.4 > > Attachments: LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch, > LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch > > > Follow-up from LUCENE-6570: the fact that all queries are mutable in order to > allow for applying a boost raises issues since it makes queries bad cache > keys since their hashcode can change anytime. We could just document that > queries should never be modified after they have gone through IndexSearcher > but it would be even better if the API made queries impossible to mutate at > all. > I think there are two main options: > - either replace "void setBoost(boost)" with something like "Query > withBoost(boost)" which would return a clone that has a different boost > - or move boost handling outside of Query, for instance we could have a > (immutable) query impl that would be dedicated to applying boosts, that > queries that need to change boosts at rewrite time (such as BooleanQuery) > would use as a wrapper. > The latter idea is from Robert and I like it a lot given how often I either > introduced or found a bug which was due to the boost parameter being ignored. > Maybe there are other options, but I think this is worth exploring. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6699) Integrate lat/lon BKD and spatial3d
[ https://issues.apache.org/jira/browse/LUCENE-6699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906465#comment-14906465 ] Terry Smith commented on LUCENE-6699: - Thanks guys. I was hoping to squeeze those x,y,z values into a 64 bits instead of 96. I'm not a bit twiddler but I'll take a look at Nicholas' patch and see if I can adapt it. > Integrate lat/lon BKD and spatial3d > --- > > Key: LUCENE-6699 > URL: https://issues.apache.org/jira/browse/LUCENE-6699 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: Trunk, 5.4 > > Attachments: Geo3DPacking.java, LUCENE-6699.patch, LUCENE-6699.patch, > LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, > LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, > LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, > LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, > LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, > LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch > > > I'm opening this for discussion, because I'm not yet sure how to do > this integration, because of my ignorance about spatial in general and > spatial3d in particular :) > Our BKD tree impl is very fast at doing lat/lon shape intersection > (bbox, polygon, soon distance: LUCENE-6698) against previously indexed > points. > I think to integrate with spatial3d, we would first need to record > lat/lon/z into doc values. Somewhere I saw discussion about how we > could stuff all 3 into a single long value with acceptable precision > loss? Or, we could use BinaryDocValues? We need all 3 dims available > to do the fast per-hit query time filtering. > But, second: what do we index into the BKD tree? Can we "just" index > earth surface lat/lon, and then at query time is spatial3d able to > give me an enclosing "surface lat/lon" bbox for a 3d shape? Or > ... must we index all 3 dimensions into the BKD tree (seems like this > could be somewhat wasteful)? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6815) Should DisjunctionScorer advance more lazily?
[ https://issues.apache.org/jira/browse/LUCENE-6815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906401#comment-14906401 ] Terry Smith commented on LUCENE-6815: - Additionally, I don't see DisiPriorityQueue taking the cost of each scorer into account. I'd imagine that the scorer with highest cost is more likely to be a hit which would make this kind of lazy advancing even better. > Should DisjunctionScorer advance more lazily? > - > > Key: LUCENE-6815 > URL: https://issues.apache.org/jira/browse/LUCENE-6815 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > > Today if you call DisjunctionScorer.advance(X), it will try to advance all > sub scorers to X. However, if DisjunctionScorer is being intersected with > another scorer (which is almost always the case as we use BooleanScorer for > top-level disjunctions), we could stop as soon as we find one matching sub > scorer, and only advance the remaining sub scorers when freq() or score() is > called. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6699) Integrate lat/lon BKD and spatial3d
[ https://issues.apache.org/jira/browse/LUCENE-6699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14904650#comment-14904650 ] Terry Smith commented on LUCENE-6699: - Karl, were you able to find that packing scheme? I'm interested in poking the x,y,z values into a SortedNumericDocValuesField to see how well it would perform. > Integrate lat/lon BKD and spatial3d > --- > > Key: LUCENE-6699 > URL: https://issues.apache.org/jira/browse/LUCENE-6699 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: Trunk, 5.4 > > Attachments: Geo3DPacking.java, LUCENE-6699.patch, LUCENE-6699.patch, > LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, > LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, > LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, > LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, > LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, > LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch, LUCENE-6699.patch > > > I'm opening this for discussion, because I'm not yet sure how to do > this integration, because of my ignorance about spatial in general and > spatial3d in particular :) > Our BKD tree impl is very fast at doing lat/lon shape intersection > (bbox, polygon, soon distance: LUCENE-6698) against previously indexed > points. > I think to integrate with spatial3d, we would first need to record > lat/lon/z into doc values. Somewhere I saw discussion about how we > could stuff all 3 into a single long value with acceptable precision > loss? Or, we could use BinaryDocValues? We need all 3 dims available > to do the fast per-hit query time filtering. > But, second: what do we index into the BKD tree? Can we "just" index > earth surface lat/lon, and then at query time is spatial3d able to > give me an enclosing "surface lat/lon" bbox for a 3d shape? Or > ... must we index all 3 dimensions into the BKD tree (seems like this > could be somewhat wasteful)? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6590) Explore different ways to apply boosts
[ https://issues.apache.org/jira/browse/LUCENE-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745623#comment-14745623 ] Terry Smith commented on LUCENE-6590: - [~jpountz]: PhraseQuery is missing a call to ToStringUtils.boost in it's toString method on the 5.x branch. > Explore different ways to apply boosts > -- > > Key: LUCENE-6590 > URL: https://issues.apache.org/jira/browse/LUCENE-6590 > Project: Lucene - Core > Issue Type: Wish >Reporter: Adrien Grand >Priority: Minor > Fix For: 5.4 > > Attachments: LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch, > LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch > > > Follow-up from LUCENE-6570: the fact that all queries are mutable in order to > allow for applying a boost raises issues since it makes queries bad cache > keys since their hashcode can change anytime. We could just document that > queries should never be modified after they have gone through IndexSearcher > but it would be even better if the API made queries impossible to mutate at > all. > I think there are two main options: > - either replace "void setBoost(boost)" with something like "Query > withBoost(boost)" which would return a clone that has a different boost > - or move boost handling outside of Query, for instance we could have a > (immutable) query impl that would be dedicated to applying boosts, that > queries that need to change boosts at rewrite time (such as BooleanQuery) > would use as a wrapper. > The latter idea is from Robert and I like it a lot given how often I either > introduced or found a bug which was due to the boost parameter being ignored. > Maybe there are other options, but I think this is worth exploring. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6590) Explore different ways to apply boosts
[ https://issues.apache.org/jira/browse/LUCENE-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745675#comment-14745675 ] Terry Smith commented on LUCENE-6590: - Also FunctionQuery. > Explore different ways to apply boosts > -- > > Key: LUCENE-6590 > URL: https://issues.apache.org/jira/browse/LUCENE-6590 > Project: Lucene - Core > Issue Type: Wish >Reporter: Adrien Grand >Priority: Minor > Fix For: 5.4 > > Attachments: LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch, > LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch > > > Follow-up from LUCENE-6570: the fact that all queries are mutable in order to > allow for applying a boost raises issues since it makes queries bad cache > keys since their hashcode can change anytime. We could just document that > queries should never be modified after they have gone through IndexSearcher > but it would be even better if the API made queries impossible to mutate at > all. > I think there are two main options: > - either replace "void setBoost(boost)" with something like "Query > withBoost(boost)" which would return a clone that has a different boost > - or move boost handling outside of Query, for instance we could have a > (immutable) query impl that would be dedicated to applying boosts, that > queries that need to change boosts at rewrite time (such as BooleanQuery) > would use as a wrapper. > The latter idea is from Robert and I like it a lot given how often I either > introduced or found a bug which was due to the boost parameter being ignored. > Maybe there are other options, but I think this is worth exploring. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6590) Explore different ways to apply boosts
[ https://issues.apache.org/jira/browse/LUCENE-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745653#comment-14745653 ] Terry Smith commented on LUCENE-6590: - Hmm, so is NumericRangeQuery. > Explore different ways to apply boosts > -- > > Key: LUCENE-6590 > URL: https://issues.apache.org/jira/browse/LUCENE-6590 > Project: Lucene - Core > Issue Type: Wish >Reporter: Adrien Grand >Priority: Minor > Fix For: 5.4 > > Attachments: LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch, > LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch > > > Follow-up from LUCENE-6570: the fact that all queries are mutable in order to > allow for applying a boost raises issues since it makes queries bad cache > keys since their hashcode can change anytime. We could just document that > queries should never be modified after they have gone through IndexSearcher > but it would be even better if the API made queries impossible to mutate at > all. > I think there are two main options: > - either replace "void setBoost(boost)" with something like "Query > withBoost(boost)" which would return a clone that has a different boost > - or move boost handling outside of Query, for instance we could have a > (immutable) query impl that would be dedicated to applying boosts, that > queries that need to change boosts at rewrite time (such as BooleanQuery) > would use as a wrapper. > The latter idea is from Robert and I like it a lot given how often I either > introduced or found a bug which was due to the boost parameter being ignored. > Maybe there are other options, but I think this is worth exploring. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-6806) FunctionQuery.AllScorer.explain overwrites FunctionWeight.queryNorm in trappy fashion
Terry Smith created LUCENE-6806: --- Summary: FunctionQuery.AllScorer.explain overwrites FunctionWeight.queryNorm in trappy fashion Key: LUCENE-6806 URL: https://issues.apache.org/jira/browse/LUCENE-6806 Project: Lucene - Core Issue Type: Bug Affects Versions: Trunk Reporter: Terry Smith Priority: Minor FunctionQuery.AllScorer.explain is: {code:java} public Explanation explain(int doc, float queryNorm) throws IOException { float sc = qWeight * vals.floatVal(doc); return Explanation.match(sc, "FunctionQuery(" + func + "), product of:", vals.explain(doc), Explanation.match(queryNorm, "boost"), Explanation.match(weight.queryNorm = 1f, "queryNorm")); } {code} The following line has a subtle assignment that overwrites weight.queryNorm. {code:java} Explanation.match(weight.queryNorm = 1f, "queryNorm")); {code} Because weights aren't reused between search and explain this doesn't break anything but it's awfully subtle. Seeing as queryNorm is ALWAYS 1 here, could we just drop this extra line from the explain output and use the following instead? {code:java} public Explanation explain(int doc, float queryNorm) throws IOException { float sc = qWeight * vals.floatVal(doc); return Explanation.match(sc, "FunctionQuery(" + func + "), product of:", vals.explain(doc), Explanation.match(queryNorm, "boost")); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6785) Consider merging Query.rewrite() into Query.createWeight()
[ https://issues.apache.org/jira/browse/LUCENE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736932#comment-14736932 ] Terry Smith commented on LUCENE-6785: - The original patch drops a few key settings from the BooleanQuery in BQ.createWeight, the following patch puts them back and makes the tests happier. {noformat} diff --git a/lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java b/lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java index fb5f7c8..8dec338 100644 --- a/lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java +++ b/lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java @@ -210,7 +210,9 @@ public class BooleanQuery extends Query implements Iterable { } List subweights = new ArrayList<>(); -Builder builder = new Builder(); +Builder builder = new Builder() + .setDisableCoord(disableCoord) + .setMinimumNumberShouldMatch(minimumNumberShouldMatch); for (BooleanClause clause : query) { Weight w = searcher.createWeight(clause.getQuery(), needsScores); subweights.add(w); {noformat} > Consider merging Query.rewrite() into Query.createWeight() > -- > > Key: LUCENE-6785 > URL: https://issues.apache.org/jira/browse/LUCENE-6785 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward > Attachments: LUCENE-6785.patch > > > Prompted by the discussion on LUCENE-6590. > Query.rewrite() is a bit of an oddity. You call it to create a query for a > specific IndexSearcher, and to ensure that you get a query implementation > that has a working createWeight() method. However, Weight itself already > encapsulates the notion of a per-searcher query. > You also need to repeatedly call rewrite() until the query has stopped > rewriting itself, which is a bit trappy - there are a few places (in > highlighting code for example) that just call rewrite() once, rather than > looping round as IndexSearcher.rewrite() does. Most queries don't need to be > called multiple times, however, so this seems a bit redundant. And the ones > that do currently return un-rewritten queries can be changed simply enough to > rewrite them. > Finally, in pretty much every case I can find in the codebase, rewrite() is > called purely as a prelude to createWeight(). This means, in the case of for > example large BooleanQueries, we end up cloning the whole query structure, > only to throw it away immediately. > I'd like to try removing rewrite() entirely, and merging the logic into > createWeight(), simplifying the API and removing the trap where code only > calls rewrite once. What do people think? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6787) BooleanQuery should be able to drop duplicate non-scoring clauses
[ https://issues.apache.org/jira/browse/LUCENE-6787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Terry Smith updated LUCENE-6787: Attachment: LUCENE-6787.patch Absolutely, updated patch attached. > BooleanQuery should be able to drop duplicate non-scoring clauses > - > > Key: LUCENE-6787 > URL: https://issues.apache.org/jira/browse/LUCENE-6787 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: Trunk > Reporter: Terry Smith >Priority: Minor > Attachments: LUCENE-6787.patch, LUCENE-6787.patch > > > Pulling out of the discussion on LUCENE-6305. > BooleanQuery could drop duplicate non-scoring (MUST_NOT, FILTER) clauses. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6787) BooleanQuery should be able to drop duplicate non-scoring clauses
[ https://issues.apache.org/jira/browse/LUCENE-6787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Terry Smith updated LUCENE-6787: Attachment: LUCENE-6787-on-6785.patch Here is an alternate patch applied after LUCENE-6785. > BooleanQuery should be able to drop duplicate non-scoring clauses > - > > Key: LUCENE-6787 > URL: https://issues.apache.org/jira/browse/LUCENE-6787 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: Trunk > Reporter: Terry Smith >Priority: Minor > Attachments: LUCENE-6787-on-6785.patch, LUCENE-6787.patch, > LUCENE-6787.patch > > > Pulling out of the discussion on LUCENE-6305. > BooleanQuery could drop duplicate non-scoring (MUST_NOT, FILTER) clauses. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-6787) BooleanQuery should be able to drop duplicate non-scoring clauses
Terry Smith created LUCENE-6787: --- Summary: BooleanQuery should be able to drop duplicate non-scoring clauses Key: LUCENE-6787 URL: https://issues.apache.org/jira/browse/LUCENE-6787 Project: Lucene - Core Issue Type: Improvement Affects Versions: Trunk Reporter: Terry Smith Priority: Minor Pulling out of the discussion on LUCENE-6305. BooleanQuery could drop duplicate non-scoring (MUST_NOT, FILTER) clauses. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6679) Filter's Weight.explain returns an explanation with isMatch==true even on documents that don't match
[ https://issues.apache.org/jira/browse/LUCENE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Terry Smith updated LUCENE-6679: Attachment: LUCENE-6679.patch Here is a patch (against trunk) that adds test coverage for explanations on hits only. I'm looking for feedback to the approach used before expanding to cover explanations for misses. Currently I get a couple of failures when running just the Lucene tests: {noformat} Tests with failures: - org.apache.lucene.search.TestSortRandom.testRandomStringValSort - org.apache.lucene.search.TestSortRandom.testRandomStringSort JVM J0: 1.42 .. 284.75 = 283.33s JVM J1: 1.64 .. 284.77 = 283.13s JVM J2: 1.42 .. 284.70 = 283.28s JVM J3: 1.42 .. 284.68 = 283.26s Execution time total: 4 minutes 44 seconds Tests summary: 404 suites, 3235 tests, 2 failures, 104 ignored (100 assumptions) {noformat} Happy to dig into these more once an approach has been found that people like. > Filter's Weight.explain returns an explanation with isMatch==true even on > documents that don't match > > > Key: LUCENE-6679 > URL: https://issues.apache.org/jira/browse/LUCENE-6679 > Project: Lucene - Core > Issue Type: Bug >Reporter: Adrien Grand > Attachments: LUCENE-6679.patch > > > This was reported by Trejkaz on the java-user list: > http://search-lucene.com/m/l6pAi19h4Y3DclgB1=Re+What+on+earth+is+FilteredQuery+explain+doing+ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6787) BooleanQuery should be able to drop duplicate non-scoring clauses
[ https://issues.apache.org/jira/browse/LUCENE-6787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Terry Smith updated LUCENE-6787: Attachment: LUCENE-6787.patch Here is a patch based on [~jpountz]'s suggestion of putting this optimization in BooleanQuery.rewrite(). > BooleanQuery should be able to drop duplicate non-scoring clauses > - > > Key: LUCENE-6787 > URL: https://issues.apache.org/jira/browse/LUCENE-6787 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: Trunk > Reporter: Terry Smith >Priority: Minor > Attachments: LUCENE-6787.patch > > > Pulling out of the discussion on LUCENE-6305. > BooleanQuery could drop duplicate non-scoring (MUST_NOT, FILTER) clauses. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6758) Adding a SHOULD clause to a BQ over an empty field clears the score when using DefaultSimilarity
[ https://issues.apache.org/jira/browse/LUCENE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734796#comment-14734796 ] Terry Smith commented on LUCENE-6758: - Ah, you've changed DefaultSimilarity.idf() to use (docCount + 1) instead of just docCount forcing it to be larger than 0. That looks like a great fix, thanks. > Adding a SHOULD clause to a BQ over an empty field clears the score when > using DefaultSimilarity > > > Key: LUCENE-6758 > URL: https://issues.apache.org/jira/browse/LUCENE-6758 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: Trunk > Reporter: Terry Smith > Attachments: LUCENE-6758.patch, LUCENE-6758.patch > > > Patch with unit test to show the bug will be attached. > I've narrowed this change in behavior with git bisect to the following commit: > {noformat} > commit 698b4b56f0f2463b21c9e3bc67b8b47d635b7d1f > Author: Robert Muir <rm...@apache.org> > Date: Thu Aug 13 17:37:15 2015 + > LUCENE-6711: Use CollectionStatistics.docCount() for IDF and average > field length computations > > git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1695744 > 13f79535-47bb-0310-9956-ffa450edef68 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6758) Adding a SHOULD clause to a BQ over an empty field clears the score when using DefaultSimilarity
[ https://issues.apache.org/jira/browse/LUCENE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Terry Smith updated LUCENE-6758: Attachment: LUCENE-6758.patch Run this unit test a few times and you'll hit a failure when DefaultSimilarity is picked. The method testBQHitOrEmpty() will fail because the score is zero. It's friend testBQHitOrMiss() has a non-zero score. The difference between the two is that the field empty is unused, whereas the field test has one token (hit). Adding a SHOULD clause to a BQ over an empty field clears the score when using DefaultSimilarity Key: LUCENE-6758 URL: https://issues.apache.org/jira/browse/LUCENE-6758 Project: Lucene - Core Issue Type: Bug Affects Versions: Trunk Reporter: Terry Smith Attachments: LUCENE-6758.patch Patch with unit test to show the bug will be attached. I've narrowed this change in behavior with git bisect to the following commit: {noformat} commit 698b4b56f0f2463b21c9e3bc67b8b47d635b7d1f Author: Robert Muir rm...@apache.org Date: Thu Aug 13 17:37:15 2015 + LUCENE-6711: Use CollectionStatistics.docCount() for IDF and average field length computations git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1695744 13f79535-47bb-0310-9956-ffa450edef68 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6758) Adding a SHOULD clause to a BQ over an empty field clears the score when using DefaultSimilarity
[ https://issues.apache.org/jira/browse/LUCENE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706963#comment-14706963 ] Terry Smith commented on LUCENE-6758: - Explain output for the failing query (testBQHitOrEmpty): {noformat} 0.0 = product of: 0.0 = sum of: 0.0 = weight(test:hit in 0) [DefaultSimilarity], result of: 0.0 = score(doc=0,freq=1.0), product of: 0.0 = queryWeight, product of: 0.30685282 = idf(docFreq=1, docCount=1) 0.0 = queryNorm 0.30685282 = fieldWeight in 0, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 0.30685282 = idf(docFreq=1, docCount=1) 1.0 = fieldNorm(doc=0) 0.5 = coord(1/2) {noformat} Explain output for the variant against a populated field (testBQHitOrMiss): {noformat} 0.04500804 = product of: 0.09001608 = sum of: 0.09001608 = weight(test:hit in 0) [DefaultSimilarity], result of: 0.09001608 = score(doc=0,freq=1.0), product of: 0.29335263 = queryWeight, product of: 0.30685282 = idf(docFreq=1, docCount=1) 0.9560043 = queryNorm 0.30685282 = fieldWeight in 0, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 0.30685282 = idf(docFreq=1, docCount=1) 1.0 = fieldNorm(doc=0) 0.5 = coord(1/2) {noformat} Adding a SHOULD clause to a BQ over an empty field clears the score when using DefaultSimilarity Key: LUCENE-6758 URL: https://issues.apache.org/jira/browse/LUCENE-6758 Project: Lucene - Core Issue Type: Bug Affects Versions: Trunk Reporter: Terry Smith Attachments: LUCENE-6758.patch Patch with unit test to show the bug will be attached. I've narrowed this change in behavior with git bisect to the following commit: {noformat} commit 698b4b56f0f2463b21c9e3bc67b8b47d635b7d1f Author: Robert Muir rm...@apache.org Date: Thu Aug 13 17:37:15 2015 + LUCENE-6711: Use CollectionStatistics.docCount() for IDF and average field length computations git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1695744 13f79535-47bb-0310-9956-ffa450edef68 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-6758) Adding a SHOULD clause to a BQ over an empty field clears the score when using DefaultSimilarity
Terry Smith created LUCENE-6758: --- Summary: Adding a SHOULD clause to a BQ over an empty field clears the score when using DefaultSimilarity Key: LUCENE-6758 URL: https://issues.apache.org/jira/browse/LUCENE-6758 Project: Lucene - Core Issue Type: Bug Affects Versions: Trunk Reporter: Terry Smith Patch with unit test to show the bug will be attached. I've narrowed this change in behavior with git bisect to the following commit: {noformat} commit 698b4b56f0f2463b21c9e3bc67b8b47d635b7d1f Author: Robert Muir rm...@apache.org Date: Thu Aug 13 17:37:15 2015 + LUCENE-6711: Use CollectionStatistics.docCount() for IDF and average field length computations git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1695744 13f79535-47bb-0310-9956-ffa450edef68 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6748) The query cache should not cache trivial queries
[ https://issues.apache.org/jira/browse/LUCENE-6748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705045#comment-14705045 ] Terry Smith commented on LUCENE-6748: - I'd add a case to the patch to include empty DisjunctionMaxQuery instances also. The query cache should not cache trivial queries Key: LUCENE-6748 URL: https://issues.apache.org/jira/browse/LUCENE-6748 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-6748.patch The query cache already avoids caching term queries because they are cheap, but it doesn't do it with even cheaper queries like MatchAllDocsQuery. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6531) Make PhraseQuery immutable
[ https://issues.apache.org/jira/browse/LUCENE-6531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647690#comment-14647690 ] Terry Smith commented on LUCENE-6531: - [~jpountz] The PhraseQuery.Builder setter methods are all void, where as the ones for BooleanQuery and BlendedTermQuery return the Builder itself. Can the set/add methods on PhraseQuery.Builder return this to make the various Query builders consistent with each other? Make PhraseQuery immutable -- Key: LUCENE-6531 URL: https://issues.apache.org/jira/browse/LUCENE-6531 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Fix For: 5.3, 6.0 Attachments: LUCENE-6531.patch, LUCENE-6531.patch Mutable queries are an issue for automatic filter caching since modifying a query after it has been put into the cache will corrupt the cache. We should make all queries immutable (up to the boost) to avoid this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6531) Make PhraseQuery immutable
[ https://issues.apache.org/jira/browse/LUCENE-6531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647701#comment-14647701 ] Terry Smith commented on LUCENE-6531: - Awesome, you rock! Make PhraseQuery immutable -- Key: LUCENE-6531 URL: https://issues.apache.org/jira/browse/LUCENE-6531 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Fix For: 5.3, 6.0 Attachments: LUCENE-6531.patch, LUCENE-6531.patch Mutable queries are an issue for automatic filter caching since modifying a query after it has been put into the cache will corrupt the cache. We should make all queries immutable (up to the boost) to avoid this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6590) Explore different ways to apply boosts
[ https://issues.apache.org/jira/browse/LUCENE-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639166#comment-14639166 ] Terry Smith commented on LUCENE-6590: - I think this looks great and will certainly make the boost handling more robust in my custom queries. Especially looking forward to fully immutable queries. What do you think is possible in terms of updating 5.x to make the transition easier? Explore different ways to apply boosts -- Key: LUCENE-6590 URL: https://issues.apache.org/jira/browse/LUCENE-6590 Project: Lucene - Core Issue Type: Wish Reporter: Adrien Grand Priority: Minor Attachments: LUCENE-6590.patch, LUCENE-6590.patch, LUCENE-6590.patch Follow-up from LUCENE-6570: the fact that all queries are mutable in order to allow for applying a boost raises issues since it makes queries bad cache keys since their hashcode can change anytime. We could just document that queries should never be modified after they have gone through IndexSearcher but it would be even better if the API made queries impossible to mutate at all. I think there are two main options: - either replace void setBoost(boost) with something like Query withBoost(boost) which would return a clone that has a different boost - or move boost handling outside of Query, for instance we could have a (immutable) query impl that would be dedicated to applying boosts, that queries that need to change boosts at rewrite time (such as BooleanQuery) would use as a wrapper. The latter idea is from Robert and I like it a lot given how often I either introduced or found a bug which was due to the boost parameter being ignored. Maybe there are other options, but I think this is worth exploring. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6679) Filter's Weight.explain returns an explanation with isMatch==true even on documents that don't match
[ https://issues.apache.org/jira/browse/LUCENE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629956#comment-14629956 ] Terry Smith commented on LUCENE-6679: - Trejkaz confirmed the patch referenced from the mailing list works. This bug is fixed as a side effect of LUCENE-6601 so will automatically be fixed as part of release 5.3. I'll work on cleaning up the new test contributed by Trejkaz for inclusion and then move onto a more generic hook to catch other explanation mistakes. Filter's Weight.explain returns an explanation with isMatch==true even on documents that don't match Key: LUCENE-6679 URL: https://issues.apache.org/jira/browse/LUCENE-6679 Project: Lucene - Core Issue Type: Bug Reporter: Adrien Grand This was reported by Trejkaz on the java-user list: http://search-lucene.com/m/l6pAi19h4Y3DclgB1subj=Re+What+on+earth+is+FilteredQuery+explain+doing+ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6679) Filter's Weight.explan returns an explanation with isMatch==true even on documents that don't match
[ https://issues.apache.org/jira/browse/LUCENE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628153#comment-14628153 ] Terry Smith commented on LUCENE-6679: - [~jpountz] Absolutely, I'd love to give it a stab. Currently waiting for feedback from TX on the users list. I think you are spot on about adding some additional testing to the test suite to catch explanation mismatches. I'll take a peek at that also and see if I can figure out something worth submitting. Filter's Weight.explan returns an explanation with isMatch==true even on documents that don't match --- Key: LUCENE-6679 URL: https://issues.apache.org/jira/browse/LUCENE-6679 Project: Lucene - Core Issue Type: Bug Reporter: Adrien Grand This was reported by Trejkaz on the java-user list: http://search-lucene.com/m/l6pAi19h4Y3DclgB1subj=Re+What+on+earth+is+FilteredQuery+explain+doing+ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6661) Allow queries to opt out of caching
[ https://issues.apache.org/jira/browse/LUCENE-6661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622271#comment-14622271 ] Terry Smith commented on LUCENE-6661: - I agree that we shouldn't base API's off of already hacking solutions. I'm going to play with your suggestion a little more and see how it pans out for my usecases, will report back. The ring buffer frequency for non-cacheable queries issue is interesting. If in some obscure but easy to understand scenario half of my queries are good cache candidates but the other half are never to be cached (using the Weight.getQuery() equals busting method) then the ring buffer will be a lot less effective at finding new cache candidates purely based on the churn of never-to-be-cached queries. Still, I can see why that might also be a good thing, it all depends on your definition of frequently used. Where would be the best place to expand this discussion to include score based caching? A new Jira, one of the mailing lists? Allow queries to opt out of caching --- Key: LUCENE-6661 URL: https://issues.apache.org/jira/browse/LUCENE-6661 Project: Lucene - Core Issue Type: Improvement Affects Versions: 5.2 Reporter: Terry Smith Priority: Minor Attachments: LUCENE-6661.patch Some queries have out-of-band dependencies that make them incompatible with caching, it'd be great if they could opt out of the new fancy query/filter cache in IndexSearcher. This affects DrillSidewaysQuery and any user-provided custom Query implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6661) Allow queries to opt out of caching
[ https://issues.apache.org/jira/browse/LUCENE-6661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616723#comment-14616723 ] Terry Smith commented on LUCENE-6661: - I'd completely missed the issue with marker interfaces, this really ought to be a method on Weight itself, perhaps Weight.cacheCompatible(). You suggested workaround sounds a little special casey. I'd be concerned that a future release something would change in such a way that the workaround would be lost with no alternative. Specifically, it relies on the cache implementation tracking usage when the cache itself is pluggable (it could be replaced with one that does not) and when LRUQueryCache itself in play I see the following issues: 1) The queries that we know ahead of time should never be cached would still take up room in the ring buffer and thus push aside other less frequent queries that could be great cache candidates. 2) Special care would want to be taken over the Query instances used in the ring buffer and cache so that things like dependent FacetCollectors don't get added and bloat memory usage. You described earlier how to handle this from createWeight(). 3) CachingWrapperWeight forces the cached query to use scorer() instead of bulkScorer(). Both my custom query and DrillSidewaysQuery implement a custom bulkScorer() method and throw an UnsupportedOperationException from scorer(). They break when wrapped in a CachingWraperWeight. The ability to opt of of caching would remove the need for the hacky workaround in DrillSideways. My current solution is a custom QueryCache implementation that just delegates to the LRUQueryCache and does not propagate doCache() for some Weights. However, this has the same problem with wrapped queries as the marker interface scenario. Allow queries to opt out of caching --- Key: LUCENE-6661 URL: https://issues.apache.org/jira/browse/LUCENE-6661 Project: Lucene - Core Issue Type: Improvement Affects Versions: 5.2 Reporter: Terry Smith Priority: Minor Attachments: LUCENE-6661.patch Some queries have out-of-band dependencies that make them incompatible with caching, it'd be great if they could opt out of the new fancy query/filter cache in IndexSearcher. This affects DrillSidewaysQuery and any user-provided custom Query implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-6661) Allow queries to opt out of caching
Terry Smith created LUCENE-6661: --- Summary: Allow queries to opt out of caching Key: LUCENE-6661 URL: https://issues.apache.org/jira/browse/LUCENE-6661 Project: Lucene - Core Issue Type: Improvement Affects Versions: 5.2 Reporter: Terry Smith Priority: Minor Some queries have out-of-band dependencies that make them incompatible with caching, it'd be great if they could opt out of the new fancy query/filter cache in IndexSearcher. This affects DrillSidewaysQuery and any user-provided custom Query implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6661) Allow queries to opt out of caching
[ https://issues.apache.org/jira/browse/LUCENE-6661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Terry Smith updated LUCENE-6661: Attachment: LUCENE-6661.patch Rather than add a new method to Query/Weight this feature I've added a small marker interface and an instanceof check to prototype this feature. If this is of interest we should decide whether Query, Weight, or both could implement this interface to disable caching. Allow queries to opt out of caching --- Key: LUCENE-6661 URL: https://issues.apache.org/jira/browse/LUCENE-6661 Project: Lucene - Core Issue Type: Improvement Affects Versions: 5.2 Reporter: Terry Smith Priority: Minor Attachments: LUCENE-6661.patch Some queries have out-of-band dependencies that make them incompatible with caching, it'd be great if they could opt out of the new fancy query/filter cache in IndexSearcher. This affects DrillSidewaysQuery and any user-provided custom Query implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6639) LRUQueryCache.CachingWrapperWeight not calling policy.onUse() if the first scorer is skipped
[ https://issues.apache.org/jira/browse/LUCENE-6639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615087#comment-14615087 ] Terry Smith commented on LUCENE-6639: - Ah, I didn't realize the highlighters were creating the weights to extract the terms, that makes sense. I like the idea of just calling onUse() the first time scorer() is called, that ought to be more robust and is very easy to understand. LRUQueryCache.CachingWrapperWeight not calling policy.onUse() if the first scorer is skipped Key: LUCENE-6639 URL: https://issues.apache.org/jira/browse/LUCENE-6639 Project: Lucene - Core Issue Type: Bug Affects Versions: 5.3 Reporter: Terry Smith Priority: Minor Attachments: LUCENE-6639.patch The method {{org.apache.lucene.search.LRUQueryCache.CachingWrapperWeight.scorer(LeafReaderContext)}} starts with {code} if (context.ord == 0) { policy.onUse(getQuery()); } {code} which can result in a missed call for queries that return a null scorer for the first segment. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6639) LRUQueryCache.CachingWrapperWeight not calling policy.onUse() if the first scorer is skipped
[ https://issues.apache.org/jira/browse/LUCENE-6639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608281#comment-14608281 ] Terry Smith commented on LUCENE-6639: - This doesn't seem pressing but irked me enough to submit a ticket. It feels that we should be able to be more correct but the current API isn't very supportive of that work flow. I slightly prefer calling onUse() from createWeight() as it does make this edge case of the first segment go away which I feel is harder to reason about than someone creating a weight and not using it. The improved multi-threaded search code in IndexSearcher is a great example of this misbehaving where there is no guarantee that the first segment's Weight.scorer() will be called before the other segments. However I'm not familiar with use cases that use Query.createWeight() without executing some kind of search or explain to know if they are more of an issue. Is adding bookend methods to more correctly detect the begin/end of the search phase seen as too messy and special casey? At the end of the day I also wonder if it's worth the complexity but wanted to open this ticket to bootstrap the discussion as this could be a hard problem to diagnose in the future (someone wants to know why their query isn't getting cached and it's due to some obscure detail like this). LRUQueryCache.CachingWrapperWeight not calling policy.onUse() if the first scorer is skipped Key: LUCENE-6639 URL: https://issues.apache.org/jira/browse/LUCENE-6639 Project: Lucene - Core Issue Type: Bug Affects Versions: 5.3 Reporter: Terry Smith Priority: Minor Attachments: LUCENE-6639.patch The method {{org.apache.lucene.search.LRUQueryCache.CachingWrapperWeight.scorer(LeafReaderContext)}} starts with {code} if (context.ord == 0) { policy.onUse(getQuery()); } {code} which can result in a missed call for queries that return a null scorer for the first segment. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6639) LRUQueryCache.CachingWrapperWeight not calling policy.onUse() if the first scorer is skipped
[ https://issues.apache.org/jira/browse/LUCENE-6639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Terry Smith updated LUCENE-6639: Attachment: LUCENE-6639.patch Attached unit test will fail if the extra IndexWriter.commit() gets triggered or the BooleanQuery clauses are shuffled to make the first clauses' scorer null for the first segment. LRUQueryCache.CachingWrapperWeight not calling policy.onUse() if the first scorer is skipped Key: LUCENE-6639 URL: https://issues.apache.org/jira/browse/LUCENE-6639 Project: Lucene - Core Issue Type: Bug Affects Versions: 5.3 Reporter: Terry Smith Priority: Minor Attachments: LUCENE-6639.patch The method {{org.apache.lucene.search.LRUQueryCache.CachingWrapperWeight.scorer(LeafReaderContext)}} starts with {code} if (context.ord == 0) { policy.onUse(getQuery()); } {code} which can result in a missed call for queries that return a null scorer for the first segment. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-6639) LRUQueryCache.CachingWrapperWeight not calling policy.onUse() if the first scorer is skipped
Terry Smith created LUCENE-6639: --- Summary: LRUQueryCache.CachingWrapperWeight not calling policy.onUse() if the first scorer is skipped Key: LUCENE-6639 URL: https://issues.apache.org/jira/browse/LUCENE-6639 Project: Lucene - Core Issue Type: Bug Affects Versions: 5.3 Reporter: Terry Smith Priority: Minor The method {{org.apache.lucene.search.LRUQueryCache.CachingWrapperWeight.scorer(LeafReaderContext)}} starts with {code} if (context.ord == 0) { policy.onUse(getQuery()); } {code} which can result in a missed call for queries that return a null scorer for the first segment. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6305) BooleanQuery.equals should ignore clause order
[ https://issues.apache.org/jira/browse/LUCENE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593457#comment-14593457 ] Terry Smith commented on LUCENE-6305: - Oops, read the patch too quickly and missed that key detail! Sorry for the noise. BooleanQuery.equals should ignore clause order -- Key: LUCENE-6305 URL: https://issues.apache.org/jira/browse/LUCENE-6305 Project: Lucene - Core Issue Type: Bug Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-6305.patch BooleanQuery.equals is sensible to the order in which clauses have been added. So for instance +A +B would be considered different from +B +A although it generates the same matches and scores. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6305) BooleanQuery.equals should ignore clause order
[ https://issues.apache.org/jira/browse/LUCENE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593447#comment-14593447 ] Terry Smith commented on LUCENE-6305: - Having BooleanQuery.equals() ignore order is a great idea but I think it'd be better if we could preserve the original clause order for Query.toString(), the Explanation, debugging and test expectations. Additionally, I've been burnt by JVM changes to String.hashCode() that cause HashMapString,? to order entries differently when run in a newer JVM. Are the Query hash codes immune to this problem? BooleanQuery.equals should ignore clause order -- Key: LUCENE-6305 URL: https://issues.apache.org/jira/browse/LUCENE-6305 Project: Lucene - Core Issue Type: Bug Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-6305.patch BooleanQuery.equals is sensible to the order in which clauses have been added. So for instance +A +B would be considered different from +B +A although it generates the same matches and scores. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6305) BooleanQuery.equals should ignore clause order
[ https://issues.apache.org/jira/browse/LUCENE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593467#comment-14593467 ] Terry Smith commented on LUCENE-6305: - Slightly off topic to your original goal, but what do you think about deduping repeated non scoring (FILTER, MUST_NOT) clauses from the list in the query or do you see that as an possible optimization when building the weights/scorers? BooleanQuery.equals should ignore clause order -- Key: LUCENE-6305 URL: https://issues.apache.org/jira/browse/LUCENE-6305 Project: Lucene - Core Issue Type: Bug Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-6305.patch BooleanQuery.equals is sensible to the order in which clauses have been added. So for instance +A +B would be considered different from +B +A although it generates the same matches and scores. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6446) Simplify Explanation API
[ https://issues.apache.org/jira/browse/LUCENE-6446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507111#comment-14507111 ] Terry Smith commented on LUCENE-6446: - bq. I removed it because it was not always in the summary (only when using ComplexExplanation) as well as redundant with the description which is explicit when there is no match, for instance TermWeight's no matching term or BooleanWeight no match on required clause? That makes sense. Removing the redundant information is definitely the way to go. I also noticed that the new Explanation.noMatch() methods look a little trappy. They both take the child details and drop them on the floor. {code} public static Explanation noMatch(String description, CollectionExplanation details) { return new Explanation(false, 0f, description, Collections.emptyList()); } {code} I think the noMatch() methods should either add the details to the created explanation or not accept them as parameters. Having a non-matching explanation contain child details can be really useful for complex queries. What do you think? Simplify Explanation API Key: LUCENE-6446 URL: https://issues.apache.org/jira/browse/LUCENE-6446 Project: Lucene - Core Issue Type: Bug Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Fix For: Trunk, 5.2 Attachments: LUCENE-6446.patch We should make this API easier to consume, for instance: - enforce important components to be non-null (eg. description) - decouple entirely the score computation from whether there is a match or not (Explanation assumes there is a match if the score is 0, you need to use ComplexExplanation to override this behaviour) - return an empty array instead of null when there are no details -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6446) Simplify Explanation API
[ https://issues.apache.org/jira/browse/LUCENE-6446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507038#comment-14507038 ] Terry Smith commented on LUCENE-6446: - The refactored Explanation looks great, however I see a couple of small issues worth raising. 1. The constructor is private and there is a protected toString(int depth) method, it doesn't look like anyone else is calling it and no-one can subclass it. Should this method be private? 2. The toString() output is different! ComplexExplanation had a slightly different getSummary() method: {code} return getValue() + = + (isMatch() ? (MATCH) : (NON-MATCH) ) + getDescription(); {code} versus {code} return getValue() + = + getDescription(); {code} I find this extra context invaluable, especially with the decoupling of score and match, we can't assume that a score of 0 is a NON-MATCH yet the output no longer tells is if an explanation is a MATCH or not. I understand that I can roll my own string building code with the current API. It'd be great if the default output was as useful as possible. Simplify Explanation API Key: LUCENE-6446 URL: https://issues.apache.org/jira/browse/LUCENE-6446 Project: Lucene - Core Issue Type: Bug Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Fix For: Trunk, 5.2 Attachments: LUCENE-6446.patch We should make this API easier to consume, for instance: - enforce important components to be non-null (eg. description) - decouple entirely the score computation from whether there is a match or not (Explanation assumes there is a match if the score is 0, you need to use ComplexExplanation to override this behaviour) - return an empty array instead of null when there are no details -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-6385) NullPointerException from Highlighter.getBestFragment()
Terry Smith created LUCENE-6385: --- Summary: NullPointerException from Highlighter.getBestFragment() Key: LUCENE-6385 URL: https://issues.apache.org/jira/browse/LUCENE-6385 Project: Lucene - Core Issue Type: Bug Components: modules/highlighter Affects Versions: 5.1 Reporter: Terry Smith When testing against the 5.1 nightly snapshots I've come across a NullPointerException in highlighting when nothing would be highlighted. This does not happen with 5.0. {noformat} java.lang.NullPointerException at __randomizedtesting.SeedInfo.seed([3EDC6EB0FA552B34:9971866E394F5FD0]:0) at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extractWeightedSpanTerms(WeightedSpanTermExtractor.java:311) at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:151) at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:515) at org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:219) at org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:187) at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:196) at org.apache.lucene.search.highlight.Highlighter.getBestFragments(Highlighter.java:156) at org.apache.lucene.search.highlight.Highlighter.getBestFragment(Highlighter.java:102) at org.apache.lucene.search.highlight.Highlighter.getBestFragment(Highlighter.java:80) at org.apache.lucene.search.highlight.MissesTest.testPhraseQuery(MissesTest.java:50) {noformat} I've written a small unit test and used git bisect to narrow the regression to the following commit: {noformat} commit 24e4eefaefb1837d1d4fa35f7669c2b264f872ac Author: Michael McCandless mikemcc...@apache.org Date: Tue Mar 31 08:48:28 2015 + LUCENE-6308: cutover Spans to DISI, reuse ConjunctionDISI, use two-phased iteration git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/branches/branch_5x@1670273 13f79535-47bb-0310-9956-ffa450edef68 {noformat} The problem looks quite simple, WeightedSpanTermExtractor.extractWeightedSpanTerms() needs an early return if SpanQuery.getSpans() returns null. All other callers check against this. Unit test and fix (against the regressed commit) attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6385) NullPointerException from Highlighter.getBestFragment()
[ https://issues.apache.org/jira/browse/LUCENE-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Terry Smith updated LUCENE-6385: Attachment: LUCENE-6385.patch NullPointerException from Highlighter.getBestFragment() --- Key: LUCENE-6385 URL: https://issues.apache.org/jira/browse/LUCENE-6385 Project: Lucene - Core Issue Type: Bug Components: modules/highlighter Affects Versions: 5.1 Reporter: Terry Smith Attachments: LUCENE-6385.patch When testing against the 5.1 nightly snapshots I've come across a NullPointerException in highlighting when nothing would be highlighted. This does not happen with 5.0. {noformat} java.lang.NullPointerException at __randomizedtesting.SeedInfo.seed([3EDC6EB0FA552B34:9971866E394F5FD0]:0) at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extractWeightedSpanTerms(WeightedSpanTermExtractor.java:311) at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:151) at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:515) at org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:219) at org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:187) at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:196) at org.apache.lucene.search.highlight.Highlighter.getBestFragments(Highlighter.java:156) at org.apache.lucene.search.highlight.Highlighter.getBestFragment(Highlighter.java:102) at org.apache.lucene.search.highlight.Highlighter.getBestFragment(Highlighter.java:80) at org.apache.lucene.search.highlight.MissesTest.testPhraseQuery(MissesTest.java:50) {noformat} I've written a small unit test and used git bisect to narrow the regression to the following commit: {noformat} commit 24e4eefaefb1837d1d4fa35f7669c2b264f872ac Author: Michael McCandless mikemcc...@apache.org Date: Tue Mar 31 08:48:28 2015 + LUCENE-6308: cutover Spans to DISI, reuse ConjunctionDISI, use two-phased iteration git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/branches/branch_5x@1670273 13f79535-47bb-0310-9956-ffa450edef68 {noformat} The problem looks quite simple, WeightedSpanTermExtractor.extractWeightedSpanTerms() needs an early return if SpanQuery.getSpans() returns null. All other callers check against this. Unit test and fix (against the regressed commit) attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [DISCUSS] Change Query API to make queries immutable in 6.0
Adrien, Thanks for the explanation. It seems a pity to make queries just nearly immutable. Do you have any interest in adding a boost parameter to clone() so they really could be immutable? --Terry On Tue, Mar 31, 2015 at 9:44 AM, Adrien Grand jpou...@gmail.com wrote: Hi Terry, Indeed this is for query rewriting. For instance if you have a boolean query with a boost of 5 that wraps a single MUST clause with a term query, then we rewrite to this to the inner term query and update its boost using clone() and setBoost() in order to not modify in-place a user-modified query. On Tue, Mar 31, 2015 at 3:37 PM, Terry Smith sheb...@gmail.com wrote: Adrien, I missed the reason that boost is going to stay mutable. Is this to support query rewriting? --Terry On Tue, Mar 31, 2015 at 7:21 AM, Robert Muir rcm...@gmail.com wrote: Same with BooleanQuery. the go-to ctor should just take 'clauses' On Tue, Mar 31, 2015 at 5:18 AM, Michael McCandless luc...@mikemccandless.com wrote: +1 For PhraseQuery we could also have a common-case ctor that just takes the terms (and assumes sequential positions)? Mike McCandless http://blog.mikemccandless.com On Tue, Mar 31, 2015 at 5:10 AM, Adrien Grand jpou...@gmail.com wrote: Recent changes that added automatic filter caching to IndexSearcher uncovered some traps with our queries when it comes to using them as cache keys. The problem comes from the fact that some of our main queries are mutable, and modifying them while they are used as cache keys makes the entry that they are caching invisible (because the hash code changed too) yet still using memory. While I think most users would be unaffected as it is rather uncommon to modify queries after having passed them to IndexSearcher, I would like to remove this trap by making queries immutable: everything should be set at construction time except the boost parameter that could still be changed with the same clone()/setBoost() mechanism as today. First I would like to make sure that it sounds good to everyone and then to discuss what the API should look like. Most of our queries happen to be immutable already (NumericRangeQuery, TermsQuery, SpanNearQuery, etc.) but some aren't and the main exceptions are: - BooleanQuery, - DisjunctionMaxQuery, - PhraseQuery, - MultiPhraseQuery. We could take all parameters that are set as setters and move them to constructor arguments. For the above queries, this would mean (using varargs for ease of use): BooleanQuery(boolean disableCoord, int minShouldMatch, BooleanClause... clauses) DisjunctionMaxQuery(float tieBreakMul, Query... clauses) For PhraseQuery and MultiPhraseQuery, the closest to what we have today would require adding new classes to wrap terms and positions together, for instance: class TermAndPosition { public final BytesRef term; public final int position; } so that eg. PhraseQuery would look like: PhraseQuery(int slop, String field, TermAndPosition... terms) MultiPhraseQuery would be the same with several terms at the same position. Comments/ideas/concerns are highly welcome. -- Adrien - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Adrien - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [DISCUSS] Change Query API to make queries immutable in 6.0
Adrien, I missed the reason that boost is going to stay mutable. Is this to support query rewriting? --Terry On Tue, Mar 31, 2015 at 7:21 AM, Robert Muir rcm...@gmail.com wrote: Same with BooleanQuery. the go-to ctor should just take 'clauses' On Tue, Mar 31, 2015 at 5:18 AM, Michael McCandless luc...@mikemccandless.com wrote: +1 For PhraseQuery we could also have a common-case ctor that just takes the terms (and assumes sequential positions)? Mike McCandless http://blog.mikemccandless.com On Tue, Mar 31, 2015 at 5:10 AM, Adrien Grand jpou...@gmail.com wrote: Recent changes that added automatic filter caching to IndexSearcher uncovered some traps with our queries when it comes to using them as cache keys. The problem comes from the fact that some of our main queries are mutable, and modifying them while they are used as cache keys makes the entry that they are caching invisible (because the hash code changed too) yet still using memory. While I think most users would be unaffected as it is rather uncommon to modify queries after having passed them to IndexSearcher, I would like to remove this trap by making queries immutable: everything should be set at construction time except the boost parameter that could still be changed with the same clone()/setBoost() mechanism as today. First I would like to make sure that it sounds good to everyone and then to discuss what the API should look like. Most of our queries happen to be immutable already (NumericRangeQuery, TermsQuery, SpanNearQuery, etc.) but some aren't and the main exceptions are: - BooleanQuery, - DisjunctionMaxQuery, - PhraseQuery, - MultiPhraseQuery. We could take all parameters that are set as setters and move them to constructor arguments. For the above queries, this would mean (using varargs for ease of use): BooleanQuery(boolean disableCoord, int minShouldMatch, BooleanClause... clauses) DisjunctionMaxQuery(float tieBreakMul, Query... clauses) For PhraseQuery and MultiPhraseQuery, the closest to what we have today would require adding new classes to wrap terms and positions together, for instance: class TermAndPosition { public final BytesRef term; public final int position; } so that eg. PhraseQuery would look like: PhraseQuery(int slop, String field, TermAndPosition... terms) MultiPhraseQuery would be the same with several terms at the same position. Comments/ideas/concerns are highly welcome. -- Adrien - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?
[ https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335022#comment-14335022 ] Terry Smith commented on LUCENE-6229: - Understood. If you end up keeping getChildren(), how do you feel about making it well defined by capturing these constraints in the Javadoc? Remove Scorer.getChildren? -- Key: LUCENE-6229 URL: https://issues.apache.org/jira/browse/LUCENE-6229 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Priority: Minor This API is used in a single place in our code base: ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that using this API from a collector only works if setScorer is called with an actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in disjunctions) so it needs a custom IndexSearcher that does not use the BulkScorer API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?
[ https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334937#comment-14334937 ] Terry Smith commented on LUCENE-6229: - [~rcmuir] Sorry for excluding that scenario, it wasn't intentional. If you all decide to keep getChildren(), then I'd love to get the contract described so people know what to expect. I think these statements are correct: # Scorer.getChildren() returns the immediate child scorers # A returned scorer may be ## unpositioned (never had next() or advance() called on it) ## positioned on a valid document that is before, on, or after the current document ## exhausted and thus positioned at NO_MORE_DOCS # You MUST NOT call next() or advance() on the returned scorers yourself And have these questions: # Can I walk the returned scorers to get to all non-null leaf scorers? # Can I position the returned scorers on the current document by calling freq(), score() or something else? Remove Scorer.getChildren? -- Key: LUCENE-6229 URL: https://issues.apache.org/jira/browse/LUCENE-6229 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Priority: Minor This API is used in a single place in our code base: ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that using this API from a collector only works if setScorer is called with an actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in disjunctions) so it needs a custom IndexSearcher that does not use the BulkScorer API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?
[ https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329069#comment-14329069 ] Terry Smith commented on LUCENE-6229: - I’ll summarize this as two options: # Remove getChildren() as it complicates the code hurting the ability to maintain it and make performance enhancements. # Make getChildren() a more well defined API that gives you the ability to retrieve child scorers that are correctly positioned. You are looking for data to backup option 2 to determine if this is an API that is worth fixing/keeping. Here are the use cases that I have: # Custom scoring of a BooleanQuery. A query that wraps any BooleanQuery which it uses for recall but supplies it’s own scoring algorithm to aggregate the scores from the original clauses. # Custom DrillSidewaysQuery. A query that can use the sideways scorers for precision instead of just recall. # Recursive DrillSidewaysQuery (not implemented, tricky). A query that could perform DrillSideways for union or in a nested fashion. # Auxillary metadata. An enhancement that can augment the current recall (boolean match) and precision (float score) for a document in the search pipeline to add extra information that can be used from Query and FunctionValue instances (collected via a custom Collector) and supported by a custom SortField. These can be categorized into two camps: # Using an existing Query (typically BooleanQuery) to find matches but providing some combination of ## custom scoring that isn’t compatible with the Similarity API. ## custom recall (think DrillSideways) # Adding extra information to the search pipeline that can be ## generated by leaf queries and value sources ## aggregated by composing queries (BooleanQuery, DisjunctionMaxQuery, etc) ## survive wrapping queries and value sources that don’t know about it ## collected and sorted on Hope this helps. Remove Scorer.getChildren? -- Key: LUCENE-6229 URL: https://issues.apache.org/jira/browse/LUCENE-6229 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Priority: Minor This API is used in a single place in our code base: ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that using this API from a collector only works if setScorer is called with an actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in disjunctions) so it needs a custom IndexSearcher that does not use the BulkScorer API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?
[ https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14324686#comment-14324686 ] Terry Smith commented on LUCENE-6229: - [~rcmuir] Thanks for the backstory. I've been trying to wrap my head around where Lucene is going and this kind of information really helps. It sounds like both [~rcmuir] and [~thetaphi] agree that Scorer.getChildren() is not an API that Lucene should support. Reading between the lines, this implies to me that scoring is moving to a bulk-only approach, which will bring great performance gains. A best effort implementation of Scorer.getChildren() would be something that I'd be uncomfortable adding features on top of, although it could be useful for debugging. Unfortunately this is a showstopper for me as I rely on Scorer.getChildren() for some critical features, and need to do some serious thinking to figure out if I can formulate an alternative approach. Remove Scorer.getChildren? -- Key: LUCENE-6229 URL: https://issues.apache.org/jira/browse/LUCENE-6229 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Priority: Minor This API is used in a single place in our code base: ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that using this API from a collector only works if setScorer is called with an actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in disjunctions) so it needs a custom IndexSearcher that does not use the BulkScorer API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?
[ https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318272#comment-14318272 ] Terry Smith commented on LUCENE-6229: - [~jpountz] - I'm going to split the freq() vs score() thing into a separate ticket so it doesn't hijack this one. I intend to take the unit test I previously pasted and extend it to create some randomized BooleanQuerys to try and locate possibly broken edge cases and give a safety blanket for future refactoring. I'll make these assumptions, shout out if they are incorrect. For a BooleanQuery I should be able to perform doc-at-a-time scoring, meaning that in a Collector or Scorer I can 1. Find all Scorers from the child clauses of the BooleanQuery 2. Have those Scorers be positioned for me by calling score() or freq() Remove Scorer.getChildren? -- Key: LUCENE-6229 URL: https://issues.apache.org/jira/browse/LUCENE-6229 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Priority: Minor This API is used in a single place in our code base: ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that using this API from a collector only works if setScorer is called with an actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in disjunctions) so it needs a custom IndexSearcher that does not use the BulkScorer API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?
[ https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314219#comment-14314219 ] Terry Smith commented on LUCENE-6229: - Like Stefan, I'm also using this functionality to access child scorers on a per document basis. Currently for some custom query enhancements and a custom drill sideways implementation. Like Adrien, I've also had to wrap queries in a custom NonBulkScoringQuery to force doc-at-a-time scoring. It'd be great to simplify this workflow as I've been calling Scorer.freq() to position all the child scorers (from a BooleanQuery) and as of the 5.1 nightly builds am needing to call Scorer.score() instead for positioning due to changes in MinShouldMatchSumScorer. I'd love to have a way to not only get the child scorers but be confident that they were all correctly positioned. Remove Scorer.getChildren? -- Key: LUCENE-6229 URL: https://issues.apache.org/jira/browse/LUCENE-6229 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Priority: Minor This API is used in a single place in our code base: ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that using this API from a collector only works if setScorer is called with an actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in disjunctions) so it needs a custom IndexSearcher that does not use the BulkScorer API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6229) Remove Scorer.getChildren?
[ https://issues.apache.org/jira/browse/LUCENE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314361#comment-14314361 ] Terry Smith commented on LUCENE-6229: - h2. freq() vs score() I think the lazy positioning in MinShouldMatchSumScorer is misbehaving. Drop these three methods into TestBooleanMinShouldMatch.java to see. {code:java} public void testMinNrShouldMatchFreq() throws Exception { BooleanQuery q = new BooleanQuery(); q.add(new TermQuery(new Term(data, 1)), Occur.SHOULD); q.add(new TermQuery(new Term(data, 2)), Occur.SHOULD); q.add(new TermQuery(new Term(data, 3)), Occur.SHOULD); q.add(new TermQuery(new Term(id, 0)), Occur.MUST); q.setMinimumNumberShouldMatch(2); verifyNrHits(q, 1); s.search(q, new SimpleCollector() { private Scorer scorer; private CollectionScorer leafScorers; @Override public void setScorer(Scorer scorer) throws IOException { this.scorer = scorer; this.leafScorers = leafScorers(new ArrayListScorer(), scorer); assertEquals(4, leafScorers.size()); } @Override public void collect(int doc) throws IOException { assertEquals(0, doc); scorer.freq(); // position leaf scorers for (Scorer leafScorer : leafScorers) { assertEquals(0, leafScorer.docID()); } } }); } public void testMinNrShouldMatchScore() throws Exception { BooleanQuery q = new BooleanQuery(); q.add(new TermQuery(new Term(data, 1)), Occur.SHOULD); q.add(new TermQuery(new Term(data, 2)), Occur.SHOULD); q.add(new TermQuery(new Term(data, 3)), Occur.SHOULD); q.add(new TermQuery(new Term(id, 0)), Occur.MUST); q.setMinimumNumberShouldMatch(2); verifyNrHits(q, 1); s.search(q, new SimpleCollector() { private Scorer scorer; private CollectionScorer leafScorers; @Override public void setScorer(Scorer scorer) throws IOException { this.scorer = scorer; this.leafScorers = leafScorers(new ArrayListScorer(), scorer); assertEquals(4, leafScorers.size()); } @Override public void collect(int doc) throws IOException { assertEquals(0, doc); scorer.score(); // position leaf scorers for (Scorer leafScorer : leafScorers) { assertEquals(0, leafScorer.docID()); } } }); } private static CollectionScorer leafScorers(CollectionScorer target, Scorer scorer) { CollectionChildScorer childScorers = scorer.getChildren(); if (childScorers.isEmpty()) { target.add(scorer); } else { for (ChildScorer childScorer : childScorers) { leafScorers(target, childScorer.child); } } return target; } {code} Here the one that uses freq() to position the sub scorers fails but the one that uses score() succeeds. h2. middle ground I have Scorer constructors, Weight.scorer(), Weight.explain() and Collectors all calling Scorer.getChildren(). But when using my custom Collectors I'm careful to wrap the Query in a custom NonBulkScoringQuery that prevents bulk scoring to work around the trap. The NonBulkScoringQuery I mention is a simple delegating Query that allows Weight.bulkScorer() to use it's default implementation instead of allowing the wrapped Query to override it. I like removing the trap for bulk scoring queries, it's really subtle and it took me a while to diagnose the first time I hit it. Having a separate entry point into IndexSearcher to achieve doc-at-a-time scoring that supports getChildren() would be awesome. I'm not so hot on having to cast the collector, do you think there could be a way to preserve type safety here? Remove Scorer.getChildren? -- Key: LUCENE-6229 URL: https://issues.apache.org/jira/browse/LUCENE-6229 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Priority: Minor This API is used in a single place in our code base: ToParentBlockJoinCollector. In addition, the usage is a bit buggy given that using this API from a collector only works if setScorer is called with an actual Scorer (and not eg. FakeScorer or BooleanScorer like you would get in disjunctions) so it needs a custom IndexSearcher that does not use the BulkScorer API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6232) Replace ValueSource context Map with a more concrete data type
[ https://issues.apache.org/jira/browse/LUCENE-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314242#comment-14314242 ] Terry Smith commented on LUCENE-6232: - I have custom code that injects objects into this map. If you refactor this to be concrete class could you leave it non-final so a custom FunctionQuery could provide it's own subclassed instance of this context? Replace ValueSource context Map with a more concrete data type -- Key: LUCENE-6232 URL: https://issues.apache.org/jira/browse/LUCENE-6232 Project: Lucene - Core Issue Type: Improvement Reporter: Mike Drob Inspired by LUCENE-3973 The context object used by ValueSource and friends is a raw Map that provides no type safety guarantees. In our current state, there are lots of warnings about unchecked casts, raw types, and generally unsafe code from the compiler's perspective. There are several common patterns and types of Objects that we store in the context. It would be beneficial to instead use a class with typed methods for get/set of Scorer, Weights, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6232) Replace ValueSource context Map with a more concrete data type
[ https://issues.apache.org/jira/browse/LUCENE-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314555#comment-14314555 ] Terry Smith commented on LUCENE-6232: - [~mdrob] Thanks, either would work. Replace ValueSource context Map with a more concrete data type -- Key: LUCENE-6232 URL: https://issues.apache.org/jira/browse/LUCENE-6232 Project: Lucene - Core Issue Type: Improvement Reporter: Mike Drob Inspired by LUCENE-3973 The context object used by ValueSource and friends is a raw Map that provides no type safety guarantees. In our current state, there are lots of warnings about unchecked casts, raw types, and generally unsafe code from the compiler's perspective. There are several common patterns and types of Objects that we store in the context. It would be beneficial to instead use a class with typed methods for get/set of Scorer, Weights, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5796) Scorer.getChildren() can throw or hide a subscorer for some boolean queries
[ https://issues.apache.org/jira/browse/LUCENE-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049941#comment-14049941 ] Terry Smith commented on LUCENE-5796: - Thanks for taking the time to review my patch and comment on the approach. The reason that I advocated changing FilterScorer and BoostedScorer is to allow some of my custom Query implementations to use a regular BooleanQuery for recall and optionally scoring while taking advantage of the actual Scorers used on a per document, per clause basis. This has been working great across quite a few Lucene releases but failed when I upgraded to 4.9 due to the two regressions in behavior for Scorer.getChildren() as described in this ticket. In this scenario, a BooleanQuery containing two TermQueries (one a miss and the other a hit) returns the following from BooleanWeight.scorer(): * BoostedScorer ** TermScorer (hit) Calling getChildren() on this returns an empty list because the BoostedScorer just returns in.getChildren() and thus you are unable to navigate to the actual TermScorer in play. This would impact any classes that extend FilterScorer and don't override getChildren(). In other words, the current wiring does make the BoostedScorer transparent but with the disadvantage of hiding the actual scorer that performs the work. If this is an unsupported workflow, I'm happy to move the discussion over to the user mailing list. Scorer.getChildren() can throw or hide a subscorer for some boolean queries --- Key: LUCENE-5796 URL: https://issues.apache.org/jira/browse/LUCENE-5796 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 4.9 Reporter: Terry Smith Priority: Minor Attachments: LUCENE-5796.patch I've isolated two example boolean queries that don't behave with release 4.9 of Lucene. # A BooleanQuery with three SHOULD clauses and a minimumNumberShouldMatch of 2 will throw an ArrayIndexOutOfBoundsException. {noformat} java.lang.ArrayIndexOutOfBoundsException: 2 at __randomizedtesting.SeedInfo.seed([2F79B3DF917D071B:2539E6DBC4DF793C]:0) at org.apache.lucene.search.MinShouldMatchSumScorer.getChildren(MinShouldMatchSumScorer.java:119) at org.apache.lucene.search.TestBooleanQueryVisitSubscorers$ScorerSummarizingCollector.summarizeScorer(TestBooleanQueryVisitSubscorers.java:261) at org.apache.lucene.search.TestBooleanQueryVisitSubscorers$ScorerSummarizingCollector.setScorer(TestBooleanQueryVisitSubscorers.java:238) at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:161) at org.apache.lucene.search.AssertingBulkScorer.score(AssertingBulkScorer.java:64) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621) at org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:94) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309) at org.apache.lucene.search.TestBooleanQueryVisitSubscorers.testGetChildrenMinShouldMatchSumScorer(TestBooleanQueryVisitSubscorers.java:196) {noformat} # A BooleanQuery with two should clauses, one of which is a miss for all documents in the current segment will accidentally mask the scorer that was a hit. Unit tests and patch based on {{branch_4x}} are available and will be attached as soon as this ticket has a number. They are immediately available on GitHub on branch [shebiki/bqgetchildren|https://github.com/shebiki/lucene-solr/commits/bqgetchildren] as commit [c64bb6f|https://github.com/shebiki/lucene-solr/commit/c64bb6f2df8f33dd8daafc953d9c27b5cbf29fa3]. I took the liberty of naming the relationship in BoostingScorer.getChildren() {{BOOSTING}}. Suspect someone will offer a better name for this. Here is a summary of the various relationships in play for all Scorer.getChildren() implementations on {{branch_4x}} to help choose. || class || relationships | org.apache.lucene.search.AssertingScorer | SHOULD | org.apache.lucene.search.join.ToParentBlockJoinQuery.BlockJoinScorer | BLOCK_JOIN | org.apache.lucene.search.ConjunctionScorer | MUST | org.apache.lucene.search.ConstantScoreQuery.ConstantScorer | constant | org.apache.lucene.queries.function.BoostedQuery.CustomScorer | CUSTOM | org.apache.lucene.queries.CustomScoreQuery.CustomScorer | CUSTOM | org.apache.lucene.search.DisjunctionScorer | SHOULD | org.apache.lucene.facet.DrillSidewaysScorer.FakeScorer | MUST | org.apache.lucene.search.FilterScorer
[jira] [Created] (LUCENE-5796) Scorer.getChildren() can throw or hide a subscorer for some boolean queries
Terry Smith created LUCENE-5796: --- Summary: Scorer.getChildren() can throw or hide a subscorer for some boolean queries Key: LUCENE-5796 URL: https://issues.apache.org/jira/browse/LUCENE-5796 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 4.9 Reporter: Terry Smith Priority: Minor I've isolated two example boolean queries that don't behave with release 4.9 of Lucene. # A BooleanQuery with three SHOULD clauses and a minimumNumberShouldMatch of 2 will throw an ArrayIndexOutOfBoundsException. {noformat} java.lang.ArrayIndexOutOfBoundsException: 2 at __randomizedtesting.SeedInfo.seed([2F79B3DF917D071B:2539E6DBC4DF793C]:0) at org.apache.lucene.search.MinShouldMatchSumScorer.getChildren(MinShouldMatchSumScorer.java:119) at org.apache.lucene.search.TestBooleanQueryVisitSubscorers$ScorerSummarizingCollector.summarizeScorer(TestBooleanQueryVisitSubscorers.java:261) at org.apache.lucene.search.TestBooleanQueryVisitSubscorers$ScorerSummarizingCollector.setScorer(TestBooleanQueryVisitSubscorers.java:238) at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:161) at org.apache.lucene.search.AssertingBulkScorer.score(AssertingBulkScorer.java:64) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621) at org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:94) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309) at org.apache.lucene.search.TestBooleanQueryVisitSubscorers.testGetChildrenMinShouldMatchSumScorer(TestBooleanQueryVisitSubscorers.java:196) {noformat} # A BooleanQuery with two should clauses, one of which is a miss for all documents in the current segment will accidentally mask the scorer that was a hit. Unit tests and patch based on {{branch_4x}} are available and will be attached as soon as this ticket has a number. They are immediately available on GitHub on branch [shebiki/bqgetchildren|https://github.com/shebiki/lucene-solr/commits/bqgetchildren] as commit [c64bb6f|https://github.com/shebiki/lucene-solr/commit/c64bb6f2df8f33dd8daafc953d9c27b5cbf29fa3]. I took the liberty of naming the relationship in BoostingScorer.getChildren() {{BOOSTING}}. Suspect someone will offer a better name for this. Here is a summary of the various relationships in play for all Scorer.getChildren() implementations on {{branch_4x}} to help choose. || class || relationships | org.apache.lucene.search.AssertingScorer | SHOULD | org.apache.lucene.search.join.ToParentBlockJoinQuery.BlockJoinScorer | BLOCK_JOIN | org.apache.lucene.search.ConjunctionScorer | MUST | org.apache.lucene.search.ConstantScoreQuery.ConstantScorer | constant | org.apache.lucene.queries.function.BoostedQuery.CustomScorer | CUSTOM | org.apache.lucene.queries.CustomScoreQuery.CustomScorer | CUSTOM | org.apache.lucene.search.DisjunctionScorer | SHOULD | org.apache.lucene.facet.DrillSidewaysScorer.FakeScorer | MUST | org.apache.lucene.search.FilterScorer| calls in.getChildren() | org.apache.lucene.search.ScoreCachingWrappingScorer | CACHED | org.apache.lucene.search.FilteredQuery.LeapFrogScorer| FILTERED | org.apache.lucene.search.MinShouldMatchSumScorer | SHOULD | org.apache.lucene.search.FilteredQuery | FILTERED | org.apache.lucene.search.ReqExclScorer | MUST | org.apache.lucene.search.ReqOptSumScorer | MUST, SHOULD | org.apache.lucene.search.join.ToChildBlockJoinQuery | BLOCK_JOIN I also removed FilterScorer.getChildren() to prevent mistakes and force subclasses to provide a correct implementation. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5796) Scorer.getChildren() can throw or hide a subscorer for some boolean queries
[ https://issues.apache.org/jira/browse/LUCENE-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Terry Smith updated LUCENE-5796: Attachment: LUCENE-5796.patch Scorer.getChildren() can throw or hide a subscorer for some boolean queries --- Key: LUCENE-5796 URL: https://issues.apache.org/jira/browse/LUCENE-5796 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 4.9 Reporter: Terry Smith Priority: Minor Attachments: LUCENE-5796.patch I've isolated two example boolean queries that don't behave with release 4.9 of Lucene. # A BooleanQuery with three SHOULD clauses and a minimumNumberShouldMatch of 2 will throw an ArrayIndexOutOfBoundsException. {noformat} java.lang.ArrayIndexOutOfBoundsException: 2 at __randomizedtesting.SeedInfo.seed([2F79B3DF917D071B:2539E6DBC4DF793C]:0) at org.apache.lucene.search.MinShouldMatchSumScorer.getChildren(MinShouldMatchSumScorer.java:119) at org.apache.lucene.search.TestBooleanQueryVisitSubscorers$ScorerSummarizingCollector.summarizeScorer(TestBooleanQueryVisitSubscorers.java:261) at org.apache.lucene.search.TestBooleanQueryVisitSubscorers$ScorerSummarizingCollector.setScorer(TestBooleanQueryVisitSubscorers.java:238) at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:161) at org.apache.lucene.search.AssertingBulkScorer.score(AssertingBulkScorer.java:64) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621) at org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:94) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309) at org.apache.lucene.search.TestBooleanQueryVisitSubscorers.testGetChildrenMinShouldMatchSumScorer(TestBooleanQueryVisitSubscorers.java:196) {noformat} # A BooleanQuery with two should clauses, one of which is a miss for all documents in the current segment will accidentally mask the scorer that was a hit. Unit tests and patch based on {{branch_4x}} are available and will be attached as soon as this ticket has a number. They are immediately available on GitHub on branch [shebiki/bqgetchildren|https://github.com/shebiki/lucene-solr/commits/bqgetchildren] as commit [c64bb6f|https://github.com/shebiki/lucene-solr/commit/c64bb6f2df8f33dd8daafc953d9c27b5cbf29fa3]. I took the liberty of naming the relationship in BoostingScorer.getChildren() {{BOOSTING}}. Suspect someone will offer a better name for this. Here is a summary of the various relationships in play for all Scorer.getChildren() implementations on {{branch_4x}} to help choose. || class || relationships | org.apache.lucene.search.AssertingScorer | SHOULD | org.apache.lucene.search.join.ToParentBlockJoinQuery.BlockJoinScorer | BLOCK_JOIN | org.apache.lucene.search.ConjunctionScorer | MUST | org.apache.lucene.search.ConstantScoreQuery.ConstantScorer | constant | org.apache.lucene.queries.function.BoostedQuery.CustomScorer | CUSTOM | org.apache.lucene.queries.CustomScoreQuery.CustomScorer | CUSTOM | org.apache.lucene.search.DisjunctionScorer | SHOULD | org.apache.lucene.facet.DrillSidewaysScorer.FakeScorer | MUST | org.apache.lucene.search.FilterScorer| calls in.getChildren() | org.apache.lucene.search.ScoreCachingWrappingScorer | CACHED | org.apache.lucene.search.FilteredQuery.LeapFrogScorer| FILTERED | org.apache.lucene.search.MinShouldMatchSumScorer | SHOULD | org.apache.lucene.search.FilteredQuery | FILTERED | org.apache.lucene.search.ReqExclScorer | MUST | org.apache.lucene.search.ReqOptSumScorer | MUST, SHOULD | org.apache.lucene.search.join.ToChildBlockJoinQuery | BLOCK_JOIN I also removed FilterScorer.getChildren() to prevent mistakes and force subclasses to provide a correct implementation. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Wiki edit permission
Could I be allowed edit permissions on the wiki? I've signed up on the Lucene and Solr wiki with the same name (TerrySmith) but am just emailing lucene-dev to start. --Terry
Re: Wiki edit permission
Steve, Boy that was quick! Thanks. --Terry On Mon, Mar 24, 2014 at 9:28 AM, Steve Rowe sar...@gmail.com wrote: Hi Terry, I've added your account name to the ContributorsGroup page on both the Solr and Lucene wikis, so you should be able to edit both now. Steve On Mar 24, 2014, at 9:21 AM, Terry Smith sheb...@gmail.com wrote: Could I be allowed edit permissions on the wiki? I've signed up on the Lucene and Solr wiki with the same name (TerrySmith) but am just emailing lucene-dev to start. --Terry - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Stalled unit tests
It seems that you need to run the tests with `-Dtests.disableHdfs=true` for them to succeed. Is there any interested in making this the default behavior? If not, I'll happily start a new email thread to get wiki permissions so that the contribution pages linked from the main README.txthttps://github.com/apache/lucene-solr/blob/trunk/README.txt will both mention this important flag. - http://wiki.apache.org/lucene-java/HowToContribute - http://wiki.apache.org/solr/HowToContribute Right now they both state that you can run `ant clean test`, unfortunately that command will fail if you run the tests from either the top level of the project or the solr subdirectory unless you instead run `ant -Dtests.disableHdfs=true clean test` or create a build.properties file. I also couldn't find any references to build.properties on the wiki, here are the searches I tried: - http://wiki.apache.org/general/FrontPage?action=fullsearchcontext=180value=build.propertiesfullsearch=Text - http://www.google.com/?q=%22build.properties%22+site:wiki.apache.org%2Flucene - http://www.google.com/?q=%22build.properties%22+site:wiki.apache.org%2Fsolr Is this documented somewhere else? I'd be happy to back some out from the ant files, collate documentation from other sources and make it easier to find. --Terry On Mon, Mar 10, 2014 at 2:55 PM, Dawid Weiss dawid.we...@cs.put.poznan.plwrote: Dawid: Boy, those are some large timeouts! I know... I wasn't the one to bump them; my default was, I think, about 3 minutes per class... Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Stalled unit tests
Dawid, Thanks, I didn't even know about it until Mike mentioned it earlier in this thread. I've had it work from ~/lucene.build.properties and ~/build.properties but didn't have any luck putting it in the root of project (I'm probably just miss reading the ant file). --Terry On Thu, Mar 13, 2014 at 9:35 AM, Dawid Weiss dawid.we...@cs.put.poznan.plwrote: Terry, The build.properties file holds the current user's config, as opposed to the defaults stored in the repository. In fact, there are more locations where you can put such defaults (see common-build.xml's header): !-- Give user a chance to override without editing this file (and without typing -D each time it compiles it -- property file=${user.home}/lucene.build.properties/ property file=${user.home}/build.properties/ property file=${basedir}/build.properties/ property file=${common.dir}/build.properties/ Dawid On Thu, Mar 13, 2014 at 2:29 PM, Terry Smith sheb...@gmail.com wrote: It seems that you need to run the tests with `-Dtests.disableHdfs=true` for them to succeed. Is there any interested in making this the default behavior? If not, I'll happily start a new email thread to get wiki permissions so that the contribution pages linked from the main README.txt will both mention this important flag. http://wiki.apache.org/lucene-java/HowToContribute http://wiki.apache.org/solr/HowToContribute Right now they both state that you can run `ant clean test`, unfortunately that command will fail if you run the tests from either the top level of the project or the solr subdirectory unless you instead run `ant -Dtests.disableHdfs=true clean test` or create a build.properties file. I also couldn't find any references to build.properties on the wiki, here are the searches I tried: http://wiki.apache.org/general/FrontPage?action=fullsearchcontext=180value=build.propertiesfullsearch=Text http://www.google.com/?q=%22build.properties%22+site:wiki.apache.org%2Flucene http://www.google.com/?q=%22build.properties%22+site:wiki.apache.org%2Fsolr Is this documented somewhere else? I'd be happy to back some out from the ant files, collate documentation from other sources and make it easier to find. --Terry On Mon, Mar 10, 2014 at 2:55 PM, Dawid Weiss dawid.we...@cs.put.poznan.pl wrote: Dawid: Boy, those are some large timeouts! I know... I wasn't the one to bump them; my default was, I think, about 3 minutes per class... Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Stalled unit tests
Dawid: Boy, those are some large timeouts! Mike: The build.properties suggestion resolved my issue. I can now run the test to completion. On a Mid 2009 MacBook Pro running Mavericks and using Java 6 executing ant from the top level of the lucene-solr project I get the following timings: ant clean compile -- 3 minutes ant clean test (tests.disableHdfs=true, tests.slow=false) -- 55 minutes ant clean test (tests.disableHdfs=true) -- 88 minutes On a Mid 2012 MacBook Pro with the same software stack: ant clean compile -- 1 minute ant clean test (tests.disableHdfs=true, tests.slow=false) -- 8 minutes All running from the same git commit mentioned at the top of this thread. The tests make great use of multiple CPU/cores so a faster machine makes a huge difference to the total runtime. Do the HDFS tests fail due to test bugs or implementation issues? How do you feel about changing the default value of tests.disableHdfs to true versus updating the wiki documentation to let knew contributors know how to work around this? --Terry On Fri, Mar 7, 2014 at 12:46 PM, Michael McCandless luc...@mikemccandless.com wrote: I just ran ant test under Solr; it took 4 minutes 25 seconds. But, in my ~/build.properties I have: tests.disableHdfs=true tests.slow=false Which makes things substantially faster, and also [seems to] sidestep the Solr tests that false fail. Mike McCandless http://blog.mikemccandless.com On Fri, Mar 7, 2014 at 9:04 AM, Terry Smith sheb...@gmail.com wrote: Mike, Fair enough. I'll let them run for more than 30 minutes and see what happens. How long does it take on your machine? I'm happy to signup for the wiki and add some extra information to http://wiki.apache.org/lucene-java/HowToContribute for folks wanting to tinker with Lucene. Do the Lucene developers typically run a subset of the test suite to make committing cheaper? Thanks, --Terry On Fri, Mar 7, 2014 at 5:52 AM, Michael McCandless luc...@mikemccandless.com wrote: Unfortunately, some tests take a very long time, and the test infra will print these HEARTBEAT messages notifying you that they are still running. They should eventually finish? Mike McCandless http://blog.mikemccandless.com On Thu, Mar 6, 2014 at 5:09 PM, Terry Smith sheb...@gmail.com wrote: I'm sure that I'm just missing something obvious but I'm having trouble getting the unit tests to run to completion on my laptop and was hoping that someone would be kind enough to point me in the right direction. I've cloned the repository from GitHub (http://git.apache.org/lucene-solr.git) and checked out the latest commit on branch_4x. commit 6e06247cec1410f32592bfd307c1020b814def06 Author: Robert Muir rm...@apache.org Date: Thu Mar 6 19:54:07 2014 + disable slow solr tests in smoketester git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x@1575025 13f79535-47bb-0310-9956-ffa450edef68 Executing ant clean test from the top level directory of the project shows the tests running but they seems to get stuck in loop with some stalled heartbeat messages. If I run the tests directly from lucene/ then they complete successfully after about 10 minutes. I'm using Java 6 under OS X (10.9.2). $ java -version java version 1.6.0_65 Java(TM) SE Runtime Environment (build 1.6.0_65-b14-462-11M4609) Java HotSpot(TM) 64-Bit Server VM (build 20.65-b04-462, mixed mode) My terminal lists repeating stalled heartbeat messages like so: HEARTBEAT J2 PID(20104@onyx.local): 2014-03-06T16:53:35, stalled for 2111s at: HdfsLockFactoryTest.testBasic HEARTBEAT J0 PID(20106@onyx.local): 2014-03-06T16:53:47, stalled for 2108s at: TestSurroundQueryParser.testQueryParser HEARTBEAT J1 PID(20103@onyx.local): 2014-03-06T16:54:11, stalled for 2167s at: TestRecoveryHdfs.testBuffering HEARTBEAT J3 PID(20105@onyx.local): 2014-03-06T16:54:23, stalled for 2165s at: HdfsDirectoryTest.testEOF My machine does have 3 java processes chewing CPU, see attached jstack dumps for more information. Should I expect the tests to complete on my platform? Do I need to specify any special flags to give them more memory or to avoid any bad apples? Thanks in advance, --Terry - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr
Re: Stalled unit tests
Oops, the second set of timings on the Mid 2012 MacBook Pro were for JUST the solr tests. On Mon, Mar 10, 2014 at 9:31 AM, Terry Smith sheb...@gmail.com wrote: Dawid: Boy, those are some large timeouts! Mike: The build.properties suggestion resolved my issue. I can now run the test to completion. On a Mid 2009 MacBook Pro running Mavericks and using Java 6 executing ant from the top level of the lucene-solr project I get the following timings: ant clean compile -- 3 minutes ant clean test (tests.disableHdfs=true, tests.slow=false) -- 55 minutes ant clean test (tests.disableHdfs=true) -- 88 minutes On a Mid 2012 MacBook Pro with the same software stack: ant clean compile -- 1 minute ant clean test (tests.disableHdfs=true, tests.slow=false) -- 8 minutes All running from the same git commit mentioned at the top of this thread. The tests make great use of multiple CPU/cores so a faster machine makes a huge difference to the total runtime. Do the HDFS tests fail due to test bugs or implementation issues? How do you feel about changing the default value of tests.disableHdfs to true versus updating the wiki documentation to let knew contributors know how to work around this? --Terry On Fri, Mar 7, 2014 at 12:46 PM, Michael McCandless luc...@mikemccandless.com wrote: I just ran ant test under Solr; it took 4 minutes 25 seconds. But, in my ~/build.properties I have: tests.disableHdfs=true tests.slow=false Which makes things substantially faster, and also [seems to] sidestep the Solr tests that false fail. Mike McCandless http://blog.mikemccandless.com On Fri, Mar 7, 2014 at 9:04 AM, Terry Smith sheb...@gmail.com wrote: Mike, Fair enough. I'll let them run for more than 30 minutes and see what happens. How long does it take on your machine? I'm happy to signup for the wiki and add some extra information to http://wiki.apache.org/lucene-java/HowToContribute for folks wanting to tinker with Lucene. Do the Lucene developers typically run a subset of the test suite to make committing cheaper? Thanks, --Terry On Fri, Mar 7, 2014 at 5:52 AM, Michael McCandless luc...@mikemccandless.com wrote: Unfortunately, some tests take a very long time, and the test infra will print these HEARTBEAT messages notifying you that they are still running. They should eventually finish? Mike McCandless http://blog.mikemccandless.com On Thu, Mar 6, 2014 at 5:09 PM, Terry Smith sheb...@gmail.com wrote: I'm sure that I'm just missing something obvious but I'm having trouble getting the unit tests to run to completion on my laptop and was hoping that someone would be kind enough to point me in the right direction. I've cloned the repository from GitHub (http://git.apache.org/lucene-solr.git) and checked out the latest commit on branch_4x. commit 6e06247cec1410f32592bfd307c1020b814def06 Author: Robert Muir rm...@apache.org Date: Thu Mar 6 19:54:07 2014 + disable slow solr tests in smoketester git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x@1575025 13f79535-47bb-0310-9956-ffa450edef68 Executing ant clean test from the top level directory of the project shows the tests running but they seems to get stuck in loop with some stalled heartbeat messages. If I run the tests directly from lucene/ then they complete successfully after about 10 minutes. I'm using Java 6 under OS X (10.9.2). $ java -version java version 1.6.0_65 Java(TM) SE Runtime Environment (build 1.6.0_65-b14-462-11M4609) Java HotSpot(TM) 64-Bit Server VM (build 20.65-b04-462, mixed mode) My terminal lists repeating stalled heartbeat messages like so: HEARTBEAT J2 PID(20104@onyx.local): 2014-03-06T16:53:35, stalled for 2111s at: HdfsLockFactoryTest.testBasic HEARTBEAT J0 PID(20106@onyx.local): 2014-03-06T16:53:47, stalled for 2108s at: TestSurroundQueryParser.testQueryParser HEARTBEAT J1 PID(20103@onyx.local): 2014-03-06T16:54:11, stalled for 2167s at: TestRecoveryHdfs.testBuffering HEARTBEAT J3 PID(20105@onyx.local): 2014-03-06T16:54:23, stalled for 2165s at: HdfsDirectoryTest.testEOF My machine does have 3 java processes chewing CPU, see attached jstack dumps for more information. Should I expect the tests to complete on my platform? Do I need to specify any special flags to give them more memory or to avoid any bad apples? Thanks in advance, --Terry - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
Re: Stalled unit tests
Shalin: That makes sense. Both the machines I used for testing have SSDs. On Mon, Mar 10, 2014 at 9:35 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: In my experience, the test suite is much faster on an SSD. Around 18 minutes on my mac book pro and 12 minutes on my PC for just the Solr tests with -Dtests.slow=true (both have SSDs) On Mon, Mar 10, 2014 at 7:02 PM, Terry Smith sheb...@gmail.com wrote: Oops, the second set of timings on the Mid 2012 MacBook Pro were for JUST the solr tests. On Mon, Mar 10, 2014 at 9:31 AM, Terry Smith sheb...@gmail.com wrote: Dawid: Boy, those are some large timeouts! Mike: The build.properties suggestion resolved my issue. I can now run the test to completion. On a Mid 2009 MacBook Pro running Mavericks and using Java 6 executing ant from the top level of the lucene-solr project I get the following timings: ant clean compile -- 3 minutes ant clean test (tests.disableHdfs=true, tests.slow=false) -- 55 minutes ant clean test (tests.disableHdfs=true) -- 88 minutes On a Mid 2012 MacBook Pro with the same software stack: ant clean compile -- 1 minute ant clean test (tests.disableHdfs=true, tests.slow=false) -- 8 minutes All running from the same git commit mentioned at the top of this thread. The tests make great use of multiple CPU/cores so a faster machine makes a huge difference to the total runtime. Do the HDFS tests fail due to test bugs or implementation issues? How do you feel about changing the default value of tests.disableHdfs to true versus updating the wiki documentation to let knew contributors know how to work around this? --Terry On Fri, Mar 7, 2014 at 12:46 PM, Michael McCandless luc...@mikemccandless.com wrote: I just ran ant test under Solr; it took 4 minutes 25 seconds. But, in my ~/build.properties I have: tests.disableHdfs=true tests.slow=false Which makes things substantially faster, and also [seems to] sidestep the Solr tests that false fail. Mike McCandless http://blog.mikemccandless.com On Fri, Mar 7, 2014 at 9:04 AM, Terry Smith sheb...@gmail.com wrote: Mike, Fair enough. I'll let them run for more than 30 minutes and see what happens. How long does it take on your machine? I'm happy to signup for the wiki and add some extra information to http://wiki.apache.org/lucene-java/HowToContribute for folks wanting to tinker with Lucene. Do the Lucene developers typically run a subset of the test suite to make committing cheaper? Thanks, --Terry On Fri, Mar 7, 2014 at 5:52 AM, Michael McCandless luc...@mikemccandless.com wrote: Unfortunately, some tests take a very long time, and the test infra will print these HEARTBEAT messages notifying you that they are still running. They should eventually finish? Mike McCandless http://blog.mikemccandless.com On Thu, Mar 6, 2014 at 5:09 PM, Terry Smith sheb...@gmail.com wrote: I'm sure that I'm just missing something obvious but I'm having trouble getting the unit tests to run to completion on my laptop and was hoping that someone would be kind enough to point me in the right direction. I've cloned the repository from GitHub (http://git.apache.org/lucene-solr.git) and checked out the latest commit on branch_4x. commit 6e06247cec1410f32592bfd307c1020b814def06 Author: Robert Muir rm...@apache.org Date: Thu Mar 6 19:54:07 2014 + disable slow solr tests in smoketester git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x@1575025 13f79535-47bb-0310-9956-ffa450edef68 Executing ant clean test from the top level directory of the project shows the tests running but they seems to get stuck in loop with some stalled heartbeat messages. If I run the tests directly from lucene/ then they complete successfully after about 10 minutes. I'm using Java 6 under OS X (10.9.2). $ java -version java version 1.6.0_65 Java(TM) SE Runtime Environment (build 1.6.0_65-b14-462-11M4609) Java HotSpot(TM) 64-Bit Server VM (build 20.65-b04-462, mixed mode) My terminal lists repeating stalled heartbeat messages like so: HEARTBEAT J2 PID(20104@onyx.local): 2014-03-06T16:53:35, stalled for 2111s at: HdfsLockFactoryTest.testBasic HEARTBEAT J0 PID(20106@onyx.local): 2014-03-06T16:53:47, stalled for 2108s at: TestSurroundQueryParser.testQueryParser HEARTBEAT J1 PID(20103@onyx.local): 2014-03-06T16:54:11, stalled for 2167s at: TestRecoveryHdfs.testBuffering HEARTBEAT J3 PID(20105@onyx.local): 2014-03-06T16:54:23, stalled for 2165s at: HdfsDirectoryTest.testEOF My machine does have 3 java
Re: Stalled unit tests
Mike, Fair enough. I'll let them run for more than 30 minutes and see what happens. How long does it take on your machine? I'm happy to signup for the wiki and add some extra information to http://wiki.apache.org/lucene-java/HowToContribute for folks wanting to tinker with Lucene. Do the Lucene developers typically run a subset of the test suite to make committing cheaper? Thanks, --Terry On Fri, Mar 7, 2014 at 5:52 AM, Michael McCandless luc...@mikemccandless.com wrote: Unfortunately, some tests take a very long time, and the test infra will print these HEARTBEAT messages notifying you that they are still running. They should eventually finish? Mike McCandless http://blog.mikemccandless.com On Thu, Mar 6, 2014 at 5:09 PM, Terry Smith sheb...@gmail.com wrote: I'm sure that I'm just missing something obvious but I'm having trouble getting the unit tests to run to completion on my laptop and was hoping that someone would be kind enough to point me in the right direction. I've cloned the repository from GitHub (http://git.apache.org/lucene-solr.git) and checked out the latest commit on branch_4x. commit 6e06247cec1410f32592bfd307c1020b814def06 Author: Robert Muir rm...@apache.org Date: Thu Mar 6 19:54:07 2014 + disable slow solr tests in smoketester git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x@1575025 13f79535-47bb-0310-9956-ffa450edef68 Executing ant clean test from the top level directory of the project shows the tests running but they seems to get stuck in loop with some stalled heartbeat messages. If I run the tests directly from lucene/ then they complete successfully after about 10 minutes. I'm using Java 6 under OS X (10.9.2). $ java -version java version 1.6.0_65 Java(TM) SE Runtime Environment (build 1.6.0_65-b14-462-11M4609) Java HotSpot(TM) 64-Bit Server VM (build 20.65-b04-462, mixed mode) My terminal lists repeating stalled heartbeat messages like so: HEARTBEAT J2 PID(20104@onyx.local): 2014-03-06T16:53:35, stalled for 2111s at: HdfsLockFactoryTest.testBasic HEARTBEAT J0 PID(20106@onyx.local): 2014-03-06T16:53:47, stalled for 2108s at: TestSurroundQueryParser.testQueryParser HEARTBEAT J1 PID(20103@onyx.local): 2014-03-06T16:54:11, stalled for 2167s at: TestRecoveryHdfs.testBuffering HEARTBEAT J3 PID(20105@onyx.local): 2014-03-06T16:54:23, stalled for 2165s at: HdfsDirectoryTest.testEOF My machine does have 3 java processes chewing CPU, see attached jstack dumps for more information. Should I expect the tests to complete on my platform? Do I need to specify any special flags to give them more memory or to avoid any bad apples? Thanks in advance, --Terry - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Stalled unit tests
I'm sure that I'm just missing something obvious but I'm having trouble getting the unit tests to run to completion on my laptop and was hoping that someone would be kind enough to point me in the right direction. I've cloned the repository from GitHub ( http://git.apache.org/lucene-solr.git) and checked out the latest commit on branch_4x. commit 6e06247cec1410f32592bfd307c1020b814def06 Author: Robert Muir rm...@apache.org Date: Thu Mar 6 19:54:07 2014 + disable slow solr tests in smoketester git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x@157502513f79535-47bb-0310-9956-ffa450edef68 Executing ant clean test from the top level directory of the project shows the tests running but they seems to get stuck in loop with some stalled heartbeat messages. If I run the tests directly from lucene/ then they complete successfully after about 10 minutes. I'm using Java 6 under OS X (10.9.2). $ java -version java version 1.6.0_65 Java(TM) SE Runtime Environment (build 1.6.0_65-b14-462-11M4609) Java HotSpot(TM) 64-Bit Server VM (build 20.65-b04-462, mixed mode) My terminal lists repeating stalled heartbeat messages like so: HEARTBEAT J2 PID(20104@onyx.local): 2014-03-06T16:53:35, stalled for 2111s at: HdfsLockFactoryTest.testBasic HEARTBEAT J0 PID(20106@onyx.local): 2014-03-06T16:53:47, stalled for 2108s at: TestSurroundQueryParser.testQueryParser HEARTBEAT J1 PID(20103@onyx.local): 2014-03-06T16:54:11, stalled for 2167s at: TestRecoveryHdfs.testBuffering HEARTBEAT J3 PID(20105@onyx.local): 2014-03-06T16:54:23, stalled for 2165s at: HdfsDirectoryTest.testEOF My machine does have 3 java processes chewing CPU, see attached jstack dumps for more information. Should I expect the tests to complete on my platform? Do I need to specify any special flags to give them more memory or to avoid any bad apples? Thanks in advance, --Terry 20103.jstack.txt.gz Description: GNU Zip compressed data 20104.jstack.txt.gz Description: GNU Zip compressed data 20105.jstack.txt.gz Description: GNU Zip compressed data - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org