[jira] [Commented] (LUCENE-9269) Blended queries with boolean rewrite can result in inconstitent scores
[ https://issues.apache.org/jira/browse/LUCENE-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064294#comment-17064294 ] Michele Palmia commented on LUCENE-9269: I have a few questions, please feel free to let me know if they're too dumb: - while testing a solution for adding {{perReaderTermState}} to the current {{TermQuery#equals}} implementation, I found a test that I believe is not doing anything of what it was designed to do - essentially it was rewritten for an only tangentially related change, and it's been working as no-op since (test is [TestMultiTermQueryRewrites#checkBoosts|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/test/org/apache/lucene/search/TestMultiTermQueryRewrites.java#L215], problematic edit was [this|https://github.com/apache/lucene-solr/commit/30807709e663c35f6760084632407dc1bf76aff7#diff-581d1e68f090e657acc327fc90534c51], missing essential {{initialSeekTerm}}). Should I fix it as part of my proposal for this or open a new issue? - What's your opinion on comparing two TermQueries only one of which has a {{perReaderTermState}}? I'd say the're different, but their Weights could ultimately end up using the exact same statistics. - Changing {{equals}} without changing {{toString}} mean errors like {code:java} expected: but was: {code} are possible. That seems to me less of an issue than adding df/ttf to the TermQuery representation. Is that so? > Blended queries with boolean rewrite can result in inconstitent scores > -- > > Key: LUCENE-9269 > URL: https://issues.apache.org/jira/browse/LUCENE-9269 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 8.4 >Reporter: Michele Palmia >Priority: Minor > Attachments: LUCENE-9269-test.patch > > > If two blended queries are should clauses of a boolean query and are built so > that > * some of their terms are the same > * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE > the docFreq for the overlapping terms used for scoring is picked as follow: > # if the overlapping terms are not boosted, the df of the term in the first > blended query is used > # if any of the overlapping terms is boosted, the df is picked at (what > looks like) random. > A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3). > {code:java} > a) > Blended(f:a f:b) Blended (f:a) > df: 3 df: 2 > gets rewritten to: > (f:a)^2.0 (f:b) > df: 3 df:2 > b) > Blended(f:a) Blended(f:a f:b) > df: 2df: 3 > gets rewritten to: > (f:a)^2.0 (f:b) > df: 2 df:2 > c) > Blended(f:a f:b^0.66) Blended (f:a^0.75) > df: 3 df: 2 > gets rewritten to: > (f:a)^1.75 (f:b)^0.66 > df:? df:2 > {code} > with ? either 2 or 3, depending on the run. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8103) QueryValueSource should use TwoPhaseIterator
[ https://issues.apache.org/jira/browse/LUCENE-8103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michele Palmia updated LUCENE-8103: --- Attachment: LUCENE-8103.patch > QueryValueSource should use TwoPhaseIterator > > > Key: LUCENE-8103 > URL: https://issues.apache.org/jira/browse/LUCENE-8103 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/other >Reporter: David Smiley >Priority: Minor > Attachments: LUCENE-8103.patch, LUCENE-8103.patch > > > QueryValueSource (in "queries" module) is a ValueSource representation of a > Query; the score is the value. It ought to try to use a TwoPhaseIterator > from the query if it can be offered. This will prevent possibly expensive > advancing beyond documents that we aren't interested in. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-8103) QueryValueSource should use TwoPhaseIterator
[ https://issues.apache.org/jira/browse/LUCENE-8103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056067#comment-17056067 ] Michele Palmia edited comment on LUCENE-8103 at 3/10/20, 3:37 PM: -- Thanks a lot - i had not fully grasped the approximation mechanism and the {{TPI. asDocIdSetIterator(tpi)}} implementation. I uploaded an updated patch. was (Author: micpalmia): Thanks a lot - i had not grasped the approximation mechanism and the {{TPI. asDocIdSetIterator(tpi)}} implementation. I uploaded an updated patch. > QueryValueSource should use TwoPhaseIterator > > > Key: LUCENE-8103 > URL: https://issues.apache.org/jira/browse/LUCENE-8103 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/other >Reporter: David Smiley >Priority: Minor > Attachments: LUCENE-8103.patch > > > QueryValueSource (in "queries" module) is a ValueSource representation of a > Query; the score is the value. It ought to try to use a TwoPhaseIterator > from the query if it can be offered. This will prevent possibly expensive > advancing beyond documents that we aren't interested in. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8103) QueryValueSource should use TwoPhaseIterator
[ https://issues.apache.org/jira/browse/LUCENE-8103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056067#comment-17056067 ] Michele Palmia commented on LUCENE-8103: Thanks a lot - i had not grasped the approximation mechanism and the {{TPI. asDocIdSetIterator(tpi)}} implementation. I uploaded an updated patch. > QueryValueSource should use TwoPhaseIterator > > > Key: LUCENE-8103 > URL: https://issues.apache.org/jira/browse/LUCENE-8103 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/other >Reporter: David Smiley >Priority: Minor > Attachments: LUCENE-8103.patch > > > QueryValueSource (in "queries" module) is a ValueSource representation of a > Query; the score is the value. It ought to try to use a TwoPhaseIterator > from the query if it can be offered. This will prevent possibly expensive > advancing beyond documents that we aren't interested in. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8103) QueryValueSource should use TwoPhaseIterator
[ https://issues.apache.org/jira/browse/LUCENE-8103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michele Palmia updated LUCENE-8103: --- Attachment: (was: LUCENE-8103.patch) > QueryValueSource should use TwoPhaseIterator > > > Key: LUCENE-8103 > URL: https://issues.apache.org/jira/browse/LUCENE-8103 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/other >Reporter: David Smiley >Priority: Minor > Attachments: LUCENE-8103.patch > > > QueryValueSource (in "queries" module) is a ValueSource representation of a > Query; the score is the value. It ought to try to use a TwoPhaseIterator > from the query if it can be offered. This will prevent possibly expensive > advancing beyond documents that we aren't interested in. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8103) QueryValueSource should use TwoPhaseIterator
[ https://issues.apache.org/jira/browse/LUCENE-8103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michele Palmia updated LUCENE-8103: --- Attachment: LUCENE-8103.patch > QueryValueSource should use TwoPhaseIterator > > > Key: LUCENE-8103 > URL: https://issues.apache.org/jira/browse/LUCENE-8103 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/other >Reporter: David Smiley >Priority: Minor > Attachments: LUCENE-8103.patch > > > QueryValueSource (in "queries" module) is a ValueSource representation of a > Query; the score is the value. It ought to try to use a TwoPhaseIterator > from the query if it can be offered. This will prevent possibly expensive > advancing beyond documents that we aren't interested in. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-8674) UnsupportedOperationException due to call to o.a.l.q.f.FunctionValues.floatVal
[ https://issues.apache.org/jira/browse/LUCENE-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055990#comment-17055990 ] Michele Palmia edited comment on LUCENE-8674 at 3/10/20, 2:20 PM: -- The problematic query ( {{?fq=\{!frange l=10 u=100}or_version_s,directed_by}} ) specifies two value sources separated by a comma ({{or_version_s,directed_by}}). These are parsed as a {{VectorValueSource}} embedding the two individual ValueSources corresponding to the two fields (see [FunctionQParser.java:115|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/FunctionQParser.java#L115]). was (Author: micpalmia): The problematic query ( {{?fq={!frange%20l=10%20u=100}or_version_s,directed_by}} ) specifies two value sources separated by a comma ({{or_version_s,directed_by}}). These are parsed as a {{VectorValueSource}} embedding the two individual ValueSources corresponding to the two fields (see [FunctionQParser.java:115|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/FunctionQParser.java#L115]). > UnsupportedOperationException due to call to o.a.l.q.f.FunctionValues.floatVal > -- > > Key: LUCENE-8674 > URL: https://issues.apache.org/jira/browse/LUCENE-8674 > Project: Lucene - Core > Issue Type: Bug > Components: core/query/scoring >Affects Versions: master (9.0) > Environment: h1. Steps to reproduce > * Use a Linux machine. > * Build commit {{ea2c8ba}} of Solr as described in the section below. > * Build the films collection as described below. > * Start the server using the command {{./bin/solr start -f -p 8983 -s > /tmp/home}} > * Request the URL given in the bug description. > h1. Compiling the server > {noformat} > git clone https://github.com/apache/lucene-solr > cd lucene-solr > git checkout ea2c8ba > ant compile > cd solr > ant server > {noformat} > h1. Building the collection and reproducing the bug > We followed [Exercise > 2|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html#exercise-2] from > the [Solr > Tutorial|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html]. > {noformat} > mkdir -p /tmp/home > echo '' > > /tmp/home/solr.xml > {noformat} > In one terminal start a Solr instance in foreground: > {noformat} > ./bin/solr start -f -p 8983 -s /tmp/home > {noformat} > In another terminal, create a collection of movies, with no shards and no > replication, and initialize it: > {noformat} > bin/solr create -c films > curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": > {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' > http://localhost:8983/solr/films/schema > curl -X POST -H 'Content-type:application/json' --data-binary > '{"add-copy-field" : {"source":"*","dest":"_text_"}}' > http://localhost:8983/solr/films/schema > ./bin/post -c films example/films/films.json > curl -v “URL_BUG” > {noformat} > Please check the issue description below to find the “URL_BUG” that will > allow you to reproduce the issue reported. >Reporter: Johannes Kloos >Priority: Minor > Labels: diffblue, newdev > > Requesting the following URL causes Solr to return an HTTP 500 error response: > {noformat} > http://localhost:8983/solr/films/select?fq={!frange%20l=10%20u=100}or_version_s,directed_by > {noformat} > The error response seems to be caused by the following uncaught exception: > {noformat} > java.lang.UnsupportedOperationException > at > org.apache.lucene.queries.function.FunctionValues.floatVal(FunctionValues.java:47) > at > org.apache.lucene.queries.function.FunctionValues$3.matches(FunctionValues.java:188) > at > org.apache.lucene.queries.function.ValueSourceScorer$1.matches(ValueSourceScorer.java:53) > at > org.apache.lucene.search.TwoPhaseIterator$TwoPhaseIteratorAsDocIdSetIterator.doNext(TwoPhaseIterator.java:89) > at > org.apache.lucene.search.TwoPhaseIterator$TwoPhaseIteratorAsDocIdSetIterator.nextDoc(TwoPhaseIterator.java:77) > at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:261) > at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:214) > at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39) > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:652) > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:443) > at org.apache.solr.search.DocSetUtil.createDocSetGeneric(DocSetUtil.java:151) > at org.apache.solr.search.DocSetUtil.createDocSet(DocSetUtil.java:140) > at > org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:1177) > at > org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:817) > at >
[jira] [Commented] (LUCENE-8674) UnsupportedOperationException due to call to o.a.l.q.f.FunctionValues.floatVal
[ https://issues.apache.org/jira/browse/LUCENE-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055990#comment-17055990 ] Michele Palmia commented on LUCENE-8674: The problematic query ( {{?fq={!frange%20l=10%20u=100}or_version_s,directed_by}} ) specifies two value sources separated by a comma ({{or_version_s,directed_by}}). These are parsed as a {{VectorValueSource}} embedding the two individual ValueSources corresponding to the two fields (see [FunctionQParser.java:115|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/FunctionQParser.java#L115]). > UnsupportedOperationException due to call to o.a.l.q.f.FunctionValues.floatVal > -- > > Key: LUCENE-8674 > URL: https://issues.apache.org/jira/browse/LUCENE-8674 > Project: Lucene - Core > Issue Type: Bug > Components: core/query/scoring >Affects Versions: master (9.0) > Environment: h1. Steps to reproduce > * Use a Linux machine. > * Build commit {{ea2c8ba}} of Solr as described in the section below. > * Build the films collection as described below. > * Start the server using the command {{./bin/solr start -f -p 8983 -s > /tmp/home}} > * Request the URL given in the bug description. > h1. Compiling the server > {noformat} > git clone https://github.com/apache/lucene-solr > cd lucene-solr > git checkout ea2c8ba > ant compile > cd solr > ant server > {noformat} > h1. Building the collection and reproducing the bug > We followed [Exercise > 2|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html#exercise-2] from > the [Solr > Tutorial|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html]. > {noformat} > mkdir -p /tmp/home > echo '' > > /tmp/home/solr.xml > {noformat} > In one terminal start a Solr instance in foreground: > {noformat} > ./bin/solr start -f -p 8983 -s /tmp/home > {noformat} > In another terminal, create a collection of movies, with no shards and no > replication, and initialize it: > {noformat} > bin/solr create -c films > curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": > {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' > http://localhost:8983/solr/films/schema > curl -X POST -H 'Content-type:application/json' --data-binary > '{"add-copy-field" : {"source":"*","dest":"_text_"}}' > http://localhost:8983/solr/films/schema > ./bin/post -c films example/films/films.json > curl -v “URL_BUG” > {noformat} > Please check the issue description below to find the “URL_BUG” that will > allow you to reproduce the issue reported. >Reporter: Johannes Kloos >Priority: Minor > Labels: diffblue, newdev > > Requesting the following URL causes Solr to return an HTTP 500 error response: > {noformat} > http://localhost:8983/solr/films/select?fq={!frange%20l=10%20u=100}or_version_s,directed_by > {noformat} > The error response seems to be caused by the following uncaught exception: > {noformat} > java.lang.UnsupportedOperationException > at > org.apache.lucene.queries.function.FunctionValues.floatVal(FunctionValues.java:47) > at > org.apache.lucene.queries.function.FunctionValues$3.matches(FunctionValues.java:188) > at > org.apache.lucene.queries.function.ValueSourceScorer$1.matches(ValueSourceScorer.java:53) > at > org.apache.lucene.search.TwoPhaseIterator$TwoPhaseIteratorAsDocIdSetIterator.doNext(TwoPhaseIterator.java:89) > at > org.apache.lucene.search.TwoPhaseIterator$TwoPhaseIteratorAsDocIdSetIterator.nextDoc(TwoPhaseIterator.java:77) > at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:261) > at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:214) > at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39) > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:652) > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:443) > at org.apache.solr.search.DocSetUtil.createDocSetGeneric(DocSetUtil.java:151) > at org.apache.solr.search.DocSetUtil.createDocSet(DocSetUtil.java:140) > at > org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:1177) > at > org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:817) > at > org.apache.solr.search.SolrIndexSearcher.getProcessedFilter(SolrIndexSearcher.java:1025) > at > org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1540) > at > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1420) > at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:567) > at > org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryComponent.java:1434) > {noformat} > Sadly, I can't understand the logic of this code well enough to
[jira] [Comment Edited] (LUCENE-9269) Blended queries with boolean rewrite can result in inconstitent scores
[ https://issues.apache.org/jira/browse/LUCENE-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055927#comment-17055927 ] Michele Palmia edited comment on LUCENE-9269 at 3/10/20, 1:15 PM: -- I was actually just looking at a [user report|https://mail-archives.apache.org/mod_mbox/lucene-dev/202003.mbox/%3CCALyzSEn%2BQFoT3MpNYkxw-dEK9jc59mSTvXqccuUVMMDAgOMMmA%40mail.gmail.com%3E] that came to lucene-dev and looked interesting. In their use case, they were using fuzzy queries, that in turn generate blended queries that are affected by this issue. Maybe users of BlendedQuery/FuzzyQuery should be able to find some form of warning in the docs (while [LUCENE-8840|https://issues.apache.org/jira/browse/LUCENE-8840] is not fixed)? was (Author: micpalmia): I was actually just looking at a [user report|https://mail-archives.apache.org/mod_mbox/lucene-dev/202003.mbox/%3CCALyzSEn%2BQFoT3MpNYkxw-dEK9jc59mSTvXqccuUVMMDAgOMMmA%40mail.gmail.com%3E] that came to lucene-dev and looked interesting. In their use case, they were using fuzzy queries, that in turn generate blended queries that are affected by this issue. Maybe users of BlendedQuery/FuzzyQuery should be able to find some form of warning in the docs? > Blended queries with boolean rewrite can result in inconstitent scores > -- > > Key: LUCENE-9269 > URL: https://issues.apache.org/jira/browse/LUCENE-9269 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 8.4 >Reporter: Michele Palmia >Priority: Minor > Attachments: LUCENE-9269-test.patch > > > If two blended queries are should clauses of a boolean query and are built so > that > * some of their terms are the same > * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE > the docFreq for the overlapping terms used for scoring is picked as follow: > # if the overlapping terms are not boosted, the df of the term in the first > blended query is used > # if any of the overlapping terms is boosted, the df is picked at (what > looks like) random. > A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3). > {code:java} > a) > Blended(f:a f:b) Blended (f:a) > df: 3 df: 2 > gets rewritten to: > (f:a)^2.0 (f:b) > df: 3 df:2 > b) > Blended(f:a) Blended(f:a f:b) > df: 2df: 3 > gets rewritten to: > (f:a)^2.0 (f:b) > df: 2 df:2 > c) > Blended(f:a f:b^0.66) Blended (f:a^0.75) > df: 3 df: 2 > gets rewritten to: > (f:a)^1.75 (f:b)^0.66 > df:? df:2 > {code} > with ? either 2 or 3, depending on the run. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9269) Blended queries with boolean rewrite can result in inconstitent scores
[ https://issues.apache.org/jira/browse/LUCENE-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055927#comment-17055927 ] Michele Palmia edited comment on LUCENE-9269 at 3/10/20, 1:07 PM: -- I was actually just looking at a [user report|https://mail-archives.apache.org/mod_mbox/lucene-dev/202003.mbox/%3CCALyzSEn%2BQFoT3MpNYkxw-dEK9jc59mSTvXqccuUVMMDAgOMMmA%40mail.gmail.com%3E] that came to lucene-dev and looked interesting. In their use case, they were using fuzzy queries, that in turn generate blended queries that are affected by this issue. Maybe users of BlendedQuery/FuzzyQuery should be able to find some form of warning in the docs? was (Author: micpalmia): I was actually just looking at a [user report|https://mail-archives.apache.org/mod_mbox/lucene-dev/202003.mbox/browser] that came to lucene-dev and looked interesting. In their use case, they were using fuzzy queries, that in turn generate blended queries that are affected by this issue. Maybe users of BlendedQuery/FuzzyQuery should be able to find some form of warning in the docs? > Blended queries with boolean rewrite can result in inconstitent scores > -- > > Key: LUCENE-9269 > URL: https://issues.apache.org/jira/browse/LUCENE-9269 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 8.4 >Reporter: Michele Palmia >Priority: Minor > Attachments: LUCENE-9269-test.patch > > > If two blended queries are should clauses of a boolean query and are built so > that > * some of their terms are the same > * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE > the docFreq for the overlapping terms used for scoring is picked as follow: > # if the overlapping terms are not boosted, the df of the term in the first > blended query is used > # if any of the overlapping terms is boosted, the df is picked at (what > looks like) random. > A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3). > {code:java} > a) > Blended(f:a f:b) Blended (f:a) > df: 3 df: 2 > gets rewritten to: > (f:a)^2.0 (f:b) > df: 3 df:2 > b) > Blended(f:a) Blended(f:a f:b) > df: 2df: 3 > gets rewritten to: > (f:a)^2.0 (f:b) > df: 2 df:2 > c) > Blended(f:a f:b^0.66) Blended (f:a^0.75) > df: 3 df: 2 > gets rewritten to: > (f:a)^1.75 (f:b)^0.66 > df:? df:2 > {code} > with ? either 2 or 3, depending on the run. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9269) Blended queries with boolean rewrite can result in inconstitent scores
[ https://issues.apache.org/jira/browse/LUCENE-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055927#comment-17055927 ] Michele Palmia edited comment on LUCENE-9269 at 3/10/20, 1:05 PM: -- I was actually just looking at a [user report|https://mail-archives.apache.org/mod_mbox/lucene-dev/202003.mbox/browser] that came to lucene-dev and looked interesting. In their use case, they were using fuzzy queries, that in turn generate blended queries that are affected by this issue. Maybe users of BlendedQuery/FuzzyQuery should be able to find some form of warning in the docs? was (Author: micpalmia): I was actually just looking at a [user report|[https://mail-archives.apache.org/mod_mbox/lucene-dev/202003.mbox/browser]] that came to lucene-dev and looked interesting. In their use case, they were using fuzzy queries, that in turn generate blended queries that are affected by this issue. Maybe users of BlendedQuery/FuzzyQuery should be able to find some form of warning in the docs? > Blended queries with boolean rewrite can result in inconstitent scores > -- > > Key: LUCENE-9269 > URL: https://issues.apache.org/jira/browse/LUCENE-9269 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 8.4 >Reporter: Michele Palmia >Priority: Minor > Attachments: LUCENE-9269-test.patch > > > If two blended queries are should clauses of a boolean query and are built so > that > * some of their terms are the same > * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE > the docFreq for the overlapping terms used for scoring is picked as follow: > # if the overlapping terms are not boosted, the df of the term in the first > blended query is used > # if any of the overlapping terms is boosted, the df is picked at (what > looks like) random. > A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3). > {code:java} > a) > Blended(f:a f:b) Blended (f:a) > df: 3 df: 2 > gets rewritten to: > (f:a)^2.0 (f:b) > df: 3 df:2 > b) > Blended(f:a) Blended(f:a f:b) > df: 2df: 3 > gets rewritten to: > (f:a)^2.0 (f:b) > df: 2 df:2 > c) > Blended(f:a f:b^0.66) Blended (f:a^0.75) > df: 3 df: 2 > gets rewritten to: > (f:a)^1.75 (f:b)^0.66 > df:? df:2 > {code} > with ? either 2 or 3, depending on the run. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9269) Blended queries with boolean rewrite can result in inconstitent scores
[ https://issues.apache.org/jira/browse/LUCENE-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055927#comment-17055927 ] Michele Palmia commented on LUCENE-9269: I was actually just looking at a [user report|[https://mail-archives.apache.org/mod_mbox/lucene-dev/202003.mbox/browser]] that came to lucene-dev and looked interesting. In their use case, they were using fuzzy queries, that in turn generate blended queries that are affected by this issue. Maybe users of BlendedQuery/FuzzyQuery should be able to find some form of warning in the docs? > Blended queries with boolean rewrite can result in inconstitent scores > -- > > Key: LUCENE-9269 > URL: https://issues.apache.org/jira/browse/LUCENE-9269 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 8.4 >Reporter: Michele Palmia >Priority: Minor > Attachments: LUCENE-9269-test.patch > > > If two blended queries are should clauses of a boolean query and are built so > that > * some of their terms are the same > * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE > the docFreq for the overlapping terms used for scoring is picked as follow: > # if the overlapping terms are not boosted, the df of the term in the first > blended query is used > # if any of the overlapping terms is boosted, the df is picked at (what > looks like) random. > A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3). > {code:java} > a) > Blended(f:a f:b) Blended (f:a) > df: 3 df: 2 > gets rewritten to: > (f:a)^2.0 (f:b) > df: 3 df:2 > b) > Blended(f:a) Blended(f:a f:b) > df: 2df: 3 > gets rewritten to: > (f:a)^2.0 (f:b) > df: 2 df:2 > c) > Blended(f:a f:b^0.66) Blended (f:a^0.75) > df: 3 df: 2 > gets rewritten to: > (f:a)^1.75 (f:b)^0.66 > df:? df:2 > {code} > with ? either 2 or 3, depending on the run. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9269) Blended queries with boolean rewrite can result in inconstitent scores
[ https://issues.apache.org/jira/browse/LUCENE-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055891#comment-17055891 ] Michele Palmia edited comment on LUCENE-9269 at 3/10/20, 12:57 PM: --- I added a very simple test (with my very limited Lucene testing skills) that emulates example c) above and checks for the score of the top document. As there is no "right" score, I just check for one of the two possible scores and have the test fail on the other. I'm having a hard time wrapping my head around what the right behavior should be in this case (and thus coming up with a more sensible test and fix). In case that's useful, I should probably add that the randomness in the scoring behavior is due to the HashMap underlying MultiSet: when should clauses are processed for deduplication, they're served in an arbitrary order (see [BooleanQuery.java:370|[https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java#L370]]) was (Author: micpalmia): I added a very simple test (with my very limited Lucene testing skills) that simply emulates example c) above and checks for the score of the top document. As there is no "right" score, I just check for one of the two possible scores and have the test fail on the other. I'm having a hard time wrapping my head around what the right behavior should be in this case (and thus coming up with a more sensible test and fix). > Blended queries with boolean rewrite can result in inconstitent scores > -- > > Key: LUCENE-9269 > URL: https://issues.apache.org/jira/browse/LUCENE-9269 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 8.4 >Reporter: Michele Palmia >Priority: Minor > Attachments: LUCENE-9269-test.patch > > > If two blended queries are should clauses of a boolean query and are built so > that > * some of their terms are the same > * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE > the docFreq for the overlapping terms used for scoring is picked as follow: > # if the overlapping terms are not boosted, the df of the term in the first > blended query is used > # if any of the overlapping terms is boosted, the df is picked at (what > looks like) random. > A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3). > {code:java} > a) > Blended(f:a f:b) Blended (f:a) > df: 3 df: 2 > gets rewritten to: > (f:a)^2.0 (f:b) > df: 3 df:2 > b) > Blended(f:a) Blended(f:a f:b) > df: 2df: 3 > gets rewritten to: > (f:a)^2.0 (f:b) > df: 2 df:2 > c) > Blended(f:a f:b^0.66) Blended (f:a^0.75) > df: 3 df: 2 > gets rewritten to: > (f:a)^1.75 (f:b)^0.66 > df:? df:2 > {code} > with ? either 2 or 3, depending on the run. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9269) Blended queries with boolean rewrite can result in inconstitent scores
[ https://issues.apache.org/jira/browse/LUCENE-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055891#comment-17055891 ] Michele Palmia commented on LUCENE-9269: I added a very simple test (with my very limited Lucene testing skills) that simply emulates example c) above and checks for the score of the top document. As there is no "right" score, I just check for one of the two possible scores and have the test fail on the other. I'm having a hard time wrapping my head around what the right behavior should be in this case (and thus coming up with a more sensible test and fix). > Blended queries with boolean rewrite can result in inconstitent scores > -- > > Key: LUCENE-9269 > URL: https://issues.apache.org/jira/browse/LUCENE-9269 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 8.4 >Reporter: Michele Palmia >Priority: Minor > Attachments: LUCENE-9269-test.patch > > > If two blended queries are should clauses of a boolean query and are built so > that > * some of their terms are the same > * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE > the docFreq for the overlapping terms used for scoring is picked as follow: > # if the overlapping terms are not boosted, the df of the term in the first > blended query is used > # if any of the overlapping terms is boosted, the df is picked at (what > looks like) random. > A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3). > {code:java} > a) > Blended(f:a f:b) Blended (f:a) > df: 3 df: 2 > gets rewritten to: > (f:a)^2.0 (f:b) > df: 3 df:2 > b) > Blended(f:a) Blended(f:a f:b) > df: 2df: 3 > gets rewritten to: > (f:a)^2.0 (f:b) > df: 2 df:2 > c) > Blended(f:a f:b^0.66) Blended (f:a^0.75) > df: 3 df: 2 > gets rewritten to: > (f:a)^1.75 (f:b)^0.66 > df:? df:2 > {code} > with ? either 2 or 3, depending on the run. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9269) Blended queries with boolean rewrite can result in inconstitent scores
[ https://issues.apache.org/jira/browse/LUCENE-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michele Palmia updated LUCENE-9269: --- Description: If two blended queries are should clauses of a boolean query and are built so that * some of their terms are the same * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE the docFreq for the overlapping terms used for scoring is picked as follow: * if the overlapping terms are not boosted, the df of the term in the first blended query is used * if any of the overlapping terms is boosted, the df is picked at (what looks like) random. A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3). {code:java} 1. Blended(f:a f:b) Blended (f:a) df: 3 df: 2 gets rewritten to: (f:a)^2.0 (f:b) df: 3 df:2 Blended(f:a) Blended(f:a f:b) df: 2df: 3 gets rewritten to: (f:a)^2.0 (f:b) df: 2 df:2 Blended(f:a f:b^0.66) Blended (f:a^0.75) df: 3 df: 2 gets rewritten to: (f:a)^1.75 (f:b)^0.66 df:? df:2 {code} with ? either 2 or 3, depending on the run. was: If two blended queries are built so that * some of their terms are the same * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE the docFreq for the overlapping terms used for scoring is picked as follow: * if the overlapping terms are not boosted, the df of the term in the first blended query is used * if any of the overlapping terms is boosted, the df is picked at (what looks like) random. A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3). {code:java} 1. Blended(f:a f:b) Blended (f:a) df: 3 df: 2 gets rewritten to: (f:a)^2.0 (f:b) df: 3 df:2 Blended(f:a) Blended(f:a f:b) df: 2df: 3 gets rewritten to: (f:a)^2.0 (f:b) df: 2 df:2 Blended(f:a f:b^0.66) Blended (f:a^0.75) df: 3 df: 2 gets rewritten to: (f:a)^1.75 (f:b)^0.66 df:? df:2 {code} with ? either 2 or 3, depending on the run. > Blended queries with boolean rewrite can result in inconstitent scores > -- > > Key: LUCENE-9269 > URL: https://issues.apache.org/jira/browse/LUCENE-9269 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 8.4 >Reporter: Michele Palmia >Priority: Minor > > If two blended queries are should clauses of a boolean query and are built so > that > * some of their terms are the same > * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE > the docFreq for the overlapping terms used for scoring is picked as follow: > * if the overlapping terms are not boosted, the df of the term in the first > blended query is used > * if any of the overlapping terms is boosted, the df is picked at (what > looks like) random. > A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3). > {code:java} > 1. > Blended(f:a f:b) Blended (f:a) > df: 3 df: 2 > gets rewritten to: > (f:a)^2.0 (f:b) > df: 3 df:2 > Blended(f:a) Blended(f:a f:b) > df: 2df: 3 > gets rewritten to: > (f:a)^2.0 (f:b) > df: 2 df:2 > Blended(f:a f:b^0.66) Blended (f:a^0.75) > df: 3 df: 2 > gets rewritten to: > (f:a)^1.75 (f:b)^0.66 > df:? df:2 > {code} > with ? either 2 or 3, depending on the run. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9269) Blended queries with boolean rewrite can result in inconstitent scores
[ https://issues.apache.org/jira/browse/LUCENE-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michele Palmia updated LUCENE-9269: --- Description: If two blended queries are should clauses of a boolean query and are built so that * some of their terms are the same * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE the docFreq for the overlapping terms used for scoring is picked as follow: # if the overlapping terms are not boosted, the df of the term in the first blended query is used # if any of the overlapping terms is boosted, the df is picked at (what looks like) random. A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3). {code:java} a) Blended(f:a f:b) Blended (f:a) df: 3 df: 2 gets rewritten to: (f:a)^2.0 (f:b) df: 3 df:2 b) Blended(f:a) Blended(f:a f:b) df: 2df: 3 gets rewritten to: (f:a)^2.0 (f:b) df: 2 df:2 c) Blended(f:a f:b^0.66) Blended (f:a^0.75) df: 3 df: 2 gets rewritten to: (f:a)^1.75 (f:b)^0.66 df:? df:2 {code} with ? either 2 or 3, depending on the run. was: If two blended queries are should clauses of a boolean query and are built so that * some of their terms are the same * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE the docFreq for the overlapping terms used for scoring is picked as follow: * if the overlapping terms are not boosted, the df of the term in the first blended query is used * if any of the overlapping terms is boosted, the df is picked at (what looks like) random. A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3). {code:java} 1. Blended(f:a f:b) Blended (f:a) df: 3 df: 2 gets rewritten to: (f:a)^2.0 (f:b) df: 3 df:2 Blended(f:a) Blended(f:a f:b) df: 2df: 3 gets rewritten to: (f:a)^2.0 (f:b) df: 2 df:2 Blended(f:a f:b^0.66) Blended (f:a^0.75) df: 3 df: 2 gets rewritten to: (f:a)^1.75 (f:b)^0.66 df:? df:2 {code} with ? either 2 or 3, depending on the run. > Blended queries with boolean rewrite can result in inconstitent scores > -- > > Key: LUCENE-9269 > URL: https://issues.apache.org/jira/browse/LUCENE-9269 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 8.4 >Reporter: Michele Palmia >Priority: Minor > > If two blended queries are should clauses of a boolean query and are built so > that > * some of their terms are the same > * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE > the docFreq for the overlapping terms used for scoring is picked as follow: > # if the overlapping terms are not boosted, the df of the term in the first > blended query is used > # if any of the overlapping terms is boosted, the df is picked at (what > looks like) random. > A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3). > {code:java} > a) > Blended(f:a f:b) Blended (f:a) > df: 3 df: 2 > gets rewritten to: > (f:a)^2.0 (f:b) > df: 3 df:2 > b) > Blended(f:a) Blended(f:a f:b) > df: 2df: 3 > gets rewritten to: > (f:a)^2.0 (f:b) > df: 2 df:2 > c) > Blended(f:a f:b^0.66) Blended (f:a^0.75) > df: 3 df: 2 > gets rewritten to: > (f:a)^1.75 (f:b)^0.66 > df:? df:2 > {code} > with ? either 2 or 3, depending on the run. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9258) DocTermsIndexDocValues should not assume it's operating on a SortedDocValues field
[ https://issues.apache.org/jira/browse/LUCENE-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054892#comment-17054892 ] Michele Palmia commented on LUCENE-9258: I found this out while playing with function range queries on Solr (was trying to figure out [LUCENE-8674|https://issues.apache.org/jira/browse/LUCENE-8674]). Trying to perform a range query on a field indexed as a String (and multi-valued set to true, that I believe to be the default) failed with a NullPointerException. The way i see it, this fixes range scoring (and thus Solr range queries) on multi-valued string fields. > DocTermsIndexDocValues should not assume it's operating on a SortedDocValues > field > -- > > Key: LUCENE-9258 > URL: https://issues.apache.org/jira/browse/LUCENE-9258 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 7.7.2, 8.4 >Reporter: Michele Palmia >Assignee: David Smiley >Priority: Minor > Attachments: LUCENE-9258.patch > > > When requesting a new _ValueSourceScorer_ (with _getRangeScorer_) from > _DocTermsIndexDocValues_ , the latter instantiates a new iterator on > _SortedDocValues_ regardless of the fact that the underlying field can > actually be of a different type (e.g. a _SortedSetDocValues_ processed > through a _SortedSetSelector_). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8103) QueryValueSource should use TwoPhaseIterator
[ https://issues.apache.org/jira/browse/LUCENE-8103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michele Palmia updated LUCENE-8103: --- Attachment: LUCENE-8103.patch > QueryValueSource should use TwoPhaseIterator > > > Key: LUCENE-8103 > URL: https://issues.apache.org/jira/browse/LUCENE-8103 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/other >Reporter: David Smiley >Priority: Minor > Attachments: LUCENE-8103.patch > > > QueryValueSource (in "queries" module) is a ValueSource representation of a > Query; the score is the value. It ought to try to use a TwoPhaseIterator > from the query if it can be offered. This will prevent possibly expensive > advancing beyond documents that we aren't interested in. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9258) DocTermsIndexDocValues should not assume it's operating on a SortedDocValues field
[ https://issues.apache.org/jira/browse/LUCENE-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michele Palmia updated LUCENE-9258: --- Affects Version/s: 7.7.2 > DocTermsIndexDocValues should not assume it's operating on a SortedDocValues > field > -- > > Key: LUCENE-9258 > URL: https://issues.apache.org/jira/browse/LUCENE-9258 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 7.7.2, 8.4 >Reporter: Michele Palmia >Priority: Minor > Attachments: LUCENE-9258.patch > > > When requesting a new _ValueSourceScorer_ (with _getRangeScorer_) from > _DocTermsIndexDocValues_ , the latter instantiates a new iterator on > _SortedDocValues_ regardless of the fact that the underlying field can > actually be of a different type (e.g. a _SortedSetDocValues_ processed > through a _SortedSetSelector_). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8849) DocValuesRewriteMethod.visit should visit the MTQ
[ https://issues.apache.org/jira/browse/LUCENE-8849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17050461#comment-17050461 ] Michele Palmia commented on LUCENE-8849: No use case, I'm studying Lucene, found the issue and used it to learn how the query visiting system works. Failed at learning how they're normally tested though! :) > DocValuesRewriteMethod.visit should visit the MTQ > - > > Key: LUCENE-8849 > URL: https://issues.apache.org/jira/browse/LUCENE-8849 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Reporter: David Smiley >Priority: Minor > Attachments: LUCENE-8849.patch > > > The DocValuesRewriteMethod implements the QueryVisitor API (visit method) in > a way that surprises me. It does not visit the wrapped MTQ query. Shouldn't > it? Here is what I think it should do, similar to other query wrappers: > {code:java} > @Override > public void visit(QueryVisitor visitor) { > query.visit(visitor.getSubVisitor(BooleanClause.Occur.MUST, this)); > } > {code} > CC [~romseygeek] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8849) DocValuesRewriteMethod.visit should visit the MTQ
[ https://issues.apache.org/jira/browse/LUCENE-8849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17050460#comment-17050460 ] Michele Palmia commented on LUCENE-8849: I added a silly one-line patch to fix this. I chose FILTER instead of MUST as the rewritten query is really a filter. I tried to add a test but failed badly - _TestBooleanQuery_ for instance has a _testQueryVisitor()_ method that only tests the correctness of the visit, not whether the visit actually takes place at all. I would have tested this by mocking the _QueryVisitor_, but from what I could gather _Mockito_ is not available in core. Any suggestion on how to test this would be a great help! > DocValuesRewriteMethod.visit should visit the MTQ > - > > Key: LUCENE-8849 > URL: https://issues.apache.org/jira/browse/LUCENE-8849 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Reporter: David Smiley >Priority: Minor > Attachments: LUCENE-8849.patch > > > The DocValuesRewriteMethod implements the QueryVisitor API (visit method) in > a way that surprises me. It does not visit the wrapped MTQ query. Shouldn't > it? Here is what I think it should do, similar to other query wrappers: > {code:java} > @Override > public void visit(QueryVisitor visitor) { > query.visit(visitor.getSubVisitor(BooleanClause.Occur.MUST, this)); > } > {code} > CC [~romseygeek] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8849) DocValuesRewriteMethod.visit should visit the MTQ
[ https://issues.apache.org/jira/browse/LUCENE-8849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michele Palmia updated LUCENE-8849: --- Attachment: LUCENE-8849.patch > DocValuesRewriteMethod.visit should visit the MTQ > - > > Key: LUCENE-8849 > URL: https://issues.apache.org/jira/browse/LUCENE-8849 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Reporter: David Smiley >Priority: Minor > Attachments: LUCENE-8849.patch > > > The DocValuesRewriteMethod implements the QueryVisitor API (visit method) in > a way that surprises me. It does not visit the wrapped MTQ query. Shouldn't > it? Here is what I think it should do, similar to other query wrappers: > {code:java} > @Override > public void visit(QueryVisitor visitor) { > query.visit(visitor.getSubVisitor(BooleanClause.Occur.MUST, this)); > } > {code} > CC [~romseygeek] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9258) DocTermsIndexDocValues should not assume it's operating on a SortedDocValues field
[ https://issues.apache.org/jira/browse/LUCENE-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michele Palmia updated LUCENE-9258: --- Status: Patch Available (was: Open) > DocTermsIndexDocValues should not assume it's operating on a SortedDocValues > field > -- > > Key: LUCENE-9258 > URL: https://issues.apache.org/jira/browse/LUCENE-9258 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 8.4 >Reporter: Michele Palmia >Priority: Minor > Attachments: LUCENE-9258.patch > > > When requesting a new _ValueSourceScorer_ (with _getRangeScorer_) from > _DocTermsIndexDocValues_ , the latter instantiates a new iterator on > _SortedDocValues_ regardless of the fact that the underlying field can > actually be of a different type (e.g. a _SortedSetDocValues_ processed > through a _SortedSetSelector_). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9258) DocTermsIndexDocValues should not assume it's operating on a SortedDocValues field
[ https://issues.apache.org/jira/browse/LUCENE-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049738#comment-17049738 ] Michele Palmia commented on LUCENE-9258: I added a patch with the fix together with a(n addition to a) test that fails with the current implementation. Any advice on improving the testing would be greatly appreciated (is it ok to test the Scorer independently? Should I mock the Weight?). > DocTermsIndexDocValues should not assume it's operating on a SortedDocValues > field > -- > > Key: LUCENE-9258 > URL: https://issues.apache.org/jira/browse/LUCENE-9258 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 8.4 >Reporter: Michele Palmia >Priority: Minor > Attachments: LUCENE-9258.patch > > > When requesting a new _ValueSourceScorer_ (with _getRangeScorer_) from > _DocTermsIndexDocValues_ , the latter instantiates a new iterator on > _SortedDocValues_ regardless of the fact that the underlying field can > actually be of a different type (e.g. a _SortedSetDocValues_ processed > through a _SortedSetSelector_). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9258) DocTermsIndexDocValues should not assume it's operating on a SortedDocValues field
[ https://issues.apache.org/jira/browse/LUCENE-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michele Palmia updated LUCENE-9258: --- Attachment: LUCENE-9258.patch Lucene Fields: New,Patch Available (was: New) Review Patch?: Yes > DocTermsIndexDocValues should not assume it's operating on a SortedDocValues > field > -- > > Key: LUCENE-9258 > URL: https://issues.apache.org/jira/browse/LUCENE-9258 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 8.4 >Reporter: Michele Palmia >Priority: Minor > Attachments: LUCENE-9258.patch > > > When requesting a new _ValueSourceScorer_ (with _getRangeScorer_) from > _DocTermsIndexDocValues_ , the latter instantiates a new iterator on > _SortedDocValues_ regardless of the fact that the underlying field can > actually be of a different type (e.g. a _SortedSetDocValues_ processed > through a _SortedSetSelector_). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9258) DocTermsIndexDocValues should not assume it's operating on a SortedDocValues field
Michele Palmia created LUCENE-9258: -- Summary: DocTermsIndexDocValues should not assume it's operating on a SortedDocValues field Key: LUCENE-9258 URL: https://issues.apache.org/jira/browse/LUCENE-9258 Project: Lucene - Core Issue Type: Bug Affects Versions: 8.4 Reporter: Michele Palmia When requesting a new _ValueSourceScorer_ (with _getRangeScorer_) from _DocTermsIndexDocValues_ , the latter instantiates a new iterator on _SortedDocValues_ regardless of the fact that the underlying field can actually be of a different type (e.g. a _SortedSetDocValues_ processed through a _SortedSetSelector_). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-8674) UnsupportedOperationException due to call to o.a.l.q.f.FunctionValues.floatVal
[ https://issues.apache.org/jira/browse/LUCENE-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049681#comment-17049681 ] Michele Palmia edited comment on LUCENE-8674 at 3/2/20 9:49 PM: This is due to a _VectorValueSource_ being fed to a _FunctionRangeQuery_, that is therefore trying to use its _floatVal_. By default, requesting the _floatVal(int doc)_ of a _VectorValueSource_ throws an _UnsupportedOperationException_, since no algorithm for merging the (possibly multiple) values is implemented. For reference, the query Solr tries to do is the following, {code:java} new ConstantScoreQuery( new FunctionRangeQuery( new VectorValueSource( new BytesRefFieldSource("any_field"), new SortedSetFieldSource("another_field") ), 0, 100, true, true)); {code} that always throws an exception if there are documents in the index. >From the way it's implemented (with the _UnsupportedOperationException_) it >doesn't look like this kind of inconsistencies are meant to be fixed in >Lucene. But not sure about that. Any suggestions are appreciated! was (Author: micpalmia): This is due to a _VectorValueSource_ being fed to a _FunctionRangeQuery_, that is therefore trying to use its _floatVal_. By default, requesting the _floatVal(int doc)_ of a _VectorValueSource_ throws an _UnsupportedOperationException_, since no algorithm for merging the (possibly multiple) values is implemented. For reference, the query Solr tries to do is the following, {code:java} final ConstantScoreQuery query = new ConstantScoreQuery( new FunctionRangeQuery( new VectorValueSource( new BytesRefFieldSource("any_field"), new SortedSetFieldSource("another_field") ), 0, 100, true, true)); {code} that always throws an exception if there are documents in the index. >From the way it's implemented (with the _UnsupportedOperationException_) it >doesn't look like this kind of inconsistencies are meant to be fixed in >Lucene. But not sure about that. Any suggestions are appreciated! > UnsupportedOperationException due to call to o.a.l.q.f.FunctionValues.floatVal > -- > > Key: LUCENE-8674 > URL: https://issues.apache.org/jira/browse/LUCENE-8674 > Project: Lucene - Core > Issue Type: Bug > Components: core/query/scoring >Affects Versions: master (9.0) > Environment: h1. Steps to reproduce > * Use a Linux machine. > * Build commit {{ea2c8ba}} of Solr as described in the section below. > * Build the films collection as described below. > * Start the server using the command {{./bin/solr start -f -p 8983 -s > /tmp/home}} > * Request the URL given in the bug description. > h1. Compiling the server > {noformat} > git clone https://github.com/apache/lucene-solr > cd lucene-solr > git checkout ea2c8ba > ant compile > cd solr > ant server > {noformat} > h1. Building the collection and reproducing the bug > We followed [Exercise > 2|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html#exercise-2] from > the [Solr > Tutorial|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html]. > {noformat} > mkdir -p /tmp/home > echo '' > > /tmp/home/solr.xml > {noformat} > In one terminal start a Solr instance in foreground: > {noformat} > ./bin/solr start -f -p 8983 -s /tmp/home > {noformat} > In another terminal, create a collection of movies, with no shards and no > replication, and initialize it: > {noformat} > bin/solr create -c films > curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": > {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' > http://localhost:8983/solr/films/schema > curl -X POST -H 'Content-type:application/json' --data-binary > '{"add-copy-field" : {"source":"*","dest":"_text_"}}' > http://localhost:8983/solr/films/schema > ./bin/post -c films example/films/films.json > curl -v “URL_BUG” > {noformat} > Please check the issue description below to find the “URL_BUG” that will > allow you to reproduce the issue reported. >Reporter: Johannes Kloos >Priority: Minor > Labels: diffblue, newdev > > Requesting the following URL causes Solr to return an HTTP 500 error response: > {noformat} > http://localhost:8983/solr/films/select?fq={!frange%20l=10%20u=100}or_version_s,directed_by > {noformat} > The error response seems to be caused by the following uncaught exception: > {noformat} > java.lang.UnsupportedOperationException > at > org.apache.lucene.queries.function.FunctionValues.floatVal(FunctionValues.java:47) > at >
[jira] [Comment Edited] (LUCENE-8674) UnsupportedOperationException due to call to o.a.l.q.f.FunctionValues.floatVal
[ https://issues.apache.org/jira/browse/LUCENE-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049681#comment-17049681 ] Michele Palmia edited comment on LUCENE-8674 at 3/2/20 9:47 PM: This is due to a _VectorValueSource_ being fed to a _FunctionRangeQuery_, that is therefore trying to use its _floatVal_. By default, requesting the _floatVal(int doc)_ of a _VectorValueSource_ throws an _UnsupportedOperationException_, since no algorithm for merging the (possibly multiple) values is implemented. For reference, the query Solr tries to do is the following, {code:java} final ConstantScoreQuery query = new ConstantScoreQuery( new FunctionRangeQuery( new VectorValueSource( new BytesRefFieldSource("any_field"), new SortedSetFieldSource("another_field") ), 0, 100, true, true)); {code} that always throws an exception if there are documents in the index. >From the way it's implemented (with the _UnsupportedOperationException_) it >doesn't look like this kind of inconsistencies are meant to be fixed in >Lucene. But not sure about that. Any suggestions are appreciated! was (Author: micpalmia): This is due to a `VectorValueSource` being fed to a `FunctionRangeQuery`, that is therefore trying to use its `floatVal`. By default, requesting the `floatVal(int doc)` of a `VectorValueSource` throws an `UnsupportedOperationException`, since no algorithm for merging the (possibly multiple) values is implemented. For reference, the query Solr tries to do is the following, ```java final ConstantScoreQuery query = new ConstantScoreQuery( new FunctionRangeQuery( new VectorValueSource( new BytesRefFieldSource("any_field"), new SortedSetFieldSource("another_field") ), 0, 100, true, true)); ``` that always throws an exception if there are documents in the index. >From the way it's implemented (with the `UnsupportedOperationException`) it >doesn't look like this kind of inconsistencies are meant to be fixed in >Lucene. But not sure about that. Any suggestions are appreciated! > UnsupportedOperationException due to call to o.a.l.q.f.FunctionValues.floatVal > -- > > Key: LUCENE-8674 > URL: https://issues.apache.org/jira/browse/LUCENE-8674 > Project: Lucene - Core > Issue Type: Bug > Components: core/query/scoring >Affects Versions: master (9.0) > Environment: h1. Steps to reproduce > * Use a Linux machine. > * Build commit {{ea2c8ba}} of Solr as described in the section below. > * Build the films collection as described below. > * Start the server using the command {{./bin/solr start -f -p 8983 -s > /tmp/home}} > * Request the URL given in the bug description. > h1. Compiling the server > {noformat} > git clone https://github.com/apache/lucene-solr > cd lucene-solr > git checkout ea2c8ba > ant compile > cd solr > ant server > {noformat} > h1. Building the collection and reproducing the bug > We followed [Exercise > 2|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html#exercise-2] from > the [Solr > Tutorial|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html]. > {noformat} > mkdir -p /tmp/home > echo '' > > /tmp/home/solr.xml > {noformat} > In one terminal start a Solr instance in foreground: > {noformat} > ./bin/solr start -f -p 8983 -s /tmp/home > {noformat} > In another terminal, create a collection of movies, with no shards and no > replication, and initialize it: > {noformat} > bin/solr create -c films > curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": > {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' > http://localhost:8983/solr/films/schema > curl -X POST -H 'Content-type:application/json' --data-binary > '{"add-copy-field" : {"source":"*","dest":"_text_"}}' > http://localhost:8983/solr/films/schema > ./bin/post -c films example/films/films.json > curl -v “URL_BUG” > {noformat} > Please check the issue description below to find the “URL_BUG” that will > allow you to reproduce the issue reported. >Reporter: Johannes Kloos >Priority: Minor > Labels: diffblue, newdev > > Requesting the following URL causes Solr to return an HTTP 500 error response: > {noformat} > http://localhost:8983/solr/films/select?fq={!frange%20l=10%20u=100}or_version_s,directed_by > {noformat} > The error response seems to be caused by the following uncaught exception: > {noformat} > java.lang.UnsupportedOperationException > at >
[jira] [Commented] (LUCENE-8674) UnsupportedOperationException due to call to o.a.l.q.f.FunctionValues.floatVal
[ https://issues.apache.org/jira/browse/LUCENE-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049681#comment-17049681 ] Michele Palmia commented on LUCENE-8674: This is due to a `VectorValueSource` being fed to a `FunctionRangeQuery`, that is therefore trying to use its `floatVal`. By default, requesting the `floatVal(int doc)` of a `VectorValueSource` throws an `UnsupportedOperationException`, since no algorithm for merging the (possibly multiple) values is implemented. For reference, the query Solr tries to do is the following, ```java final ConstantScoreQuery query = new ConstantScoreQuery( new FunctionRangeQuery( new VectorValueSource( new BytesRefFieldSource("any_field"), new SortedSetFieldSource("another_field") ), 0, 100, true, true)); ``` that always throws an exception if there are documents in the index. >From the way it's implemented (with the `UnsupportedOperationException`) it >doesn't look like this kind of inconsistencies are meant to be fixed in >Lucene. But not sure about that. Any suggestions are appreciated! > UnsupportedOperationException due to call to o.a.l.q.f.FunctionValues.floatVal > -- > > Key: LUCENE-8674 > URL: https://issues.apache.org/jira/browse/LUCENE-8674 > Project: Lucene - Core > Issue Type: Bug > Components: core/query/scoring >Affects Versions: master (9.0) > Environment: h1. Steps to reproduce > * Use a Linux machine. > * Build commit {{ea2c8ba}} of Solr as described in the section below. > * Build the films collection as described below. > * Start the server using the command {{./bin/solr start -f -p 8983 -s > /tmp/home}} > * Request the URL given in the bug description. > h1. Compiling the server > {noformat} > git clone https://github.com/apache/lucene-solr > cd lucene-solr > git checkout ea2c8ba > ant compile > cd solr > ant server > {noformat} > h1. Building the collection and reproducing the bug > We followed [Exercise > 2|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html#exercise-2] from > the [Solr > Tutorial|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html]. > {noformat} > mkdir -p /tmp/home > echo '' > > /tmp/home/solr.xml > {noformat} > In one terminal start a Solr instance in foreground: > {noformat} > ./bin/solr start -f -p 8983 -s /tmp/home > {noformat} > In another terminal, create a collection of movies, with no shards and no > replication, and initialize it: > {noformat} > bin/solr create -c films > curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": > {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' > http://localhost:8983/solr/films/schema > curl -X POST -H 'Content-type:application/json' --data-binary > '{"add-copy-field" : {"source":"*","dest":"_text_"}}' > http://localhost:8983/solr/films/schema > ./bin/post -c films example/films/films.json > curl -v “URL_BUG” > {noformat} > Please check the issue description below to find the “URL_BUG” that will > allow you to reproduce the issue reported. >Reporter: Johannes Kloos >Priority: Minor > Labels: diffblue, newdev > > Requesting the following URL causes Solr to return an HTTP 500 error response: > {noformat} > http://localhost:8983/solr/films/select?fq={!frange%20l=10%20u=100}or_version_s,directed_by > {noformat} > The error response seems to be caused by the following uncaught exception: > {noformat} > java.lang.UnsupportedOperationException > at > org.apache.lucene.queries.function.FunctionValues.floatVal(FunctionValues.java:47) > at > org.apache.lucene.queries.function.FunctionValues$3.matches(FunctionValues.java:188) > at > org.apache.lucene.queries.function.ValueSourceScorer$1.matches(ValueSourceScorer.java:53) > at > org.apache.lucene.search.TwoPhaseIterator$TwoPhaseIteratorAsDocIdSetIterator.doNext(TwoPhaseIterator.java:89) > at > org.apache.lucene.search.TwoPhaseIterator$TwoPhaseIteratorAsDocIdSetIterator.nextDoc(TwoPhaseIterator.java:77) > at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:261) > at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:214) > at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39) > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:652) > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:443) > at org.apache.solr.search.DocSetUtil.createDocSetGeneric(DocSetUtil.java:151) > at org.apache.solr.search.DocSetUtil.createDocSet(DocSetUtil.java:140) > at > org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:1177) > at >