[jira] [Commented] (LUCENE-8860) LatLonShapeBoundingBoxQuery could make more decisions on inner nodes
[ https://issues.apache.org/jira/browse/LUCENE-8860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914594#comment-16914594 ] Igor Motov commented on LUCENE-8860: Thanks! It is clear now. I think I understood the part about the bounding box queries and I opened PR based on it. Unfortunately, I don't see how to extend this to the polygons queries. If we take a look at the fig3, the bounding box for the red query completely encapsulates the green polygon's bounding box and yet, we cannot make any conclusion about their intersection based on this information. !fig3.png! > LatLonShapeBoundingBoxQuery could make more decisions on inner nodes > > > Key: LUCENE-8860 > URL: https://issues.apache.org/jira/browse/LUCENE-8860 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: fig1.png, fig2.png, fig3.png > > Time Spent: 10m > Remaining Estimate: 0h > > Currently LatLonShapeBoundingBoxQuery with the INTERSECTS relation only > returns CELL_INSIDE_QUERY if the query contains ALL minimum bounding > rectangles of the indexed triangles. > I think we could return CELL_INSIDE_QUERY if the box contains either of the > edges of all MBRs of indexed triangles since triangles are guaranteed to > touch all edges of their MBR by definition. In some cases this would help > save decoding triangles and running costly point-in-triangle computations. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8860) LatLonShapeBoundingBoxQuery could make more decisions on inner nodes
[ https://issues.apache.org/jira/browse/LUCENE-8860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Motov updated LUCENE-8860: --- Attachment: fig3.png > LatLonShapeBoundingBoxQuery could make more decisions on inner nodes > > > Key: LUCENE-8860 > URL: https://issues.apache.org/jira/browse/LUCENE-8860 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: fig1.png, fig2.png, fig3.png > > Time Spent: 10m > Remaining Estimate: 0h > > Currently LatLonShapeBoundingBoxQuery with the INTERSECTS relation only > returns CELL_INSIDE_QUERY if the query contains ALL minimum bounding > rectangles of the indexed triangles. > I think we could return CELL_INSIDE_QUERY if the box contains either of the > edges of all MBRs of indexed triangles since triangles are guaranteed to > touch all edges of their MBR by definition. In some cases this would help > save decoding triangles and running costly point-in-triangle computations. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8860) LatLonShapeBoundingBoxQuery could make more decisions on inner nodes
[ https://issues.apache.org/jira/browse/LUCENE-8860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16910787#comment-16910787 ] Igor Motov commented on LUCENE-8860: I tried looking into this issue but I don't think we have enough information in the inner node to make such determination. For example, if I index two polygons: one L shaped polygon and another small triangle placed inside the L-shape (see blue and green tessellated versions on fig 1): !fig1.png! Then all I have on the inner node level are minPackedValues and maxPackedValue (depicted as purple rectangles on fig 2) this doesn't give me enough information to determine if my query bounding box (red rectangle on fig 2) intersects with blue triangle or not. !fig2.png! So, unless I misunderstood the proposal, I am not really sure how to achieve that on the inner node level. > LatLonShapeBoundingBoxQuery could make more decisions on inner nodes > > > Key: LUCENE-8860 > URL: https://issues.apache.org/jira/browse/LUCENE-8860 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: fig1.png, fig2.png > > > Currently LatLonShapeBoundingBoxQuery with the INTERSECTS relation only > returns CELL_INSIDE_QUERY if the query contains ALL minimum bounding > rectangles of the indexed triangles. > I think we could return CELL_INSIDE_QUERY if the box contains either of the > edges of all MBRs of indexed triangles since triangles are guaranteed to > touch all edges of their MBR by definition. In some cases this would help > save decoding triangles and running costly point-in-triangle computations. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8860) LatLonShapeBoundingBoxQuery could make more decisions on inner nodes
[ https://issues.apache.org/jira/browse/LUCENE-8860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Motov updated LUCENE-8860: --- Attachment: fig1.png fig2.png > LatLonShapeBoundingBoxQuery could make more decisions on inner nodes > > > Key: LUCENE-8860 > URL: https://issues.apache.org/jira/browse/LUCENE-8860 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: fig1.png, fig2.png > > > Currently LatLonShapeBoundingBoxQuery with the INTERSECTS relation only > returns CELL_INSIDE_QUERY if the query contains ALL minimum bounding > rectangles of the indexed triangles. > I think we could return CELL_INSIDE_QUERY if the box contains either of the > edges of all MBRs of indexed triangles since triangles are guaranteed to > touch all edges of their MBR by definition. In some cases this would help > save decoding triangles and running costly point-in-triangle computations. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8860) LatLonShapeBoundingBoxQuery could make more decisions on inner nodes
[ https://issues.apache.org/jira/browse/LUCENE-8860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Motov updated LUCENE-8860: --- Attachment: (was: fig2.png) > LatLonShapeBoundingBoxQuery could make more decisions on inner nodes > > > Key: LUCENE-8860 > URL: https://issues.apache.org/jira/browse/LUCENE-8860 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > Currently LatLonShapeBoundingBoxQuery with the INTERSECTS relation only > returns CELL_INSIDE_QUERY if the query contains ALL minimum bounding > rectangles of the indexed triangles. > I think we could return CELL_INSIDE_QUERY if the box contains either of the > edges of all MBRs of indexed triangles since triangles are guaranteed to > touch all edges of their MBR by definition. In some cases this would help > save decoding triangles and running costly point-in-triangle computations. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8860) LatLonShapeBoundingBoxQuery could make more decisions on inner nodes
[ https://issues.apache.org/jira/browse/LUCENE-8860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Motov updated LUCENE-8860: --- Attachment: (was: fig1.png) > LatLonShapeBoundingBoxQuery could make more decisions on inner nodes > > > Key: LUCENE-8860 > URL: https://issues.apache.org/jira/browse/LUCENE-8860 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > Currently LatLonShapeBoundingBoxQuery with the INTERSECTS relation only > returns CELL_INSIDE_QUERY if the query contains ALL minimum bounding > rectangles of the indexed triangles. > I think we could return CELL_INSIDE_QUERY if the box contains either of the > edges of all MBRs of indexed triangles since triangles are guaranteed to > touch all edges of their MBR by definition. In some cases this would help > save decoding triangles and running costly point-in-triangle computations. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8860) LatLonShapeBoundingBoxQuery could make more decisions on inner nodes
[ https://issues.apache.org/jira/browse/LUCENE-8860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Motov updated LUCENE-8860: --- Attachment: fig1.png fig2.png > LatLonShapeBoundingBoxQuery could make more decisions on inner nodes > > > Key: LUCENE-8860 > URL: https://issues.apache.org/jira/browse/LUCENE-8860 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: fig1.png, fig2.png > > > Currently LatLonShapeBoundingBoxQuery with the INTERSECTS relation only > returns CELL_INSIDE_QUERY if the query contains ALL minimum bounding > rectangles of the indexed triangles. > I think we could return CELL_INSIDE_QUERY if the box contains either of the > edges of all MBRs of indexed triangles since triangles are guaranteed to > touch all edges of their MBR by definition. In some cases this would help > save decoding triangles and running costly point-in-triangle computations. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8614) ArrayIndexOutOfBoundsException in ByteBlockPool
[ https://issues.apache.org/jira/browse/LUCENE-8614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Motov updated LUCENE-8614: --- Attachment: LUCENE-8614.patch > ArrayIndexOutOfBoundsException in ByteBlockPool > --- > > Key: LUCENE-8614 > URL: https://issues.apache.org/jira/browse/LUCENE-8614 > Project: Lucene - Core > Issue Type: Bug > Components: core/index >Affects Versions: 7.5 >Reporter: Igor Motov >Priority: Major > Attachments: LUCENE-8614.patch > > > A field with a very large number of small tokens can cause > ArrayIndexOutOfBoundsException in ByteBlockPool due to an arithmetic overflow > in ByteBlockPool. > The issue was originally reported in > [https://github.com/elastic/elasticsearch/issues/23670] where due to the > indexing settings the geo_shape generated a very large number of tokens and > caused the indexing operation to fail with the following exception: > {noformat} > Caused by: java.lang.ArrayIndexOutOfBoundsException: -65531 > at > org.apache.lucene.util.ByteBlockPool.setBytesRef(ByteBlockPool.java:308) > ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim > - 2017-01-17 15:57:29] > at org.apache.lucene.util.BytesRefHash.equals(BytesRefHash.java:183) > ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim > - 2017-01-17 15:57:29] > at org.apache.lucene.util.BytesRefHash.findHash(BytesRefHash.java:337) > ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim > - 2017-01-17 15:57:29] > at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:255) > ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim > - 2017-01-17 15:57:29] > at > org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:149) > ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim > - 2017-01-17 15:57:29] > at > org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:766) > ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim > - 2017-01-17 15:57:29] > at > org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:417) > ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim > - 2017-01-17 15:57:29] > at > org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:373) > ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim > - 2017-01-17 15:57:29] > at > org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:231) > ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim > - 2017-01-17 15:57:29] > at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:478) > ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim > - 2017-01-17 15:57:29] > at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1575) > ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim > - 2017-01-17 15:57:29] > at > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1320) > ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim > - 2017-01-17 15:57:29] > {noformat} > I was able to reproduce the issue and somewhat reduce the test that > reproduces it (see enclosed patch) but unfortunately it still requires 12G of > heap to run. > The issue seems to be caused by arithmetic overflow in the {{byteOffset}} > calculation when {{BytesBlockPool}} advances to the next buffer on the last > line of the > [nextBuffer()|https://github.com/apache/lucene-solr/blob/e386ec973b8a4ec2de2bfc43f51df511a365d60f/lucene/core/src/java/org/apache/lucene/util/ByteBlockPool.java#L207] > method, but it doesn't manifest itself until much later when this offset is > used to calculate the > [bytesStart|https://github.com/apache/lucene-solr/blob/e386ec973b8a4ec2de2bfc43f51df511a365d60f/lucene/core/src/java/org/apache/lucene/util/BytesRefHash.java#L277] > in {{BytesRefHash}}, which in turn causes AIOB back in the {{ByteBlockPool}} > [setBytesRef()|https://github.com/apache/lucene-solr/blob/e386ec973b8a4ec2de2bfc43f51df511a365d60f/lucene/core/src/java/org/apache/lucene/util/ByteBlockPool.java#L308] > method where it is used to find the term's buffer. > I realize that it's unreasonable to expect lucene to index such fields, but I > wonder if an overflow check should be added to {{BytesBlockPool.nextBuffer}} > in order to handle such condition more gracefully. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (LUCENE-8614) ArrayIndexOutOfBoundsException in ByteBlockPool
Igor Motov created LUCENE-8614: -- Summary: ArrayIndexOutOfBoundsException in ByteBlockPool Key: LUCENE-8614 URL: https://issues.apache.org/jira/browse/LUCENE-8614 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 7.5 Reporter: Igor Motov A field with a very large number of small tokens can cause ArrayIndexOutOfBoundsException in ByteBlockPool due to an arithmetic overflow in ByteBlockPool. The issue was originally reported in [https://github.com/elastic/elasticsearch/issues/23670] where due to the indexing settings the geo_shape generated a very large number of tokens and caused the indexing operation to fail with the following exception: {noformat} Caused by: java.lang.ArrayIndexOutOfBoundsException: -65531 at org.apache.lucene.util.ByteBlockPool.setBytesRef(ByteBlockPool.java:308) ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 2017-01-17 15:57:29] at org.apache.lucene.util.BytesRefHash.equals(BytesRefHash.java:183) ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 2017-01-17 15:57:29] at org.apache.lucene.util.BytesRefHash.findHash(BytesRefHash.java:337) ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 2017-01-17 15:57:29] at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:255) ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 2017-01-17 15:57:29] at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:149) ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 2017-01-17 15:57:29] at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:766) ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 2017-01-17 15:57:29] at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:417) ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 2017-01-17 15:57:29] at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:373) ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 2017-01-17 15:57:29] at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:231) ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 2017-01-17 15:57:29] at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:478) ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 2017-01-17 15:57:29] at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1575) ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 2017-01-17 15:57:29] at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1320) ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 2017-01-17 15:57:29] {noformat} I was able to reproduce the issue and somewhat reduce the test that reproduces it (see enclosed patch) but unfortunately it still requires 12G of heap to run. The issue seems to be caused by arithmetic overflow in the {{byteOffset}} calculation when {{BytesBlockPool}} advances to the next buffer on the last line of the [nextBuffer()|https://github.com/apache/lucene-solr/blob/e386ec973b8a4ec2de2bfc43f51df511a365d60f/lucene/core/src/java/org/apache/lucene/util/ByteBlockPool.java#L207] method, but it doesn't manifest itself until much later when this offset is used to calculate the [bytesStart|https://github.com/apache/lucene-solr/blob/e386ec973b8a4ec2de2bfc43f51df511a365d60f/lucene/core/src/java/org/apache/lucene/util/BytesRefHash.java#L277] in {{BytesRefHash}}, which in turn causes AIOB back in the {{ByteBlockPool}} [setBytesRef()|https://github.com/apache/lucene-solr/blob/e386ec973b8a4ec2de2bfc43f51df511a365d60f/lucene/core/src/java/org/apache/lucene/util/ByteBlockPool.java#L308] method where it is used to find the term's buffer. I realize that it's unreasonable to expect lucene to index such fields, but I wonder if an overflow check should be added to {{BytesBlockPool.nextBuffer}} in order to handle such condition more gracefully. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7148) Support boolean subset matching
[ https://issues.apache.org/jira/browse/LUCENE-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15226434#comment-15226434 ] Igor Motov commented on LUCENE-7148: I just want to mention that I have also seen many questions and requests for this feature on different elasticsearch forums. Here are a couple of examples from stackoverflow : - http://stackoverflow.com/questions/31258959/elasticsearch-documents-that-only-have-terms-intersecting-a-list-of-terms-but-no - http://stackoverflow.com/questions/32580295/elasticsearch-match-all-words-from-document-in-the-search-query It seems to me that there is a need for such functionality. > Support boolean subset matching > --- > > Key: LUCENE-7148 > URL: https://issues.apache.org/jira/browse/LUCENE-7148 > Project: Lucene - Core > Issue Type: New Feature > Components: core/search >Affects Versions: 5.x >Reporter: Otmar Caduff > Labels: newbie > > In Lucene, I know of the possibility of Occur.SHOULD, Occur.MUST and the > “minimum should match” setting on the boolean query. > Now, when querying, I want to > - (1) match the documents which either contain all the terms of the query > (Occur.MUST for all terms would do that) or, > - (2) if all terms for a given field of a document are a subset of the query > terms, that document should match as well. > Example: > Document d hast field f with terms A, B, C > Query with the following terms should match that document: > A > B > A B > A B C > A B C D > Query with the following terms should not match: > D > A B D -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5502) equals method of TermsFilter might equate two different filters
[ https://issues.apache.org/jira/browse/LUCENE-5502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Motov updated LUCENE-5502: --- Attachment: LUCENE-5502.patch Updated patch with ArrayUtil.equals > equals method of TermsFilter might equate two different filters > --- > > Key: LUCENE-5502 > URL: https://issues.apache.org/jira/browse/LUCENE-5502 > Project: Lucene - Core > Issue Type: Bug > Components: core/query/scoring >Affects Versions: 4.7 >Reporter: Igor Motov > Attachments: LUCENE-5502.patch, LUCENE-5502.patch, LUCENE-5502.patch > > > If two terms filters have 1) the same number of terms, 2) use the same field > in all these terms and 3) term values happened to have the same hash codes, > these two filter are considered to be equal as long as the first term is the > same in both filters. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5502) equals method of TermsFilter might equate two different filters
[ https://issues.apache.org/jira/browse/LUCENE-5502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Motov updated LUCENE-5502: --- Attachment: LUCENE-5502.patch Thanks Adrien. You are right, I missed offsets. Here is an updated version. I cannot use Arrays.equals for termsBytes and offsets because we compare only parts of the arrays, but I can switch to ArrayUtil.equals if you think it would make more sense. > equals method of TermsFilter might equate two different filters > --- > > Key: LUCENE-5502 > URL: https://issues.apache.org/jira/browse/LUCENE-5502 > Project: Lucene - Core > Issue Type: Bug > Components: core/query/scoring >Affects Versions: 4.7 >Reporter: Igor Motov > Attachments: LUCENE-5502.patch, LUCENE-5502.patch > > > If two terms filters have 1) the same number of terms, 2) use the same field > in all these terms and 3) term values happened to have the same hash codes, > these two filter are considered to be equal as long as the first term is the > same in both filters. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5502) equals method of TermsFilter might equate two different filters
[ https://issues.apache.org/jira/browse/LUCENE-5502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Motov updated LUCENE-5502: --- Attachment: LUCENE-5502.patch Test and patch for the issue. > equals method of TermsFilter might equate two different filters > --- > > Key: LUCENE-5502 > URL: https://issues.apache.org/jira/browse/LUCENE-5502 > Project: Lucene - Core > Issue Type: Bug > Components: core/query/scoring >Affects Versions: 4.7 >Reporter: Igor Motov > Attachments: LUCENE-5502.patch > > > If two terms filters have 1) the same number of terms, 2) use the same field > in all these terms and 3) term values happened to have the same hash codes, > these two filter are considered to be equal as long as the first term is the > same in both filters. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5502) equals method of TermsFilter might equate two different filters
Igor Motov created LUCENE-5502: -- Summary: equals method of TermsFilter might equate two different filters Key: LUCENE-5502 URL: https://issues.apache.org/jira/browse/LUCENE-5502 Project: Lucene - Core Issue Type: Bug Components: core/query/scoring Affects Versions: 4.7 Reporter: Igor Motov If two terms filters have 1) the same number of terms, 2) use the same field in all these terms and 3) term values happened to have the same hash codes, these two filter are considered to be equal as long as the first term is the same in both filters. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3724) No highlighting for phrases with stop words when FVH is used
Igor Motov created SOLR-3724: Summary: No highlighting for phrases with stop words when FVH is used Key: SOLR-3724 URL: https://issues.apache.org/jira/browse/SOLR-3724 Project: Solr Issue Type: Bug Components: highlighter Affects Versions: 3.6.1 Reporter: Igor Motov To reproduce: - Index text "foo and bar" into the field "message" with the following schema : {code:xml} {code} - Search for the {{message:"foo and bar"}} with highlighting enabled and {{hl.useFastVectorHighlighter=true}} - The text is not highlighted Standard highlighter works fine. If I set {{enablePositionIncrements=false}} in the analyzer, FVH starts to highlight the entire phrase. You can find complete schema and test data files that I used to reproduce this issue here: https://gist.github.com/3279879 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org