[jira] [Commented] (LUCENE-8860) LatLonShapeBoundingBoxQuery could make more decisions on inner nodes

2019-08-23 Thread Igor Motov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914594#comment-16914594
 ] 

Igor Motov commented on LUCENE-8860:


Thanks! It is clear now. I think I understood the part about the bounding box 
queries and I opened PR based on it.  Unfortunately, I don't see how to extend 
this to the polygons queries. If we take a look at the fig3, the bounding box 
for the red query completely encapsulates the green polygon's bounding box and 
yet, we cannot make any conclusion about their intersection based on this 
information. 

              !fig3.png!

> LatLonShapeBoundingBoxQuery could make more decisions on inner nodes
> 
>
> Key: LUCENE-8860
> URL: https://issues.apache.org/jira/browse/LUCENE-8860
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: fig1.png, fig2.png, fig3.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently LatLonShapeBoundingBoxQuery with the INTERSECTS relation only 
> returns CELL_INSIDE_QUERY if the query contains ALL minimum bounding 
> rectangles of the indexed triangles.
> I think we could return CELL_INSIDE_QUERY if the box contains either of the 
> edges of all MBRs of indexed triangles since triangles are guaranteed to 
> touch all edges of their MBR by definition. In some cases this would help 
> save decoding triangles and running costly point-in-triangle computations.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8860) LatLonShapeBoundingBoxQuery could make more decisions on inner nodes

2019-08-23 Thread Igor Motov (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Motov updated LUCENE-8860:
---
Attachment: fig3.png

> LatLonShapeBoundingBoxQuery could make more decisions on inner nodes
> 
>
> Key: LUCENE-8860
> URL: https://issues.apache.org/jira/browse/LUCENE-8860
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: fig1.png, fig2.png, fig3.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently LatLonShapeBoundingBoxQuery with the INTERSECTS relation only 
> returns CELL_INSIDE_QUERY if the query contains ALL minimum bounding 
> rectangles of the indexed triangles.
> I think we could return CELL_INSIDE_QUERY if the box contains either of the 
> edges of all MBRs of indexed triangles since triangles are guaranteed to 
> touch all edges of their MBR by definition. In some cases this would help 
> save decoding triangles and running costly point-in-triangle computations.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8860) LatLonShapeBoundingBoxQuery could make more decisions on inner nodes

2019-08-19 Thread Igor Motov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16910787#comment-16910787
 ] 

Igor Motov commented on LUCENE-8860:


I tried looking into this issue but I don't think we have enough information in 
the inner node to make such determination. For example, if I index two 
polygons: one L shaped polygon and another small triangle placed inside the 
L-shape (see blue and green tessellated versions on fig 1):
         !fig1.png!

Then all I have on the inner node level are minPackedValues and maxPackedValue 
(depicted as purple rectangles on fig 2) this doesn't give me enough 
information to determine if my query bounding box (red rectangle on fig 2) 
intersects with blue triangle or not. 
         !fig2.png!

So, unless I misunderstood the proposal, I am not really sure how to achieve 
that on the inner node level. 

> LatLonShapeBoundingBoxQuery could make more decisions on inner nodes
> 
>
> Key: LUCENE-8860
> URL: https://issues.apache.org/jira/browse/LUCENE-8860
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: fig1.png, fig2.png
>
>
> Currently LatLonShapeBoundingBoxQuery with the INTERSECTS relation only 
> returns CELL_INSIDE_QUERY if the query contains ALL minimum bounding 
> rectangles of the indexed triangles.
> I think we could return CELL_INSIDE_QUERY if the box contains either of the 
> edges of all MBRs of indexed triangles since triangles are guaranteed to 
> touch all edges of their MBR by definition. In some cases this would help 
> save decoding triangles and running costly point-in-triangle computations.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8860) LatLonShapeBoundingBoxQuery could make more decisions on inner nodes

2019-08-19 Thread Igor Motov (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Motov updated LUCENE-8860:
---
Attachment: fig1.png
fig2.png

> LatLonShapeBoundingBoxQuery could make more decisions on inner nodes
> 
>
> Key: LUCENE-8860
> URL: https://issues.apache.org/jira/browse/LUCENE-8860
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: fig1.png, fig2.png
>
>
> Currently LatLonShapeBoundingBoxQuery with the INTERSECTS relation only 
> returns CELL_INSIDE_QUERY if the query contains ALL minimum bounding 
> rectangles of the indexed triangles.
> I think we could return CELL_INSIDE_QUERY if the box contains either of the 
> edges of all MBRs of indexed triangles since triangles are guaranteed to 
> touch all edges of their MBR by definition. In some cases this would help 
> save decoding triangles and running costly point-in-triangle computations.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8860) LatLonShapeBoundingBoxQuery could make more decisions on inner nodes

2019-08-19 Thread Igor Motov (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Motov updated LUCENE-8860:
---
Attachment: (was: fig2.png)

> LatLonShapeBoundingBoxQuery could make more decisions on inner nodes
> 
>
> Key: LUCENE-8860
> URL: https://issues.apache.org/jira/browse/LUCENE-8860
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> Currently LatLonShapeBoundingBoxQuery with the INTERSECTS relation only 
> returns CELL_INSIDE_QUERY if the query contains ALL minimum bounding 
> rectangles of the indexed triangles.
> I think we could return CELL_INSIDE_QUERY if the box contains either of the 
> edges of all MBRs of indexed triangles since triangles are guaranteed to 
> touch all edges of their MBR by definition. In some cases this would help 
> save decoding triangles and running costly point-in-triangle computations.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8860) LatLonShapeBoundingBoxQuery could make more decisions on inner nodes

2019-08-19 Thread Igor Motov (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Motov updated LUCENE-8860:
---
Attachment: (was: fig1.png)

> LatLonShapeBoundingBoxQuery could make more decisions on inner nodes
> 
>
> Key: LUCENE-8860
> URL: https://issues.apache.org/jira/browse/LUCENE-8860
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> Currently LatLonShapeBoundingBoxQuery with the INTERSECTS relation only 
> returns CELL_INSIDE_QUERY if the query contains ALL minimum bounding 
> rectangles of the indexed triangles.
> I think we could return CELL_INSIDE_QUERY if the box contains either of the 
> edges of all MBRs of indexed triangles since triangles are guaranteed to 
> touch all edges of their MBR by definition. In some cases this would help 
> save decoding triangles and running costly point-in-triangle computations.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8860) LatLonShapeBoundingBoxQuery could make more decisions on inner nodes

2019-08-19 Thread Igor Motov (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Motov updated LUCENE-8860:
---
Attachment: fig1.png
fig2.png

> LatLonShapeBoundingBoxQuery could make more decisions on inner nodes
> 
>
> Key: LUCENE-8860
> URL: https://issues.apache.org/jira/browse/LUCENE-8860
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: fig1.png, fig2.png
>
>
> Currently LatLonShapeBoundingBoxQuery with the INTERSECTS relation only 
> returns CELL_INSIDE_QUERY if the query contains ALL minimum bounding 
> rectangles of the indexed triangles.
> I think we could return CELL_INSIDE_QUERY if the box contains either of the 
> edges of all MBRs of indexed triangles since triangles are guaranteed to 
> touch all edges of their MBR by definition. In some cases this would help 
> save decoding triangles and running costly point-in-triangle computations.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8614) ArrayIndexOutOfBoundsException in ByteBlockPool

2018-12-18 Thread Igor Motov (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Motov updated LUCENE-8614:
---
Attachment: LUCENE-8614.patch

> ArrayIndexOutOfBoundsException in ByteBlockPool
> ---
>
> Key: LUCENE-8614
> URL: https://issues.apache.org/jira/browse/LUCENE-8614
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 7.5
>Reporter: Igor Motov
>Priority: Major
> Attachments: LUCENE-8614.patch
>
>
> A field with a very large number of small tokens can cause 
> ArrayIndexOutOfBoundsException in ByteBlockPool due to an arithmetic overflow 
> in ByteBlockPool.
> The issue was originally reported in 
> [https://github.com/elastic/elasticsearch/issues/23670] where due to the 
> indexing settings the geo_shape generated a very large number of tokens and 
> caused the indexing operation to fail with the following exception: 
> {noformat}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: -65531
>   at 
> org.apache.lucene.util.ByteBlockPool.setBytesRef(ByteBlockPool.java:308) 
> ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim 
> - 2017-01-17 15:57:29]
>   at org.apache.lucene.util.BytesRefHash.equals(BytesRefHash.java:183) 
> ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim 
> - 2017-01-17 15:57:29]
>   at org.apache.lucene.util.BytesRefHash.findHash(BytesRefHash.java:337) 
> ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim 
> - 2017-01-17 15:57:29]
>   at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:255) 
> ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim 
> - 2017-01-17 15:57:29]
>   at 
> org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:149) 
> ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim 
> - 2017-01-17 15:57:29]
>   at 
> org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:766)
>  ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim 
> - 2017-01-17 15:57:29]
>   at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:417)
>  ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim 
> - 2017-01-17 15:57:29]
>   at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:373)
>  ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim 
> - 2017-01-17 15:57:29]
>   at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:231)
>  ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim 
> - 2017-01-17 15:57:29]
>   at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:478)
>  ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim 
> - 2017-01-17 15:57:29]
>   at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1575) 
> ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim 
> - 2017-01-17 15:57:29]
>   at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1320) 
> ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim 
> - 2017-01-17 15:57:29]
> {noformat}
> I was able to reproduce the issue and somewhat reduce the test that 
> reproduces it (see enclosed patch) but unfortunately it still requires 12G of 
> heap to run.
> The issue seems to be caused by arithmetic overflow in the {{byteOffset}} 
> calculation when {{BytesBlockPool}} advances to the next buffer on the last 
> line of the 
> [nextBuffer()|https://github.com/apache/lucene-solr/blob/e386ec973b8a4ec2de2bfc43f51df511a365d60f/lucene/core/src/java/org/apache/lucene/util/ByteBlockPool.java#L207]
>  method, but it doesn't manifest itself until much later when this offset is 
> used to calculate the 
> [bytesStart|https://github.com/apache/lucene-solr/blob/e386ec973b8a4ec2de2bfc43f51df511a365d60f/lucene/core/src/java/org/apache/lucene/util/BytesRefHash.java#L277]
>  in {{BytesRefHash}}, which in turn causes AIOB back in the {{ByteBlockPool}} 
> [setBytesRef()|https://github.com/apache/lucene-solr/blob/e386ec973b8a4ec2de2bfc43f51df511a365d60f/lucene/core/src/java/org/apache/lucene/util/ByteBlockPool.java#L308]
>  method where it is used to find the term's buffer.
> I realize that it's unreasonable to expect lucene to index such fields, but I 
> wonder if an overflow check should be added to {{BytesBlockPool.nextBuffer}} 
> in order to handle such condition more gracefully. 
>   
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (LUCENE-8614) ArrayIndexOutOfBoundsException in ByteBlockPool

2018-12-18 Thread Igor Motov (JIRA)
Igor Motov created LUCENE-8614:
--

 Summary: ArrayIndexOutOfBoundsException in ByteBlockPool
 Key: LUCENE-8614
 URL: https://issues.apache.org/jira/browse/LUCENE-8614
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 7.5
Reporter: Igor Motov


A field with a very large number of small tokens can cause 
ArrayIndexOutOfBoundsException in ByteBlockPool due to an arithmetic overflow 
in ByteBlockPool.

The issue was originally reported in 
[https://github.com/elastic/elasticsearch/issues/23670] where due to the 
indexing settings the geo_shape generated a very large number of tokens and 
caused the indexing operation to fail with the following exception: 
{noformat}
Caused by: java.lang.ArrayIndexOutOfBoundsException: -65531
at 
org.apache.lucene.util.ByteBlockPool.setBytesRef(ByteBlockPool.java:308) 
~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 
2017-01-17 15:57:29]
at org.apache.lucene.util.BytesRefHash.equals(BytesRefHash.java:183) 
~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 
2017-01-17 15:57:29]
at org.apache.lucene.util.BytesRefHash.findHash(BytesRefHash.java:337) 
~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 
2017-01-17 15:57:29]
at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:255) 
~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 
2017-01-17 15:57:29]
at 
org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:149) 
~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 
2017-01-17 15:57:29]
at 
org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:766)
 ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 
2017-01-17 15:57:29]
at 
org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:417)
 ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 
2017-01-17 15:57:29]
at 
org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:373)
 ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 
2017-01-17 15:57:29]
at 
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:231)
 ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 
2017-01-17 15:57:29]
at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:478)
 ~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 
2017-01-17 15:57:29]
at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1575) 
~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 
2017-01-17 15:57:29]
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1320) 
~[lucene-core-6.4.0.jar:6.4.0 bbe4b08cc1fb673d0c3eb4b8455f23ddc1364124 - jim - 
2017-01-17 15:57:29]
{noformat}
I was able to reproduce the issue and somewhat reduce the test that reproduces 
it (see enclosed patch) but unfortunately it still requires 12G of heap to run.

The issue seems to be caused by arithmetic overflow in the {{byteOffset}} 
calculation when {{BytesBlockPool}} advances to the next buffer on the last 
line of the 
[nextBuffer()|https://github.com/apache/lucene-solr/blob/e386ec973b8a4ec2de2bfc43f51df511a365d60f/lucene/core/src/java/org/apache/lucene/util/ByteBlockPool.java#L207]
 method, but it doesn't manifest itself until much later when this offset is 
used to calculate the 
[bytesStart|https://github.com/apache/lucene-solr/blob/e386ec973b8a4ec2de2bfc43f51df511a365d60f/lucene/core/src/java/org/apache/lucene/util/BytesRefHash.java#L277]
 in {{BytesRefHash}}, which in turn causes AIOB back in the {{ByteBlockPool}} 
[setBytesRef()|https://github.com/apache/lucene-solr/blob/e386ec973b8a4ec2de2bfc43f51df511a365d60f/lucene/core/src/java/org/apache/lucene/util/ByteBlockPool.java#L308]
 method where it is used to find the term's buffer.

I realize that it's unreasonable to expect lucene to index such fields, but I 
wonder if an overflow check should be added to {{BytesBlockPool.nextBuffer}} in 
order to handle such condition more gracefully. 
  

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7148) Support boolean subset matching

2016-04-05 Thread Igor Motov (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15226434#comment-15226434
 ] 

Igor Motov commented on LUCENE-7148:


I just want to mention that I have also seen many questions and requests for 
this feature on different elasticsearch forums. Here are a couple of examples 
from stackoverflow : 

- 
http://stackoverflow.com/questions/31258959/elasticsearch-documents-that-only-have-terms-intersecting-a-list-of-terms-but-no
- 
http://stackoverflow.com/questions/32580295/elasticsearch-match-all-words-from-document-in-the-search-query

It seems to me that there is a need for such functionality. 

> Support boolean subset matching
> ---
>
> Key: LUCENE-7148
> URL: https://issues.apache.org/jira/browse/LUCENE-7148
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/search
>Affects Versions: 5.x
>Reporter: Otmar Caduff
>  Labels: newbie
>
> In Lucene, I know of the possibility of Occur.SHOULD, Occur.MUST and the 
> “minimum should match” setting on the boolean query.
> Now, when querying, I want to
> - (1)  match the documents which either contain all the terms of the query 
> (Occur.MUST for all terms would do that) or,
> - (2)  if all terms for a given field of a document are a subset of the query 
> terms, that document should match as well.
> Example:
> Document d hast field f with terms A, B, C
> Query with the following terms should match that document:
> A
> B
> A B
> A B C
> A B C D
> Query with the following terms should not match:
> D
> A B D



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5502) equals method of TermsFilter might equate two different filters

2014-03-10 Thread Igor Motov (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Motov updated LUCENE-5502:
---

Attachment: LUCENE-5502.patch

Updated patch with ArrayUtil.equals

> equals method of TermsFilter might equate two different filters
> ---
>
> Key: LUCENE-5502
> URL: https://issues.apache.org/jira/browse/LUCENE-5502
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring
>Affects Versions: 4.7
>Reporter: Igor Motov
> Attachments: LUCENE-5502.patch, LUCENE-5502.patch, LUCENE-5502.patch
>
>
> If two terms filters have 1) the same number of terms, 2) use the same field 
> in all these terms and 3) term values happened to have the same hash codes, 
> these two filter are considered to be equal as long as the first term is the 
> same in both filters.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5502) equals method of TermsFilter might equate two different filters

2014-03-09 Thread Igor Motov (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Motov updated LUCENE-5502:
---

Attachment: LUCENE-5502.patch

Thanks Adrien. You are right, I missed offsets. Here is an updated version. I 
cannot use Arrays.equals for termsBytes and offsets because we compare only 
parts of the arrays, but I can switch to ArrayUtil.equals if you think it would 
make more sense.

> equals method of TermsFilter might equate two different filters
> ---
>
> Key: LUCENE-5502
> URL: https://issues.apache.org/jira/browse/LUCENE-5502
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring
>Affects Versions: 4.7
>Reporter: Igor Motov
> Attachments: LUCENE-5502.patch, LUCENE-5502.patch
>
>
> If two terms filters have 1) the same number of terms, 2) use the same field 
> in all these terms and 3) term values happened to have the same hash codes, 
> these two filter are considered to be equal as long as the first term is the 
> same in both filters.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5502) equals method of TermsFilter might equate two different filters

2014-03-07 Thread Igor Motov (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Motov updated LUCENE-5502:
---

Attachment: LUCENE-5502.patch

Test and patch for the issue.

> equals method of TermsFilter might equate two different filters
> ---
>
> Key: LUCENE-5502
> URL: https://issues.apache.org/jira/browse/LUCENE-5502
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring
>Affects Versions: 4.7
>Reporter: Igor Motov
> Attachments: LUCENE-5502.patch
>
>
> If two terms filters have 1) the same number of terms, 2) use the same field 
> in all these terms and 3) term values happened to have the same hash codes, 
> these two filter are considered to be equal as long as the first term is the 
> same in both filters.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5502) equals method of TermsFilter might equate two different filters

2014-03-07 Thread Igor Motov (JIRA)
Igor Motov created LUCENE-5502:
--

 Summary: equals method of TermsFilter might equate two different 
filters
 Key: LUCENE-5502
 URL: https://issues.apache.org/jira/browse/LUCENE-5502
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/query/scoring
Affects Versions: 4.7
Reporter: Igor Motov


If two terms filters have 1) the same number of terms, 2) use the same field in 
all these terms and 3) term values happened to have the same hash codes, these 
two filter are considered to be equal as long as the first term is the same in 
both filters.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3724) No highlighting for phrases with stop words when FVH is used

2012-08-09 Thread Igor Motov (JIRA)
Igor Motov created SOLR-3724:


 Summary: No highlighting for phrases with stop words when FVH is 
used
 Key: SOLR-3724
 URL: https://issues.apache.org/jira/browse/SOLR-3724
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 3.6.1
Reporter: Igor Motov


To reproduce:
- Index text "foo and bar" into the field "message" with the following schema :
{code:xml}

  


  



  
  




  


  
  



  
  

{code}
- Search for the {{message:"foo and bar"}} with highlighting enabled and 
{{hl.useFastVectorHighlighter=true}}
- The text is not highlighted

Standard highlighter works fine. If I set {{enablePositionIncrements=false}} in 
the analyzer, FVH starts to highlight the entire phrase. You can find complete 
schema and test data files that I used to reproduce this issue here: 
https://gist.github.com/3279879 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org