[jira] [Resolved] (LUCENE-8976) Use exact distance between point and bounding rectangle in FloatPointNearestNeighbor

2019-09-11 Thread Ignacio Vera (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera resolved LUCENE-8976.
--
Fix Version/s: 8.3
 Assignee: Ignacio Vera
   Resolution: Fixed

> Use exact distance between point and bounding rectangle in 
> FloatPointNearestNeighbor
> 
>
> Key: LUCENE-8976
> URL: https://issues.apache.org/jira/browse/LUCENE-8976
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Minor
> Fix For: 8.3
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Calculating minimum distance between a point and a bounding rectangle can be 
> computed quite efficiently. This allows the FloatPointNearestNeighbor 
> algorithm to discard inner nodes based on that calculation.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8976) Use exact distance between point and bounding rectangle in FloatPointNearestNeighbor

2019-09-11 Thread Ignacio Vera (Jira)
Ignacio Vera created LUCENE-8976:


 Summary: Use exact distance between point and bounding rectangle 
in FloatPointNearestNeighbor
 Key: LUCENE-8976
 URL: https://issues.apache.org/jira/browse/LUCENE-8976
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Ignacio Vera


Calculating minimum distance between a point and a bounding rectangle can be 
computed quite efficiently. This allows the FloatPointNearestNeighbor algorithm 
to discard inner nodes based on that calculation.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8968) Improve performance of WITHIN and DISJOINT queries for Shape queries

2019-09-10 Thread Ignacio Vera (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera resolved LUCENE-8968.
--
Fix Version/s: 8.3
 Assignee: Ignacio Vera
   Resolution: Fixed

> Improve performance of WITHIN and DISJOINT queries for Shape queries
> 
>
> Key: LUCENE-8968
> URL: https://issues.apache.org/jira/browse/LUCENE-8968
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: 8.3
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We are currently walking the tree twice for INTERSECTS and WITHIN queries in 
> ShapeQuery when we can do it in just one pass. Still we need most of the 
> times to visit all documents to remove false positives due to multi-shapes 
> except in the case where all documents up to maxDoc are on the tree.
> This issue refactors that class and tries to improve the strategy for such 
> cases.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8964) Allow GeoJSON parser to properly skip string arrays

2019-09-10 Thread Ignacio Vera (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera resolved LUCENE-8964.
--
Fix Version/s: 8.3
   Resolution: Fixed

Thanks [~aleree]

> Allow GeoJSON parser to properly skip string arrays
> ---
>
> Key: LUCENE-8964
> URL: https://issues.apache.org/jira/browse/LUCENE-8964
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other
>Affects Versions: trunk
>Reporter: Alexander Reelsen
>Assignee: Ignacio Vera
>Priority: Trivial
> Fix For: 8.3
>
> Attachments: lucene-parse-geojson-arrays-0.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The Geo JSON parser throws an exception when trying to parse an array of 
> strings, which is somewhat common in some free geojson services like 
> [https://whosonfirst.org|https://whosonfirst.org/]
> An example file can be seen at 
> [https://data.whosonfirst.org/101/748/479/101748479.geojson]
> This fixes the parser to also parse a string array.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8973) XYRectangle2D should work on float space

2019-09-10 Thread Ignacio Vera (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera updated LUCENE-8973:
-
Description: 
While working in CONTAINS support for shapes I came across errors in XYShape 
when querying with a bounding box. After looking to the errors it is clear that 
the issue is that XYRectangle2D is working in the encoding space. In this case, 
XYShape encoding is not lineal and shapes loose their spatial relationship. 

XYRectangle2D should work then on the float space.

> XYRectangle2D should work on float space
> 
>
> Key: LUCENE-8973
> URL: https://issues.apache.org/jira/browse/LUCENE-8973
> Project: Lucene - Core
>  Issue Type: Bug
> Environment: 
>Reporter: Ignacio Vera
>Priority: Major
>
> While working in CONTAINS support for shapes I came across errors in XYShape 
> when querying with a bounding box. After looking to the errors it is clear 
> that the issue is that XYRectangle2D is working in the encoding space. In 
> this case, XYShape encoding is not lineal and shapes loose their spatial 
> relationship. 
> XYRectangle2D should work then on the float space.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8973) XYRectangle2D should work on float space

2019-09-10 Thread Ignacio Vera (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera updated LUCENE-8973:
-
Environment: 




  was:
While working in CONTAINS support for shapes I came across errors in XYShape 
when querying with a bounding box. After looking to the errors it is clear that 
the issue is that XYRectangle2D is working in the encoding space. In this case, 
XYShape encoding is not lineal and shapes loose their spatial relationship. 

XYRectangle2D should work then on the float space.




> XYRectangle2D should work on float space
> 
>
> Key: LUCENE-8973
> URL: https://issues.apache.org/jira/browse/LUCENE-8973
> Project: Lucene - Core
>  Issue Type: Bug
> Environment: 
>Reporter: Ignacio Vera
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8973) XYRectangle2D should work on float space

2019-09-10 Thread Ignacio Vera (Jira)
Ignacio Vera created LUCENE-8973:


 Summary: XYRectangle2D should work on float space
 Key: LUCENE-8973
 URL: https://issues.apache.org/jira/browse/LUCENE-8973
 Project: Lucene - Core
  Issue Type: Bug
 Environment: While working in CONTAINS support for shapes I came 
across errors in XYShape when querying with a bounding box. After looking to 
the errors it is clear that the issue is that XYRectangle2D is working in the 
encoding space. In this case, XYShape encoding is not lineal and shapes loose 
their spatial relationship. 

XYRectangle2D should work then on the float space.


Reporter: Ignacio Vera






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8964) Allow GeoJSON parser to properly skip string arrays

2019-09-06 Thread Ignacio Vera (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924220#comment-16924220
 ] 

Ignacio Vera commented on LUCENE-8964:
--

Thanks Alex! Patch looks good I will commit soon.

> Allow GeoJSON parser to properly skip string arrays
> ---
>
> Key: LUCENE-8964
> URL: https://issues.apache.org/jira/browse/LUCENE-8964
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other
>Affects Versions: trunk
>Reporter: Alexander Reelsen
>Assignee: Ignacio Vera
>Priority: Trivial
> Attachments: lucene-parse-geojson-arrays-0.patch
>
>
> The Geo JSON parser throws an exception when trying to parse an array of 
> strings, which is somewhat common in some free geojson services like 
> [https://whosonfirst.org|https://whosonfirst.org/]
> An example file can be seen at 
> [https://data.whosonfirst.org/101/748/479/101748479.geojson]
> This fixes the parser to also parse a string array.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-8964) Allow GeoJSON parser to properly skip string arrays

2019-09-06 Thread Ignacio Vera (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera reassigned LUCENE-8964:


Assignee: Ignacio Vera

> Allow GeoJSON parser to properly skip string arrays
> ---
>
> Key: LUCENE-8964
> URL: https://issues.apache.org/jira/browse/LUCENE-8964
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other
>Affects Versions: trunk
>Reporter: Alexander Reelsen
>Assignee: Ignacio Vera
>Priority: Trivial
> Attachments: lucene-parse-geojson-arrays-0.patch
>
>
> The Geo JSON parser throws an exception when trying to parse an array of 
> strings, which is somewhat common in some free geojson services like 
> [https://whosonfirst.org|https://whosonfirst.org/]
> An example file can be seen at 
> [https://data.whosonfirst.org/101/748/479/101748479.geojson]
> This fixes the parser to also parse a string array.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8968) Improve performance of WITHIN and DISJOINT queries for Shape queries

2019-09-05 Thread Ignacio Vera (Jira)
Ignacio Vera created LUCENE-8968:


 Summary: Improve performance of WITHIN and DISJOINT queries for 
Shape queries
 Key: LUCENE-8968
 URL: https://issues.apache.org/jira/browse/LUCENE-8968
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Ignacio Vera


We are currently walking the tree twice for INTERSECTS and WITHIN queries in 
ShapeQuery when we can do it in just one pass. Still we need most of the times 
to visit all documents to remove false positives due to multi-shapes except in 
the case where all documents up to maxDoc are on the tree.

This issue refactors that class and tries to improve the strategy for such 
cases.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8960) Add LatLonDocValuesPointInPolygonQuery

2019-09-03 Thread Ignacio Vera (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera resolved LUCENE-8960.
--
Fix Version/s: 8.3
 Assignee: Ignacio Vera
   Resolution: Fixed

> Add LatLonDocValuesPointInPolygonQuery
> --
>
> Key: LUCENE-8960
> URL: https://issues.apache.org/jira/browse/LUCENE-8960
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Minor
> Fix For: 8.3
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently LatLonDocValuesField contain queries for bounding box and circle. 
> This issue adds a polygon query as well.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8960) Add LatLonDocValuesPointInPolygonQuery

2019-09-02 Thread Ignacio Vera (Jira)
Ignacio Vera created LUCENE-8960:


 Summary: Add LatLonDocValuesPointInPolygonQuery
 Key: LUCENE-8960
 URL: https://issues.apache.org/jira/browse/LUCENE-8960
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Ignacio Vera


Currently LatLonDocValuesField contain queries for bounding box and circle. 
This issue adds a polygon query as well.




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8955) Move compare logic to IntersectVisitor in NearestNeighbor

2019-08-27 Thread Ignacio Vera (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916777#comment-16916777
 ] 

Ignacio Vera commented on LUCENE-8955:
--

Nice speed up of 10 nearest points query on geo benchmarks (65%):

https://home.apache.org/~mikemccand/geobench.html#search-nearest_10

> Move compare logic to IntersectVisitor in NearestNeighbor
> -
>
> Key: LUCENE-8955
> URL: https://issues.apache.org/jira/browse/LUCENE-8955
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Minor
> Fix For: 8.3
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Similar to LUCENE-8914, move compare logic to the IntersectVisitor so we can 
> take advantage of the improvement added on LUCENE-7862. I ran the 
> geoBenchmark for nearest 10 locally and the change provides an improvement of 
> around 30%. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8955) Move compare logic to IntersectVisitor in NearestNeighbor

2019-08-26 Thread Ignacio Vera (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera resolved LUCENE-8955.
--
Fix Version/s: 8.3
 Assignee: Ignacio Vera
   Resolution: Fixed

> Move compare logic to IntersectVisitor in NearestNeighbor
> -
>
> Key: LUCENE-8955
> URL: https://issues.apache.org/jira/browse/LUCENE-8955
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Minor
> Fix For: 8.3
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Similar to LUCENE-8914, move compare logic to the IntersectVisitor so we can 
> take advantage of the improvement added on LUCENE-7862. I ran the 
> geoBenchmark for nearest 10 locally and the change provides an improvement of 
> around 30%. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8952) Use a sort key instead of true distance in NearestNeighbors.

2019-08-23 Thread Ignacio Vera (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera resolved LUCENE-8952.
--
Fix Version/s: 8.3
 Assignee: Ignacio Vera
   Resolution: Fixed

> Use a sort key instead of true distance in NearestNeighbors.
> 
>
> Key: LUCENE-8952
> URL: https://issues.apache.org/jira/browse/LUCENE-8952
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Julie Tibshirani
>Assignee: Ignacio Vera
>Priority: Minor
> Fix For: 8.3
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The NearestNeighbors class contains a TODO to switch to 
> SloppyMath.haversinSortKey when comparing candidate nearest neighbors. This 
> change is not high priority, but could be a nice way to get more familiar 
> with the kNN search implementation.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13452) Update the lucene-solr build from Ivy+Ant+Maven (shadow build) to Gradle.

2019-08-22 Thread Ignacio Vera (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16913088#comment-16913088
 ] 

Ignacio Vera edited comment on SOLR-13452 at 8/22/19 7:55 AM:
--

I had a look to the implementation and I found that the new Lucene modules 
monitor and Luke seems not to be included in the gradle build.

And a question: which is the gradle command similar to ant precommit?


was (Author: ivera):
I had a look to the implementation and I found that the new Lucene modules 
monitor and Luke seems not to be included in the cradle build.

And a question: which is the griddle command similar to ant precommit?

> Update the lucene-solr build from Ivy+Ant+Maven (shadow build) to Gradle.
> -
>
> Key: SOLR-13452
> URL: https://issues.apache.org/jira/browse/SOLR-13452
> Project: Solr
>  Issue Type: Improvement
>  Components: Build
>Reporter: Mark Miller
>Assignee: Mark Miller
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: gradle-build.pdf
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I took some things from the great work that Dat did in 
> [https://github.com/apache/lucene-solr/tree/jira/gradle] and took the ball a 
> little further.
>  
> When working with gradle in sub modules directly, I recommend 
> [https://github.com/dougborg/gdub]
> This gradle branch uses the following plugin for version locking, version 
> configuration and version consistency across modules: 
> [https://github.com/palantir/gradle-consistent-versions]
>  
> https://github.com/apache/lucene-solr/tree/jira/SOLR-13452_gradle_5



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13452) Update the lucene-solr build from Ivy+Ant+Maven (shadow build) to Gradle.

2019-08-22 Thread Ignacio Vera (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16913088#comment-16913088
 ] 

Ignacio Vera commented on SOLR-13452:
-

I had a look to the implementation and I found that the new Lucene modules 
monitor and Luke seems not to be included in the cradle build.

And a question: which is the griddle command similar to ant precommit?

> Update the lucene-solr build from Ivy+Ant+Maven (shadow build) to Gradle.
> -
>
> Key: SOLR-13452
> URL: https://issues.apache.org/jira/browse/SOLR-13452
> Project: Solr
>  Issue Type: Improvement
>  Components: Build
>Reporter: Mark Miller
>Assignee: Mark Miller
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: gradle-build.pdf
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I took some things from the great work that Dat did in 
> [https://github.com/apache/lucene-solr/tree/jira/gradle] and took the ball a 
> little further.
>  
> When working with gradle in sub modules directly, I recommend 
> [https://github.com/dougborg/gdub]
> This gradle branch uses the following plugin for version locking, version 
> configuration and version consistency across modules: 
> [https://github.com/palantir/gradle-consistent-versions]
>  
> https://github.com/apache/lucene-solr/tree/jira/SOLR-13452_gradle_5



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8955) Move compare logic to IntersectVisitor in NearestNeighbor

2019-08-22 Thread Ignacio Vera (Jira)
Ignacio Vera created LUCENE-8955:


 Summary: Move compare logic to IntersectVisitor in NearestNeighbor
 Key: LUCENE-8955
 URL: https://issues.apache.org/jira/browse/LUCENE-8955
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Ignacio Vera


Similar to LUCENE-8914, move compare logic to the IntersectVisitor so we can 
take advantage of the improvement added on LUCENE-7862. I ran the geoBenchmark 
for nearest 10 locally and the change provides an improvement of around 30%. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8928) BKDWriter could make splitting decisions based on the actual range of values

2019-07-26 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16893763#comment-16893763
 ] 

Ignacio Vera commented on LUCENE-8928:
--

I tried to see the effect running on 1D ranges and it is the same as above. So 
+1 to apply this change only when numDims > 2 as it seems the right tree off.

> BKDWriter could make splitting decisions based on the actual range of values
> 
>
> Key: LUCENE-8928
> URL: https://issues.apache.org/jira/browse/LUCENE-8928
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> Currently BKDWriter assumes that splitting on one dimension has no effect on 
> values in other dimensions. While this may be ok for geo points, this is 
> usually not true for ranges (or geo shapes, which are ranges too). Maybe we 
> could get better indexing by re-computing the range of values on each 
> dimension before making the choice of the split dimension?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-8928) BKDWriter could make splitting decisions based on the actual range of values

2019-07-26 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16893763#comment-16893763
 ] 

Ignacio Vera edited comment on LUCENE-8928 at 7/26/19 12:02 PM:


I tried to see the effect running on 1D ranges and it is the same as above. So 
+1 to apply this change only when numDims > 2 as it seems the right trade off.


was (Author: ivera):
I tried to see the effect running on 1D ranges and it is the same as above. So 
+1 to apply this change only when numDims > 2 as it seems the right tree off.

> BKDWriter could make splitting decisions based on the actual range of values
> 
>
> Key: LUCENE-8928
> URL: https://issues.apache.org/jira/browse/LUCENE-8928
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> Currently BKDWriter assumes that splitting on one dimension has no effect on 
> values in other dimensions. While this may be ok for geo points, this is 
> usually not true for ranges (or geo shapes, which are ranges too). Maybe we 
> could get better indexing by re-computing the range of values on each 
> dimension before making the choice of the split dimension?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8369) Remove the spatial module as it is obsolete

2019-07-26 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16893752#comment-16893752
 ] 

Ignacio Vera commented on LUCENE-8369:
--

My best example is LUCENE-8746: I am trying to refactor the classes that 
contain the spatial logic and having them in different packages make it very 
difficult.

In addition, how to asses what is common and what is exotic? Maybe pointInBox 
(which is a range anyway) is the most common case but pointInPolygon might 
start moving into the exotic area.

> Remove the spatial module as it is obsolete
> ---
>
> Key: LUCENE-8369
> URL: https://issues.apache.org/jira/browse/LUCENE-8369
> Project: Lucene - Core
>  Issue Type: Task
>  Components: modules/spatial
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
> Attachments: LUCENE-8369.patch
>
>
> The "spatial" module is at this juncture nearly empty with only a couple 
> utilities that aren't used by anything in the entire codebase -- 
> GeoRelationUtils, and MortonEncoder.  Perhaps it should have been removed 
> earlier in LUCENE-7664 which was the removal of GeoPointField which was 
> essentially why the module existed.  Better late than never.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8369) Remove the spatial module as it is obsolete

2019-07-25 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16892785#comment-16892785
 ] 

Ignacio Vera commented on LUCENE-8369:
--

My fear of having LatLonPoint in a different package to other spatial fields is 
code sharing. Most of the code used by LatLotPoint is reused when dealing with 
more complex shapes and having them in different packages hurts the API 
(Objects that should be package protected become public because the need of 
reuse them). This has been already the case working in the sandbox.

> Remove the spatial module as it is obsolete
> ---
>
> Key: LUCENE-8369
> URL: https://issues.apache.org/jira/browse/LUCENE-8369
> Project: Lucene - Core
>  Issue Type: Task
>  Components: modules/spatial
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
> Attachments: LUCENE-8369.patch
>
>
> The "spatial" module is at this juncture nearly empty with only a couple 
> utilities that aren't used by anything in the entire codebase -- 
> GeoRelationUtils, and MortonEncoder.  Perhaps it should have been removed 
> earlier in LUCENE-7664 which was the removal of GeoPointField which was 
> essentially why the module existed.  Better late than never.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-8928) BKDWriter could make splitting decisions based on the actual range of values

2019-07-22 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890173#comment-16890173
 ] 

Ignacio Vera edited comment on LUCENE-8928 at 7/22/19 6:11 PM:
---

I run this approach locally. It helps as well in the case of Geo3D (3 
dimensions case) quite a bit. I tried different approaches to try to make 
indexation faster but so far no luck:

 
||Approach||Index time (sec)||Index time (sec)|| ||Force merge time 
(sec)||Force merge time (sec)|| ||Index size (GB)||Index size (GB)|| ||Reader 
heap (MB)||Reader heap (MB)||
|| ||Dev||Base||Diff||Dev||Base||diff||Dev||Base||Diff||Dev||Base||Diff||
|points|181.1s|124.4s|46%|76.9s|53.5s|44%|0.55|0.55|-0%|1.57|1.57|0%|
|shapes|327.4s|215.4s|52%|168.9s|120.2s|40%|1.28|1.29|-1%|1.62|1.61|0%|
|geo3d|211.9s|154.7s|37%|94.3s|66.4s|42%|0.75|0.75|-0%|1.58|1.58|0%|

 


 
||Approach||Shape||M hits/sec||M hits/sec|| ||QPS  ||QPS ||   ||Hit 
count  ||Hit count|| 
 ||  ||  ||Dev||Base ||Diff||Dev||Base||Diff||Dev||Base||Diff||
|points|box|94.34|94.84|-1%|95.99|96.50|-1%|221118844|221118844| 0%|
|points|polyRussia|20.07|20.46|-2%|5.72|5.83|-2%|3508846|3508846| 0%|
|points|poly 10|88.64|87.56| 1%|56.05|55.37| 1%|355809475|355809475| 0%|
|points|polyMedium|10.47|10.54|-1%|128.26|129.15|-1%|2693559|2693559| 0%|
|points|box|94.34|94.84|-1%|95.99|96.50|-1%|221118844|221118844| 0%|
|points|distance|93.48|95.96|-3%|54.92|56.38|-3%|382961957|382961957| 0%|
|points|nearest 10|0.10|0.09|11%|9687.24|8755.72|11%|60844404|60844404| 0%|
|points|sort|43.12|43.04| 0%|43.88|43.80| 0%|221118844|221118844| 0%|
|shapes|box|66.02|52.23|26%|67.18|53.15|26%|221118844|221118844| 0%|
|shapes|polyRussia|11.57|9.85|17%|3.30|2.81|17%|3508846|3508846| 0%|
|shapes|poly 10|54.98|47.08|17%|34.77|29.77|17%|355809475|355809475| 0%|
|shapes|polyMedium|5.31|4.52|17%|65.01|55.39|17%|2693559|2693559| 0%|
|shapes|box|66.02|52.23|26%|67.18|53.15|26%|221118844|221118844| 0%|
|geo3d|box|79.17|66.22|20%|80.56|67.38|20%|221118844|221118844| 0%|
|geo3d|polyRussia|0.95|0.90| 5%|0.27|0.26| 5%|3508671|3508671| 0%|
|geo3d|poly 10|77.26|57.16|35%|48.85|36.14|35%|355855227|355855227| 0%|
|geo3d|polyMedium|0.95|0.69|37%|11.62|8.50|37%|2693545|2693545| 0%|
|geo3d|box|79.17|66.22|20%|80.56|67.38|20%|221118844|221118844| 0%|
|geo3d|distance|95.35|76.17|25%|55.96|44.70|25%|383371884|383371884| 0%|

 


was (Author: ivera):
I run this approach locally. It helps as well in the case of Geo3D (3 
dimensions case) quite a bit. I tried different approaches to try to make 
indexation faster but so far no luck:

 
||Approach||Index time (sec)||Index time (sec)||Force merge time (sec)||Force 
merge time (sec)||Index size (GB)||Index size (GB)||Reader heap (MB)||Reader 
heap (MB)||
|| ||Dev||Base||Diff||Dev||Base||diff||Dev||Base||Diff||Dev||Base||Diff||
|points|181.1s|124.4s|46%|76.9s|53.5s|44%|0.55|0.55|-0%|1.57|1.57|0%|
|shapes|327.4s|215.4s|52%|168.9s|120.2s|40%|1.28|1.29|-1%|1.62|1.61|0%|
|geo3d|211.9s|154.7s|37%|94.3s|66.4s|42%|0.75|0.75|-0%|1.58|1.58|0%|

 


 
||Approach||Shape||M hits/sec||M hits/sec|| ||QPS  ||QPS ||   ||Hit 
count  ||Hit count|| 
 ||  ||  ||Dev||Base ||Diff||Dev||Base||Diff||Dev||Base||Diff||
|points|box|94.34|94.84|-1%|95.99|96.50|-1%|221118844|221118844| 0%|
|points|polyRussia|20.07|20.46|-2%|5.72|5.83|-2%|3508846|3508846| 0%|
|points|poly 10|88.64|87.56| 1%|56.05|55.37| 1%|355809475|355809475| 0%|
|points|polyMedium|10.47|10.54|-1%|128.26|129.15|-1%|2693559|2693559| 0%|
|points|box|94.34|94.84|-1%|95.99|96.50|-1%|221118844|221118844| 0%|
|points|distance|93.48|95.96|-3%|54.92|56.38|-3%|382961957|382961957| 0%|
|points|nearest 10|0.10|0.09|11%|9687.24|8755.72|11%|60844404|60844404| 0%|
|points|sort|43.12|43.04| 0%|43.88|43.80| 0%|221118844|221118844| 0%|
|shapes|box|66.02|52.23|26%|67.18|53.15|26%|221118844|221118844| 0%|
|shapes|polyRussia|11.57|9.85|17%|3.30|2.81|17%|3508846|3508846| 0%|
|shapes|poly 10|54.98|47.08|17%|34.77|29.77|17%|355809475|355809475| 0%|
|shapes|polyMedium|5.31|4.52|17%|65.01|55.39|17%|2693559|2693559| 0%|
|shapes|box|66.02|52.23|26%|67.18|53.15|26%|221118844|221118844| 0%|
|geo3d|box|79.17|66.22|20%|80.56|67.38|20%|221118844|221118844| 0%|
|geo3d|polyRussia|0.95|0.90| 5%|0.27|0.26| 5%|3508671|3508671| 0%|
|geo3d|poly 10|77.26|57.16|35%|48.85|36.14|35%|355855227|355855227| 0%|
|geo3d|polyMedium|0.95|0.69|37%|11.62|8.50|37%|2693545|2693545| 0%|
|geo3d|box|79.17|66.22|20%|80.56|67.38|20%|221118844|221118844| 0%|
|geo3d|distance|95.35|76.17|25%|55.96|44.70|25%|383371884|383371884| 0%|

 

> BKDWriter could make splitting decisions based on the actual range of values
> 
>
> Key: LUCENE-8928
> URL: https://issues.apache.org/jira/browse/LUCENE-8928
> Project: Lucene - Core
> 

[jira] [Commented] (LUCENE-8928) BKDWriter could make splitting decisions based on the actual range of values

2019-07-22 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890214#comment-16890214
 ] 

Ignacio Vera commented on LUCENE-8928:
--

1-dimensional ranges (2D in total). 

In fact the shapes benchmark above is a 2-dimensional rage (4D in total).

 

> BKDWriter could make splitting decisions based on the actual range of values
> 
>
> Key: LUCENE-8928
> URL: https://issues.apache.org/jira/browse/LUCENE-8928
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> Currently BKDWriter assumes that splitting on one dimension has no effect on 
> values in other dimensions. While this may be ok for geo points, this is 
> usually not true for ranges (or geo shapes, which are ranges too). Maybe we 
> could get better indexing by re-computing the range of values on each 
> dimension before making the choice of the split dimension?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8928) BKDWriter could make splitting decisions based on the actual range of values

2019-07-22 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890194#comment-16890194
 ] 

Ignacio Vera commented on LUCENE-8928:
--

I would expect some increase on QPS for 2D when there is correlation between 
the dimensions, e.g range fields.

> BKDWriter could make splitting decisions based on the actual range of values
> 
>
> Key: LUCENE-8928
> URL: https://issues.apache.org/jira/browse/LUCENE-8928
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> Currently BKDWriter assumes that splitting on one dimension has no effect on 
> values in other dimensions. While this may be ok for geo points, this is 
> usually not true for ranges (or geo shapes, which are ranges too). Maybe we 
> could get better indexing by re-computing the range of values on each 
> dimension before making the choice of the split dimension?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8928) BKDWriter could make splitting decisions based on the actual range of values

2019-07-22 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890173#comment-16890173
 ] 

Ignacio Vera commented on LUCENE-8928:
--

I run this approach locally. It helps as well in the case of Geo3D (3 
dimensions case) quite a bit. I tried different approaches to try to make 
indexation faster but so far no luck:

 
||Approach||Index time (sec)||Index time (sec)||Force merge time (sec)||Force 
merge time (sec)||Index size (GB)||Index size (GB)||Reader heap (MB)||Reader 
heap (MB)||
|| ||Dev||Base||Diff||Dev||Base||diff||Dev||Base||Diff||Dev||Base||Diff||
|points|181.1s|124.4s|46%|76.9s|53.5s|44%|0.55|0.55|-0%|1.57|1.57|0%|
|shapes|327.4s|215.4s|52%|168.9s|120.2s|40%|1.28|1.29|-1%|1.62|1.61|0%|
|geo3d|211.9s|154.7s|37%|94.3s|66.4s|42%|0.75|0.75|-0%|1.58|1.58|0%|

 


 
||Approach||Shape||M hits/sec||M hits/sec|| ||QPS  ||QPS ||   ||Hit 
count  ||Hit count|| 
 ||  ||  ||Dev||Base ||Diff||Dev||Base||Diff||Dev||Base||Diff||
|points|box|94.34|94.84|-1%|95.99|96.50|-1%|221118844|221118844| 0%|
|points|polyRussia|20.07|20.46|-2%|5.72|5.83|-2%|3508846|3508846| 0%|
|points|poly 10|88.64|87.56| 1%|56.05|55.37| 1%|355809475|355809475| 0%|
|points|polyMedium|10.47|10.54|-1%|128.26|129.15|-1%|2693559|2693559| 0%|
|points|box|94.34|94.84|-1%|95.99|96.50|-1%|221118844|221118844| 0%|
|points|distance|93.48|95.96|-3%|54.92|56.38|-3%|382961957|382961957| 0%|
|points|nearest 10|0.10|0.09|11%|9687.24|8755.72|11%|60844404|60844404| 0%|
|points|sort|43.12|43.04| 0%|43.88|43.80| 0%|221118844|221118844| 0%|
|shapes|box|66.02|52.23|26%|67.18|53.15|26%|221118844|221118844| 0%|
|shapes|polyRussia|11.57|9.85|17%|3.30|2.81|17%|3508846|3508846| 0%|
|shapes|poly 10|54.98|47.08|17%|34.77|29.77|17%|355809475|355809475| 0%|
|shapes|polyMedium|5.31|4.52|17%|65.01|55.39|17%|2693559|2693559| 0%|
|shapes|box|66.02|52.23|26%|67.18|53.15|26%|221118844|221118844| 0%|
|geo3d|box|79.17|66.22|20%|80.56|67.38|20%|221118844|221118844| 0%|
|geo3d|polyRussia|0.95|0.90| 5%|0.27|0.26| 5%|3508671|3508671| 0%|
|geo3d|poly 10|77.26|57.16|35%|48.85|36.14|35%|355855227|355855227| 0%|
|geo3d|polyMedium|0.95|0.69|37%|11.62|8.50|37%|2693545|2693545| 0%|
|geo3d|box|79.17|66.22|20%|80.56|67.38|20%|221118844|221118844| 0%|
|geo3d|distance|95.35|76.17|25%|55.96|44.70|25%|383371884|383371884| 0%|

 

> BKDWriter could make splitting decisions based on the actual range of values
> 
>
> Key: LUCENE-8928
> URL: https://issues.apache.org/jira/browse/LUCENE-8928
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> Currently BKDWriter assumes that splitting on one dimension has no effect on 
> values in other dimensions. While this may be ok for geo points, this is 
> usually not true for ranges (or geo shapes, which are ranges too). Maybe we 
> could get better indexing by re-computing the range of values on each 
> dimension before making the choice of the split dimension?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8913) Reproducing failure in various TestLatLon* equals/hashcode tests

2019-07-17 Thread Ignacio Vera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera resolved LUCENE-8913.
--
   Resolution: Fixed
 Assignee: Ignacio Vera
Fix Version/s: 8.3
   8.2
   master (9.0)

> Reproducing failure in various TestLatLon*  equals/hashcode tests
> -
>
> Key: LUCENE-8913
> URL: https://issues.apache.org/jira/browse/LUCENE-8913
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/other
>Affects Versions: master (9.0)
>Reporter: Gus Heck
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: master (9.0), 8.2, 8.3
>
>
> Bumped into this while running tests locally
> ant clean test -Dtests.seed=41D0C5A80C823307 -Dtests.slow=true 
> -Dtests.badapples=true -Dtests.locale=es-CL 
> -Dtests.timezone=Pacific/Rarotonga -Dtests.asserts=true 
> -Dtests.file.encoding=UTF-8
> reliably produces:
>  
> {code:java}
> Tests with failures [seed: 41D0C5A80C823307]:
>[junit4]   - 
> org.apache.lucene.document.TestLatLonPointShapeQueries.testBoxQueryEqualsAndHashcode
>[junit4]   - 
> org.apache.lucene.document.TestLatLonMultiPointShapeQueries.testBoxQueryEqualsAndHashcode
>[junit4]   - 
> org.apache.lucene.document.TestLatLonLineShapeQueries.testBoxQueryEqualsAndHashcode
>[junit4]   - 
> org.apache.lucene.document.TestLatLonPolygonShapeQueries.testBoxQueryEqualsAndHashcode
>[junit4]   - 
> org.apache.lucene.document.TestLatLonMultiPolygonShapeQueries.testBoxQueryEqualsAndHashcode
>[junit4]   - 
> org.apache.lucene.document.TestLatLonMultiLineShapeQueries.testBoxQueryEqualsAndHashcode{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8913) Reproducing failure in various TestLatLon* equals/hashcode tests

2019-07-17 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887627#comment-16887627
 ] 

Ignacio Vera commented on LUCENE-8913:
--

I will fix this one as it is a trivial test bug

> Reproducing failure in various TestLatLon*  equals/hashcode tests
> -
>
> Key: LUCENE-8913
> URL: https://issues.apache.org/jira/browse/LUCENE-8913
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/other
>Affects Versions: master (9.0)
>Reporter: Gus Heck
>Priority: Major
>
> Bumped into this while running tests locally
> ant clean test -Dtests.seed=41D0C5A80C823307 -Dtests.slow=true 
> -Dtests.badapples=true -Dtests.locale=es-CL 
> -Dtests.timezone=Pacific/Rarotonga -Dtests.asserts=true 
> -Dtests.file.encoding=UTF-8
> reliably produces:
>  
> {code:java}
> Tests with failures [seed: 41D0C5A80C823307]:
>[junit4]   - 
> org.apache.lucene.document.TestLatLonPointShapeQueries.testBoxQueryEqualsAndHashcode
>[junit4]   - 
> org.apache.lucene.document.TestLatLonMultiPointShapeQueries.testBoxQueryEqualsAndHashcode
>[junit4]   - 
> org.apache.lucene.document.TestLatLonLineShapeQueries.testBoxQueryEqualsAndHashcode
>[junit4]   - 
> org.apache.lucene.document.TestLatLonPolygonShapeQueries.testBoxQueryEqualsAndHashcode
>[junit4]   - 
> org.apache.lucene.document.TestLatLonMultiPolygonShapeQueries.testBoxQueryEqualsAndHashcode
>[junit4]   - 
> org.apache.lucene.document.TestLatLonMultiLineShapeQueries.testBoxQueryEqualsAndHashcode{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8914) Small improvement in FloatPointNearestNeighbor

2019-07-17 Thread Ignacio Vera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera resolved LUCENE-8914.
--
   Resolution: Fixed
 Assignee: Ignacio Vera
Fix Version/s: 8.3
   master (9.0)

> Small improvement in FloatPointNearestNeighbor
> --
>
> Key: LUCENE-8914
> URL: https://issues.apache.org/jira/browse/LUCENE-8914
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Minor
> Fix For: master (9.0), 8.3
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the logic to visit inner nodes of the BKD tree in 
> FloatPointNearestNeighbor is in the custom tree traversing logic instead of 
> in the IntersectVisitor. This approach is missing the improvement added on 
> LUCENE-7862 which my experiments shows that for a high number of dimensions 
> can give a performance improvements of around 10%.
> This change proposes to move the logic for discarding inner modes to the 
> IntersectVisitor.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8923) Release procedure does not add new version in CHANGES.txt in master

2019-07-17 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886967#comment-16886967
 ] 

Ignacio Vera commented on LUCENE-8923:
--

I added the entries. I leave the issue open so we can clarify if the current 
procedure needs to be updated to add this entries,

> Release procedure does not add new version in CHANGES.txt in master
> ---
>
> Key: LUCENE-8923
> URL: https://issues.apache.org/jira/browse/LUCENE-8923
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Priority: Minor
> Attachments: LUCENE-8923.patch
>
>
> This issue is just to track something that maybe missing in the release 
> procedure. It currently adds a new version on CHANGES.txt in the minor 
> version branch but it does not do it in master.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8923) Release procedure does not add new version in CHANGES.txt in master

2019-07-17 Thread Ignacio Vera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera updated LUCENE-8923:
-
Attachment: LUCENE-8923.patch
Status: Open  (was: Open)

In the meanwhile I propose to create them manually. Attached a patch.

[~tomoko] I am moving you issues to Lucene 8.3 in master, let me know if it is 
correct.

> Release procedure does not add new version in CHANGES.txt in master
> ---
>
> Key: LUCENE-8923
> URL: https://issues.apache.org/jira/browse/LUCENE-8923
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Priority: Minor
> Attachments: LUCENE-8923.patch
>
>
> This issue is just to track something that maybe missing in the release 
> procedure. It currently adds a new version on CHANGES.txt in the minor 
> version branch but it does not do it in master.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8923) Release procedure does not add new version in CHANGES.txt in master

2019-07-17 Thread Ignacio Vera (JIRA)
Ignacio Vera created LUCENE-8923:


 Summary: Release procedure does not add new version in CHANGES.txt 
in master
 Key: LUCENE-8923
 URL: https://issues.apache.org/jira/browse/LUCENE-8923
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Ignacio Vera


This issue is just to track something that maybe missing in the release 
procedure. It currently adds a new version on CHANGES.txt in the minor version 
branch but it does not do it in master.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8911) Backport LUCENE-8778 (improved analysis SPI name handling) to 8.x

2019-07-16 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886058#comment-16886058
 ] 

Ignacio Vera commented on LUCENE-8911:
--

I have my doubts for this change to make 8.2. Elasticsearch CI has reported a 
failure that I think it is related with. this change:
{code:java}
ant test  -Dtestcase=TestFactories -Dtests.method=test 
-Dtests.seed=FEA8D71DFC111060 -Dtests.slow=true -Dtests.badapples=true 
-Dtests.locale=zh-TW -Dtests.timezone=Europe/Guernsey -Dtests.asserts=true 
-Dtests.file.encoding=ISO-8859-1{code}

> Backport LUCENE-8778 (improved analysis SPI name handling) to 8.x
> -
>
> Key: LUCENE-8911
> URL: https://issues.apache.org/jira/browse/LUCENE-8911
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In LUCENE-8907 I reverted LUCENE-8778 from the 8x branch.
> Can we backport it to 8x branch again, with transparent backwards 
> compatibility (by emulating the factory loading method of Lucene 8.1)?
> I am not so sure about it would be better or not to backport the changes, 
> however, maybe it is good for Solr to have SOLR-13593 without waiting for 
> release 9.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8894) Add APIs to tokenizer/charfilter/tokenfilter factories to get their SPI names from concrete classes

2019-07-16 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886022#comment-16886022
 ] 

Ignacio Vera commented on LUCENE-8894:
--

In master the entry in CHANGES.txt is in Lucene 9.0.0 bit from branch_8x the 
entry is under Lucene 8.3.0, is that correct?

 

> Add APIs to tokenizer/charfilter/tokenfilter factories to get their SPI names 
> from concrete classes
> ---
>
> Key: LUCENE-8894
> URL: https://issues.apache.org/jira/browse/LUCENE-8894
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Minor
> Fix For: master (9.0), 8.2
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, reflection tricks are needed to obtain SPI name (this is now 
> stored in static NAME fields in each factory class) from a concrete factory 
> class. While it is easy to implement that logic, it would be much better to 
> provide unified APIs to get SPI name from a factory class. In other words, 
> the APIs would provide "inverse" operation of {{lookupClass(String)}} method.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8914) Small improvement in FloatPointNearestNeighbor

2019-07-14 Thread Ignacio Vera (JIRA)
Ignacio Vera created LUCENE-8914:


 Summary: Small improvement in FloatPointNearestNeighbor
 Key: LUCENE-8914
 URL: https://issues.apache.org/jira/browse/LUCENE-8914
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Ignacio Vera


Currently the logic to visit inner nodes of the BKD tree in 
FloatPointNearestNeighbor is in the custom tree traversing logic instead of in 
the IntersectVisitor. This approach is missing the improvement added on 
LUCENE-7862 which my experiments shows that for a high number of dimensions can 
give a performance improvements of around 10%.

This change proposes to move the logic for discarding inner modes to the 
IntersectVisitor.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8898) TestRamUsageEstimator.testMap failures

2019-07-04 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16878593#comment-16878593
 ] 

Ignacio Vera commented on LUCENE-8898:
--

[~ab]Can we resolve this issue?

> TestRamUsageEstimator.testMap failures
> --
>
> Key: LUCENE-8898
> URL: https://issues.apache.org/jira/browse/LUCENE-8898
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Andrzej Bialecki 
>Priority: Blocker
> Fix For: 8.2
>
>
> Here is an example failure:
> {noformat}
> 4 tests failed.
> FAILED:  org.apache.lucene.util.TestRamUsageEstimator.testMap
> Error Message:
> expected:<25152.0> but was:<30184.0>
> Stack Trace:
> java.lang.AssertionError: expected:<25152.0> but was:<30184.0>
> at 
> __randomizedtesting.SeedInfo.seed([ED7055A14021EA69:CD56E1725ADAF91B]:0)
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:834)
> at org.junit.Assert.assertEquals(Assert.java:553)
> at org.junit.Assert.assertEquals(Assert.java:683)
> at 
> org.apache.lucene.util.TestRamUsageEstimator.testMap(TestRamUsageEstimator.java:136)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
> at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
> at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
> at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
> at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
> at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
> at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
> at 
> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapte

[jira] [Updated] (LUCENE-8903) Add LatLonShape point query

2019-07-04 Thread Ignacio Vera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera updated LUCENE-8903:
-
Description: Adds a query to LatLonShape that filters by a provided point.  
 (was: Add a query to LatLonShape that filters by a provided point. )

> Add LatLonShape point query
> ---
>
> Key: LUCENE-8903
> URL: https://issues.apache.org/jira/browse/LUCENE-8903
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Ignacio Vera
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Adds a query to LatLonShape that filters by a provided point. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8903) Add LatLonShape point query

2019-07-04 Thread Ignacio Vera (JIRA)
Ignacio Vera created LUCENE-8903:


 Summary: Add LatLonShape point query
 Key: LUCENE-8903
 URL: https://issues.apache.org/jira/browse/LUCENE-8903
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Ignacio Vera


Add a query to LatLonShape that filters by a provided point. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8888) Improve distribution of points with data dimension in BKD tree leaves

2019-07-04 Thread Ignacio Vera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera resolved LUCENE-.
--
   Resolution: Fixed
 Assignee: Ignacio Vera
Fix Version/s: 8.2
   master (9.0)

> Improve distribution of points with data dimension in BKD tree leaves
> -
>
> Key: LUCENE-
> URL: https://issues.apache.org/jira/browse/LUCENE-
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: master (9.0), 8.2
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> In LUCENE-8688 it was introduce a new storing strategy for leaves contains 
> duplicated points. This works well with indexed dimension as the process of 
> partition the space and the final sorting of leaves groups points with equal 
> indexed dimensions.
> This is not the case all the time if the point contain data dimensions. It 
> might happen that if two points have the same indexed dimensions but 
> different data dimensions, the distribution on the leaves is not the most 
> optimal.
> A good example is if a user tries to index a bounding box using LatLonShape. 
> The resulting tessellation of a bounding box is two triangles with the same 
> indexed dimensions but different data dimensions. If there are two documents 
> indexing the same bounding box, the result in the leaf is the triangles from 
> one document followed by the triangles of the second document. This is  
> because the current sorting/selection algorithms  use one indexed dimension 
> and tie-break on the 
> docID.
> The most optimal distribution in the case above is two group together the 
> equal triangles. Therefore what it is propose here is to update the 
> selection/ sorting algorithms to use the data dimensions when they exist as 
> tie-breakers before using the docID.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8896) Override default implementation of IntersectVisitor#visit(DocIDSetBuilder, byte[]) for several queries

2019-07-02 Thread Ignacio Vera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera resolved LUCENE-8896.
--
   Resolution: Fixed
 Assignee: Ignacio Vera
Fix Version/s: 8.2
   master (9.0)

> Override default implementation of IntersectVisitor#visit(DocIDSetBuilder, 
> byte[]) for several queries
> --
>
> Key: LUCENE-8896
> URL: https://issues.apache.org/jira/browse/LUCENE-8896
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: master (9.0), 8.2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In LUCENE-8885, it was introduced a new method on the {{IntersectsVisitor}} 
> interface. It contains a default implementation but queries can override it 
> and therefore benefit when there are several documents on a leaf associated 
> to the same point.
> In this issue the following queries are proposed to override the default 
> implementation
> * LatLonShapeQuery
> * RangeFieldQuery
> * LatLonPointInPolygonQuery
> * LatLonPointDistanceQuery
> * PointRangeQuery
> * PointInSetQuery



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8896) Override default implementation of IntersectVisitor#visit(DocIDSetBuilder, byte[]) for several queries

2019-07-01 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876373#comment-16876373
 ] 

Ignacio Vera commented on LUCENE-8896:
--

[~atris] I am not sure what you mean, I have opened a PR with the change, hope 
it make sense to you.

> Override default implementation of IntersectVisitor#visit(DocIDSetBuilder, 
> byte[]) for several queries
> --
>
> Key: LUCENE-8896
> URL: https://issues.apache.org/jira/browse/LUCENE-8896
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In LUCENE-8885, it was introduced a new method on the {{IntersectsVisitor}} 
> interface. It contains a default implementation but queries can override it 
> and therefore benefit when there are several documents on a leaf associated 
> to the same point.
> In this issue the following queries are proposed to override the default 
> implementation
> * LatLonShapeQuery
> * RangeFieldQuery
> * LatLonPointInPolygonQuery
> * LatLonPointDistanceQuery
> * PointRangeQuery
> * PointInSetQuery



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8896) Override default implementation of IntersectVisitor#visit(DocIDSetBuilder, byte[]) for several queries

2019-07-01 Thread Ignacio Vera (JIRA)
Ignacio Vera created LUCENE-8896:


 Summary: Override default implementation of 
IntersectVisitor#visit(DocIDSetBuilder, byte[]) for several queries
 Key: LUCENE-8896
 URL: https://issues.apache.org/jira/browse/LUCENE-8896
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Ignacio Vera


In LUCENE-8885, it was introduced a new method on the {{IntersectsVisitor}} 
interface. It contains a default implementation but queries can override it and 
therefore benefit when there are several documents on a leaf associated to the 
same point.

In this issue the following queries are proposed to override the default 
implementation

* LatLonShapeQuery
* RangeFieldQuery
* LatLonPointInPolygonQuery
* LatLonPointDistanceQuery
* PointRangeQuery
* PointInSetQuery





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8885) Optimise BKD reader by exploiting cardinality information stored on leaves

2019-06-30 Thread Ignacio Vera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera resolved LUCENE-8885.
--
   Resolution: Fixed
 Assignee: Ignacio Vera
Fix Version/s: 8.2
   master (9.0)

> Optimise BKD reader by exploiting cardinality information stored on leaves
> --
>
> Key: LUCENE-8885
> URL: https://issues.apache.org/jira/browse/LUCENE-8885
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: master (9.0), 8.2
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> In LUCENE-8688 it was introduce a new storing strategy for leaves contains 
> duplicated points. In such case the points are stored together with the 
> cardinality. We still call the IntersectVisitor once per document therefore 
> we are checking many times the same point agains the query. The idea is to 
> check the point once and then add all the documents.
> The API of the IntersectVisitor does not allow that, and therefore to exploit 
> that property we need to either change the API or extend it. Here are the 
> possibilities I can think of:
> 1) Modify the API by replacing the method IntersectVisitor#visit(byte[], int) 
> by the following method:
> {code:java}
>  /** Called for leaf cells that intersects the leaf to test if the point  
>  matches to the query
>  * In case it matches, the implementor must call {@link 
> IntersectVisitor#visit(int)} with the
>  * documents associated with this point are visited */
> boolean matches(byte[] packedValue) throws IOException;
> {code}
> This will allow the BKD reader to check if a point matches the query and if 
> true then Coll the method IntersectVisitor#visit(int) for all documents 
> associated with that point.
> The drawback of this approach is backwards compatibility and the need to 
> update all classes implement this interface.
> 2) Extends the API by adding a new default method in the IntersectVisitor 
> interface:
> {code:java}
>  /** Called for documents in a leaf cell that crosses the query.  The consumer
>  *  should scrutinize the packedValue to decide whether to accept it.  If 
> accepted it should
>  *  consider only the {@code numberDocs} documents starting at {@code 
> offset} In the 1D case,
>  *  values are visited in increasing order, and in the case of ties, in 
> increasing
>  *  docID order. */
> default void visit(int[] docID, int offset, int numberDocs, byte[] 
> packedValue) throws IOException {
>   for ( int i =offset; i < offset + numberDocs; i++) {
> visit(docID[i], packedValue);
>   }
> }
> {code}
> The merit of this approach is that is backwards compatible and it is up to 
> the implementors to override this method and get the benefits for this 
> optimisation.The biggest downside is that it assumes that the codec has doc 
> IDs available in an int[] slice as opposed to streaming them from disk 
> directly to the IntersectVisitor for instance as [~jpountz] noted.
> Maybe there are more options I did not think about so looking forward to 
> hearing opining if we should do this change at all and if so, how to approach 
> it. My +1 goes to 1).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8831) LatLonShapeBoundingBoxQuery hashcode is wrong

2019-06-28 Thread Ignacio Vera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera updated LUCENE-8831:
-
Fix Version/s: 8.1.2

> LatLonShapeBoundingBoxQuery hashcode is wrong 
> --
>
> Key: LUCENE-8831
> URL: https://issues.apache.org/jira/browse/LUCENE-8831
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: master (9.0), 8.2, 8.1.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the hashcode implementation for LatLonShapeBoundingBoxQuery returns 
> always a different value. Therefore the query cannot be cached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8831) LatLonShapeBoundingBoxQuery hashcode is wrong

2019-06-28 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874910#comment-16874910
 ] 

Ignacio Vera commented on LUCENE-8831:
--

I was thinking the same

> LatLonShapeBoundingBoxQuery hashcode is wrong 
> --
>
> Key: LUCENE-8831
> URL: https://issues.apache.org/jira/browse/LUCENE-8831
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: master (9.0), 8.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the hashcode implementation for LatLonShapeBoundingBoxQuery returns 
> always a different value. Therefore the query cannot be cached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8886) TestMutablePointsReaderUtils not doing what it is expected

2019-06-27 Thread Ignacio Vera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera updated LUCENE-8886:
-
   Resolution: Fixed
 Assignee: Ignacio Vera
Fix Version/s: 8.2
   master (9.0)
   Status: Resolved  (was: Patch Available)

> TestMutablePointsReaderUtils not doing what it is expected
> --
>
> Key: LUCENE-8886
> URL: https://issues.apache.org/jira/browse/LUCENE-8886
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: master (9.0), 8.2
>
> Attachments: LUCENE-8886.patch
>
>
> The  TestMutablePointsReaderUtils is actually not doing what it is expected. 
> The problem is that we are constructing Point objects but not copying the 
> bytes provided so is always working with arrays with 0 values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8888) Improve distribution of points with data dimension in BKD tree leaves

2019-06-27 Thread Ignacio Vera (JIRA)
Ignacio Vera created LUCENE-:


 Summary: Improve distribution of points with data dimension in BKD 
tree leaves
 Key: LUCENE-
 URL: https://issues.apache.org/jira/browse/LUCENE-
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Ignacio Vera


In LUCENE-8688 it was introduce a new storing strategy for leaves contains 
duplicated points. This works well with indexed dimension as the process of 
partition the space and the final sorting of leaves groups points with equal 
indexed dimensions.

This is not the case all the time if the point contain data dimensions. It 
might happen that if two points have the same indexed dimensions but different 
data dimensions, the distribution on the leaves is not the most optimal.

A good example is if a user tries to index a bounding box using LatLonShape. 
The resulting tessellation of a bounding box is two triangles with the same 
indexed dimensions but different data dimensions. If there are two documents 
indexing the same bounding box, the result in the leaf is the triangles from 
one document followed by the triangles of the second document. This is  because 
the current sorting/selection algorithms  use one indexed dimension and 
tie-break on the 
docID.

The most optimal distribution in the case above is two group together the equal 
triangles. Therefore what it is propose here is to update the selection/ 
sorting algorithms to use the data dimensions when they exist as tie-breakers 
before using the docID.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8885) Optimise BKD reader by exploiting cardinality information stored on leaves

2019-06-26 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873860#comment-16873860
 ] 

Ignacio Vera commented on LUCENE-8885:
--

I have opened a PR with [~jpountz] suggestion. 

> Optimise BKD reader by exploiting cardinality information stored on leaves
> --
>
> Key: LUCENE-8885
> URL: https://issues.apache.org/jira/browse/LUCENE-8885
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In LUCENE-8688 it was introduce a new storing strategy for leaves contains 
> duplicated points. In such case the points are stored together with the 
> cardinality. We still call the IntersectVisitor once per document therefore 
> we are checking many times the same point agains the query. The idea is to 
> check the point once and then add all the documents.
> The API of the IntersectVisitor does not allow that, and therefore to exploit 
> that property we need to either change the API or extend it. Here are the 
> possibilities I can think of:
> 1) Modify the API by replacing the method IntersectVisitor#visit(byte[], int) 
> by the following method:
> {code:java}
>  /** Called for leaf cells that intersects the leaf to test if the point  
>  matches to the query
>  * In case it matches, the implementor must call {@link 
> IntersectVisitor#visit(int)} with the
>  * documents associated with this point are visited */
> boolean matches(byte[] packedValue) throws IOException;
> {code}
> This will allow the BKD reader to check if a point matches the query and if 
> true then Coll the method IntersectVisitor#visit(int) for all documents 
> associated with that point.
> The drawback of this approach is backwards compatibility and the need to 
> update all classes implement this interface.
> 2) Extends the API by adding a new default method in the IntersectVisitor 
> interface:
> {code:java}
>  /** Called for documents in a leaf cell that crosses the query.  The consumer
>  *  should scrutinize the packedValue to decide whether to accept it.  If 
> accepted it should
>  *  consider only the {@code numberDocs} documents starting at {@code 
> offset} In the 1D case,
>  *  values are visited in increasing order, and in the case of ties, in 
> increasing
>  *  docID order. */
> default void visit(int[] docID, int offset, int numberDocs, byte[] 
> packedValue) throws IOException {
>   for ( int i =offset; i < offset + numberDocs; i++) {
> visit(docID[i], packedValue);
>   }
> }
> {code}
> The merit of this approach is that is backwards compatible and it is up to 
> the implementors to override this method and get the benefits for this 
> optimisation.The biggest downside is that it assumes that the codec has doc 
> IDs available in an int[] slice as opposed to streaming them from disk 
> directly to the IntersectVisitor for instance as [~jpountz] noted.
> Maybe there are more options I did not think about so looking forward to 
> hearing opining if we should do this change at all and if so, how to approach 
> it. My +1 goes to 1).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8886) TestMutablePointsReaderUtils not doing what it is expected

2019-06-26 Thread Ignacio Vera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera updated LUCENE-8886:
-
Attachment: LUCENE-8886.patch
Status: Open  (was: Open)

> TestMutablePointsReaderUtils not doing what it is expected
> --
>
> Key: LUCENE-8886
> URL: https://issues.apache.org/jira/browse/LUCENE-8886
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Ignacio Vera
>Priority: Major
> Attachments: LUCENE-8886.patch
>
>
> The  TestMutablePointsReaderUtils is actually not doing what it is expected. 
> The problem is that we are constructing Point objects but not copying the 
> bytes provided so is always working with arrays with 0 values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8886) TestMutablePointsReaderUtils not doing what it is expected

2019-06-26 Thread Ignacio Vera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera updated LUCENE-8886:
-
Status: Patch Available  (was: Open)

> TestMutablePointsReaderUtils not doing what it is expected
> --
>
> Key: LUCENE-8886
> URL: https://issues.apache.org/jira/browse/LUCENE-8886
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Ignacio Vera
>Priority: Major
> Attachments: LUCENE-8886.patch
>
>
> The  TestMutablePointsReaderUtils is actually not doing what it is expected. 
> The problem is that we are constructing Point objects but not copying the 
> bytes provided so is always working with arrays with 0 values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8886) TestMutablePointsReaderUtils not doing what it is expected

2019-06-26 Thread Ignacio Vera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera updated LUCENE-8886:
-
Issue Type: Test  (was: Improvement)

> TestMutablePointsReaderUtils not doing what it is expected
> --
>
> Key: LUCENE-8886
> URL: https://issues.apache.org/jira/browse/LUCENE-8886
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Ignacio Vera
>Priority: Major
>
> The  TestMutablePointsReaderUtils is actually not doing what it is expected. 
> The problem is that we are constructing Point objects but not copying the 
> bytes provided so is always working with arrays with 0 values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8886) TestMutablePointsReaderUtils not doing what it is expected

2019-06-26 Thread Ignacio Vera (JIRA)
Ignacio Vera created LUCENE-8886:


 Summary: TestMutablePointsReaderUtils not doing what it is expected
 Key: LUCENE-8886
 URL: https://issues.apache.org/jira/browse/LUCENE-8886
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Ignacio Vera


The  TestMutablePointsReaderUtils is actually not doing what it is expected. 
The problem is that we are constructing Point objects but not copying the bytes 
provided so is always working with arrays with 0 values.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8885) Optimise BKD reader by exploiting cardinality information stored on leaves

2019-06-26 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873395#comment-16873395
 ] 

Ignacio Vera commented on LUCENE-8885:
--

With 1 you would have to track the last visited point and associate that with 
the corresponding call to visit(int docID) which is not nice and relies on the 
behaviour of the reader.

2 is more practical and ;less intrusive and using a DocIdSetIterator will make 
the API cleaner ++

> Optimise BKD reader by exploiting cardinality information stored on leaves
> --
>
> Key: LUCENE-8885
> URL: https://issues.apache.org/jira/browse/LUCENE-8885
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Major
>
> In LUCENE-8688 it was introduce a new storing strategy for leaves contains 
> duplicated points. In such case the points are stored together with the 
> cardinality. We still call the IntersectVisitor once per document therefore 
> we are checking many times the same point agains the query. The idea is to 
> check the point once and then add all the documents.
> The API of the IntersectVisitor does not allow that, and therefore to exploit 
> that property we need to either change the API or extend it. Here are the 
> possibilities I can think of:
> 1) Modify the API by replacing the method IntersectVisitor#visit(byte[], int) 
> by the following method:
> {code:java}
>  /** Called for leaf cells that intersects the leaf to test if the point  
>  matches to the query
>  * In case it matches, the implementor must call {@link 
> IntersectVisitor#visit(int)} with the
>  * documents associated with this point are visited */
> boolean matches(byte[] packedValue) throws IOException;
> {code}
> This will allow the BKD reader to check if a point matches the query and if 
> true then Coll the method IntersectVisitor#visit(int) for all documents 
> associated with that point.
> The drawback of this approach is backwards compatibility and the need to 
> update all classes implement this interface.
> 2) Extends the API by adding a new default method in the IntersectVisitor 
> interface:
> {code:java}
>  /** Called for documents in a leaf cell that crosses the query.  The consumer
>  *  should scrutinize the packedValue to decide whether to accept it.  If 
> accepted it should
>  *  consider only the {@code numberDocs} documents starting at {@code 
> offset} In the 1D case,
>  *  values are visited in increasing order, and in the case of ties, in 
> increasing
>  *  docID order. */
> default void visit(int[] docID, int offset, int numberDocs, byte[] 
> packedValue) throws IOException {
>   for ( int i =offset; i < offset + numberDocs; i++) {
> visit(docID[i], packedValue);
>   }
> }
> {code}
> The merit of this approach is that is backwards compatible and it is up to 
> the implementors to override this method and get the benefits for this 
> optimisation.The biggest downside is that it assumes that the codec has doc 
> IDs available in an int[] slice as opposed to streaming them from disk 
> directly to the IntersectVisitor for instance as [~jpountz] noted.
> Maybe there are more options I did not think about so looking forward to 
> hearing opining if we should do this change at all and if so, how to approach 
> it. My +1 goes to 1).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8867) Optimise BKD tree for low cardinality leaves

2019-06-26 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873310#comment-16873310
 ] 

Ignacio Vera commented on LUCENE-8867:
--

I have opened https://issues.apache.org/jira/browse/LUCENE-8885 to track the 
second optimisation.

> Optimise BKD tree for low cardinality leaves
> 
>
> Key: LUCENE-8867
> URL: https://issues.apache.org/jira/browse/LUCENE-8867
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently if a leaf on the BKD tree contains only few values, then the leaf 
> is treated the same way as it all values are different. It many cases it can 
> be much more efficient to store the distinct values with the cardinality.
> In addition, in this case the method IntersectVisitor#visit(docId, byte[]) is 
> called n times with the same byte array but different docID. This issue 
> proposes to add a new method to the interface that accepts an array of docs 
> so it can be override by implementors and gain search performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8885) Optimise BKD reader by exploiting cardinality information stored on leaves

2019-06-26 Thread Ignacio Vera (JIRA)
Ignacio Vera created LUCENE-8885:


 Summary: Optimise BKD reader by exploiting cardinality information 
stored on leaves
 Key: LUCENE-8885
 URL: https://issues.apache.org/jira/browse/LUCENE-8885
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Ignacio Vera


In LUCENE-8688 it was introduce a new storing strategy for leaves contains 
duplicated points. In such case the points are stored together with the 
cardinality. We still call the IntersectVisitor once per document therefore we 
are checking many times the same point agains the query. The idea is to check 
the point once and then add all the documents.

The API of the IntersectVisitor does not allow that, and therefore to exploit 
that property we need to either change the API or extend it. Here are the 
possibilities I can think of:

1) Modify the API by replacing the method IntersectVisitor#visit(byte[], int) 
by the following method:
{code:java}
 /** Called for leaf cells that intersects the leaf to test if the point   
matches to the query
 * In case it matches, the implementor must call {@link 
IntersectVisitor#visit(int)} with the
 * documents associated with this point are visited */
boolean matches(byte[] packedValue) throws IOException;
{code}
This will allow the BKD reader to check if a point matches the query and if 
true then Coll the method IntersectVisitor#visit(int) for all documents 
associated with that point.
The drawback of this approach is backwards compatibility and the need to update 
all classes implement this interface.


2) Extends the API by adding a new default method in the IntersectVisitor 
interface:
{code:java}
 /** Called for documents in a leaf cell that crosses the query.  The consumer
 *  should scrutinize the packedValue to decide whether to accept it.  If 
accepted it should
 *  consider only the {@code numberDocs} documents starting at {@code 
offset} In the 1D case,
 *  values are visited in increasing order, and in the case of ties, in 
increasing
 *  docID order. */
default void visit(int[] docID, int offset, int numberDocs, byte[] 
packedValue) throws IOException {
  for ( int i =offset; i < offset + numberDocs; i++) {
visit(docID[i], packedValue);
  }
}
{code}

The merit of this approach is that is backwards compatible and it is up to the 
implementors to override this method and get the benefits for this 
optimisation.The biggest downside is that it assumes that the codec has doc IDs 
available in an int[] slice as opposed to streaming them from disk directly to 
the IntersectVisitor for instance as [~jpountz] noted.


Maybe there are more options I did not think about so looking forward to 
hearing opining if we should do this change at all and if so, how to approach 
it. My +1 goes to 1).




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8868) New storing strategy for BKD tree leaves with low cardinality

2019-06-26 Thread Ignacio Vera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera resolved LUCENE-8868.
--
   Resolution: Fixed
 Assignee: Ignacio Vera
Fix Version/s: 8.2
   master (9.0)

> New storing strategy for BKD tree leaves with low cardinality
> -
>
> Key: LUCENE-8868
> URL: https://issues.apache.org/jira/browse/LUCENE-8868
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: master (9.0), 8.2
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Currently if a leaf on the BKD tree contains only few values, then the leaf 
> is treated the same way as it all values are different. It many cases it can 
> be much more efficient to store the distinct values with the cardinality.
> The strategy is the following:
> 1.  When writing a leaf block the cardinality is computed.
> 2. Perform some naive calculation to compute if it is better to store the 
> leaf as a low cardinality leaf. The storage cost are calculated as follows:
> *   low cardinality: leafCardinality * (packedBytesLength - prefixLenSum + 2) 
> where two is the estimated size of storing the cardinality. This is an 
> overestimation as in some cases you will only need one byte to store the 
> cardinality.
> * High cardinality: count * (packedBytesLength - prefixLenSum). We are not 
> taking into account the runlen compression.
> 3. If the tree has low cardinality then we set the compressed dim to -2. Note 
> that -1 is when all values are equal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8879) Add tests for BKDRadixSelector#sort

2019-06-26 Thread Ignacio Vera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera resolved LUCENE-8879.
--
   Resolution: Fixed
 Assignee: Ignacio Vera
Fix Version/s: 8.2
   master (9.0)

> Add tests for BKDRadixSelector#sort
> ---
>
> Key: LUCENE-8879
> URL: https://issues.apache.org/jira/browse/LUCENE-8879
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Minor
> Fix For: master (9.0), 8.2
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This issue just add some test specifically for the sorting capability of 
> class BKDRadixSelector and improves the existing ones for the selection 
> capabilities.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8775) Tessellator: Improve the election of diagonals when splitting the polygon

2019-06-26 Thread Ignacio Vera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera updated LUCENE-8775:
-
Fix Version/s: 8.1.2

> Tessellator: Improve the election of diagonals when splitting the polygon
> -
>
> Key: LUCENE-8775
> URL: https://issues.apache.org/jira/browse/LUCENE-8775
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: master (9.0), 8.2, 8.1.2
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> There are some cases when polygon tessellation fails and it seems it is due 
> to a bad election of the diagonal when splitting the polygon. Here I propose 
> a patch that make sure when splitting a polygon that the resulting polygons 
> are valid CW polygons. 
> In addition this patch adds few test to check the functionality of the 
> tessellator and throws an error if the polygon cannot be splitted instead of 
> just empty the current tessellation.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13413) jetty IdleTimeout bugs with Http2SolrClient, cause sprious timeouts on intranode requests

2019-06-25 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872533#comment-16872533
 ] 

Ignacio Vera commented on SOLR-13413:
-

[~caomanhdat] yes currently the branch 8.1 is broken due to a missing sha1.

> jetty IdleTimeout bugs with Http2SolrClient, cause sprious timeouts on 
> intranode requests
> -
>
> Key: SOLR-13413
> URL: https://issues.apache.org/jira/browse/SOLR-13413
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 8.0
>Reporter: Hoss Man
>Assignee: Cao Manh Dat
>Priority: Major
> Fix For: master (9.0), 8.2, 8.1.2
>
> Attachments: SOLR-13413.patch, 
> nocommit_TestDistributedStatsComponentCardinality_trivial-no-http2.patch
>
>
> There is evidence in some recent jenkins failures that we may have some manor 
> of bug in our http2 client/server code that can cause intra-node query 
> requests to stall / timeout non-reproducibly.
> In at least one known case, forcing the jetty & SolrClients used in the test 
> to use http1.1, seems to prevent these test failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8879) Add tests for BKDRadixSelector#sort

2019-06-24 Thread Ignacio Vera (JIRA)
Ignacio Vera created LUCENE-8879:


 Summary: Add tests for BKDRadixSelector#sort
 Key: LUCENE-8879
 URL: https://issues.apache.org/jira/browse/LUCENE-8879
 Project: Lucene - Core
  Issue Type: Test
Reporter: Ignacio Vera


This issue just add some test specifically for the sorting capability of class 
BKDRadixSelector and improves the existing ones for the selection capabilities.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8838) Tessellator: Remove support for Steiner points

2019-06-24 Thread Ignacio Vera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera resolved LUCENE-8838.
--
   Resolution: Fixed
 Assignee: Ignacio Vera
Fix Version/s: 8.2
   master (9.0)

> Tessellator: Remove support for Steiner points
> --
>
> Key: LUCENE-8838
> URL: https://issues.apache.org/jira/browse/LUCENE-8838
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: master (9.0), 8.2
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Tessellator has support from Steiner points which come from the original 
> porting of the MapBox's earcut algorithm to Java. We are not using such 
> points and therefore it would be better to remove it.
> In addition, it actually introduces a bug when a polygon hole is a line with 
> al coplanar points.  In some cases it can be reduced to a point and then 
> treated as a Steiner points. This looks to be wrong and on those cases we 
> should throw an error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8868) New storing strategy for BKD tree leaves with low cardinality

2019-06-20 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868354#comment-16868354
 ] 

Ignacio Vera commented on LUCENE-8868:
--

It is small and I think the 1D case will benefit quite a lot. Just wanted to 
raise the issue so we think over it.

> New storing strategy for BKD tree leaves with low cardinality
> -
>
> Key: LUCENE-8868
> URL: https://issues.apache.org/jira/browse/LUCENE-8868
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently if a leaf on the BKD tree contains only few values, then the leaf 
> is treated the same way as it all values are different. It many cases it can 
> be much more efficient to store the distinct values with the cardinality.
> The strategy is the following:
> 1.  When writing a leaf block the cardinality is computed.
> 2. Perform some naive calculation to compute if it is better to store the 
> leaf as a low cardinality leaf. The storage cost are calculated as follows:
> *   low cardinality: leafCardinality * (packedBytesLength - prefixLenSum + 2) 
> where two is the estimated size of storing the cardinality. This is an 
> overestimation as in some cases you will only need one byte to store the 
> cardinality.
> * High cardinality: count * (packedBytesLength - prefixLenSum). We are not 
> taking into account the runlen compression.
> 3. If the tree has low cardinality then we set the compressed dim to -2. Note 
> that -1 is when all values are equal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8868) New storing strategy for BKD tree leaves with low cardinality

2019-06-19 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867327#comment-16867327
 ] 

Ignacio Vera commented on LUCENE-8868:
--

I have some doubts with this optimisation in 1D. There is no much difference in 
storage cost as runLen will do a good job compressing the values and we are 
adding some overhead at indexing time as we need to compute cardinality. All 
and all it does not hurt too much but just want o raise that concern.

> New storing strategy for BKD tree leaves with low cardinality
> -
>
> Key: LUCENE-8868
> URL: https://issues.apache.org/jira/browse/LUCENE-8868
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently if a leaf on the BKD tree contains only few values, then the leaf 
> is treated the same way as it all values are different. It many cases it can 
> be much more efficient to store the distinct values with the cardinality.
> The strategy is the following:
> 1.  When writing a leaf block the cardinality is computed.
> 2. Perform some naive calculation to compute if it is better to store the 
> leaf as a low cardinality leaf. The storage cost are calculated as follows:
> *   low cardinality: leafCardinality * (packedBytesLength - prefixLenSum + 2) 
> where two is the estimated size of storing the cardinality. This is an 
> overestimation as in some cases you will only need one byte to store the 
> cardinality.
> * High cardinality: count * (packedBytesLength - prefixLenSum). We are not 
> taking into account the runlen compression.
> 3. If the tree has low cardinality then we set the compressed dim to -2. Note 
> that -1 is when all values are equal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8867) Optimise BKD tree for low cardinality leaves

2019-06-19 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867326#comment-16867326
 ] 

Ignacio Vera commented on LUCENE-8867:
--

I have opened https://issues.apache.org/jira/browse/LUCENE-8868 to track the 
first optimisation.

> Optimise BKD tree for low cardinality leaves
> 
>
> Key: LUCENE-8867
> URL: https://issues.apache.org/jira/browse/LUCENE-8867
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently if a leaf on the BKD tree contains only few values, then the leaf 
> is treated the same way as it all values are different. It many cases it can 
> be much more efficient to store the distinct values with the cardinality.
> In addition, in this case the method IntersectVisitor#visit(docId, byte[]) is 
> called n times with the same byte array but different docID. This issue 
> proposes to add a new method to the interface that accepts an array of docs 
> so it can be override by implementors and gain search performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8868) New storing strategy for BKD tree leaves with low cardinality

2019-06-19 Thread Ignacio Vera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera updated LUCENE-8868:
-
Description: 
Currently if a leaf on the BKD tree contains only few values, then the leaf is 
treated the same way as it all values are different. It many cases it can be 
much more efficient to store the distinct values with the cardinality.

The strategy is the following:

1.  When writing a leaf block the cardinality is computed.
2. Perform some naive calculation to compute if it is better to store the leaf 
as a low cardinality leaf. The storage cost are calculated as follows:
*   low cardinality: leafCardinality * (packedBytesLength - prefixLenSum + 2) 
where two is the estimated size of storing the cardinality. This is an 
overestimation as in some cases you will only need one byte to store the 
cardinality.
* High cardinality: count * (packedBytesLength - prefixLenSum). We are not 
taking into account the runlen compression.

3. If the tree has low cardinality then we set the compressed dim to -2. Note 
that -1 is when all values are equal.




  was:
Currently if a leaf on the BKD tree contains only few values, then the leaf is 
treated the same way as it all values are different. It many cases it can be 
much more efficient to store the distinct values with the cardinality.

The strategy is the following:

1.  When writing a leaf block the cardinality is computed.
2. Perform some naive calculation to compute if it is better to store the leaf 
as a low cardinality leaf. The storage cost are calculated as follows:
*   low cardinality: leafCardinality * (packedBytesLength - prefixLenSum + 2) 
where two is the estimated size of storing the cardinality. This is an 
overestimation as in some cases you will only need one byte to store the 
cardinality.
* High cardinality: count * (packedBytesLength - prefixLenSum). We are not 
taking into account the runlen compression.
3.  If the tree has low cardinality then we set the compressed dim to -2. Note 
that -1 is when all values are equal.





> New storing strategy for BKD tree leaves with low cardinality
> -
>
> Key: LUCENE-8868
> URL: https://issues.apache.org/jira/browse/LUCENE-8868
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Major
>
> Currently if a leaf on the BKD tree contains only few values, then the leaf 
> is treated the same way as it all values are different. It many cases it can 
> be much more efficient to store the distinct values with the cardinality.
> The strategy is the following:
> 1.  When writing a leaf block the cardinality is computed.
> 2. Perform some naive calculation to compute if it is better to store the 
> leaf as a low cardinality leaf. The storage cost are calculated as follows:
> *   low cardinality: leafCardinality * (packedBytesLength - prefixLenSum + 2) 
> where two is the estimated size of storing the cardinality. This is an 
> overestimation as in some cases you will only need one byte to store the 
> cardinality.
> * High cardinality: count * (packedBytesLength - prefixLenSum). We are not 
> taking into account the runlen compression.
> 3. If the tree has low cardinality then we set the compressed dim to -2. Note 
> that -1 is when all values are equal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8868) New storing strategy for BKD tree leaves with low cardinality

2019-06-19 Thread Ignacio Vera (JIRA)
Ignacio Vera created LUCENE-8868:


 Summary: New storing strategy for BKD tree leaves with low 
cardinality
 Key: LUCENE-8868
 URL: https://issues.apache.org/jira/browse/LUCENE-8868
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Ignacio Vera


Currently if a leaf on the BKD tree contains only few values, then the leaf is 
treated the same way as it all values are different. It many cases it can be 
much more efficient to store the distinct values with the cardinality.

The strategy is the following:

1.  When writing a leaf block the cardinality is computed.
2. Perform some naive calculation to compute if it is better to store the leaf 
as a low cardinality leaf. The storage cost are calculated as follows:
*   low cardinality: leafCardinality * (packedBytesLength - prefixLenSum + 2) 
where two is the estimated size of storing the cardinality. This is an 
overestimation as in some cases you will only need one byte to store the 
cardinality.
* High cardinality: count * (packedBytesLength - prefixLenSum). We are not 
taking into account the runlen compression.
3.  If the tree has low cardinality then we set the compressed dim to -2. Note 
that -1 is when all values are equal.






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8867) Optimise BKD tree for low cardinality leaves

2019-06-18 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866868#comment-16866868
 ] 

Ignacio Vera commented on LUCENE-8867:
--

{quote}
Right, this is what I had in mind when I said this is only a problem if you 
have data dimensions. Because if you don't, then you could call 
IntersectVisitor.compare(A, A) as a way to know whether value A matches, and we 
wouldn't need any new API?
{quote}

True, that would not work when you have data dimensions. In addition 
IntersectVisitor.compare(A, A) is intended to compare the query with a range 
which normally is more expensive that a comparison with a point so it would 
defeat the purpose of the optimisation.

I propose to break this change in two so we can work in the storage 
optimisation first and then we can think in the right API and make 
IntersectVisitor more efficient in these cases.

> Optimise BKD tree for low cardinality leaves
> 
>
> Key: LUCENE-8867
> URL: https://issues.apache.org/jira/browse/LUCENE-8867
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently if a leaf on the BKD tree contains only few values, then the leaf 
> is treated the same way as it all values are different. It many cases it can 
> be much more efficient to store the distinct values with the cardinality.
> In addition, in this case the method IntersectVisitor#visit(docId, byte[]) is 
> called n times with the same byte array but different docID. This issue 
> proposes to add a new method to the interface that accepts an array of docs 
> so it can be override by implementors and gain search performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-8867) Optimise BKD tree for low cardinality leaves

2019-06-18 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866798#comment-16866798
 ] 

Ignacio Vera edited comment on LUCENE-8867 at 6/18/19 4:16 PM:
---

{quote}
This is only an issue in the case that not all dimensions are indexed, right? 
Otherwise you could figure out that all values are equal in 
IntersectVisitor#compare?
{quote}

I think this is generic issue. The problem here is not when are values are 
equal but when you have a very low cardinality on the leaf nodes. In this case 
the can safe lots of space by storing the values in the proposed way.


{quote}
One concern I have with the patch is that it assumes that the codec has doc IDs 
available in an int[] slice as opposed to streaming them from disk directly to 
the IntersectVisitor for instance.
{quote}

I see your concern , another option would be to change more radically the 
interface and add a matches(byte[]) method that returns a boolean and then use 
the visit(docID) method.





was (Author: ivera):
{quote}
This is only an issue in the case that not all dimensions are indexed, right? 
Otherwise you could figure out that all values are equal in 
IntersectVisitor#compare?
{quote}

I think this is generic issue. The problem here is not when are values are 
equal but when you have a very low cardinality on the leaf nodes. In this case 
the can safe lots of space by storing the values in the proposed way.


{quote}
One concern I have with the patch is that it assumes that the codec has doc IDs 
available in an int[] slice as opposed to streaming them from disk directly to 
the IntersectVisitor for instance.
{quote}

I see your concern , another option would be to change more radically the 
interface and add a matches(byte[]) method and then use the visit(docID) method.




> Optimise BKD tree for low cardinality leaves
> 
>
> Key: LUCENE-8867
> URL: https://issues.apache.org/jira/browse/LUCENE-8867
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently if a leaf on the BKD tree contains only few values, then the leaf 
> is treated the same way as it all values are different. It many cases it can 
> be much more efficient to store the distinct values with the cardinality.
> In addition, in this case the method IntersectVisitor#visit(docId, byte[]) is 
> called n times with the same byte array but different docID. This issue 
> proposes to add a new method to the interface that accepts an array of docs 
> so it can be override by implementors and gain search performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8867) Optimise BKD tree for low cardinality leaves

2019-06-18 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866798#comment-16866798
 ] 

Ignacio Vera commented on LUCENE-8867:
--

{quote}
This is only an issue in the case that not all dimensions are indexed, right? 
Otherwise you could figure out that all values are equal in 
IntersectVisitor#compare?
{quote}

I think this is generic issue. The problem here is not when are values are 
equal but when you have a very low cardinality on the leaf nodes. In this case 
the can safe lots of space by storing the values in the proposed way.


{quote}
One concern I have with the patch is that it assumes that the codec has doc IDs 
available in an int[] slice as opposed to streaming them from disk directly to 
the IntersectVisitor for instance.
{quote}

I see your concern , another option would be to change more radically the 
interface and add a matches(byte[]) method and then use the visit(docID) method.




> Optimise BKD tree for low cardinality leaves
> 
>
> Key: LUCENE-8867
> URL: https://issues.apache.org/jira/browse/LUCENE-8867
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently if a leaf on the BKD tree contains only few values, then the leaf 
> is treated the same way as it all values are different. It many cases it can 
> be much more efficient to store the distinct values with the cardinality.
> In addition, in this case the method IntersectVisitor#visit(docId, byte[]) is 
> called n times with the same byte array but different docID. This issue 
> proposes to add a new method to the interface that accepts an array of docs 
> so it can be override by implementors and gain search performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8867) Optimise BKD tree for low cardinality leaves

2019-06-18 Thread Ignacio Vera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera updated LUCENE-8867:
-
Description: 
Currently if a leaf on the BKD tree contains only few values, then the leaf is 
treated the same way as it all values are different. It many cases it can be 
much more efficient to store the distinct values with the cardinality.

In addition, in this case the method IntersectVisitor#visit(docId, byte[]) is 
called n times with the same byte array but different docID. This issue 
proposes to add a new method to the interface that accepts an array of docs so 
it can be override by implementors and gain search performance.

  was:
Currently if a leaf on the BKD tree contains only few values, then the leaf is 
treated the same way as it all values are different. It many cases it can be 
much more efficient to store the distinct values with the cardinality.

In addition, in this cases the method IntersectVisitor#visit(docId, byte[]) is 
called n times with the same byte array but different docID. This issue 
proposes to add a new method to the interface that accepts an array of docs so 
it can be override by implementors and gain search performance.


> Optimise BKD tree for low cardinality leaves
> 
>
> Key: LUCENE-8867
> URL: https://issues.apache.org/jira/browse/LUCENE-8867
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently if a leaf on the BKD tree contains only few values, then the leaf 
> is treated the same way as it all values are different. It many cases it can 
> be much more efficient to store the distinct values with the cardinality.
> In addition, in this case the method IntersectVisitor#visit(docId, byte[]) is 
> called n times with the same byte array but different docID. This issue 
> proposes to add a new method to the interface that accepts an array of docs 
> so it can be override by implementors and gain search performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8867) Optimise BKD tree for low cardinality leaves

2019-06-18 Thread Ignacio Vera (JIRA)
Ignacio Vera created LUCENE-8867:


 Summary: Optimise BKD tree for low cardinality leaves
 Key: LUCENE-8867
 URL: https://issues.apache.org/jira/browse/LUCENE-8867
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Ignacio Vera


Currently if a leaf on the BKD tree contains only few values, then the leaf is 
treated the same way as it all values are different. It many cases it can be 
much more efficient to store the distinct values with the cardinality.

In addition, in this cases the method IntersectVisitor#visit(docId, byte[]) is 
called n times with the same byte array but different docID. This issue 
proposes to add a new method to the interface that accepts an array of docs so 
it can be override by implementors and gain search performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8769) Range Query Type With Logically Connected Ranges

2019-06-11 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860665#comment-16860665
 ] 

Ignacio Vera commented on LUCENE-8769:
--

I agree with Adrien, in the case of AND we can just compute the intersection 
range for all provided ranges and execute it as a single range query.

On the other hand it is a good start to support logical connected ranges that 
cannot be merge on a single range.

> Range Query Type With Logically Connected Ranges
> 
>
> Key: LUCENE-8769
> URL: https://issues.apache.org/jira/browse/LUCENE-8769
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8769.patch, LUCENE-8769.patch
>
>
> Today, we visit BKD tree for each range specified for PointRangeQuery. It 
> would be good to have a range query type which can take multiple ranges 
> logically ANDed or ORed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8850) Creating a polygon with no area or an invalid area should throw an error

2019-06-11 Thread Ignacio Vera (JIRA)
Ignacio Vera created LUCENE-8850:


 Summary: Creating a polygon with no area or an invalid area should 
throw an error
 Key: LUCENE-8850
 URL: https://issues.apache.org/jira/browse/LUCENE-8850
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Ignacio Vera


Working with some polygon data which has some data quality issues I found two 
cases where we can throw an error as we know the polygon is invalid, 

1) We are already computing the signed area on the Polygon constructor, 
therefore if the value is 0, then we know we are building a polygon with no 
area and therefore we can throw an error as it is an invalid polygon.

2) We can calculate the total area of the polygon and holes, if the area is 
lower or equal than 0, then the holes are equal or bigger than the polygon 
which it is an invalid polygon.

In addition I propose anew method to calculate the signed area which requires 
less arithmetic calculations and therefore it introduces less numerical errors.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8775) Tessellator: Improve the election of diagonals when splitting the polygon

2019-06-10 Thread Ignacio Vera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera resolved LUCENE-8775.
--
Resolution: Fixed

> Tessellator: Improve the election of diagonals when splitting the polygon
> -
>
> Key: LUCENE-8775
> URL: https://issues.apache.org/jira/browse/LUCENE-8775
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: master (9.0), 8.2
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> There are some cases when polygon tessellation fails and it seems it is due 
> to a bad election of the diagonal when splitting the polygon. Here I propose 
> a patch that make sure when splitting a polygon that the resulting polygons 
> are valid CW polygons. 
> In addition this patch adds few test to check the functionality of the 
> tessellator and throws an error if the polygon cannot be splitted instead of 
> just empty the current tessellation.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (LUCENE-8775) Tessellator: Improve the election of diagonals when splitting the polygon

2019-06-09 Thread Ignacio Vera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera reopened LUCENE-8775:
--
  Assignee: Ignacio Vera

I reopen the issue as I have realised in some cases holes are removed from the 
tessellation. I have opened a new PR that fixes it and it improves the coverage 
of the test by comparing the area of the polygon with the area of the 
tessellation.

> Tessellator: Improve the election of diagonals when splitting the polygon
> -
>
> Key: LUCENE-8775
> URL: https://issues.apache.org/jira/browse/LUCENE-8775
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: master (9.0), 8.2
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> There are some cases when polygon tessellation fails and it seems it is due 
> to a bad election of the diagonal when splitting the polygon. Here I propose 
> a patch that make sure when splitting a polygon that the resulting polygons 
> are valid CW polygons. 
> In addition this patch adds few test to check the functionality of the 
> tessellator and throws an error if the polygon cannot be splitted instead of 
> just empty the current tessellation.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8838) Tessellator: Remove support for Steiner points

2019-06-06 Thread Ignacio Vera (JIRA)
Ignacio Vera created LUCENE-8838:


 Summary: Tessellator: Remove support for Steiner points
 Key: LUCENE-8838
 URL: https://issues.apache.org/jira/browse/LUCENE-8838
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Ignacio Vera


Tessellator has support from Steiner points which come from the original 
porting of the MapBox's earcut algorithm to Java. We are not using such points 
and therefore it would be better to remove it.

In addition, it actually introduces a bug when a polygon hole is a line with al 
coplanar points.  In some cases it can be reduced to a point and then treated 
as a Steiner points. This looks to be wrong and on those cases we should throw 
an error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8775) Tessellator: Improve the election of diagonals when splitting the polygon

2019-06-06 Thread Ignacio Vera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera resolved LUCENE-8775.
--
   Resolution: Fixed
Fix Version/s: 8.2
   master (9.0)

> Tessellator: Improve the election of diagonals when splitting the polygon
> -
>
> Key: LUCENE-8775
> URL: https://issues.apache.org/jira/browse/LUCENE-8775
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Priority: Major
> Fix For: master (9.0), 8.2
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> There are some cases when polygon tessellation fails and it seems it is due 
> to a bad election of the diagonal when splitting the polygon. Here I propose 
> a patch that make sure when splitting a polygon that the resulting polygons 
> are valid CW polygons. 
> In addition this patch adds few test to check the functionality of the 
> tessellator and throws an error if the polygon cannot be splitted instead of 
> just empty the current tessellation.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8620) Add CONTAINS support for LatLonShape

2019-06-05 Thread Ignacio Vera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera updated LUCENE-8620:
-
Fix Version/s: (was: 8.1)
   (was: master (9.0))

> Add CONTAINS support for LatLonShape
> 
>
> Key: LUCENE-8620
> URL: https://issues.apache.org/jira/browse/LUCENE-8620
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/sandbox
>Reporter: Ignacio Vera
>Priority: Major
> Attachments: LUCENE-8620.patch, LUCENE-8620.patch
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Currently the only spatial operation that cannot be performed using 
> {{LatLonShape}} is CONTAINS. This issue will add such capability by tracking 
> if an edge of a generated triangle from the {{Tessellator}} is an edge of the 
> polygon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8831) LatLonShapeBoundingBoxQuery hashcode is wrong

2019-06-05 Thread Ignacio Vera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera resolved LUCENE-8831.
--
   Resolution: Fixed
 Assignee: Ignacio Vera
Fix Version/s: 8.2
   master (9.0)

> LatLonShapeBoundingBoxQuery hashcode is wrong 
> --
>
> Key: LUCENE-8831
> URL: https://issues.apache.org/jira/browse/LUCENE-8831
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: master (9.0), 8.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the hashcode implementation for LatLonShapeBoundingBoxQuery returns 
> always a different value. Therefore the query cannot be cached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8831) LatLonShapeBoundingBoxQuery hashcode is wrong

2019-06-05 Thread Ignacio Vera (JIRA)
Ignacio Vera created LUCENE-8831:


 Summary: LatLonShapeBoundingBoxQuery hashcode is wrong 
 Key: LUCENE-8831
 URL: https://issues.apache.org/jira/browse/LUCENE-8831
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Ignacio Vera


Currently the hashcode implementation for LatLonShapeBoundingBoxQuery returns 
always a different value. Therefore the query cannot be cached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8819) org.apache.lucene.search.TestTopDocsMerge.testSort_1 failure

2019-06-04 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855900#comment-16855900
 ] 

Ignacio Vera commented on LUCENE-8819:
--

I had a look into the error in TestRandomRegExp2 and it looks more concerning.

The test executes two equivalent queries and expects the same results. The 
problem is that it is using two different random searchers and in this case one 
is using multiple threads (executor != null) and the other one is single 
threaded (executor == null). The effect is the same as above, documents gets 
different shard_index and results are different.

It sounds wrong to me that depending how you construct your searcher you might 
get different results.

 

> org.apache.lucene.search.TestTopDocsMerge.testSort_1 failure
> 
>
> Key: LUCENE-8819
> URL: https://issues.apache.org/jira/browse/LUCENE-8819
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Priority: Major
> Attachments: LUCENE-8819.patch
>
>
> It can be reproduced with:
>  
> {code:java}
> ant test  -Dtestcase=TestTopDocsMerge -Dtests.method=testSort_1 
> -Dtests.seed=E916688CE5BC9122 -Dtests.multiplier=3 -Dtests.slow=true 
> -Dtests.locale=es-US -Dtests.timezone=Pacific/Johnston -Dtests.asserts=true 
> -Dtests.file.encoding=ISO-8859-1{code}
>  
> Test fails in master and branch 8.x but it does not fail in branch 8.1. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8819) org.apache.lucene.search.TestTopDocsMerge.testSort_1 failure

2019-06-04 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855682#comment-16855682
 ] 

Ignacio Vera commented on LUCENE-8819:
--

The second test worked for me adding the following VM configuration:
{code:java}
-Dtests.nightly=true -Dtests.seed=215B9655D6767594 -Dtests.multiplier=2 {code}
Note the nightly flag.

 

 

The new issue makes sense to me, thanks!

 

> org.apache.lucene.search.TestTopDocsMerge.testSort_1 failure
> 
>
> Key: LUCENE-8819
> URL: https://issues.apache.org/jira/browse/LUCENE-8819
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Priority: Major
> Attachments: LUCENE-8819.patch
>
>
> It can be reproduced with:
>  
> {code:java}
> ant test  -Dtestcase=TestTopDocsMerge -Dtests.method=testSort_1 
> -Dtests.seed=E916688CE5BC9122 -Dtests.multiplier=3 -Dtests.slow=true 
> -Dtests.locale=es-US -Dtests.timezone=Pacific/Johnston -Dtests.asserts=true 
> -Dtests.file.encoding=ISO-8859-1{code}
>  
> Test fails in master and branch 8.x but it does not fail in branch 8.1. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8819) org.apache.lucene.search.TestTopDocsMerge.testSort_1 failure

2019-06-03 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854586#comment-16854586
 ] 

Ignacio Vera commented on LUCENE-8819:
--

I set these two arguments in the VM configuration:
{code:java}
-Dtests.seed=E916688CE5BC9122 -Dtests.multiplier=3 {code}

> org.apache.lucene.search.TestTopDocsMerge.testSort_1 failure
> 
>
> Key: LUCENE-8819
> URL: https://issues.apache.org/jira/browse/LUCENE-8819
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Priority: Major
> Attachments: LUCENE-8819.patch
>
>
> It can be reproduced with:
>  
> {code:java}
> ant test  -Dtestcase=TestTopDocsMerge -Dtests.method=testSort_1 
> -Dtests.seed=E916688CE5BC9122 -Dtests.multiplier=3 -Dtests.slow=true 
> -Dtests.locale=es-US -Dtests.timezone=Pacific/Johnston -Dtests.asserts=true 
> -Dtests.file.encoding=ISO-8859-1{code}
>  
> Test fails in master and branch 8.x but it does not fail in branch 8.1. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8819) org.apache.lucene.search.TestTopDocsMerge.testSort_1 failure

2019-06-03 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854562#comment-16854562
 ] 

Ignacio Vera commented on LUCENE-8819:
--

Note that you need to add the seed and the multiplier in IntelliJ. In that case 
I am able to reproduce it in the debugger.

What I observe is that documents come in different order and naively my best 
guess is that is due to the order of execution that 8757 has introduced, but I 
might be totally wrong :). 

> org.apache.lucene.search.TestTopDocsMerge.testSort_1 failure
> 
>
> Key: LUCENE-8819
> URL: https://issues.apache.org/jira/browse/LUCENE-8819
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Priority: Major
> Attachments: LUCENE-8819.patch
>
>
> It can be reproduced with:
>  
> {code:java}
> ant test  -Dtestcase=TestTopDocsMerge -Dtests.method=testSort_1 
> -Dtests.seed=E916688CE5BC9122 -Dtests.multiplier=3 -Dtests.slow=true 
> -Dtests.locale=es-US -Dtests.timezone=Pacific/Johnston -Dtests.asserts=true 
> -Dtests.file.encoding=ISO-8859-1{code}
>  
> Test fails in master and branch 8.x but it does not fail in branch 8.1. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8819) org.apache.lucene.search.TestTopDocsMerge.testSort_1 failure

2019-06-03 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854400#comment-16854400
 ] 

Ignacio Vera commented on LUCENE-8819:
--

Thanks [~atris]!

Unluckily the patch does not apply with the latest master. I believe Adrien 
might have made some minor edits to your patch before applying it and it is 
conflicting with this patch. Maybe you want to pull the latest master and then 
do the changes.

Sorry for the hassle.

 

> org.apache.lucene.search.TestTopDocsMerge.testSort_1 failure
> 
>
> Key: LUCENE-8819
> URL: https://issues.apache.org/jira/browse/LUCENE-8819
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Priority: Major
> Attachments: LUCENE-8819.patch
>
>
> It can be reproduced with:
>  
> {code:java}
> ant test  -Dtestcase=TestTopDocsMerge -Dtests.method=testSort_1 
> -Dtests.seed=E916688CE5BC9122 -Dtests.multiplier=3 -Dtests.slow=true 
> -Dtests.locale=es-US -Dtests.timezone=Pacific/Johnston -Dtests.asserts=true 
> -Dtests.file.encoding=ISO-8859-1{code}
>  
> Test fails in master and branch 8.x but it does not fail in branch 8.1. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8819) org.apache.lucene.search.TestTopDocsMerge.testSort_1 failure

2019-06-03 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854297#comment-16854297
 ] 

Ignacio Vera commented on LUCENE-8819:
--

Just for the record, I see other test failing and I believe it comes from the 
same issue, for example:

 
{code:java}
ant test  -Dtestcase=TestRegexpRandom2 -Dtests.method=testRegexps 
-Dtests.seed=215B9655D6767594 -Dtests.multiplier=2 -Dtests.nightly=true 
-Dtests.slow=true 
-Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-8.x/test-data/enwiki.random.lines.txt
 -Dtests.locale=es-PY -Dtests.timezone=Asia/Kuala_Lumpur -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8{code}

> org.apache.lucene.search.TestTopDocsMerge.testSort_1 failure
> 
>
> Key: LUCENE-8819
> URL: https://issues.apache.org/jira/browse/LUCENE-8819
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Priority: Major
>
> It can be reproduced with:
>  
> {code:java}
> ant test  -Dtestcase=TestTopDocsMerge -Dtests.method=testSort_1 
> -Dtests.seed=E916688CE5BC9122 -Dtests.multiplier=3 -Dtests.slow=true 
> -Dtests.locale=es-US -Dtests.timezone=Pacific/Johnston -Dtests.asserts=true 
> -Dtests.file.encoding=ISO-8859-1{code}
>  
> Test fails in master and branch 8.x but it does not fail in branch 8.1. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8819) org.apache.lucene.search.TestTopDocsMerge.testSort_1 failure

2019-06-02 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854271#comment-16854271
 ] 

Ignacio Vera commented on LUCENE-8819:
--

I think this test error was introduced by LUCENE-8757

> org.apache.lucene.search.TestTopDocsMerge.testSort_1 failure
> 
>
> Key: LUCENE-8819
> URL: https://issues.apache.org/jira/browse/LUCENE-8819
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Priority: Major
>
> It can be reproduced with:
>  
> {code:java}
> ant test  -Dtestcase=TestTopDocsMerge -Dtests.method=testSort_1 
> -Dtests.seed=E916688CE5BC9122 -Dtests.multiplier=3 -Dtests.slow=true 
> -Dtests.locale=es-US -Dtests.timezone=Pacific/Johnston -Dtests.asserts=true 
> -Dtests.file.encoding=ISO-8859-1{code}
>  
> Test fails in master and branch 8.x but it does not fail in branch 8.1. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8819) org.apache.lucene.search.TestTopDocsMerge.testSort_1 failure

2019-06-02 Thread Ignacio Vera (JIRA)
Ignacio Vera created LUCENE-8819:


 Summary: org.apache.lucene.search.TestTopDocsMerge.testSort_1 
failure
 Key: LUCENE-8819
 URL: https://issues.apache.org/jira/browse/LUCENE-8819
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Ignacio Vera


It can be reproduced with:

 
{code:java}
ant test  -Dtestcase=TestTopDocsMerge -Dtests.method=testSort_1 
-Dtests.seed=E916688CE5BC9122 -Dtests.multiplier=3 -Dtests.slow=true 
-Dtests.locale=es-US -Dtests.timezone=Pacific/Johnston -Dtests.asserts=true 
-Dtests.file.encoding=ISO-8859-1{code}
 

Test fails in master and branch 8.x but it does not fail in branch 8.1. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8775) Tessellator: Improve the election of diagonals when splitting the polygon

2019-04-23 Thread Ignacio Vera (JIRA)
Ignacio Vera created LUCENE-8775:


 Summary: Tessellator: Improve the election of diagonals when 
splitting the polygon
 Key: LUCENE-8775
 URL: https://issues.apache.org/jira/browse/LUCENE-8775
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Ignacio Vera


There are some cases when polygon tessellation fails and it seems it is due to 
a bad election of the diagonal when splitting the polygon. Here I propose a 
patch that make sure when splitting a polygon that the resulting polygons are 
valid CW polygons. 

In addition this patch adds few test to check the functionality of the 
tessellator and throws an error if the polygon cannot be splitted instead of 
just empty the current tessellation.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8736) LatLonShapePolygonQuery returning incorrect WITHIN results with shared boundaries

2019-04-14 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16817579#comment-16817579
 ] 

Ignacio Vera commented on LUCENE-8736:
--

[~nknize] it would be good to have an example where the determinant overflows 
as I haven't been able to exercise such an error.

Regarding the pnoply algorithm, I see the properties and simplicity that 
[~rcmuir] likes. One interesting thing to notice is that the algorithm cannot 
be really used on a finite space. For example in the geo case, you have 
problems for points with longitude = 180 or points with latitude = 90 as those 
points cannot be contained by a polygon. This is actually corrected with the 
encoding as points are pulled southwards and westwards and there is no points 
in the index with those values. The penalty is that you get false positives 
(points pulled inside the polygon) and true negatives (points pulled outside of 
the polygon).

This is ok in the case of points as the effect is very small and difficult to 
notice. On the other hand this effect is pretty big when working with shapes. 
Following the example of country shapes, imagine we are indexing now those 
countries. If a user executes a query for shapes that intersect one country, 
the user would expect to get all neighbour countries. In this case due to the 
encoding effect, some of those polygons will be pulled inside the query polygon 
and some away so you only get a partial result (neighbour countries north and 
east) which is no good in this case. My feeling is that we need to eliminate 
the situation where we get true negatives when working with shapes.

One of the possibilities here is to quantise the query polygon as in theory it 
should remove true negatives. The problem with doing that is that the pnoply 
algorithm won't work anymore because now you have problems again for points  
with longitude = 180 and points with latitude = 90. This is the reason I like 
the approach of Nick as it removes that problem. 

Maybe one possibility is to use different algorithms for points and for shapes 
although that means that results will differ depending if you index points as 
points or as shapes. 





> LatLonShapePolygonQuery returning incorrect WITHIN results with shared 
> boundaries
> -
>
> Key: LUCENE-8736
> URL: https://issues.apache.org/jira/browse/LUCENE-8736
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Nicholas Knize
>Assignee: Nicholas Knize
>Priority: Major
> Fix For: 8.1, master (9.0)
>
> Attachments: LUCENE-8736.patch, LUCENE-8736.patch, 
> adaptive-decoding.patch
>
>
> Triangles that are {{WITHIN}} a target polygon query that also share a 
> boundary with the polygon are incorrectly reported as {{CROSSES}} instead of 
> {{INSIDE}}. This leads to incorrect {{WITHIN}} query results  as demonstrated 
> in the following test:
> {code:java}
>   public void testWithinFailure() throws Exception {
> Directory dir = newDirectory();
> RandomIndexWriter w = new RandomIndexWriter(random(), dir);
> // test polygons:
> Polygon indexPoly1 = new Polygon(new double[] {4d, 4d, 3d, 3d, 4d}, new 
> double[] {3d, 4d, 4d, 3d, 3d});
> Polygon indexPoly2 = new Polygon(new double[] {2d, 2d, 1d, 1d, 2d}, new 
> double[] {6d, 7d, 7d, 6d, 6d});
> Polygon indexPoly3 = new Polygon(new double[] {1d, 1d, 0d, 0d, 1d}, new 
> double[] {3d, 4d, 4d, 3d, 3d});
> Polygon indexPoly4 = new Polygon(new double[] {2d, 2d, 1d, 1d, 2d}, new 
> double[] {0d, 1d, 1d, 0d, 0d});
> // index polygons:
> Document doc;
> addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly1);
> w.addDocument(doc);
> addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly2);
> w.addDocument(doc);
> addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly3);
> w.addDocument(doc);
> addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly4);
> w.addDocument(doc);
> / search //
> IndexReader reader = w.getReader();
> w.close();
> IndexSearcher searcher = newSearcher(reader);
> Polygon[] searchPoly = new Polygon[] {new Polygon(new double[] {4d, 4d, 
> 0d, 0d, 4d}, new double[] {0d, 7d, 7d, 0d, 0d})};
> Query q = LatLonShape.newPolygonQuery(FIELDNAME, QueryRelation.WITHIN, 
> searchPoly);
> assertEquals(4, searcher.count(q));
> IOUtils.close(w, reader, dir);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8736) LatLonShapePolygonQuery returning incorrect WITHIN results with shared boundaries

2019-04-11 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815272#comment-16815272
 ] 

Ignacio Vera commented on LUCENE-8736:
--

I have run my own benchmarks with this change and they look like:

|Approach||Shape||M hits/sec dev||M hits/sec base||M hits/sec diff||QPS 
dev||QPS base||QPS diff||Hit count dev||Hit count base||Hit count diff||
|points|box|77.42|75.92| 2%|78.78|77.25| 2%|221118844|221118844| 0%|
|points|polyRussia|15.99|18.42|-13%|4.56|5.25|-13%|3508846|3508846| 0%|
|points|poly 10|76.02|76.57|-1%|48.07|48.42|-1%|355809475|355809475| 0%|
|points|polyMedium|8.97|9.28|-3%|109.93|113.64|-3%|2693559|2693559| 0%|
|shapes|box|35.51|36.40|-2%|36.13|37.04|-2%|221118844|221118844| 0%|
|shapes|polyRussia|6.22|2.78|124%|1.77|0.79|124%|3508846|3508846| 0%|
|shapes|poly 10|26.91|19.73|36%|17.01|12.48|36%|355809475|355809475| 0%|
|shapes|polyMedium|2.71|1.06|156%|33.26|12.99|156%|2693559|2693559| 0%|

In addition I run the similar benchmarks and indexing the points as lines and 
as polygons:

|Approach||Shape||M hits/sec dev||M hits/sec base||M hits/sec diff||QPS 
dev||QPS base||QPS diff||Hit count dev||Hit count base||Hit count diff||
|line|box|35.42|35.56|-0%|35.91|36.05|-0%|221924270|221924270| 0%|
|line|polyRussia|3.65|2.52|45%|1.04|0.72|45%|3510913|3510913| 0%|
|line|poly 10|22.87|18.65|23%|14.43|11.76|23%|356664874|356664874| 0%|
|line|polyMedium|1.09|0.73|50%|12.70|8.49|50%|2820569|2820569| 0%|
|polygon|box|27.20|24.93| 9%|27.58|25.27| 9%|221925638|221925638| 0%|
|polygon|polyRussia|1.85|1.77| 4%|0.53|0.51| 4%|3511839|3511839| 0%|
|polygon|poly 10|13.11|14.05|-7%|8.26|8.85|-7%|357135836|357135836| 0%|
|polygon|polyMedium|0.49|0.49| 1%|5.67|5.62| 1%|2857655|2857655| 0%|


+1. I like this approach. Benchmarks show  that adjusting the logic depending 
on the type of triangle as it really speeds up points and lines with a small 
hit on polygons (which might be due to the change in how contains is computed). 
Could we add a comment when we call tree#crossesBox regarding why we choose to 
include the boundaries? 



> LatLonShapePolygonQuery returning incorrect WITHIN results with shared 
> boundaries
> -
>
> Key: LUCENE-8736
> URL: https://issues.apache.org/jira/browse/LUCENE-8736
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Nicholas Knize
>Priority: Major
> Attachments: LUCENE-8736.patch, LUCENE-8736.patch, 
> adaptive-decoding.patch
>
>
> Triangles that are {{WITHIN}} a target polygon query that also share a 
> boundary with the polygon are incorrectly reported as {{CROSSES}} instead of 
> {{INSIDE}}. This leads to incorrect {{WITHIN}} query results  as demonstrated 
> in the following test:
> {code:java}
>   public void testWithinFailure() throws Exception {
> Directory dir = newDirectory();
> RandomIndexWriter w = new RandomIndexWriter(random(), dir);
> // test polygons:
> Polygon indexPoly1 = new Polygon(new double[] {4d, 4d, 3d, 3d, 4d}, new 
> double[] {3d, 4d, 4d, 3d, 3d});
> Polygon indexPoly2 = new Polygon(new double[] {2d, 2d, 1d, 1d, 2d}, new 
> double[] {6d, 7d, 7d, 6d, 6d});
> Polygon indexPoly3 = new Polygon(new double[] {1d, 1d, 0d, 0d, 1d}, new 
> double[] {3d, 4d, 4d, 3d, 3d});
> Polygon indexPoly4 = new Polygon(new double[] {2d, 2d, 1d, 1d, 2d}, new 
> double[] {0d, 1d, 1d, 0d, 0d});
> // index polygons:
> Document doc;
> addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly1);
> w.addDocument(doc);
> addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly2);
> w.addDocument(doc);
> addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly3);
> w.addDocument(doc);
> addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly4);
> w.addDocument(doc);
> / search //
> IndexReader reader = w.getReader();
> w.close();
> IndexSearcher searcher = newSearcher(reader);
> Polygon[] searchPoly = new Polygon[] {new Polygon(new double[] {4d, 4d, 
> 0d, 0d, 4d}, new double[] {0d, 7d, 7d, 0d, 0d})};
> Query q = LatLonShape.newPolygonQuery(FIELDNAME, QueryRelation.WITHIN, 
> searchPoly);
> assertEquals(4, searcher.count(q));
> IOUtils.close(w, reader, dir);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8758) Class Field levelN is not populated correctly in QuadPrefixTree

2019-04-10 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815114#comment-16815114
 ] 

Ignacio Vera commented on LUCENE-8758:
--

+1 to remove those arrays. They do not seem to be used and their content is 
buggy. That info can be deducted from the level anyway.

> Class Field levelN is not populated correctly in QuadPrefixTree
> ---
>
> Key: LUCENE-8758
> URL: https://issues.apache.org/jira/browse/LUCENE-8758
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spatial-extras
>Affects Versions: 4.0, 5.0, 6.0, 7.0, 8.0
>Reporter: Dominic Page
>Priority: Trivial
>  Labels: beginner
> Fix For: 8.x
>
>
> QuadPrefixTree in Lucene prepopulates these arrays:
> {{levelW = new double[maxLevels];}}
> {{levelH = new double[maxLevels];}}
> {{*levelS = new int[maxLevels];*}}
> {{*levelN = new int[maxLevels];*}}
> Like this
> {{for (int i = 1; i < levelW.length; i++) {}}
> {{ levelW[i] = levelW[i - 1] / 2.0;}}
> {{ levelH[i] = levelH[i - 1] / 2.0;}}
> {{ *levelS[i] = levelS[i - 1] * 2;*}}
> {{ *levelN[i] = levelN[i - 1] * 4;*}}
> {{}}}
> The field
> {{levelN[]}}
> overflows after level 14 = 1073741824 where maxLevels is limited to 
> {{MAX_LEVELS_POSSIBLE = 50;}}
> The field levelN appears not to be used anywhere. Likewise, the field
> {{levelS[] }}
> is only used in the 
> {{printInfo}}
> method. I would propose either to remove both 
> {{levelN[],}}{{levelS[]}}
> or to change the datatype
> {{levelN = new long[maxLevels];}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8761) RandomGeoPolygonTest.testCompareSmallPolygons test failure

2019-04-10 Thread Ignacio Vera (JIRA)
Ignacio Vera created LUCENE-8761:


 Summary: RandomGeoPolygonTest.testCompareSmallPolygons test failure
 Key: LUCENE-8761
 URL: https://issues.apache.org/jira/browse/LUCENE-8761
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/spatial3d
Reporter: Ignacio Vera


Reproduce with:
{code:java}
ant test  -Dtestcase=RandomGeoPolygonTest 
-Dtests.method=testCompareSmallPolygons -Dtests.seed=5616B8AF18E73F5 
-Dtests.slow=true -Dtests.badapples=true -Dtests.locale=it-IT 
-Dtests.timezone=Africa/Luanda -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8{code}
 

Output:
{code:java}
   [junit4] FAILURE 0.25s | RandomGeoPolygonTest.testCompareSmallPolygons 
{seed=[5616B8AF18E73F5:3F0F8C29D1934F55]} <<<
   [junit4]    > Throwable #1: java.lang.AssertionError: 
   [junit4]    > Standard polygon: GeoCompositePolygon: {[GeoConvexPolygon: 
{planetmodel=PlanetModel.WGS84, points=[[lat=4.0E-323, 
lon=-1.5642641387776646([X=0.0065394500758205526, Y=-1.0010974954578205, 
Z=4.0E-323])], [lat=-4.1204793742327684E-7, 
lon=-1.5642638751890106([X=0.0065397139537611325, Y=-1.0010974937339752, 
Z=-4.125089589031438E-7])], [lat=4.7042128367572116E-7, 
lon=-1.5642630821305357([X=0.006540507882610503, Y=-1.0010974885472588, 
Z=4.709476164070912E-7])], [lat=1.0079428885548814E-6, 
lon=-1.5642647153663134([X=0.006538872854363873, Y=-1.0010974992277146, 
Z=1.0090706294797576E-6])]], internalEdges={3}}, GeoConvexPolygon: 
{planetmodel=PlanetModel.WGS84, points=[[lat=4.0E-323, 
lon=-1.5642641387776646([X=0.0065394500758205526, Y=-1.0010974954578205, 
Z=4.0E-323])], [lat=1.0079428885548814E-6, 
lon=-1.5642647153663134([X=0.006538872854363873, Y=-1.0010974992277146, 
Z=1.0090706294797576E-6])], [lat=5.018093706393593E-7, 
lon=-1.5642651810798487([X=0.006538406629710141, Y=-1.0010975022732327, 
Z=5.02370822057141E-7])]], internalEdges={0}}]}
   [junit4]    > Large polygon: GeoComplexPolygon: 
{planetmodel=PlanetModel.WGS84, number of shapes=1, address=24da76d0, 
testPoint=[lat=3.1362512108941996E-7, 
lon=-1.5642641985086745([X=0.006539390279255702, Y=-1.0010974958483772, 
Z=3.139760218082874E-7])], testPointInSet=true, shapes={ 
{[lat=5.018093706393593E-7, lon=-1.5642651810798487([X=0.006538406629710141, 
Y=-1.0010975022732327, Z=5.02370822057141E-7])], [lat=4.0E-323, 
lon=-1.5642641387776646([X=0.0065394500758205526, Y=-1.0010974954578205, 
Z=4.0E-323])], [lat=-4.1204793742327684E-7, 
lon=-1.5642638751890106([X=0.0065397139537611325, Y=-1.0010974937339752, 
Z=-4.125089589031438E-7])], [lat=4.7042128367572116E-7, 
lon=-1.5642630821305357([X=0.006540507882610503, Y=-1.0010974885472588, 
Z=4.709476164070912E-7])], [lat=1.0079428885548814E-6, 
lon=-1.5642647153663134([X=0.006538872854363873, Y=-1.0010974992277146, 
Z=1.0090706294797576E-6])]}}
   [junit4]    > Point: [lat=2.8705346198541964E-8, 
lon=7.537339947889447E-73([X=1.0011188539924787, Y=7.545773130782812E-73, 
Z=2.8737463289741695E-8])]
   [junit4]    > WKT: POLYGON((-89.62573319562668 2.263E-321,-89.62571809310927 
-2.3608607771424413E-5,-89.62567265420576 
2.6953154147745272E-5,-89.62576623172276 
5.775087350441979E-5,-89.62579291514281 2.875155905775134E-5,-89.62573319562668 
2.263E-321))
   [junit4]    > WKT: POINT(4.318577677694212E-71 1.6446951866383562E-6)
   [junit4]    > normal polygon: false
   [junit4]    > large polygon: true{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8708) Can we simplify conjunctions of range queries automatically?

2019-04-09 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813239#comment-16813239
 ] 

Ignacio Vera commented on LUCENE-8708:
--

Just an idea maybe bias for my background.

One of the issues here is that we visit the tree for each range and this is 
what we are trying to improve. Maybe adding a query that can accept more than 
one range with a logical relationship ('AND', 'OR',...) might be less invasive 
and encapsulates the logic.

> Can we simplify conjunctions of range queries automatically?
> 
>
> Key: LUCENE-8708
> URL: https://issues.apache.org/jira/browse/LUCENE-8708
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: interval_range_clauses_merging0704.patch
>
>
> BooleanQuery#rewrite already has some logic to make queries more efficient, 
> such as deduplicating filters or rewriting boolean queries that wrap a single 
> positive clause to that clause.
> It would be nice to also simplify conjunctions of range queries, so that eg. 
> {{foo: [5 TO *] AND foo:[* TO 20]}} would be rewritten to {{foo:[5 TO 20]}}. 
> When constructing queries manually or via the classic query parser, it feels 
> unnecessary as this is something that the user can fix easily. However if you 
> want to implement a query parser that only allows specifying one bound at 
> once, such as Gmail ({{after:2018-12-31}} 
> https://support.google.com/mail/answer/7190?hl=en) or GitHub 
> ({{updated:>=2018-12-31}} 
> https://help.github.com/en/articles/searching-issues-and-pull-requests#search-by-when-an-issue-or-pull-request-was-created-or-last-updated)
>  then you might end up with inefficient queries if the end user specifies 
> both an upper and a lower bound. It would be nice if we optimized those 
> automatically.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8746) Make EdgeTree (aka ComponentTree) support different type of components

2019-03-29 Thread Ignacio Vera (JIRA)
Ignacio Vera created LUCENE-8746:


 Summary: Make EdgeTree (aka ComponentTree) support different type 
of components
 Key: LUCENE-8746
 URL: https://issues.apache.org/jira/browse/LUCENE-8746
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Ignacio Vera


Currently the class {{EdgeTree}} is a bit confusing as it is in reality a tree 
of components. The inner class {{Edge}} is the one that builds a tree of edges 
which is used by Polygon2D and Line2D to represent their structure.

Here is proposed:

1) Create a new class called {{ComponentTree}} which is in fact the current 
{{EdgeTree}}

2) Modify {{EdgeTree}} to be in fact the inner class Edge

3) Extract a {{Component}} interface so we can have different types of 
components in the same tree. This allow us to support heterogeneous trees of 
components.

4) Make {{Polygon2D}} and {{Line2D}} instance of the component interface.

4) With this change, {{LatLonShapePolygonQuery}} and {{LatLonShapeLineQuery}} 
can be replaced with one {{LatLonShapeComponentQuery.}}  

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8744) TestTessellator#testLinesIntersect failure

2019-03-28 Thread Ignacio Vera (JIRA)
Ignacio Vera created LUCENE-8744:


 Summary: TestTessellator#testLinesIntersect failure
 Key: LUCENE-8744
 URL: https://issues.apache.org/jira/browse/LUCENE-8744
 Project: Lucene - Core
  Issue Type: Test
Reporter: Ignacio Vera


Reproduce with:
{code:java}
ant test  -Dtestcase=TestTessellator -Dtests.method=testLinesIntersect 
-Dtests.seed=D8AE5A1A4CA3A81D -Dtests.slow=true -Dtests.badapples=true 
-Dtests.locale=ar-IQ -Dtests.timezone=Europe/Sarajevo -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-8736) LatLonShapePolygonQuery returning incorrect WITHIN results with shared boundaries

2019-03-27 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16802637#comment-16802637
 ] 

Ignacio Vera edited comment on LUCENE-8736 at 3/27/19 10:43 AM:


One thing it worries me about this approach is that it is adding quite a lot of 
complexity and a performance penalty on the {{relate}} methods, even for 
INTERSECTS queries. I think the problem here is not so much the mathematical 
accuracy to know if an edge terminates in another but encoding/decoding 
distortion of those edges.

In your original example we take an edge of an indexed polygon, for example 
LINESTRING(1 0, 2 0), then after encoding/decoding the edge becomes 
LINESTRING(0.999403953552 0,1.999646097422 0). On the other hand the 
query polygon contains the edge LINESTRING(0 0, 7 0). Therefore indexed and 
query polygons shared the edge because the latitude value does not change 
during encoding/decoding and the logic you propose handles the situation. I 
think this is an exception more than the general case.

If the indexed edge is  LINESTRING(1 1, 2 1), then after encoding/decoding the 
edge becomes LINESTRING( 0.999403953552 
0.999823048711,1.999646097422 0.999823048711). The query polygon 
contains the edge  LINESTRING(1 0, 1 7) which is not a shared edge anymore 
because the latitude value has changed and the new logic has no effect. I think 
this is the general case.

I am more inclined to leave the {{relate}} logic simple and fast and solve this 
edge cases in a different way. One of the ideas I am thinking about is to use 
some adaptive decoding. In this case INTERSECTS and DISJOINT queries work as 
they do now. For WITHIN queries we decode the indexed triangles minimising the 
area they cover by rounding up min values and rounding down max values. This 
seems to work well when triangles have an area (not points or lines).

 

 


was (Author: ivera):
One thing it worries me about this approach is that it is adding quite a lot of 
complexity and a performance penalty on the {{relate}} methods, even for 
INTERSECTS queries. I think the problem here is not so much the mathematical 
accuracy to know if an edge terminates in another but encoding/decoding 
distortion of those edges.

In your original example we take an edge of an indexed polygon, for example 
LINESTRING(1 0, 2 0), then after encoding and decoding the edge becomes 
LINESTRING(0.999403953552 0,1.999646097422 0). On the other hand the 
query polygon contains the edge LINESTRING(0 0, 7 0). Therefore indexed and 
query polygons shared the edge because the latitude value does not change one 
decoding and the logic you propose handles the situation. I think this is an 
exception more than the general case.

For example If the indexed edge is  LINESTRING(1 1, 2 1), then after encoding 
and decoding the edge becomes LINESTRING( 0.999403953552 
0.999823048711,1.999646097422 0.999823048711). The query polygon 
contains the edge  LINESTRING(1 0, 1 7) which is not a shared edge anymore 
because the latitude value has changed and the new logic has no effect. I think 
this is the general case.

I am more inclined to leave the {{relate}} logic simple and fast and solve this 
edge cases in a different way. One of the ideas I am thinking about is to use 
some adaptive decoding. In this case INTERSECTS and DISJOINT queries work as 
they do now. For WITHIN queries we decode the indexed triangles minimising the 
area they cover by rounding up min values and rounding down max values. This 
seems to work well when triangles have an area (not points or lines).

 

 

> LatLonShapePolygonQuery returning incorrect WITHIN results with shared 
> boundaries
> -
>
> Key: LUCENE-8736
> URL: https://issues.apache.org/jira/browse/LUCENE-8736
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Nicholas Knize
>Priority: Major
> Attachments: LUCENE-8736.patch
>
>
> Triangles that are {{WITHIN}} a target polygon query that also share a 
> boundary with the polygon are incorrectly reported as {{CROSSES}} instead of 
> {{INSIDE}}. This leads to incorrect {{WITHIN}} query results  as demonstrated 
> in the following test:
> {code:java}
>   public void testWithinFailure() throws Exception {
> Directory dir = newDirectory();
> RandomIndexWriter w = new RandomIndexWriter(random(), dir);
> // test polygons:
> Polygon indexPoly1 = new Polygon(new double[] {4d, 4d, 3d, 3d, 4d}, new 
> double[] {3d, 4d, 4d, 3d, 3d});
> Polygon indexPoly2 = new Polygon(new double[] {2d, 2d, 1d, 1d, 2d}, new 
> double[] {6d, 7d, 7d, 6d, 6d});
> Polygon indexPoly3 = new Polygon(new double[] {1d, 1d, 0d, 0d, 1d}, new 
> double[] {3d, 4d, 4d, 3d, 3d});
>  

[jira] [Commented] (LUCENE-8736) LatLonShapePolygonQuery returning incorrect WITHIN results with shared boundaries

2019-03-27 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16802637#comment-16802637
 ] 

Ignacio Vera commented on LUCENE-8736:
--

One thing it worries me about this approach is that it is adding quite a lot of 
complexity and a performance penalty on the {{relate}} methods, even for 
INTERSECTS queries. I think the problem here is not so much the mathematical 
accuracy to know if an edge terminates in another but encoding/decoding 
distortion of those edges.

In your original example we take an edge of an indexed polygon, for example 
LINESTRING(1 0, 2 0), then after encoding and decoding the edge becomes 
LINESTRING(0.999403953552 0,1.999646097422 0). On the other hand the 
query polygon contains the edge LINESTRING(0 0, 7 0). Therefore indexed and 
query polygons shared the edge because the latitude value does not change one 
decoding and the logic you propose handles the situation. I think this is an 
exception more than the general case.

For example If the indexed edge is  LINESTRING(1 1, 2 1), then after encoding 
and decoding the edge becomes LINESTRING( 0.999403953552 
0.999823048711,1.999646097422 0.999823048711). The query polygon 
contains the edge  LINESTRING(1 0, 1 7) which is not a shared edge anymore 
because the latitude value has changed and the new logic has no effect. I think 
this is the general case.

I am more inclined to leave the {{relate}} logic simple and fast and solve this 
edge cases in a different way. One of the ideas I am thinking about is to use 
some adaptive decoding. In this case INTERSECTS and DISJOINT queries work as 
they do now. For WITHIN queries we decode the indexed triangles minimising the 
area they cover by rounding up min values and rounding down max values. This 
seems to work well when triangles have an area (not points or lines).

 

 

> LatLonShapePolygonQuery returning incorrect WITHIN results with shared 
> boundaries
> -
>
> Key: LUCENE-8736
> URL: https://issues.apache.org/jira/browse/LUCENE-8736
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Nicholas Knize
>Priority: Major
> Attachments: LUCENE-8736.patch
>
>
> Triangles that are {{WITHIN}} a target polygon query that also share a 
> boundary with the polygon are incorrectly reported as {{CROSSES}} instead of 
> {{INSIDE}}. This leads to incorrect {{WITHIN}} query results  as demonstrated 
> in the following test:
> {code:java}
>   public void testWithinFailure() throws Exception {
> Directory dir = newDirectory();
> RandomIndexWriter w = new RandomIndexWriter(random(), dir);
> // test polygons:
> Polygon indexPoly1 = new Polygon(new double[] {4d, 4d, 3d, 3d, 4d}, new 
> double[] {3d, 4d, 4d, 3d, 3d});
> Polygon indexPoly2 = new Polygon(new double[] {2d, 2d, 1d, 1d, 2d}, new 
> double[] {6d, 7d, 7d, 6d, 6d});
> Polygon indexPoly3 = new Polygon(new double[] {1d, 1d, 0d, 0d, 1d}, new 
> double[] {3d, 4d, 4d, 3d, 3d});
> Polygon indexPoly4 = new Polygon(new double[] {2d, 2d, 1d, 1d, 2d}, new 
> double[] {0d, 1d, 1d, 0d, 0d});
> // index polygons:
> Document doc;
> addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly1);
> w.addDocument(doc);
> addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly2);
> w.addDocument(doc);
> addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly3);
> w.addDocument(doc);
> addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly4);
> w.addDocument(doc);
> / search //
> IndexReader reader = w.getReader();
> w.close();
> IndexSearcher searcher = newSearcher(reader);
> Polygon[] searchPoly = new Polygon[] {new Polygon(new double[] {4d, 4d, 
> 0d, 0d, 4d}, new double[] {0d, 7d, 7d, 0d, 0d})};
> Query q = LatLonShape.newPolygonQuery(FIELDNAME, QueryRelation.WITHIN, 
> searchPoly);
> assertEquals(4, searcher.count(q));
> IOUtils.close(w, reader, dir);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8736) LatLonShapePolygonQuery returning incorrect WITHIN results with shared boundaries

2019-03-26 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16801882#comment-16801882
 ] 

Ignacio Vera commented on LUCENE-8736:
--

+1 I will look into the logic in more detail.

> LatLonShapePolygonQuery returning incorrect WITHIN results with shared 
> boundaries
> -
>
> Key: LUCENE-8736
> URL: https://issues.apache.org/jira/browse/LUCENE-8736
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Nicholas Knize
>Priority: Major
> Attachments: LUCENE-8736.patch
>
>
> Triangles that are {{WITHIN}} a target polygon query that also share a 
> boundary with the polygon are incorrectly reported as {{CROSSES}} instead of 
> {{INSIDE}}. This leads to incorrect {{WITHIN}} query results  as demonstrated 
> in the following test:
> {code:java}
>   public void testWithinFailure() throws Exception {
> Directory dir = newDirectory();
> RandomIndexWriter w = new RandomIndexWriter(random(), dir);
> // test polygons:
> Polygon indexPoly1 = new Polygon(new double[] {4d, 4d, 3d, 3d, 4d}, new 
> double[] {3d, 4d, 4d, 3d, 3d});
> Polygon indexPoly2 = new Polygon(new double[] {2d, 2d, 1d, 1d, 2d}, new 
> double[] {6d, 7d, 7d, 6d, 6d});
> Polygon indexPoly3 = new Polygon(new double[] {1d, 1d, 0d, 0d, 1d}, new 
> double[] {3d, 4d, 4d, 3d, 3d});
> Polygon indexPoly4 = new Polygon(new double[] {2d, 2d, 1d, 1d, 2d}, new 
> double[] {0d, 1d, 1d, 0d, 0d});
> // index polygons:
> Document doc;
> addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly1);
> w.addDocument(doc);
> addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly2);
> w.addDocument(doc);
> addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly3);
> w.addDocument(doc);
> addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly4);
> w.addDocument(doc);
> / search //
> IndexReader reader = w.getReader();
> w.close();
> IndexSearcher searcher = newSearcher(reader);
> Polygon[] searchPoly = new Polygon[] {new Polygon(new double[] {4d, 4d, 
> 0d, 0d, 4d}, new double[] {0d, 7d, 7d, 0d, 0d})};
> Query q = LatLonShape.newPolygonQuery(FIELDNAME, QueryRelation.WITHIN, 
> searchPoly);
> assertEquals(4, searcher.count(q));
> IOUtils.close(w, reader, dir);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-8736) LatLonShapePolygonQuery returning incorrect WITHIN results with shared boundaries

2019-03-25 Thread Ignacio Vera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800456#comment-16800456
 ] 

Ignacio Vera edited comment on LUCENE-8736 at 3/25/19 8:25 AM:
---

Thanks [~nknize] for sharing the algorithm, looks pretty powerful.

I had a look into the patch and the first thing I notice is that the test 
{{testLUCENE8669}} is failing. It seems that the indexed polygons are never 
added to the index so fix is trivial (Method {{w.addDocument(doc)}} has been 
removed in the patch):
{code:java}
-Field[] fields = LatLonShape.createIndexableFields("test", indexPoly1);
-for (Field f : fields) {
-  doc.add(f);
-}
-fields = LatLonShape.createIndexableFields("test", indexPoly2);
-for (Field f : fields) {
-  doc.add(f);
-}
-w.addDocument(doc);
+addPolygonsToDoc(FIELDNAME, doc, indexPoly1);
+addPolygonsToDoc(FIELDNAME, doc, indexPoly2);
 w.forceMerge(1);{code}
 

Regarding the new approach, it seems there is still something missing as it 
does not matter to make the methods more precise if we do not take any action 
regarding the distortion of polygons due to quatization.

If we change the test by translating the polygons one degree north and 1 degree 
east, the change does not have effect due to the encoding of the indexed 
polygons:

 
{code:java}
public void testWithinFailure() throws Exception {
Directory dir = newDirectory();
RandomIndexWriter w = new RandomIndexWriter(random(), dir);

// test polygons:
Polygon indexPoly1 = new Polygon(new double[] {4d + 1d, 4d + 1d, 3d + 1d, 
3d + 1d, 4d + 1d}, new double[] {3d + 1d, 4d + 1d, 4d + 1d, 3d + 1d, 3d + 1d});
Polygon indexPoly2 = new Polygon(new double[] {2d + 1d, 2d + 1d, 1d + 1d, 
1d + 1d, 2d + 1d}, new double[] {6d + 1d, 7d + 1d, 7d + 1d, 6d + 1d, 6d + 1d});
Polygon indexPoly3 = new Polygon(new double[] {1d + 1d, 1d + 1d, 0d + 1d, 
0d + 1d, 1d + 1d}, new double[] {3d + 1d, 4d + 1d, 4d + 1d, 3d + 1d, 3d + 1d});
Polygon indexPoly4 = new Polygon(new double[] {2d + 1d, 2d + 1d, 1d + 1d, 
1d + 1d, 2d + 1d}, new double[] {0d + 1d, 1d + 1d, 1d + 1d, 0d + 1d, 0d + 1d});

// index polygons:
Document doc;
addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly1);
w.addDocument(doc);
addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly2);
w.addDocument(doc);
addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly3);
w.addDocument(doc);
addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly4);
w.addDocument(doc);

/ search //
IndexReader reader = w.getReader();
w.close();
IndexSearcher searcher = newSearcher(reader);

Polygon[] searchPoly = new Polygon[] {new Polygon(new double[] {4d + 1d, 4d 
+ 1d, 0d + 1d, 0d + 1d, 4d + 1d}, new double[] {0d + 1d, 7d + 1d, 7d + 1d, 0d + 
1d, 0d + 1d})};

Query q = LatLonShape.newPolygonQuery(FIELDNAME, QueryRelation.WITHIN, 
searchPoly);
assertEquals(4, searcher.count(q));
IOUtils.close(w, reader, dir);
  }{code}
 

I have tried to quantize the query polygon but that seems to add other issues.


was (Author: ivera):
Thanks [~nknize] for sharing the algorithm, looks pretty powerful.

I had a look into the patch and the first thing I notice is that the test 
{{testLUCENE8669}} is failing. It seems that the indexed polygons are never 
added to the index so fix is trivial (Method {{w.addDocument(doc)}} has been 
removed in the patch):
{code:java}
-Field[] fields = LatLonShape.createIndexableFields("test", indexPoly1);
-for (Field f : fields) {
-  doc.add(f);
-}
-fields = LatLonShape.createIndexableFields("test", indexPoly2);
-for (Field f : fields) {
-  doc.add(f);
-}
-w.addDocument(doc);
+addPolygonsToDoc(FIELDNAME, doc, indexPoly1);
+addPolygonsToDoc(FIELDNAME, doc, indexPoly2);
 w.forceMerge(1);{code}
 

Regarding the new approach, it seems there is still something missing. If we 
change the test by translating the polygons one degree north and 1 degree east, 
the change does not have effect due to the encoding of the indexed polygons:

 
{code:java}
public void testWithinFailure() throws Exception {
Directory dir = newDirectory();
RandomIndexWriter w = new RandomIndexWriter(random(), dir);

// test polygons:
Polygon indexPoly1 = new Polygon(new double[] {4d + 1d, 4d + 1d, 3d + 1d, 
3d + 1d, 4d + 1d}, new double[] {3d + 1d, 4d + 1d, 4d + 1d, 3d + 1d, 3d + 1d});
Polygon indexPoly2 = new Polygon(new double[] {2d + 1d, 2d + 1d, 1d + 1d, 
1d + 1d, 2d + 1d}, new double[] {6d + 1d, 7d + 1d, 7d + 1d, 6d + 1d, 6d + 1d});
Polygon indexPoly3 = new Polygon(new double[] {1d + 1d, 1d + 1d, 0d + 1d, 
0d + 1d, 1d + 1d}, new double[] {3d + 1d, 4d + 1d, 4d + 1d, 3d + 1d, 3d + 1d});
Polygon indexPoly4 = new Polygon(new double[] {2d + 1d, 2d + 1d, 1d + 1d, 
1d + 1d, 2d + 1d

  1   2   3   4   5   6   7   >