[jira] [Resolved] (LUCENE-8976) Use exact distance between point and bounding rectangle in FloatPointNearestNeighbor
[ https://issues.apache.org/jira/browse/LUCENE-8976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera resolved LUCENE-8976. -- Fix Version/s: 8.3 Assignee: Ignacio Vera Resolution: Fixed > Use exact distance between point and bounding rectangle in > FloatPointNearestNeighbor > > > Key: LUCENE-8976 > URL: https://issues.apache.org/jira/browse/LUCENE-8976 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Assignee: Ignacio Vera >Priority: Minor > Fix For: 8.3 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Calculating minimum distance between a point and a bounding rectangle can be > computed quite efficiently. This allows the FloatPointNearestNeighbor > algorithm to discard inner nodes based on that calculation. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8976) Use exact distance between point and bounding rectangle in FloatPointNearestNeighbor
Ignacio Vera created LUCENE-8976: Summary: Use exact distance between point and bounding rectangle in FloatPointNearestNeighbor Key: LUCENE-8976 URL: https://issues.apache.org/jira/browse/LUCENE-8976 Project: Lucene - Core Issue Type: Improvement Reporter: Ignacio Vera Calculating minimum distance between a point and a bounding rectangle can be computed quite efficiently. This allows the FloatPointNearestNeighbor algorithm to discard inner nodes based on that calculation. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8968) Improve performance of WITHIN and DISJOINT queries for Shape queries
[ https://issues.apache.org/jira/browse/LUCENE-8968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera resolved LUCENE-8968. -- Fix Version/s: 8.3 Assignee: Ignacio Vera Resolution: Fixed > Improve performance of WITHIN and DISJOINT queries for Shape queries > > > Key: LUCENE-8968 > URL: https://issues.apache.org/jira/browse/LUCENE-8968 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Assignee: Ignacio Vera >Priority: Major > Fix For: 8.3 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > We are currently walking the tree twice for INTERSECTS and WITHIN queries in > ShapeQuery when we can do it in just one pass. Still we need most of the > times to visit all documents to remove false positives due to multi-shapes > except in the case where all documents up to maxDoc are on the tree. > This issue refactors that class and tries to improve the strategy for such > cases. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8964) Allow GeoJSON parser to properly skip string arrays
[ https://issues.apache.org/jira/browse/LUCENE-8964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera resolved LUCENE-8964. -- Fix Version/s: 8.3 Resolution: Fixed Thanks [~aleree] > Allow GeoJSON parser to properly skip string arrays > --- > > Key: LUCENE-8964 > URL: https://issues.apache.org/jira/browse/LUCENE-8964 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Affects Versions: trunk >Reporter: Alexander Reelsen >Assignee: Ignacio Vera >Priority: Trivial > Fix For: 8.3 > > Attachments: lucene-parse-geojson-arrays-0.patch > > Time Spent: 20m > Remaining Estimate: 0h > > The Geo JSON parser throws an exception when trying to parse an array of > strings, which is somewhat common in some free geojson services like > [https://whosonfirst.org|https://whosonfirst.org/] > An example file can be seen at > [https://data.whosonfirst.org/101/748/479/101748479.geojson] > This fixes the parser to also parse a string array. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8973) XYRectangle2D should work on float space
[ https://issues.apache.org/jira/browse/LUCENE-8973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera updated LUCENE-8973: - Description: While working in CONTAINS support for shapes I came across errors in XYShape when querying with a bounding box. After looking to the errors it is clear that the issue is that XYRectangle2D is working in the encoding space. In this case, XYShape encoding is not lineal and shapes loose their spatial relationship. XYRectangle2D should work then on the float space. > XYRectangle2D should work on float space > > > Key: LUCENE-8973 > URL: https://issues.apache.org/jira/browse/LUCENE-8973 > Project: Lucene - Core > Issue Type: Bug > Environment: >Reporter: Ignacio Vera >Priority: Major > > While working in CONTAINS support for shapes I came across errors in XYShape > when querying with a bounding box. After looking to the errors it is clear > that the issue is that XYRectangle2D is working in the encoding space. In > this case, XYShape encoding is not lineal and shapes loose their spatial > relationship. > XYRectangle2D should work then on the float space. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8973) XYRectangle2D should work on float space
[ https://issues.apache.org/jira/browse/LUCENE-8973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera updated LUCENE-8973: - Environment: was: While working in CONTAINS support for shapes I came across errors in XYShape when querying with a bounding box. After looking to the errors it is clear that the issue is that XYRectangle2D is working in the encoding space. In this case, XYShape encoding is not lineal and shapes loose their spatial relationship. XYRectangle2D should work then on the float space. > XYRectangle2D should work on float space > > > Key: LUCENE-8973 > URL: https://issues.apache.org/jira/browse/LUCENE-8973 > Project: Lucene - Core > Issue Type: Bug > Environment: >Reporter: Ignacio Vera >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8973) XYRectangle2D should work on float space
Ignacio Vera created LUCENE-8973: Summary: XYRectangle2D should work on float space Key: LUCENE-8973 URL: https://issues.apache.org/jira/browse/LUCENE-8973 Project: Lucene - Core Issue Type: Bug Environment: While working in CONTAINS support for shapes I came across errors in XYShape when querying with a bounding box. After looking to the errors it is clear that the issue is that XYRectangle2D is working in the encoding space. In this case, XYShape encoding is not lineal and shapes loose their spatial relationship. XYRectangle2D should work then on the float space. Reporter: Ignacio Vera -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8964) Allow GeoJSON parser to properly skip string arrays
[ https://issues.apache.org/jira/browse/LUCENE-8964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924220#comment-16924220 ] Ignacio Vera commented on LUCENE-8964: -- Thanks Alex! Patch looks good I will commit soon. > Allow GeoJSON parser to properly skip string arrays > --- > > Key: LUCENE-8964 > URL: https://issues.apache.org/jira/browse/LUCENE-8964 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Affects Versions: trunk >Reporter: Alexander Reelsen >Assignee: Ignacio Vera >Priority: Trivial > Attachments: lucene-parse-geojson-arrays-0.patch > > > The Geo JSON parser throws an exception when trying to parse an array of > strings, which is somewhat common in some free geojson services like > [https://whosonfirst.org|https://whosonfirst.org/] > An example file can be seen at > [https://data.whosonfirst.org/101/748/479/101748479.geojson] > This fixes the parser to also parse a string array. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-8964) Allow GeoJSON parser to properly skip string arrays
[ https://issues.apache.org/jira/browse/LUCENE-8964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera reassigned LUCENE-8964: Assignee: Ignacio Vera > Allow GeoJSON parser to properly skip string arrays > --- > > Key: LUCENE-8964 > URL: https://issues.apache.org/jira/browse/LUCENE-8964 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Affects Versions: trunk >Reporter: Alexander Reelsen >Assignee: Ignacio Vera >Priority: Trivial > Attachments: lucene-parse-geojson-arrays-0.patch > > > The Geo JSON parser throws an exception when trying to parse an array of > strings, which is somewhat common in some free geojson services like > [https://whosonfirst.org|https://whosonfirst.org/] > An example file can be seen at > [https://data.whosonfirst.org/101/748/479/101748479.geojson] > This fixes the parser to also parse a string array. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8968) Improve performance of WITHIN and DISJOINT queries for Shape queries
Ignacio Vera created LUCENE-8968: Summary: Improve performance of WITHIN and DISJOINT queries for Shape queries Key: LUCENE-8968 URL: https://issues.apache.org/jira/browse/LUCENE-8968 Project: Lucene - Core Issue Type: Improvement Reporter: Ignacio Vera We are currently walking the tree twice for INTERSECTS and WITHIN queries in ShapeQuery when we can do it in just one pass. Still we need most of the times to visit all documents to remove false positives due to multi-shapes except in the case where all documents up to maxDoc are on the tree. This issue refactors that class and tries to improve the strategy for such cases. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8960) Add LatLonDocValuesPointInPolygonQuery
[ https://issues.apache.org/jira/browse/LUCENE-8960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera resolved LUCENE-8960. -- Fix Version/s: 8.3 Assignee: Ignacio Vera Resolution: Fixed > Add LatLonDocValuesPointInPolygonQuery > -- > > Key: LUCENE-8960 > URL: https://issues.apache.org/jira/browse/LUCENE-8960 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Assignee: Ignacio Vera >Priority: Minor > Fix For: 8.3 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently LatLonDocValuesField contain queries for bounding box and circle. > This issue adds a polygon query as well. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8960) Add LatLonDocValuesPointInPolygonQuery
Ignacio Vera created LUCENE-8960: Summary: Add LatLonDocValuesPointInPolygonQuery Key: LUCENE-8960 URL: https://issues.apache.org/jira/browse/LUCENE-8960 Project: Lucene - Core Issue Type: Improvement Reporter: Ignacio Vera Currently LatLonDocValuesField contain queries for bounding box and circle. This issue adds a polygon query as well. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8955) Move compare logic to IntersectVisitor in NearestNeighbor
[ https://issues.apache.org/jira/browse/LUCENE-8955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916777#comment-16916777 ] Ignacio Vera commented on LUCENE-8955: -- Nice speed up of 10 nearest points query on geo benchmarks (65%): https://home.apache.org/~mikemccand/geobench.html#search-nearest_10 > Move compare logic to IntersectVisitor in NearestNeighbor > - > > Key: LUCENE-8955 > URL: https://issues.apache.org/jira/browse/LUCENE-8955 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Assignee: Ignacio Vera >Priority: Minor > Fix For: 8.3 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Similar to LUCENE-8914, move compare logic to the IntersectVisitor so we can > take advantage of the improvement added on LUCENE-7862. I ran the > geoBenchmark for nearest 10 locally and the change provides an improvement of > around 30%. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8955) Move compare logic to IntersectVisitor in NearestNeighbor
[ https://issues.apache.org/jira/browse/LUCENE-8955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera resolved LUCENE-8955. -- Fix Version/s: 8.3 Assignee: Ignacio Vera Resolution: Fixed > Move compare logic to IntersectVisitor in NearestNeighbor > - > > Key: LUCENE-8955 > URL: https://issues.apache.org/jira/browse/LUCENE-8955 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Assignee: Ignacio Vera >Priority: Minor > Fix For: 8.3 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Similar to LUCENE-8914, move compare logic to the IntersectVisitor so we can > take advantage of the improvement added on LUCENE-7862. I ran the > geoBenchmark for nearest 10 locally and the change provides an improvement of > around 30%. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8952) Use a sort key instead of true distance in NearestNeighbors.
[ https://issues.apache.org/jira/browse/LUCENE-8952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera resolved LUCENE-8952. -- Fix Version/s: 8.3 Assignee: Ignacio Vera Resolution: Fixed > Use a sort key instead of true distance in NearestNeighbors. > > > Key: LUCENE-8952 > URL: https://issues.apache.org/jira/browse/LUCENE-8952 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Julie Tibshirani >Assignee: Ignacio Vera >Priority: Minor > Fix For: 8.3 > > Time Spent: 50m > Remaining Estimate: 0h > > The NearestNeighbors class contains a TODO to switch to > SloppyMath.haversinSortKey when comparing candidate nearest neighbors. This > change is not high priority, but could be a nice way to get more familiar > with the kNN search implementation. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-13452) Update the lucene-solr build from Ivy+Ant+Maven (shadow build) to Gradle.
[ https://issues.apache.org/jira/browse/SOLR-13452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16913088#comment-16913088 ] Ignacio Vera edited comment on SOLR-13452 at 8/22/19 7:55 AM: -- I had a look to the implementation and I found that the new Lucene modules monitor and Luke seems not to be included in the gradle build. And a question: which is the gradle command similar to ant precommit? was (Author: ivera): I had a look to the implementation and I found that the new Lucene modules monitor and Luke seems not to be included in the cradle build. And a question: which is the griddle command similar to ant precommit? > Update the lucene-solr build from Ivy+Ant+Maven (shadow build) to Gradle. > - > > Key: SOLR-13452 > URL: https://issues.apache.org/jira/browse/SOLR-13452 > Project: Solr > Issue Type: Improvement > Components: Build >Reporter: Mark Miller >Assignee: Mark Miller >Priority: Major > Fix For: master (9.0) > > Attachments: gradle-build.pdf > > Time Spent: 10m > Remaining Estimate: 0h > > I took some things from the great work that Dat did in > [https://github.com/apache/lucene-solr/tree/jira/gradle] and took the ball a > little further. > > When working with gradle in sub modules directly, I recommend > [https://github.com/dougborg/gdub] > This gradle branch uses the following plugin for version locking, version > configuration and version consistency across modules: > [https://github.com/palantir/gradle-consistent-versions] > > https://github.com/apache/lucene-solr/tree/jira/SOLR-13452_gradle_5 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13452) Update the lucene-solr build from Ivy+Ant+Maven (shadow build) to Gradle.
[ https://issues.apache.org/jira/browse/SOLR-13452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16913088#comment-16913088 ] Ignacio Vera commented on SOLR-13452: - I had a look to the implementation and I found that the new Lucene modules monitor and Luke seems not to be included in the cradle build. And a question: which is the griddle command similar to ant precommit? > Update the lucene-solr build from Ivy+Ant+Maven (shadow build) to Gradle. > - > > Key: SOLR-13452 > URL: https://issues.apache.org/jira/browse/SOLR-13452 > Project: Solr > Issue Type: Improvement > Components: Build >Reporter: Mark Miller >Assignee: Mark Miller >Priority: Major > Fix For: master (9.0) > > Attachments: gradle-build.pdf > > Time Spent: 10m > Remaining Estimate: 0h > > I took some things from the great work that Dat did in > [https://github.com/apache/lucene-solr/tree/jira/gradle] and took the ball a > little further. > > When working with gradle in sub modules directly, I recommend > [https://github.com/dougborg/gdub] > This gradle branch uses the following plugin for version locking, version > configuration and version consistency across modules: > [https://github.com/palantir/gradle-consistent-versions] > > https://github.com/apache/lucene-solr/tree/jira/SOLR-13452_gradle_5 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8955) Move compare logic to IntersectVisitor in NearestNeighbor
Ignacio Vera created LUCENE-8955: Summary: Move compare logic to IntersectVisitor in NearestNeighbor Key: LUCENE-8955 URL: https://issues.apache.org/jira/browse/LUCENE-8955 Project: Lucene - Core Issue Type: Improvement Reporter: Ignacio Vera Similar to LUCENE-8914, move compare logic to the IntersectVisitor so we can take advantage of the improvement added on LUCENE-7862. I ran the geoBenchmark for nearest 10 locally and the change provides an improvement of around 30%. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8928) BKDWriter could make splitting decisions based on the actual range of values
[ https://issues.apache.org/jira/browse/LUCENE-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16893763#comment-16893763 ] Ignacio Vera commented on LUCENE-8928: -- I tried to see the effect running on 1D ranges and it is the same as above. So +1 to apply this change only when numDims > 2 as it seems the right tree off. > BKDWriter could make splitting decisions based on the actual range of values > > > Key: LUCENE-8928 > URL: https://issues.apache.org/jira/browse/LUCENE-8928 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > Currently BKDWriter assumes that splitting on one dimension has no effect on > values in other dimensions. While this may be ok for geo points, this is > usually not true for ranges (or geo shapes, which are ranges too). Maybe we > could get better indexing by re-computing the range of values on each > dimension before making the choice of the split dimension? -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-8928) BKDWriter could make splitting decisions based on the actual range of values
[ https://issues.apache.org/jira/browse/LUCENE-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16893763#comment-16893763 ] Ignacio Vera edited comment on LUCENE-8928 at 7/26/19 12:02 PM: I tried to see the effect running on 1D ranges and it is the same as above. So +1 to apply this change only when numDims > 2 as it seems the right trade off. was (Author: ivera): I tried to see the effect running on 1D ranges and it is the same as above. So +1 to apply this change only when numDims > 2 as it seems the right tree off. > BKDWriter could make splitting decisions based on the actual range of values > > > Key: LUCENE-8928 > URL: https://issues.apache.org/jira/browse/LUCENE-8928 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > Currently BKDWriter assumes that splitting on one dimension has no effect on > values in other dimensions. While this may be ok for geo points, this is > usually not true for ranges (or geo shapes, which are ranges too). Maybe we > could get better indexing by re-computing the range of values on each > dimension before making the choice of the split dimension? -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8369) Remove the spatial module as it is obsolete
[ https://issues.apache.org/jira/browse/LUCENE-8369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16893752#comment-16893752 ] Ignacio Vera commented on LUCENE-8369: -- My best example is LUCENE-8746: I am trying to refactor the classes that contain the spatial logic and having them in different packages make it very difficult. In addition, how to asses what is common and what is exotic? Maybe pointInBox (which is a range anyway) is the most common case but pointInPolygon might start moving into the exotic area. > Remove the spatial module as it is obsolete > --- > > Key: LUCENE-8369 > URL: https://issues.apache.org/jira/browse/LUCENE-8369 > Project: Lucene - Core > Issue Type: Task > Components: modules/spatial >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Attachments: LUCENE-8369.patch > > > The "spatial" module is at this juncture nearly empty with only a couple > utilities that aren't used by anything in the entire codebase -- > GeoRelationUtils, and MortonEncoder. Perhaps it should have been removed > earlier in LUCENE-7664 which was the removal of GeoPointField which was > essentially why the module existed. Better late than never. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8369) Remove the spatial module as it is obsolete
[ https://issues.apache.org/jira/browse/LUCENE-8369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16892785#comment-16892785 ] Ignacio Vera commented on LUCENE-8369: -- My fear of having LatLonPoint in a different package to other spatial fields is code sharing. Most of the code used by LatLotPoint is reused when dealing with more complex shapes and having them in different packages hurts the API (Objects that should be package protected become public because the need of reuse them). This has been already the case working in the sandbox. > Remove the spatial module as it is obsolete > --- > > Key: LUCENE-8369 > URL: https://issues.apache.org/jira/browse/LUCENE-8369 > Project: Lucene - Core > Issue Type: Task > Components: modules/spatial >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Attachments: LUCENE-8369.patch > > > The "spatial" module is at this juncture nearly empty with only a couple > utilities that aren't used by anything in the entire codebase -- > GeoRelationUtils, and MortonEncoder. Perhaps it should have been removed > earlier in LUCENE-7664 which was the removal of GeoPointField which was > essentially why the module existed. Better late than never. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-8928) BKDWriter could make splitting decisions based on the actual range of values
[ https://issues.apache.org/jira/browse/LUCENE-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890173#comment-16890173 ] Ignacio Vera edited comment on LUCENE-8928 at 7/22/19 6:11 PM: --- I run this approach locally. It helps as well in the case of Geo3D (3 dimensions case) quite a bit. I tried different approaches to try to make indexation faster but so far no luck: ||Approach||Index time (sec)||Index time (sec)|| ||Force merge time (sec)||Force merge time (sec)|| ||Index size (GB)||Index size (GB)|| ||Reader heap (MB)||Reader heap (MB)|| || ||Dev||Base||Diff||Dev||Base||diff||Dev||Base||Diff||Dev||Base||Diff|| |points|181.1s|124.4s|46%|76.9s|53.5s|44%|0.55|0.55|-0%|1.57|1.57|0%| |shapes|327.4s|215.4s|52%|168.9s|120.2s|40%|1.28|1.29|-1%|1.62|1.61|0%| |geo3d|211.9s|154.7s|37%|94.3s|66.4s|42%|0.75|0.75|-0%|1.58|1.58|0%| ||Approach||Shape||M hits/sec||M hits/sec|| ||QPS ||QPS || ||Hit count ||Hit count|| || || ||Dev||Base ||Diff||Dev||Base||Diff||Dev||Base||Diff|| |points|box|94.34|94.84|-1%|95.99|96.50|-1%|221118844|221118844| 0%| |points|polyRussia|20.07|20.46|-2%|5.72|5.83|-2%|3508846|3508846| 0%| |points|poly 10|88.64|87.56| 1%|56.05|55.37| 1%|355809475|355809475| 0%| |points|polyMedium|10.47|10.54|-1%|128.26|129.15|-1%|2693559|2693559| 0%| |points|box|94.34|94.84|-1%|95.99|96.50|-1%|221118844|221118844| 0%| |points|distance|93.48|95.96|-3%|54.92|56.38|-3%|382961957|382961957| 0%| |points|nearest 10|0.10|0.09|11%|9687.24|8755.72|11%|60844404|60844404| 0%| |points|sort|43.12|43.04| 0%|43.88|43.80| 0%|221118844|221118844| 0%| |shapes|box|66.02|52.23|26%|67.18|53.15|26%|221118844|221118844| 0%| |shapes|polyRussia|11.57|9.85|17%|3.30|2.81|17%|3508846|3508846| 0%| |shapes|poly 10|54.98|47.08|17%|34.77|29.77|17%|355809475|355809475| 0%| |shapes|polyMedium|5.31|4.52|17%|65.01|55.39|17%|2693559|2693559| 0%| |shapes|box|66.02|52.23|26%|67.18|53.15|26%|221118844|221118844| 0%| |geo3d|box|79.17|66.22|20%|80.56|67.38|20%|221118844|221118844| 0%| |geo3d|polyRussia|0.95|0.90| 5%|0.27|0.26| 5%|3508671|3508671| 0%| |geo3d|poly 10|77.26|57.16|35%|48.85|36.14|35%|355855227|355855227| 0%| |geo3d|polyMedium|0.95|0.69|37%|11.62|8.50|37%|2693545|2693545| 0%| |geo3d|box|79.17|66.22|20%|80.56|67.38|20%|221118844|221118844| 0%| |geo3d|distance|95.35|76.17|25%|55.96|44.70|25%|383371884|383371884| 0%| was (Author: ivera): I run this approach locally. It helps as well in the case of Geo3D (3 dimensions case) quite a bit. I tried different approaches to try to make indexation faster but so far no luck: ||Approach||Index time (sec)||Index time (sec)||Force merge time (sec)||Force merge time (sec)||Index size (GB)||Index size (GB)||Reader heap (MB)||Reader heap (MB)|| || ||Dev||Base||Diff||Dev||Base||diff||Dev||Base||Diff||Dev||Base||Diff|| |points|181.1s|124.4s|46%|76.9s|53.5s|44%|0.55|0.55|-0%|1.57|1.57|0%| |shapes|327.4s|215.4s|52%|168.9s|120.2s|40%|1.28|1.29|-1%|1.62|1.61|0%| |geo3d|211.9s|154.7s|37%|94.3s|66.4s|42%|0.75|0.75|-0%|1.58|1.58|0%| ||Approach||Shape||M hits/sec||M hits/sec|| ||QPS ||QPS || ||Hit count ||Hit count|| || || ||Dev||Base ||Diff||Dev||Base||Diff||Dev||Base||Diff|| |points|box|94.34|94.84|-1%|95.99|96.50|-1%|221118844|221118844| 0%| |points|polyRussia|20.07|20.46|-2%|5.72|5.83|-2%|3508846|3508846| 0%| |points|poly 10|88.64|87.56| 1%|56.05|55.37| 1%|355809475|355809475| 0%| |points|polyMedium|10.47|10.54|-1%|128.26|129.15|-1%|2693559|2693559| 0%| |points|box|94.34|94.84|-1%|95.99|96.50|-1%|221118844|221118844| 0%| |points|distance|93.48|95.96|-3%|54.92|56.38|-3%|382961957|382961957| 0%| |points|nearest 10|0.10|0.09|11%|9687.24|8755.72|11%|60844404|60844404| 0%| |points|sort|43.12|43.04| 0%|43.88|43.80| 0%|221118844|221118844| 0%| |shapes|box|66.02|52.23|26%|67.18|53.15|26%|221118844|221118844| 0%| |shapes|polyRussia|11.57|9.85|17%|3.30|2.81|17%|3508846|3508846| 0%| |shapes|poly 10|54.98|47.08|17%|34.77|29.77|17%|355809475|355809475| 0%| |shapes|polyMedium|5.31|4.52|17%|65.01|55.39|17%|2693559|2693559| 0%| |shapes|box|66.02|52.23|26%|67.18|53.15|26%|221118844|221118844| 0%| |geo3d|box|79.17|66.22|20%|80.56|67.38|20%|221118844|221118844| 0%| |geo3d|polyRussia|0.95|0.90| 5%|0.27|0.26| 5%|3508671|3508671| 0%| |geo3d|poly 10|77.26|57.16|35%|48.85|36.14|35%|355855227|355855227| 0%| |geo3d|polyMedium|0.95|0.69|37%|11.62|8.50|37%|2693545|2693545| 0%| |geo3d|box|79.17|66.22|20%|80.56|67.38|20%|221118844|221118844| 0%| |geo3d|distance|95.35|76.17|25%|55.96|44.70|25%|383371884|383371884| 0%| > BKDWriter could make splitting decisions based on the actual range of values > > > Key: LUCENE-8928 > URL: https://issues.apache.org/jira/browse/LUCENE-8928 > Project: Lucene - Core >
[jira] [Commented] (LUCENE-8928) BKDWriter could make splitting decisions based on the actual range of values
[ https://issues.apache.org/jira/browse/LUCENE-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890214#comment-16890214 ] Ignacio Vera commented on LUCENE-8928: -- 1-dimensional ranges (2D in total). In fact the shapes benchmark above is a 2-dimensional rage (4D in total). > BKDWriter could make splitting decisions based on the actual range of values > > > Key: LUCENE-8928 > URL: https://issues.apache.org/jira/browse/LUCENE-8928 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > Currently BKDWriter assumes that splitting on one dimension has no effect on > values in other dimensions. While this may be ok for geo points, this is > usually not true for ranges (or geo shapes, which are ranges too). Maybe we > could get better indexing by re-computing the range of values on each > dimension before making the choice of the split dimension? -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8928) BKDWriter could make splitting decisions based on the actual range of values
[ https://issues.apache.org/jira/browse/LUCENE-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890194#comment-16890194 ] Ignacio Vera commented on LUCENE-8928: -- I would expect some increase on QPS for 2D when there is correlation between the dimensions, e.g range fields. > BKDWriter could make splitting decisions based on the actual range of values > > > Key: LUCENE-8928 > URL: https://issues.apache.org/jira/browse/LUCENE-8928 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > Currently BKDWriter assumes that splitting on one dimension has no effect on > values in other dimensions. While this may be ok for geo points, this is > usually not true for ranges (or geo shapes, which are ranges too). Maybe we > could get better indexing by re-computing the range of values on each > dimension before making the choice of the split dimension? -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8928) BKDWriter could make splitting decisions based on the actual range of values
[ https://issues.apache.org/jira/browse/LUCENE-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890173#comment-16890173 ] Ignacio Vera commented on LUCENE-8928: -- I run this approach locally. It helps as well in the case of Geo3D (3 dimensions case) quite a bit. I tried different approaches to try to make indexation faster but so far no luck: ||Approach||Index time (sec)||Index time (sec)||Force merge time (sec)||Force merge time (sec)||Index size (GB)||Index size (GB)||Reader heap (MB)||Reader heap (MB)|| || ||Dev||Base||Diff||Dev||Base||diff||Dev||Base||Diff||Dev||Base||Diff|| |points|181.1s|124.4s|46%|76.9s|53.5s|44%|0.55|0.55|-0%|1.57|1.57|0%| |shapes|327.4s|215.4s|52%|168.9s|120.2s|40%|1.28|1.29|-1%|1.62|1.61|0%| |geo3d|211.9s|154.7s|37%|94.3s|66.4s|42%|0.75|0.75|-0%|1.58|1.58|0%| ||Approach||Shape||M hits/sec||M hits/sec|| ||QPS ||QPS || ||Hit count ||Hit count|| || || ||Dev||Base ||Diff||Dev||Base||Diff||Dev||Base||Diff|| |points|box|94.34|94.84|-1%|95.99|96.50|-1%|221118844|221118844| 0%| |points|polyRussia|20.07|20.46|-2%|5.72|5.83|-2%|3508846|3508846| 0%| |points|poly 10|88.64|87.56| 1%|56.05|55.37| 1%|355809475|355809475| 0%| |points|polyMedium|10.47|10.54|-1%|128.26|129.15|-1%|2693559|2693559| 0%| |points|box|94.34|94.84|-1%|95.99|96.50|-1%|221118844|221118844| 0%| |points|distance|93.48|95.96|-3%|54.92|56.38|-3%|382961957|382961957| 0%| |points|nearest 10|0.10|0.09|11%|9687.24|8755.72|11%|60844404|60844404| 0%| |points|sort|43.12|43.04| 0%|43.88|43.80| 0%|221118844|221118844| 0%| |shapes|box|66.02|52.23|26%|67.18|53.15|26%|221118844|221118844| 0%| |shapes|polyRussia|11.57|9.85|17%|3.30|2.81|17%|3508846|3508846| 0%| |shapes|poly 10|54.98|47.08|17%|34.77|29.77|17%|355809475|355809475| 0%| |shapes|polyMedium|5.31|4.52|17%|65.01|55.39|17%|2693559|2693559| 0%| |shapes|box|66.02|52.23|26%|67.18|53.15|26%|221118844|221118844| 0%| |geo3d|box|79.17|66.22|20%|80.56|67.38|20%|221118844|221118844| 0%| |geo3d|polyRussia|0.95|0.90| 5%|0.27|0.26| 5%|3508671|3508671| 0%| |geo3d|poly 10|77.26|57.16|35%|48.85|36.14|35%|355855227|355855227| 0%| |geo3d|polyMedium|0.95|0.69|37%|11.62|8.50|37%|2693545|2693545| 0%| |geo3d|box|79.17|66.22|20%|80.56|67.38|20%|221118844|221118844| 0%| |geo3d|distance|95.35|76.17|25%|55.96|44.70|25%|383371884|383371884| 0%| > BKDWriter could make splitting decisions based on the actual range of values > > > Key: LUCENE-8928 > URL: https://issues.apache.org/jira/browse/LUCENE-8928 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > Currently BKDWriter assumes that splitting on one dimension has no effect on > values in other dimensions. While this may be ok for geo points, this is > usually not true for ranges (or geo shapes, which are ranges too). Maybe we > could get better indexing by re-computing the range of values on each > dimension before making the choice of the split dimension? -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8913) Reproducing failure in various TestLatLon* equals/hashcode tests
[ https://issues.apache.org/jira/browse/LUCENE-8913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera resolved LUCENE-8913. -- Resolution: Fixed Assignee: Ignacio Vera Fix Version/s: 8.3 8.2 master (9.0) > Reproducing failure in various TestLatLon* equals/hashcode tests > - > > Key: LUCENE-8913 > URL: https://issues.apache.org/jira/browse/LUCENE-8913 > Project: Lucene - Core > Issue Type: Bug > Components: core/other >Affects Versions: master (9.0) >Reporter: Gus Heck >Assignee: Ignacio Vera >Priority: Major > Fix For: master (9.0), 8.2, 8.3 > > > Bumped into this while running tests locally > ant clean test -Dtests.seed=41D0C5A80C823307 -Dtests.slow=true > -Dtests.badapples=true -Dtests.locale=es-CL > -Dtests.timezone=Pacific/Rarotonga -Dtests.asserts=true > -Dtests.file.encoding=UTF-8 > reliably produces: > > {code:java} > Tests with failures [seed: 41D0C5A80C823307]: >[junit4] - > org.apache.lucene.document.TestLatLonPointShapeQueries.testBoxQueryEqualsAndHashcode >[junit4] - > org.apache.lucene.document.TestLatLonMultiPointShapeQueries.testBoxQueryEqualsAndHashcode >[junit4] - > org.apache.lucene.document.TestLatLonLineShapeQueries.testBoxQueryEqualsAndHashcode >[junit4] - > org.apache.lucene.document.TestLatLonPolygonShapeQueries.testBoxQueryEqualsAndHashcode >[junit4] - > org.apache.lucene.document.TestLatLonMultiPolygonShapeQueries.testBoxQueryEqualsAndHashcode >[junit4] - > org.apache.lucene.document.TestLatLonMultiLineShapeQueries.testBoxQueryEqualsAndHashcode{code} > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8913) Reproducing failure in various TestLatLon* equals/hashcode tests
[ https://issues.apache.org/jira/browse/LUCENE-8913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887627#comment-16887627 ] Ignacio Vera commented on LUCENE-8913: -- I will fix this one as it is a trivial test bug > Reproducing failure in various TestLatLon* equals/hashcode tests > - > > Key: LUCENE-8913 > URL: https://issues.apache.org/jira/browse/LUCENE-8913 > Project: Lucene - Core > Issue Type: Bug > Components: core/other >Affects Versions: master (9.0) >Reporter: Gus Heck >Priority: Major > > Bumped into this while running tests locally > ant clean test -Dtests.seed=41D0C5A80C823307 -Dtests.slow=true > -Dtests.badapples=true -Dtests.locale=es-CL > -Dtests.timezone=Pacific/Rarotonga -Dtests.asserts=true > -Dtests.file.encoding=UTF-8 > reliably produces: > > {code:java} > Tests with failures [seed: 41D0C5A80C823307]: >[junit4] - > org.apache.lucene.document.TestLatLonPointShapeQueries.testBoxQueryEqualsAndHashcode >[junit4] - > org.apache.lucene.document.TestLatLonMultiPointShapeQueries.testBoxQueryEqualsAndHashcode >[junit4] - > org.apache.lucene.document.TestLatLonLineShapeQueries.testBoxQueryEqualsAndHashcode >[junit4] - > org.apache.lucene.document.TestLatLonPolygonShapeQueries.testBoxQueryEqualsAndHashcode >[junit4] - > org.apache.lucene.document.TestLatLonMultiPolygonShapeQueries.testBoxQueryEqualsAndHashcode >[junit4] - > org.apache.lucene.document.TestLatLonMultiLineShapeQueries.testBoxQueryEqualsAndHashcode{code} > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8914) Small improvement in FloatPointNearestNeighbor
[ https://issues.apache.org/jira/browse/LUCENE-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera resolved LUCENE-8914. -- Resolution: Fixed Assignee: Ignacio Vera Fix Version/s: 8.3 master (9.0) > Small improvement in FloatPointNearestNeighbor > -- > > Key: LUCENE-8914 > URL: https://issues.apache.org/jira/browse/LUCENE-8914 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Assignee: Ignacio Vera >Priority: Minor > Fix For: master (9.0), 8.3 > > Time Spent: 20m > Remaining Estimate: 0h > > Currently the logic to visit inner nodes of the BKD tree in > FloatPointNearestNeighbor is in the custom tree traversing logic instead of > in the IntersectVisitor. This approach is missing the improvement added on > LUCENE-7862 which my experiments shows that for a high number of dimensions > can give a performance improvements of around 10%. > This change proposes to move the logic for discarding inner modes to the > IntersectVisitor. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8923) Release procedure does not add new version in CHANGES.txt in master
[ https://issues.apache.org/jira/browse/LUCENE-8923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886967#comment-16886967 ] Ignacio Vera commented on LUCENE-8923: -- I added the entries. I leave the issue open so we can clarify if the current procedure needs to be updated to add this entries, > Release procedure does not add new version in CHANGES.txt in master > --- > > Key: LUCENE-8923 > URL: https://issues.apache.org/jira/browse/LUCENE-8923 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Priority: Minor > Attachments: LUCENE-8923.patch > > > This issue is just to track something that maybe missing in the release > procedure. It currently adds a new version on CHANGES.txt in the minor > version branch but it does not do it in master. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8923) Release procedure does not add new version in CHANGES.txt in master
[ https://issues.apache.org/jira/browse/LUCENE-8923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera updated LUCENE-8923: - Attachment: LUCENE-8923.patch Status: Open (was: Open) In the meanwhile I propose to create them manually. Attached a patch. [~tomoko] I am moving you issues to Lucene 8.3 in master, let me know if it is correct. > Release procedure does not add new version in CHANGES.txt in master > --- > > Key: LUCENE-8923 > URL: https://issues.apache.org/jira/browse/LUCENE-8923 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Priority: Minor > Attachments: LUCENE-8923.patch > > > This issue is just to track something that maybe missing in the release > procedure. It currently adds a new version on CHANGES.txt in the minor > version branch but it does not do it in master. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8923) Release procedure does not add new version in CHANGES.txt in master
Ignacio Vera created LUCENE-8923: Summary: Release procedure does not add new version in CHANGES.txt in master Key: LUCENE-8923 URL: https://issues.apache.org/jira/browse/LUCENE-8923 Project: Lucene - Core Issue Type: Bug Reporter: Ignacio Vera This issue is just to track something that maybe missing in the release procedure. It currently adds a new version on CHANGES.txt in the minor version branch but it does not do it in master. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8911) Backport LUCENE-8778 (improved analysis SPI name handling) to 8.x
[ https://issues.apache.org/jira/browse/LUCENE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886058#comment-16886058 ] Ignacio Vera commented on LUCENE-8911: -- I have my doubts for this change to make 8.2. Elasticsearch CI has reported a failure that I think it is related with. this change: {code:java} ant test -Dtestcase=TestFactories -Dtests.method=test -Dtests.seed=FEA8D71DFC111060 -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=zh-TW -Dtests.timezone=Europe/Guernsey -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1{code} > Backport LUCENE-8778 (improved analysis SPI name handling) to 8.x > - > > Key: LUCENE-8911 > URL: https://issues.apache.org/jira/browse/LUCENE-8911 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > In LUCENE-8907 I reverted LUCENE-8778 from the 8x branch. > Can we backport it to 8x branch again, with transparent backwards > compatibility (by emulating the factory loading method of Lucene 8.1)? > I am not so sure about it would be better or not to backport the changes, > however, maybe it is good for Solr to have SOLR-13593 without waiting for > release 9.0. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8894) Add APIs to tokenizer/charfilter/tokenfilter factories to get their SPI names from concrete classes
[ https://issues.apache.org/jira/browse/LUCENE-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886022#comment-16886022 ] Ignacio Vera commented on LUCENE-8894: -- In master the entry in CHANGES.txt is in Lucene 9.0.0 bit from branch_8x the entry is under Lucene 8.3.0, is that correct? > Add APIs to tokenizer/charfilter/tokenfilter factories to get their SPI names > from concrete classes > --- > > Key: LUCENE-8894 > URL: https://issues.apache.org/jira/browse/LUCENE-8894 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Minor > Fix For: master (9.0), 8.2 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently, reflection tricks are needed to obtain SPI name (this is now > stored in static NAME fields in each factory class) from a concrete factory > class. While it is easy to implement that logic, it would be much better to > provide unified APIs to get SPI name from a factory class. In other words, > the APIs would provide "inverse" operation of {{lookupClass(String)}} method. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8914) Small improvement in FloatPointNearestNeighbor
Ignacio Vera created LUCENE-8914: Summary: Small improvement in FloatPointNearestNeighbor Key: LUCENE-8914 URL: https://issues.apache.org/jira/browse/LUCENE-8914 Project: Lucene - Core Issue Type: Improvement Reporter: Ignacio Vera Currently the logic to visit inner nodes of the BKD tree in FloatPointNearestNeighbor is in the custom tree traversing logic instead of in the IntersectVisitor. This approach is missing the improvement added on LUCENE-7862 which my experiments shows that for a high number of dimensions can give a performance improvements of around 10%. This change proposes to move the logic for discarding inner modes to the IntersectVisitor. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8898) TestRamUsageEstimator.testMap failures
[ https://issues.apache.org/jira/browse/LUCENE-8898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16878593#comment-16878593 ] Ignacio Vera commented on LUCENE-8898: -- [~ab]Can we resolve this issue? > TestRamUsageEstimator.testMap failures > -- > > Key: LUCENE-8898 > URL: https://issues.apache.org/jira/browse/LUCENE-8898 > Project: Lucene - Core > Issue Type: Bug >Reporter: Adrien Grand >Assignee: Andrzej Bialecki >Priority: Blocker > Fix For: 8.2 > > > Here is an example failure: > {noformat} > 4 tests failed. > FAILED: org.apache.lucene.util.TestRamUsageEstimator.testMap > Error Message: > expected:<25152.0> but was:<30184.0> > Stack Trace: > java.lang.AssertionError: expected:<25152.0> but was:<30184.0> > at > __randomizedtesting.SeedInfo.seed([ED7055A14021EA69:CD56E1725ADAF91B]:0) > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:553) > at org.junit.Assert.assertEquals(Assert.java:683) > at > org.apache.lucene.util.TestRamUsageEstimator.testMap(TestRamUsageEstimator.java:136) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988) > at > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) > at > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41) > at > com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) > at > com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) > at > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) > at > org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapte
[jira] [Updated] (LUCENE-8903) Add LatLonShape point query
[ https://issues.apache.org/jira/browse/LUCENE-8903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera updated LUCENE-8903: - Description: Adds a query to LatLonShape that filters by a provided point. (was: Add a query to LatLonShape that filters by a provided point. ) > Add LatLonShape point query > --- > > Key: LUCENE-8903 > URL: https://issues.apache.org/jira/browse/LUCENE-8903 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Ignacio Vera >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > Adds a query to LatLonShape that filters by a provided point. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8903) Add LatLonShape point query
Ignacio Vera created LUCENE-8903: Summary: Add LatLonShape point query Key: LUCENE-8903 URL: https://issues.apache.org/jira/browse/LUCENE-8903 Project: Lucene - Core Issue Type: New Feature Reporter: Ignacio Vera Add a query to LatLonShape that filters by a provided point. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8888) Improve distribution of points with data dimension in BKD tree leaves
[ https://issues.apache.org/jira/browse/LUCENE-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera resolved LUCENE-. -- Resolution: Fixed Assignee: Ignacio Vera Fix Version/s: 8.2 master (9.0) > Improve distribution of points with data dimension in BKD tree leaves > - > > Key: LUCENE- > URL: https://issues.apache.org/jira/browse/LUCENE- > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Assignee: Ignacio Vera >Priority: Major > Fix For: master (9.0), 8.2 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > In LUCENE-8688 it was introduce a new storing strategy for leaves contains > duplicated points. This works well with indexed dimension as the process of > partition the space and the final sorting of leaves groups points with equal > indexed dimensions. > This is not the case all the time if the point contain data dimensions. It > might happen that if two points have the same indexed dimensions but > different data dimensions, the distribution on the leaves is not the most > optimal. > A good example is if a user tries to index a bounding box using LatLonShape. > The resulting tessellation of a bounding box is two triangles with the same > indexed dimensions but different data dimensions. If there are two documents > indexing the same bounding box, the result in the leaf is the triangles from > one document followed by the triangles of the second document. This is > because the current sorting/selection algorithms use one indexed dimension > and tie-break on the > docID. > The most optimal distribution in the case above is two group together the > equal triangles. Therefore what it is propose here is to update the > selection/ sorting algorithms to use the data dimensions when they exist as > tie-breakers before using the docID. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8896) Override default implementation of IntersectVisitor#visit(DocIDSetBuilder, byte[]) for several queries
[ https://issues.apache.org/jira/browse/LUCENE-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera resolved LUCENE-8896. -- Resolution: Fixed Assignee: Ignacio Vera Fix Version/s: 8.2 master (9.0) > Override default implementation of IntersectVisitor#visit(DocIDSetBuilder, > byte[]) for several queries > -- > > Key: LUCENE-8896 > URL: https://issues.apache.org/jira/browse/LUCENE-8896 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Assignee: Ignacio Vera >Priority: Major > Fix For: master (9.0), 8.2 > > Time Spent: 40m > Remaining Estimate: 0h > > In LUCENE-8885, it was introduced a new method on the {{IntersectsVisitor}} > interface. It contains a default implementation but queries can override it > and therefore benefit when there are several documents on a leaf associated > to the same point. > In this issue the following queries are proposed to override the default > implementation > * LatLonShapeQuery > * RangeFieldQuery > * LatLonPointInPolygonQuery > * LatLonPointDistanceQuery > * PointRangeQuery > * PointInSetQuery -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8896) Override default implementation of IntersectVisitor#visit(DocIDSetBuilder, byte[]) for several queries
[ https://issues.apache.org/jira/browse/LUCENE-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876373#comment-16876373 ] Ignacio Vera commented on LUCENE-8896: -- [~atris] I am not sure what you mean, I have opened a PR with the change, hope it make sense to you. > Override default implementation of IntersectVisitor#visit(DocIDSetBuilder, > byte[]) for several queries > -- > > Key: LUCENE-8896 > URL: https://issues.apache.org/jira/browse/LUCENE-8896 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In LUCENE-8885, it was introduced a new method on the {{IntersectsVisitor}} > interface. It contains a default implementation but queries can override it > and therefore benefit when there are several documents on a leaf associated > to the same point. > In this issue the following queries are proposed to override the default > implementation > * LatLonShapeQuery > * RangeFieldQuery > * LatLonPointInPolygonQuery > * LatLonPointDistanceQuery > * PointRangeQuery > * PointInSetQuery -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8896) Override default implementation of IntersectVisitor#visit(DocIDSetBuilder, byte[]) for several queries
Ignacio Vera created LUCENE-8896: Summary: Override default implementation of IntersectVisitor#visit(DocIDSetBuilder, byte[]) for several queries Key: LUCENE-8896 URL: https://issues.apache.org/jira/browse/LUCENE-8896 Project: Lucene - Core Issue Type: Improvement Reporter: Ignacio Vera In LUCENE-8885, it was introduced a new method on the {{IntersectsVisitor}} interface. It contains a default implementation but queries can override it and therefore benefit when there are several documents on a leaf associated to the same point. In this issue the following queries are proposed to override the default implementation * LatLonShapeQuery * RangeFieldQuery * LatLonPointInPolygonQuery * LatLonPointDistanceQuery * PointRangeQuery * PointInSetQuery -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8885) Optimise BKD reader by exploiting cardinality information stored on leaves
[ https://issues.apache.org/jira/browse/LUCENE-8885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera resolved LUCENE-8885. -- Resolution: Fixed Assignee: Ignacio Vera Fix Version/s: 8.2 master (9.0) > Optimise BKD reader by exploiting cardinality information stored on leaves > -- > > Key: LUCENE-8885 > URL: https://issues.apache.org/jira/browse/LUCENE-8885 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Assignee: Ignacio Vera >Priority: Major > Fix For: master (9.0), 8.2 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > In LUCENE-8688 it was introduce a new storing strategy for leaves contains > duplicated points. In such case the points are stored together with the > cardinality. We still call the IntersectVisitor once per document therefore > we are checking many times the same point agains the query. The idea is to > check the point once and then add all the documents. > The API of the IntersectVisitor does not allow that, and therefore to exploit > that property we need to either change the API or extend it. Here are the > possibilities I can think of: > 1) Modify the API by replacing the method IntersectVisitor#visit(byte[], int) > by the following method: > {code:java} > /** Called for leaf cells that intersects the leaf to test if the point > matches to the query > * In case it matches, the implementor must call {@link > IntersectVisitor#visit(int)} with the > * documents associated with this point are visited */ > boolean matches(byte[] packedValue) throws IOException; > {code} > This will allow the BKD reader to check if a point matches the query and if > true then Coll the method IntersectVisitor#visit(int) for all documents > associated with that point. > The drawback of this approach is backwards compatibility and the need to > update all classes implement this interface. > 2) Extends the API by adding a new default method in the IntersectVisitor > interface: > {code:java} > /** Called for documents in a leaf cell that crosses the query. The consumer > * should scrutinize the packedValue to decide whether to accept it. If > accepted it should > * consider only the {@code numberDocs} documents starting at {@code > offset} In the 1D case, > * values are visited in increasing order, and in the case of ties, in > increasing > * docID order. */ > default void visit(int[] docID, int offset, int numberDocs, byte[] > packedValue) throws IOException { > for ( int i =offset; i < offset + numberDocs; i++) { > visit(docID[i], packedValue); > } > } > {code} > The merit of this approach is that is backwards compatible and it is up to > the implementors to override this method and get the benefits for this > optimisation.The biggest downside is that it assumes that the codec has doc > IDs available in an int[] slice as opposed to streaming them from disk > directly to the IntersectVisitor for instance as [~jpountz] noted. > Maybe there are more options I did not think about so looking forward to > hearing opining if we should do this change at all and if so, how to approach > it. My +1 goes to 1). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8831) LatLonShapeBoundingBoxQuery hashcode is wrong
[ https://issues.apache.org/jira/browse/LUCENE-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera updated LUCENE-8831: - Fix Version/s: 8.1.2 > LatLonShapeBoundingBoxQuery hashcode is wrong > -- > > Key: LUCENE-8831 > URL: https://issues.apache.org/jira/browse/LUCENE-8831 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Assignee: Ignacio Vera >Priority: Major > Fix For: master (9.0), 8.2, 8.1.2 > > Time Spent: 20m > Remaining Estimate: 0h > > Currently the hashcode implementation for LatLonShapeBoundingBoxQuery returns > always a different value. Therefore the query cannot be cached. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8831) LatLonShapeBoundingBoxQuery hashcode is wrong
[ https://issues.apache.org/jira/browse/LUCENE-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874910#comment-16874910 ] Ignacio Vera commented on LUCENE-8831: -- I was thinking the same > LatLonShapeBoundingBoxQuery hashcode is wrong > -- > > Key: LUCENE-8831 > URL: https://issues.apache.org/jira/browse/LUCENE-8831 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Assignee: Ignacio Vera >Priority: Major > Fix For: master (9.0), 8.2 > > Time Spent: 20m > Remaining Estimate: 0h > > Currently the hashcode implementation for LatLonShapeBoundingBoxQuery returns > always a different value. Therefore the query cannot be cached. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8886) TestMutablePointsReaderUtils not doing what it is expected
[ https://issues.apache.org/jira/browse/LUCENE-8886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera updated LUCENE-8886: - Resolution: Fixed Assignee: Ignacio Vera Fix Version/s: 8.2 master (9.0) Status: Resolved (was: Patch Available) > TestMutablePointsReaderUtils not doing what it is expected > -- > > Key: LUCENE-8886 > URL: https://issues.apache.org/jira/browse/LUCENE-8886 > Project: Lucene - Core > Issue Type: Test >Reporter: Ignacio Vera >Assignee: Ignacio Vera >Priority: Major > Fix For: master (9.0), 8.2 > > Attachments: LUCENE-8886.patch > > > The TestMutablePointsReaderUtils is actually not doing what it is expected. > The problem is that we are constructing Point objects but not copying the > bytes provided so is always working with arrays with 0 values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8888) Improve distribution of points with data dimension in BKD tree leaves
Ignacio Vera created LUCENE-: Summary: Improve distribution of points with data dimension in BKD tree leaves Key: LUCENE- URL: https://issues.apache.org/jira/browse/LUCENE- Project: Lucene - Core Issue Type: Improvement Reporter: Ignacio Vera In LUCENE-8688 it was introduce a new storing strategy for leaves contains duplicated points. This works well with indexed dimension as the process of partition the space and the final sorting of leaves groups points with equal indexed dimensions. This is not the case all the time if the point contain data dimensions. It might happen that if two points have the same indexed dimensions but different data dimensions, the distribution on the leaves is not the most optimal. A good example is if a user tries to index a bounding box using LatLonShape. The resulting tessellation of a bounding box is two triangles with the same indexed dimensions but different data dimensions. If there are two documents indexing the same bounding box, the result in the leaf is the triangles from one document followed by the triangles of the second document. This is because the current sorting/selection algorithms use one indexed dimension and tie-break on the docID. The most optimal distribution in the case above is two group together the equal triangles. Therefore what it is propose here is to update the selection/ sorting algorithms to use the data dimensions when they exist as tie-breakers before using the docID. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8885) Optimise BKD reader by exploiting cardinality information stored on leaves
[ https://issues.apache.org/jira/browse/LUCENE-8885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873860#comment-16873860 ] Ignacio Vera commented on LUCENE-8885: -- I have opened a PR with [~jpountz] suggestion. > Optimise BKD reader by exploiting cardinality information stored on leaves > -- > > Key: LUCENE-8885 > URL: https://issues.apache.org/jira/browse/LUCENE-8885 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In LUCENE-8688 it was introduce a new storing strategy for leaves contains > duplicated points. In such case the points are stored together with the > cardinality. We still call the IntersectVisitor once per document therefore > we are checking many times the same point agains the query. The idea is to > check the point once and then add all the documents. > The API of the IntersectVisitor does not allow that, and therefore to exploit > that property we need to either change the API or extend it. Here are the > possibilities I can think of: > 1) Modify the API by replacing the method IntersectVisitor#visit(byte[], int) > by the following method: > {code:java} > /** Called for leaf cells that intersects the leaf to test if the point > matches to the query > * In case it matches, the implementor must call {@link > IntersectVisitor#visit(int)} with the > * documents associated with this point are visited */ > boolean matches(byte[] packedValue) throws IOException; > {code} > This will allow the BKD reader to check if a point matches the query and if > true then Coll the method IntersectVisitor#visit(int) for all documents > associated with that point. > The drawback of this approach is backwards compatibility and the need to > update all classes implement this interface. > 2) Extends the API by adding a new default method in the IntersectVisitor > interface: > {code:java} > /** Called for documents in a leaf cell that crosses the query. The consumer > * should scrutinize the packedValue to decide whether to accept it. If > accepted it should > * consider only the {@code numberDocs} documents starting at {@code > offset} In the 1D case, > * values are visited in increasing order, and in the case of ties, in > increasing > * docID order. */ > default void visit(int[] docID, int offset, int numberDocs, byte[] > packedValue) throws IOException { > for ( int i =offset; i < offset + numberDocs; i++) { > visit(docID[i], packedValue); > } > } > {code} > The merit of this approach is that is backwards compatible and it is up to > the implementors to override this method and get the benefits for this > optimisation.The biggest downside is that it assumes that the codec has doc > IDs available in an int[] slice as opposed to streaming them from disk > directly to the IntersectVisitor for instance as [~jpountz] noted. > Maybe there are more options I did not think about so looking forward to > hearing opining if we should do this change at all and if so, how to approach > it. My +1 goes to 1). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8886) TestMutablePointsReaderUtils not doing what it is expected
[ https://issues.apache.org/jira/browse/LUCENE-8886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera updated LUCENE-8886: - Attachment: LUCENE-8886.patch Status: Open (was: Open) > TestMutablePointsReaderUtils not doing what it is expected > -- > > Key: LUCENE-8886 > URL: https://issues.apache.org/jira/browse/LUCENE-8886 > Project: Lucene - Core > Issue Type: Test >Reporter: Ignacio Vera >Priority: Major > Attachments: LUCENE-8886.patch > > > The TestMutablePointsReaderUtils is actually not doing what it is expected. > The problem is that we are constructing Point objects but not copying the > bytes provided so is always working with arrays with 0 values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8886) TestMutablePointsReaderUtils not doing what it is expected
[ https://issues.apache.org/jira/browse/LUCENE-8886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera updated LUCENE-8886: - Status: Patch Available (was: Open) > TestMutablePointsReaderUtils not doing what it is expected > -- > > Key: LUCENE-8886 > URL: https://issues.apache.org/jira/browse/LUCENE-8886 > Project: Lucene - Core > Issue Type: Test >Reporter: Ignacio Vera >Priority: Major > Attachments: LUCENE-8886.patch > > > The TestMutablePointsReaderUtils is actually not doing what it is expected. > The problem is that we are constructing Point objects but not copying the > bytes provided so is always working with arrays with 0 values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8886) TestMutablePointsReaderUtils not doing what it is expected
[ https://issues.apache.org/jira/browse/LUCENE-8886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera updated LUCENE-8886: - Issue Type: Test (was: Improvement) > TestMutablePointsReaderUtils not doing what it is expected > -- > > Key: LUCENE-8886 > URL: https://issues.apache.org/jira/browse/LUCENE-8886 > Project: Lucene - Core > Issue Type: Test >Reporter: Ignacio Vera >Priority: Major > > The TestMutablePointsReaderUtils is actually not doing what it is expected. > The problem is that we are constructing Point objects but not copying the > bytes provided so is always working with arrays with 0 values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8886) TestMutablePointsReaderUtils not doing what it is expected
Ignacio Vera created LUCENE-8886: Summary: TestMutablePointsReaderUtils not doing what it is expected Key: LUCENE-8886 URL: https://issues.apache.org/jira/browse/LUCENE-8886 Project: Lucene - Core Issue Type: Improvement Reporter: Ignacio Vera The TestMutablePointsReaderUtils is actually not doing what it is expected. The problem is that we are constructing Point objects but not copying the bytes provided so is always working with arrays with 0 values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8885) Optimise BKD reader by exploiting cardinality information stored on leaves
[ https://issues.apache.org/jira/browse/LUCENE-8885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873395#comment-16873395 ] Ignacio Vera commented on LUCENE-8885: -- With 1 you would have to track the last visited point and associate that with the corresponding call to visit(int docID) which is not nice and relies on the behaviour of the reader. 2 is more practical and ;less intrusive and using a DocIdSetIterator will make the API cleaner ++ > Optimise BKD reader by exploiting cardinality information stored on leaves > -- > > Key: LUCENE-8885 > URL: https://issues.apache.org/jira/browse/LUCENE-8885 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Priority: Major > > In LUCENE-8688 it was introduce a new storing strategy for leaves contains > duplicated points. In such case the points are stored together with the > cardinality. We still call the IntersectVisitor once per document therefore > we are checking many times the same point agains the query. The idea is to > check the point once and then add all the documents. > The API of the IntersectVisitor does not allow that, and therefore to exploit > that property we need to either change the API or extend it. Here are the > possibilities I can think of: > 1) Modify the API by replacing the method IntersectVisitor#visit(byte[], int) > by the following method: > {code:java} > /** Called for leaf cells that intersects the leaf to test if the point > matches to the query > * In case it matches, the implementor must call {@link > IntersectVisitor#visit(int)} with the > * documents associated with this point are visited */ > boolean matches(byte[] packedValue) throws IOException; > {code} > This will allow the BKD reader to check if a point matches the query and if > true then Coll the method IntersectVisitor#visit(int) for all documents > associated with that point. > The drawback of this approach is backwards compatibility and the need to > update all classes implement this interface. > 2) Extends the API by adding a new default method in the IntersectVisitor > interface: > {code:java} > /** Called for documents in a leaf cell that crosses the query. The consumer > * should scrutinize the packedValue to decide whether to accept it. If > accepted it should > * consider only the {@code numberDocs} documents starting at {@code > offset} In the 1D case, > * values are visited in increasing order, and in the case of ties, in > increasing > * docID order. */ > default void visit(int[] docID, int offset, int numberDocs, byte[] > packedValue) throws IOException { > for ( int i =offset; i < offset + numberDocs; i++) { > visit(docID[i], packedValue); > } > } > {code} > The merit of this approach is that is backwards compatible and it is up to > the implementors to override this method and get the benefits for this > optimisation.The biggest downside is that it assumes that the codec has doc > IDs available in an int[] slice as opposed to streaming them from disk > directly to the IntersectVisitor for instance as [~jpountz] noted. > Maybe there are more options I did not think about so looking forward to > hearing opining if we should do this change at all and if so, how to approach > it. My +1 goes to 1). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8867) Optimise BKD tree for low cardinality leaves
[ https://issues.apache.org/jira/browse/LUCENE-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873310#comment-16873310 ] Ignacio Vera commented on LUCENE-8867: -- I have opened https://issues.apache.org/jira/browse/LUCENE-8885 to track the second optimisation. > Optimise BKD tree for low cardinality leaves > > > Key: LUCENE-8867 > URL: https://issues.apache.org/jira/browse/LUCENE-8867 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently if a leaf on the BKD tree contains only few values, then the leaf > is treated the same way as it all values are different. It many cases it can > be much more efficient to store the distinct values with the cardinality. > In addition, in this case the method IntersectVisitor#visit(docId, byte[]) is > called n times with the same byte array but different docID. This issue > proposes to add a new method to the interface that accepts an array of docs > so it can be override by implementors and gain search performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8885) Optimise BKD reader by exploiting cardinality information stored on leaves
Ignacio Vera created LUCENE-8885: Summary: Optimise BKD reader by exploiting cardinality information stored on leaves Key: LUCENE-8885 URL: https://issues.apache.org/jira/browse/LUCENE-8885 Project: Lucene - Core Issue Type: Improvement Reporter: Ignacio Vera In LUCENE-8688 it was introduce a new storing strategy for leaves contains duplicated points. In such case the points are stored together with the cardinality. We still call the IntersectVisitor once per document therefore we are checking many times the same point agains the query. The idea is to check the point once and then add all the documents. The API of the IntersectVisitor does not allow that, and therefore to exploit that property we need to either change the API or extend it. Here are the possibilities I can think of: 1) Modify the API by replacing the method IntersectVisitor#visit(byte[], int) by the following method: {code:java} /** Called for leaf cells that intersects the leaf to test if the point matches to the query * In case it matches, the implementor must call {@link IntersectVisitor#visit(int)} with the * documents associated with this point are visited */ boolean matches(byte[] packedValue) throws IOException; {code} This will allow the BKD reader to check if a point matches the query and if true then Coll the method IntersectVisitor#visit(int) for all documents associated with that point. The drawback of this approach is backwards compatibility and the need to update all classes implement this interface. 2) Extends the API by adding a new default method in the IntersectVisitor interface: {code:java} /** Called for documents in a leaf cell that crosses the query. The consumer * should scrutinize the packedValue to decide whether to accept it. If accepted it should * consider only the {@code numberDocs} documents starting at {@code offset} In the 1D case, * values are visited in increasing order, and in the case of ties, in increasing * docID order. */ default void visit(int[] docID, int offset, int numberDocs, byte[] packedValue) throws IOException { for ( int i =offset; i < offset + numberDocs; i++) { visit(docID[i], packedValue); } } {code} The merit of this approach is that is backwards compatible and it is up to the implementors to override this method and get the benefits for this optimisation.The biggest downside is that it assumes that the codec has doc IDs available in an int[] slice as opposed to streaming them from disk directly to the IntersectVisitor for instance as [~jpountz] noted. Maybe there are more options I did not think about so looking forward to hearing opining if we should do this change at all and if so, how to approach it. My +1 goes to 1). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8868) New storing strategy for BKD tree leaves with low cardinality
[ https://issues.apache.org/jira/browse/LUCENE-8868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera resolved LUCENE-8868. -- Resolution: Fixed Assignee: Ignacio Vera Fix Version/s: 8.2 master (9.0) > New storing strategy for BKD tree leaves with low cardinality > - > > Key: LUCENE-8868 > URL: https://issues.apache.org/jira/browse/LUCENE-8868 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Assignee: Ignacio Vera >Priority: Major > Fix For: master (9.0), 8.2 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > Currently if a leaf on the BKD tree contains only few values, then the leaf > is treated the same way as it all values are different. It many cases it can > be much more efficient to store the distinct values with the cardinality. > The strategy is the following: > 1. When writing a leaf block the cardinality is computed. > 2. Perform some naive calculation to compute if it is better to store the > leaf as a low cardinality leaf. The storage cost are calculated as follows: > * low cardinality: leafCardinality * (packedBytesLength - prefixLenSum + 2) > where two is the estimated size of storing the cardinality. This is an > overestimation as in some cases you will only need one byte to store the > cardinality. > * High cardinality: count * (packedBytesLength - prefixLenSum). We are not > taking into account the runlen compression. > 3. If the tree has low cardinality then we set the compressed dim to -2. Note > that -1 is when all values are equal. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8879) Add tests for BKDRadixSelector#sort
[ https://issues.apache.org/jira/browse/LUCENE-8879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera resolved LUCENE-8879. -- Resolution: Fixed Assignee: Ignacio Vera Fix Version/s: 8.2 master (9.0) > Add tests for BKDRadixSelector#sort > --- > > Key: LUCENE-8879 > URL: https://issues.apache.org/jira/browse/LUCENE-8879 > Project: Lucene - Core > Issue Type: Test >Reporter: Ignacio Vera >Assignee: Ignacio Vera >Priority: Minor > Fix For: master (9.0), 8.2 > > Time Spent: 1.5h > Remaining Estimate: 0h > > This issue just add some test specifically for the sorting capability of > class BKDRadixSelector and improves the existing ones for the selection > capabilities. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8775) Tessellator: Improve the election of diagonals when splitting the polygon
[ https://issues.apache.org/jira/browse/LUCENE-8775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera updated LUCENE-8775: - Fix Version/s: 8.1.2 > Tessellator: Improve the election of diagonals when splitting the polygon > - > > Key: LUCENE-8775 > URL: https://issues.apache.org/jira/browse/LUCENE-8775 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Assignee: Ignacio Vera >Priority: Major > Fix For: master (9.0), 8.2, 8.1.2 > > Time Spent: 2.5h > Remaining Estimate: 0h > > There are some cases when polygon tessellation fails and it seems it is due > to a bad election of the diagonal when splitting the polygon. Here I propose > a patch that make sure when splitting a polygon that the resulting polygons > are valid CW polygons. > In addition this patch adds few test to check the functionality of the > tessellator and throws an error if the polygon cannot be splitted instead of > just empty the current tessellation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13413) jetty IdleTimeout bugs with Http2SolrClient, cause sprious timeouts on intranode requests
[ https://issues.apache.org/jira/browse/SOLR-13413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872533#comment-16872533 ] Ignacio Vera commented on SOLR-13413: - [~caomanhdat] yes currently the branch 8.1 is broken due to a missing sha1. > jetty IdleTimeout bugs with Http2SolrClient, cause sprious timeouts on > intranode requests > - > > Key: SOLR-13413 > URL: https://issues.apache.org/jira/browse/SOLR-13413 > Project: Solr > Issue Type: Bug >Affects Versions: 8.0 >Reporter: Hoss Man >Assignee: Cao Manh Dat >Priority: Major > Fix For: master (9.0), 8.2, 8.1.2 > > Attachments: SOLR-13413.patch, > nocommit_TestDistributedStatsComponentCardinality_trivial-no-http2.patch > > > There is evidence in some recent jenkins failures that we may have some manor > of bug in our http2 client/server code that can cause intra-node query > requests to stall / timeout non-reproducibly. > In at least one known case, forcing the jetty & SolrClients used in the test > to use http1.1, seems to prevent these test failures. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8879) Add tests for BKDRadixSelector#sort
Ignacio Vera created LUCENE-8879: Summary: Add tests for BKDRadixSelector#sort Key: LUCENE-8879 URL: https://issues.apache.org/jira/browse/LUCENE-8879 Project: Lucene - Core Issue Type: Test Reporter: Ignacio Vera This issue just add some test specifically for the sorting capability of class BKDRadixSelector and improves the existing ones for the selection capabilities. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8838) Tessellator: Remove support for Steiner points
[ https://issues.apache.org/jira/browse/LUCENE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera resolved LUCENE-8838. -- Resolution: Fixed Assignee: Ignacio Vera Fix Version/s: 8.2 master (9.0) > Tessellator: Remove support for Steiner points > -- > > Key: LUCENE-8838 > URL: https://issues.apache.org/jira/browse/LUCENE-8838 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Assignee: Ignacio Vera >Priority: Major > Fix For: master (9.0), 8.2 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Tessellator has support from Steiner points which come from the original > porting of the MapBox's earcut algorithm to Java. We are not using such > points and therefore it would be better to remove it. > In addition, it actually introduces a bug when a polygon hole is a line with > al coplanar points. In some cases it can be reduced to a point and then > treated as a Steiner points. This looks to be wrong and on those cases we > should throw an error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8868) New storing strategy for BKD tree leaves with low cardinality
[ https://issues.apache.org/jira/browse/LUCENE-8868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868354#comment-16868354 ] Ignacio Vera commented on LUCENE-8868: -- It is small and I think the 1D case will benefit quite a lot. Just wanted to raise the issue so we think over it. > New storing strategy for BKD tree leaves with low cardinality > - > > Key: LUCENE-8868 > URL: https://issues.apache.org/jira/browse/LUCENE-8868 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Currently if a leaf on the BKD tree contains only few values, then the leaf > is treated the same way as it all values are different. It many cases it can > be much more efficient to store the distinct values with the cardinality. > The strategy is the following: > 1. When writing a leaf block the cardinality is computed. > 2. Perform some naive calculation to compute if it is better to store the > leaf as a low cardinality leaf. The storage cost are calculated as follows: > * low cardinality: leafCardinality * (packedBytesLength - prefixLenSum + 2) > where two is the estimated size of storing the cardinality. This is an > overestimation as in some cases you will only need one byte to store the > cardinality. > * High cardinality: count * (packedBytesLength - prefixLenSum). We are not > taking into account the runlen compression. > 3. If the tree has low cardinality then we set the compressed dim to -2. Note > that -1 is when all values are equal. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8868) New storing strategy for BKD tree leaves with low cardinality
[ https://issues.apache.org/jira/browse/LUCENE-8868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867327#comment-16867327 ] Ignacio Vera commented on LUCENE-8868: -- I have some doubts with this optimisation in 1D. There is no much difference in storage cost as runLen will do a good job compressing the values and we are adding some overhead at indexing time as we need to compute cardinality. All and all it does not hurt too much but just want o raise that concern. > New storing strategy for BKD tree leaves with low cardinality > - > > Key: LUCENE-8868 > URL: https://issues.apache.org/jira/browse/LUCENE-8868 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Currently if a leaf on the BKD tree contains only few values, then the leaf > is treated the same way as it all values are different. It many cases it can > be much more efficient to store the distinct values with the cardinality. > The strategy is the following: > 1. When writing a leaf block the cardinality is computed. > 2. Perform some naive calculation to compute if it is better to store the > leaf as a low cardinality leaf. The storage cost are calculated as follows: > * low cardinality: leafCardinality * (packedBytesLength - prefixLenSum + 2) > where two is the estimated size of storing the cardinality. This is an > overestimation as in some cases you will only need one byte to store the > cardinality. > * High cardinality: count * (packedBytesLength - prefixLenSum). We are not > taking into account the runlen compression. > 3. If the tree has low cardinality then we set the compressed dim to -2. Note > that -1 is when all values are equal. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8867) Optimise BKD tree for low cardinality leaves
[ https://issues.apache.org/jira/browse/LUCENE-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867326#comment-16867326 ] Ignacio Vera commented on LUCENE-8867: -- I have opened https://issues.apache.org/jira/browse/LUCENE-8868 to track the first optimisation. > Optimise BKD tree for low cardinality leaves > > > Key: LUCENE-8867 > URL: https://issues.apache.org/jira/browse/LUCENE-8867 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently if a leaf on the BKD tree contains only few values, then the leaf > is treated the same way as it all values are different. It many cases it can > be much more efficient to store the distinct values with the cardinality. > In addition, in this case the method IntersectVisitor#visit(docId, byte[]) is > called n times with the same byte array but different docID. This issue > proposes to add a new method to the interface that accepts an array of docs > so it can be override by implementors and gain search performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8868) New storing strategy for BKD tree leaves with low cardinality
[ https://issues.apache.org/jira/browse/LUCENE-8868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera updated LUCENE-8868: - Description: Currently if a leaf on the BKD tree contains only few values, then the leaf is treated the same way as it all values are different. It many cases it can be much more efficient to store the distinct values with the cardinality. The strategy is the following: 1. When writing a leaf block the cardinality is computed. 2. Perform some naive calculation to compute if it is better to store the leaf as a low cardinality leaf. The storage cost are calculated as follows: * low cardinality: leafCardinality * (packedBytesLength - prefixLenSum + 2) where two is the estimated size of storing the cardinality. This is an overestimation as in some cases you will only need one byte to store the cardinality. * High cardinality: count * (packedBytesLength - prefixLenSum). We are not taking into account the runlen compression. 3. If the tree has low cardinality then we set the compressed dim to -2. Note that -1 is when all values are equal. was: Currently if a leaf on the BKD tree contains only few values, then the leaf is treated the same way as it all values are different. It many cases it can be much more efficient to store the distinct values with the cardinality. The strategy is the following: 1. When writing a leaf block the cardinality is computed. 2. Perform some naive calculation to compute if it is better to store the leaf as a low cardinality leaf. The storage cost are calculated as follows: * low cardinality: leafCardinality * (packedBytesLength - prefixLenSum + 2) where two is the estimated size of storing the cardinality. This is an overestimation as in some cases you will only need one byte to store the cardinality. * High cardinality: count * (packedBytesLength - prefixLenSum). We are not taking into account the runlen compression. 3. If the tree has low cardinality then we set the compressed dim to -2. Note that -1 is when all values are equal. > New storing strategy for BKD tree leaves with low cardinality > - > > Key: LUCENE-8868 > URL: https://issues.apache.org/jira/browse/LUCENE-8868 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Priority: Major > > Currently if a leaf on the BKD tree contains only few values, then the leaf > is treated the same way as it all values are different. It many cases it can > be much more efficient to store the distinct values with the cardinality. > The strategy is the following: > 1. When writing a leaf block the cardinality is computed. > 2. Perform some naive calculation to compute if it is better to store the > leaf as a low cardinality leaf. The storage cost are calculated as follows: > * low cardinality: leafCardinality * (packedBytesLength - prefixLenSum + 2) > where two is the estimated size of storing the cardinality. This is an > overestimation as in some cases you will only need one byte to store the > cardinality. > * High cardinality: count * (packedBytesLength - prefixLenSum). We are not > taking into account the runlen compression. > 3. If the tree has low cardinality then we set the compressed dim to -2. Note > that -1 is when all values are equal. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8868) New storing strategy for BKD tree leaves with low cardinality
Ignacio Vera created LUCENE-8868: Summary: New storing strategy for BKD tree leaves with low cardinality Key: LUCENE-8868 URL: https://issues.apache.org/jira/browse/LUCENE-8868 Project: Lucene - Core Issue Type: Improvement Reporter: Ignacio Vera Currently if a leaf on the BKD tree contains only few values, then the leaf is treated the same way as it all values are different. It many cases it can be much more efficient to store the distinct values with the cardinality. The strategy is the following: 1. When writing a leaf block the cardinality is computed. 2. Perform some naive calculation to compute if it is better to store the leaf as a low cardinality leaf. The storage cost are calculated as follows: * low cardinality: leafCardinality * (packedBytesLength - prefixLenSum + 2) where two is the estimated size of storing the cardinality. This is an overestimation as in some cases you will only need one byte to store the cardinality. * High cardinality: count * (packedBytesLength - prefixLenSum). We are not taking into account the runlen compression. 3. If the tree has low cardinality then we set the compressed dim to -2. Note that -1 is when all values are equal. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8867) Optimise BKD tree for low cardinality leaves
[ https://issues.apache.org/jira/browse/LUCENE-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866868#comment-16866868 ] Ignacio Vera commented on LUCENE-8867: -- {quote} Right, this is what I had in mind when I said this is only a problem if you have data dimensions. Because if you don't, then you could call IntersectVisitor.compare(A, A) as a way to know whether value A matches, and we wouldn't need any new API? {quote} True, that would not work when you have data dimensions. In addition IntersectVisitor.compare(A, A) is intended to compare the query with a range which normally is more expensive that a comparison with a point so it would defeat the purpose of the optimisation. I propose to break this change in two so we can work in the storage optimisation first and then we can think in the right API and make IntersectVisitor more efficient in these cases. > Optimise BKD tree for low cardinality leaves > > > Key: LUCENE-8867 > URL: https://issues.apache.org/jira/browse/LUCENE-8867 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Currently if a leaf on the BKD tree contains only few values, then the leaf > is treated the same way as it all values are different. It many cases it can > be much more efficient to store the distinct values with the cardinality. > In addition, in this case the method IntersectVisitor#visit(docId, byte[]) is > called n times with the same byte array but different docID. This issue > proposes to add a new method to the interface that accepts an array of docs > so it can be override by implementors and gain search performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-8867) Optimise BKD tree for low cardinality leaves
[ https://issues.apache.org/jira/browse/LUCENE-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866798#comment-16866798 ] Ignacio Vera edited comment on LUCENE-8867 at 6/18/19 4:16 PM: --- {quote} This is only an issue in the case that not all dimensions are indexed, right? Otherwise you could figure out that all values are equal in IntersectVisitor#compare? {quote} I think this is generic issue. The problem here is not when are values are equal but when you have a very low cardinality on the leaf nodes. In this case the can safe lots of space by storing the values in the proposed way. {quote} One concern I have with the patch is that it assumes that the codec has doc IDs available in an int[] slice as opposed to streaming them from disk directly to the IntersectVisitor for instance. {quote} I see your concern , another option would be to change more radically the interface and add a matches(byte[]) method that returns a boolean and then use the visit(docID) method. was (Author: ivera): {quote} This is only an issue in the case that not all dimensions are indexed, right? Otherwise you could figure out that all values are equal in IntersectVisitor#compare? {quote} I think this is generic issue. The problem here is not when are values are equal but when you have a very low cardinality on the leaf nodes. In this case the can safe lots of space by storing the values in the proposed way. {quote} One concern I have with the patch is that it assumes that the codec has doc IDs available in an int[] slice as opposed to streaming them from disk directly to the IntersectVisitor for instance. {quote} I see your concern , another option would be to change more radically the interface and add a matches(byte[]) method and then use the visit(docID) method. > Optimise BKD tree for low cardinality leaves > > > Key: LUCENE-8867 > URL: https://issues.apache.org/jira/browse/LUCENE-8867 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Currently if a leaf on the BKD tree contains only few values, then the leaf > is treated the same way as it all values are different. It many cases it can > be much more efficient to store the distinct values with the cardinality. > In addition, in this case the method IntersectVisitor#visit(docId, byte[]) is > called n times with the same byte array but different docID. This issue > proposes to add a new method to the interface that accepts an array of docs > so it can be override by implementors and gain search performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8867) Optimise BKD tree for low cardinality leaves
[ https://issues.apache.org/jira/browse/LUCENE-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866798#comment-16866798 ] Ignacio Vera commented on LUCENE-8867: -- {quote} This is only an issue in the case that not all dimensions are indexed, right? Otherwise you could figure out that all values are equal in IntersectVisitor#compare? {quote} I think this is generic issue. The problem here is not when are values are equal but when you have a very low cardinality on the leaf nodes. In this case the can safe lots of space by storing the values in the proposed way. {quote} One concern I have with the patch is that it assumes that the codec has doc IDs available in an int[] slice as opposed to streaming them from disk directly to the IntersectVisitor for instance. {quote} I see your concern , another option would be to change more radically the interface and add a matches(byte[]) method and then use the visit(docID) method. > Optimise BKD tree for low cardinality leaves > > > Key: LUCENE-8867 > URL: https://issues.apache.org/jira/browse/LUCENE-8867 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Currently if a leaf on the BKD tree contains only few values, then the leaf > is treated the same way as it all values are different. It many cases it can > be much more efficient to store the distinct values with the cardinality. > In addition, in this case the method IntersectVisitor#visit(docId, byte[]) is > called n times with the same byte array but different docID. This issue > proposes to add a new method to the interface that accepts an array of docs > so it can be override by implementors and gain search performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8867) Optimise BKD tree for low cardinality leaves
[ https://issues.apache.org/jira/browse/LUCENE-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera updated LUCENE-8867: - Description: Currently if a leaf on the BKD tree contains only few values, then the leaf is treated the same way as it all values are different. It many cases it can be much more efficient to store the distinct values with the cardinality. In addition, in this case the method IntersectVisitor#visit(docId, byte[]) is called n times with the same byte array but different docID. This issue proposes to add a new method to the interface that accepts an array of docs so it can be override by implementors and gain search performance. was: Currently if a leaf on the BKD tree contains only few values, then the leaf is treated the same way as it all values are different. It many cases it can be much more efficient to store the distinct values with the cardinality. In addition, in this cases the method IntersectVisitor#visit(docId, byte[]) is called n times with the same byte array but different docID. This issue proposes to add a new method to the interface that accepts an array of docs so it can be override by implementors and gain search performance. > Optimise BKD tree for low cardinality leaves > > > Key: LUCENE-8867 > URL: https://issues.apache.org/jira/browse/LUCENE-8867 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Currently if a leaf on the BKD tree contains only few values, then the leaf > is treated the same way as it all values are different. It many cases it can > be much more efficient to store the distinct values with the cardinality. > In addition, in this case the method IntersectVisitor#visit(docId, byte[]) is > called n times with the same byte array but different docID. This issue > proposes to add a new method to the interface that accepts an array of docs > so it can be override by implementors and gain search performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8867) Optimise BKD tree for low cardinality leaves
Ignacio Vera created LUCENE-8867: Summary: Optimise BKD tree for low cardinality leaves Key: LUCENE-8867 URL: https://issues.apache.org/jira/browse/LUCENE-8867 Project: Lucene - Core Issue Type: Improvement Reporter: Ignacio Vera Currently if a leaf on the BKD tree contains only few values, then the leaf is treated the same way as it all values are different. It many cases it can be much more efficient to store the distinct values with the cardinality. In addition, in this cases the method IntersectVisitor#visit(docId, byte[]) is called n times with the same byte array but different docID. This issue proposes to add a new method to the interface that accepts an array of docs so it can be override by implementors and gain search performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8769) Range Query Type With Logically Connected Ranges
[ https://issues.apache.org/jira/browse/LUCENE-8769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860665#comment-16860665 ] Ignacio Vera commented on LUCENE-8769: -- I agree with Adrien, in the case of AND we can just compute the intersection range for all provided ranges and execute it as a single range query. On the other hand it is a good start to support logical connected ranges that cannot be merge on a single range. > Range Query Type With Logically Connected Ranges > > > Key: LUCENE-8769 > URL: https://issues.apache.org/jira/browse/LUCENE-8769 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8769.patch, LUCENE-8769.patch > > > Today, we visit BKD tree for each range specified for PointRangeQuery. It > would be good to have a range query type which can take multiple ranges > logically ANDed or ORed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8850) Creating a polygon with no area or an invalid area should throw an error
Ignacio Vera created LUCENE-8850: Summary: Creating a polygon with no area or an invalid area should throw an error Key: LUCENE-8850 URL: https://issues.apache.org/jira/browse/LUCENE-8850 Project: Lucene - Core Issue Type: Improvement Reporter: Ignacio Vera Working with some polygon data which has some data quality issues I found two cases where we can throw an error as we know the polygon is invalid, 1) We are already computing the signed area on the Polygon constructor, therefore if the value is 0, then we know we are building a polygon with no area and therefore we can throw an error as it is an invalid polygon. 2) We can calculate the total area of the polygon and holes, if the area is lower or equal than 0, then the holes are equal or bigger than the polygon which it is an invalid polygon. In addition I propose anew method to calculate the signed area which requires less arithmetic calculations and therefore it introduces less numerical errors. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8775) Tessellator: Improve the election of diagonals when splitting the polygon
[ https://issues.apache.org/jira/browse/LUCENE-8775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera resolved LUCENE-8775. -- Resolution: Fixed > Tessellator: Improve the election of diagonals when splitting the polygon > - > > Key: LUCENE-8775 > URL: https://issues.apache.org/jira/browse/LUCENE-8775 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Assignee: Ignacio Vera >Priority: Major > Fix For: master (9.0), 8.2 > > Time Spent: 2.5h > Remaining Estimate: 0h > > There are some cases when polygon tessellation fails and it seems it is due > to a bad election of the diagonal when splitting the polygon. Here I propose > a patch that make sure when splitting a polygon that the resulting polygons > are valid CW polygons. > In addition this patch adds few test to check the functionality of the > tessellator and throws an error if the polygon cannot be splitted instead of > just empty the current tessellation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (LUCENE-8775) Tessellator: Improve the election of diagonals when splitting the polygon
[ https://issues.apache.org/jira/browse/LUCENE-8775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera reopened LUCENE-8775: -- Assignee: Ignacio Vera I reopen the issue as I have realised in some cases holes are removed from the tessellation. I have opened a new PR that fixes it and it improves the coverage of the test by comparing the area of the polygon with the area of the tessellation. > Tessellator: Improve the election of diagonals when splitting the polygon > - > > Key: LUCENE-8775 > URL: https://issues.apache.org/jira/browse/LUCENE-8775 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Assignee: Ignacio Vera >Priority: Major > Fix For: master (9.0), 8.2 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > There are some cases when polygon tessellation fails and it seems it is due > to a bad election of the diagonal when splitting the polygon. Here I propose > a patch that make sure when splitting a polygon that the resulting polygons > are valid CW polygons. > In addition this patch adds few test to check the functionality of the > tessellator and throws an error if the polygon cannot be splitted instead of > just empty the current tessellation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8838) Tessellator: Remove support for Steiner points
Ignacio Vera created LUCENE-8838: Summary: Tessellator: Remove support for Steiner points Key: LUCENE-8838 URL: https://issues.apache.org/jira/browse/LUCENE-8838 Project: Lucene - Core Issue Type: Bug Reporter: Ignacio Vera Tessellator has support from Steiner points which come from the original porting of the MapBox's earcut algorithm to Java. We are not using such points and therefore it would be better to remove it. In addition, it actually introduces a bug when a polygon hole is a line with al coplanar points. In some cases it can be reduced to a point and then treated as a Steiner points. This looks to be wrong and on those cases we should throw an error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8775) Tessellator: Improve the election of diagonals when splitting the polygon
[ https://issues.apache.org/jira/browse/LUCENE-8775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera resolved LUCENE-8775. -- Resolution: Fixed Fix Version/s: 8.2 master (9.0) > Tessellator: Improve the election of diagonals when splitting the polygon > - > > Key: LUCENE-8775 > URL: https://issues.apache.org/jira/browse/LUCENE-8775 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Priority: Major > Fix For: master (9.0), 8.2 > > Time Spent: 2h > Remaining Estimate: 0h > > There are some cases when polygon tessellation fails and it seems it is due > to a bad election of the diagonal when splitting the polygon. Here I propose > a patch that make sure when splitting a polygon that the resulting polygons > are valid CW polygons. > In addition this patch adds few test to check the functionality of the > tessellator and throws an error if the polygon cannot be splitted instead of > just empty the current tessellation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8620) Add CONTAINS support for LatLonShape
[ https://issues.apache.org/jira/browse/LUCENE-8620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera updated LUCENE-8620: - Fix Version/s: (was: 8.1) (was: master (9.0)) > Add CONTAINS support for LatLonShape > > > Key: LUCENE-8620 > URL: https://issues.apache.org/jira/browse/LUCENE-8620 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/sandbox >Reporter: Ignacio Vera >Priority: Major > Attachments: LUCENE-8620.patch, LUCENE-8620.patch > > Time Spent: 4h 40m > Remaining Estimate: 0h > > Currently the only spatial operation that cannot be performed using > {{LatLonShape}} is CONTAINS. This issue will add such capability by tracking > if an edge of a generated triangle from the {{Tessellator}} is an edge of the > polygon. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8831) LatLonShapeBoundingBoxQuery hashcode is wrong
[ https://issues.apache.org/jira/browse/LUCENE-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera resolved LUCENE-8831. -- Resolution: Fixed Assignee: Ignacio Vera Fix Version/s: 8.2 master (9.0) > LatLonShapeBoundingBoxQuery hashcode is wrong > -- > > Key: LUCENE-8831 > URL: https://issues.apache.org/jira/browse/LUCENE-8831 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Assignee: Ignacio Vera >Priority: Major > Fix For: master (9.0), 8.2 > > Time Spent: 20m > Remaining Estimate: 0h > > Currently the hashcode implementation for LatLonShapeBoundingBoxQuery returns > always a different value. Therefore the query cannot be cached. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8831) LatLonShapeBoundingBoxQuery hashcode is wrong
Ignacio Vera created LUCENE-8831: Summary: LatLonShapeBoundingBoxQuery hashcode is wrong Key: LUCENE-8831 URL: https://issues.apache.org/jira/browse/LUCENE-8831 Project: Lucene - Core Issue Type: Bug Reporter: Ignacio Vera Currently the hashcode implementation for LatLonShapeBoundingBoxQuery returns always a different value. Therefore the query cannot be cached. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8819) org.apache.lucene.search.TestTopDocsMerge.testSort_1 failure
[ https://issues.apache.org/jira/browse/LUCENE-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855900#comment-16855900 ] Ignacio Vera commented on LUCENE-8819: -- I had a look into the error in TestRandomRegExp2 and it looks more concerning. The test executes two equivalent queries and expects the same results. The problem is that it is using two different random searchers and in this case one is using multiple threads (executor != null) and the other one is single threaded (executor == null). The effect is the same as above, documents gets different shard_index and results are different. It sounds wrong to me that depending how you construct your searcher you might get different results. > org.apache.lucene.search.TestTopDocsMerge.testSort_1 failure > > > Key: LUCENE-8819 > URL: https://issues.apache.org/jira/browse/LUCENE-8819 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Priority: Major > Attachments: LUCENE-8819.patch > > > It can be reproduced with: > > {code:java} > ant test -Dtestcase=TestTopDocsMerge -Dtests.method=testSort_1 > -Dtests.seed=E916688CE5BC9122 -Dtests.multiplier=3 -Dtests.slow=true > -Dtests.locale=es-US -Dtests.timezone=Pacific/Johnston -Dtests.asserts=true > -Dtests.file.encoding=ISO-8859-1{code} > > Test fails in master and branch 8.x but it does not fail in branch 8.1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8819) org.apache.lucene.search.TestTopDocsMerge.testSort_1 failure
[ https://issues.apache.org/jira/browse/LUCENE-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855682#comment-16855682 ] Ignacio Vera commented on LUCENE-8819: -- The second test worked for me adding the following VM configuration: {code:java} -Dtests.nightly=true -Dtests.seed=215B9655D6767594 -Dtests.multiplier=2 {code} Note the nightly flag. The new issue makes sense to me, thanks! > org.apache.lucene.search.TestTopDocsMerge.testSort_1 failure > > > Key: LUCENE-8819 > URL: https://issues.apache.org/jira/browse/LUCENE-8819 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Priority: Major > Attachments: LUCENE-8819.patch > > > It can be reproduced with: > > {code:java} > ant test -Dtestcase=TestTopDocsMerge -Dtests.method=testSort_1 > -Dtests.seed=E916688CE5BC9122 -Dtests.multiplier=3 -Dtests.slow=true > -Dtests.locale=es-US -Dtests.timezone=Pacific/Johnston -Dtests.asserts=true > -Dtests.file.encoding=ISO-8859-1{code} > > Test fails in master and branch 8.x but it does not fail in branch 8.1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8819) org.apache.lucene.search.TestTopDocsMerge.testSort_1 failure
[ https://issues.apache.org/jira/browse/LUCENE-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854586#comment-16854586 ] Ignacio Vera commented on LUCENE-8819: -- I set these two arguments in the VM configuration: {code:java} -Dtests.seed=E916688CE5BC9122 -Dtests.multiplier=3 {code} > org.apache.lucene.search.TestTopDocsMerge.testSort_1 failure > > > Key: LUCENE-8819 > URL: https://issues.apache.org/jira/browse/LUCENE-8819 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Priority: Major > Attachments: LUCENE-8819.patch > > > It can be reproduced with: > > {code:java} > ant test -Dtestcase=TestTopDocsMerge -Dtests.method=testSort_1 > -Dtests.seed=E916688CE5BC9122 -Dtests.multiplier=3 -Dtests.slow=true > -Dtests.locale=es-US -Dtests.timezone=Pacific/Johnston -Dtests.asserts=true > -Dtests.file.encoding=ISO-8859-1{code} > > Test fails in master and branch 8.x but it does not fail in branch 8.1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8819) org.apache.lucene.search.TestTopDocsMerge.testSort_1 failure
[ https://issues.apache.org/jira/browse/LUCENE-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854562#comment-16854562 ] Ignacio Vera commented on LUCENE-8819: -- Note that you need to add the seed and the multiplier in IntelliJ. In that case I am able to reproduce it in the debugger. What I observe is that documents come in different order and naively my best guess is that is due to the order of execution that 8757 has introduced, but I might be totally wrong :). > org.apache.lucene.search.TestTopDocsMerge.testSort_1 failure > > > Key: LUCENE-8819 > URL: https://issues.apache.org/jira/browse/LUCENE-8819 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Priority: Major > Attachments: LUCENE-8819.patch > > > It can be reproduced with: > > {code:java} > ant test -Dtestcase=TestTopDocsMerge -Dtests.method=testSort_1 > -Dtests.seed=E916688CE5BC9122 -Dtests.multiplier=3 -Dtests.slow=true > -Dtests.locale=es-US -Dtests.timezone=Pacific/Johnston -Dtests.asserts=true > -Dtests.file.encoding=ISO-8859-1{code} > > Test fails in master and branch 8.x but it does not fail in branch 8.1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8819) org.apache.lucene.search.TestTopDocsMerge.testSort_1 failure
[ https://issues.apache.org/jira/browse/LUCENE-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854400#comment-16854400 ] Ignacio Vera commented on LUCENE-8819: -- Thanks [~atris]! Unluckily the patch does not apply with the latest master. I believe Adrien might have made some minor edits to your patch before applying it and it is conflicting with this patch. Maybe you want to pull the latest master and then do the changes. Sorry for the hassle. > org.apache.lucene.search.TestTopDocsMerge.testSort_1 failure > > > Key: LUCENE-8819 > URL: https://issues.apache.org/jira/browse/LUCENE-8819 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Priority: Major > Attachments: LUCENE-8819.patch > > > It can be reproduced with: > > {code:java} > ant test -Dtestcase=TestTopDocsMerge -Dtests.method=testSort_1 > -Dtests.seed=E916688CE5BC9122 -Dtests.multiplier=3 -Dtests.slow=true > -Dtests.locale=es-US -Dtests.timezone=Pacific/Johnston -Dtests.asserts=true > -Dtests.file.encoding=ISO-8859-1{code} > > Test fails in master and branch 8.x but it does not fail in branch 8.1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8819) org.apache.lucene.search.TestTopDocsMerge.testSort_1 failure
[ https://issues.apache.org/jira/browse/LUCENE-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854297#comment-16854297 ] Ignacio Vera commented on LUCENE-8819: -- Just for the record, I see other test failing and I believe it comes from the same issue, for example: {code:java} ant test -Dtestcase=TestRegexpRandom2 -Dtests.method=testRegexps -Dtests.seed=215B9655D6767594 -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-8.x/test-data/enwiki.random.lines.txt -Dtests.locale=es-PY -Dtests.timezone=Asia/Kuala_Lumpur -Dtests.asserts=true -Dtests.file.encoding=UTF-8{code} > org.apache.lucene.search.TestTopDocsMerge.testSort_1 failure > > > Key: LUCENE-8819 > URL: https://issues.apache.org/jira/browse/LUCENE-8819 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Priority: Major > > It can be reproduced with: > > {code:java} > ant test -Dtestcase=TestTopDocsMerge -Dtests.method=testSort_1 > -Dtests.seed=E916688CE5BC9122 -Dtests.multiplier=3 -Dtests.slow=true > -Dtests.locale=es-US -Dtests.timezone=Pacific/Johnston -Dtests.asserts=true > -Dtests.file.encoding=ISO-8859-1{code} > > Test fails in master and branch 8.x but it does not fail in branch 8.1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8819) org.apache.lucene.search.TestTopDocsMerge.testSort_1 failure
[ https://issues.apache.org/jira/browse/LUCENE-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854271#comment-16854271 ] Ignacio Vera commented on LUCENE-8819: -- I think this test error was introduced by LUCENE-8757 > org.apache.lucene.search.TestTopDocsMerge.testSort_1 failure > > > Key: LUCENE-8819 > URL: https://issues.apache.org/jira/browse/LUCENE-8819 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ignacio Vera >Priority: Major > > It can be reproduced with: > > {code:java} > ant test -Dtestcase=TestTopDocsMerge -Dtests.method=testSort_1 > -Dtests.seed=E916688CE5BC9122 -Dtests.multiplier=3 -Dtests.slow=true > -Dtests.locale=es-US -Dtests.timezone=Pacific/Johnston -Dtests.asserts=true > -Dtests.file.encoding=ISO-8859-1{code} > > Test fails in master and branch 8.x but it does not fail in branch 8.1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8819) org.apache.lucene.search.TestTopDocsMerge.testSort_1 failure
Ignacio Vera created LUCENE-8819: Summary: org.apache.lucene.search.TestTopDocsMerge.testSort_1 failure Key: LUCENE-8819 URL: https://issues.apache.org/jira/browse/LUCENE-8819 Project: Lucene - Core Issue Type: Bug Reporter: Ignacio Vera It can be reproduced with: {code:java} ant test -Dtestcase=TestTopDocsMerge -Dtests.method=testSort_1 -Dtests.seed=E916688CE5BC9122 -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=es-US -Dtests.timezone=Pacific/Johnston -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1{code} Test fails in master and branch 8.x but it does not fail in branch 8.1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8775) Tessellator: Improve the election of diagonals when splitting the polygon
Ignacio Vera created LUCENE-8775: Summary: Tessellator: Improve the election of diagonals when splitting the polygon Key: LUCENE-8775 URL: https://issues.apache.org/jira/browse/LUCENE-8775 Project: Lucene - Core Issue Type: Bug Reporter: Ignacio Vera There are some cases when polygon tessellation fails and it seems it is due to a bad election of the diagonal when splitting the polygon. Here I propose a patch that make sure when splitting a polygon that the resulting polygons are valid CW polygons. In addition this patch adds few test to check the functionality of the tessellator and throws an error if the polygon cannot be splitted instead of just empty the current tessellation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8736) LatLonShapePolygonQuery returning incorrect WITHIN results with shared boundaries
[ https://issues.apache.org/jira/browse/LUCENE-8736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16817579#comment-16817579 ] Ignacio Vera commented on LUCENE-8736: -- [~nknize] it would be good to have an example where the determinant overflows as I haven't been able to exercise such an error. Regarding the pnoply algorithm, I see the properties and simplicity that [~rcmuir] likes. One interesting thing to notice is that the algorithm cannot be really used on a finite space. For example in the geo case, you have problems for points with longitude = 180 or points with latitude = 90 as those points cannot be contained by a polygon. This is actually corrected with the encoding as points are pulled southwards and westwards and there is no points in the index with those values. The penalty is that you get false positives (points pulled inside the polygon) and true negatives (points pulled outside of the polygon). This is ok in the case of points as the effect is very small and difficult to notice. On the other hand this effect is pretty big when working with shapes. Following the example of country shapes, imagine we are indexing now those countries. If a user executes a query for shapes that intersect one country, the user would expect to get all neighbour countries. In this case due to the encoding effect, some of those polygons will be pulled inside the query polygon and some away so you only get a partial result (neighbour countries north and east) which is no good in this case. My feeling is that we need to eliminate the situation where we get true negatives when working with shapes. One of the possibilities here is to quantise the query polygon as in theory it should remove true negatives. The problem with doing that is that the pnoply algorithm won't work anymore because now you have problems again for points with longitude = 180 and points with latitude = 90. This is the reason I like the approach of Nick as it removes that problem. Maybe one possibility is to use different algorithms for points and for shapes although that means that results will differ depending if you index points as points or as shapes. > LatLonShapePolygonQuery returning incorrect WITHIN results with shared > boundaries > - > > Key: LUCENE-8736 > URL: https://issues.apache.org/jira/browse/LUCENE-8736 > Project: Lucene - Core > Issue Type: Bug >Reporter: Nicholas Knize >Assignee: Nicholas Knize >Priority: Major > Fix For: 8.1, master (9.0) > > Attachments: LUCENE-8736.patch, LUCENE-8736.patch, > adaptive-decoding.patch > > > Triangles that are {{WITHIN}} a target polygon query that also share a > boundary with the polygon are incorrectly reported as {{CROSSES}} instead of > {{INSIDE}}. This leads to incorrect {{WITHIN}} query results as demonstrated > in the following test: > {code:java} > public void testWithinFailure() throws Exception { > Directory dir = newDirectory(); > RandomIndexWriter w = new RandomIndexWriter(random(), dir); > // test polygons: > Polygon indexPoly1 = new Polygon(new double[] {4d, 4d, 3d, 3d, 4d}, new > double[] {3d, 4d, 4d, 3d, 3d}); > Polygon indexPoly2 = new Polygon(new double[] {2d, 2d, 1d, 1d, 2d}, new > double[] {6d, 7d, 7d, 6d, 6d}); > Polygon indexPoly3 = new Polygon(new double[] {1d, 1d, 0d, 0d, 1d}, new > double[] {3d, 4d, 4d, 3d, 3d}); > Polygon indexPoly4 = new Polygon(new double[] {2d, 2d, 1d, 1d, 2d}, new > double[] {0d, 1d, 1d, 0d, 0d}); > // index polygons: > Document doc; > addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly1); > w.addDocument(doc); > addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly2); > w.addDocument(doc); > addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly3); > w.addDocument(doc); > addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly4); > w.addDocument(doc); > / search // > IndexReader reader = w.getReader(); > w.close(); > IndexSearcher searcher = newSearcher(reader); > Polygon[] searchPoly = new Polygon[] {new Polygon(new double[] {4d, 4d, > 0d, 0d, 4d}, new double[] {0d, 7d, 7d, 0d, 0d})}; > Query q = LatLonShape.newPolygonQuery(FIELDNAME, QueryRelation.WITHIN, > searchPoly); > assertEquals(4, searcher.count(q)); > IOUtils.close(w, reader, dir); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8736) LatLonShapePolygonQuery returning incorrect WITHIN results with shared boundaries
[ https://issues.apache.org/jira/browse/LUCENE-8736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815272#comment-16815272 ] Ignacio Vera commented on LUCENE-8736: -- I have run my own benchmarks with this change and they look like: |Approach||Shape||M hits/sec dev||M hits/sec base||M hits/sec diff||QPS dev||QPS base||QPS diff||Hit count dev||Hit count base||Hit count diff|| |points|box|77.42|75.92| 2%|78.78|77.25| 2%|221118844|221118844| 0%| |points|polyRussia|15.99|18.42|-13%|4.56|5.25|-13%|3508846|3508846| 0%| |points|poly 10|76.02|76.57|-1%|48.07|48.42|-1%|355809475|355809475| 0%| |points|polyMedium|8.97|9.28|-3%|109.93|113.64|-3%|2693559|2693559| 0%| |shapes|box|35.51|36.40|-2%|36.13|37.04|-2%|221118844|221118844| 0%| |shapes|polyRussia|6.22|2.78|124%|1.77|0.79|124%|3508846|3508846| 0%| |shapes|poly 10|26.91|19.73|36%|17.01|12.48|36%|355809475|355809475| 0%| |shapes|polyMedium|2.71|1.06|156%|33.26|12.99|156%|2693559|2693559| 0%| In addition I run the similar benchmarks and indexing the points as lines and as polygons: |Approach||Shape||M hits/sec dev||M hits/sec base||M hits/sec diff||QPS dev||QPS base||QPS diff||Hit count dev||Hit count base||Hit count diff|| |line|box|35.42|35.56|-0%|35.91|36.05|-0%|221924270|221924270| 0%| |line|polyRussia|3.65|2.52|45%|1.04|0.72|45%|3510913|3510913| 0%| |line|poly 10|22.87|18.65|23%|14.43|11.76|23%|356664874|356664874| 0%| |line|polyMedium|1.09|0.73|50%|12.70|8.49|50%|2820569|2820569| 0%| |polygon|box|27.20|24.93| 9%|27.58|25.27| 9%|221925638|221925638| 0%| |polygon|polyRussia|1.85|1.77| 4%|0.53|0.51| 4%|3511839|3511839| 0%| |polygon|poly 10|13.11|14.05|-7%|8.26|8.85|-7%|357135836|357135836| 0%| |polygon|polyMedium|0.49|0.49| 1%|5.67|5.62| 1%|2857655|2857655| 0%| +1. I like this approach. Benchmarks show that adjusting the logic depending on the type of triangle as it really speeds up points and lines with a small hit on polygons (which might be due to the change in how contains is computed). Could we add a comment when we call tree#crossesBox regarding why we choose to include the boundaries? > LatLonShapePolygonQuery returning incorrect WITHIN results with shared > boundaries > - > > Key: LUCENE-8736 > URL: https://issues.apache.org/jira/browse/LUCENE-8736 > Project: Lucene - Core > Issue Type: Bug >Reporter: Nicholas Knize >Priority: Major > Attachments: LUCENE-8736.patch, LUCENE-8736.patch, > adaptive-decoding.patch > > > Triangles that are {{WITHIN}} a target polygon query that also share a > boundary with the polygon are incorrectly reported as {{CROSSES}} instead of > {{INSIDE}}. This leads to incorrect {{WITHIN}} query results as demonstrated > in the following test: > {code:java} > public void testWithinFailure() throws Exception { > Directory dir = newDirectory(); > RandomIndexWriter w = new RandomIndexWriter(random(), dir); > // test polygons: > Polygon indexPoly1 = new Polygon(new double[] {4d, 4d, 3d, 3d, 4d}, new > double[] {3d, 4d, 4d, 3d, 3d}); > Polygon indexPoly2 = new Polygon(new double[] {2d, 2d, 1d, 1d, 2d}, new > double[] {6d, 7d, 7d, 6d, 6d}); > Polygon indexPoly3 = new Polygon(new double[] {1d, 1d, 0d, 0d, 1d}, new > double[] {3d, 4d, 4d, 3d, 3d}); > Polygon indexPoly4 = new Polygon(new double[] {2d, 2d, 1d, 1d, 2d}, new > double[] {0d, 1d, 1d, 0d, 0d}); > // index polygons: > Document doc; > addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly1); > w.addDocument(doc); > addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly2); > w.addDocument(doc); > addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly3); > w.addDocument(doc); > addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly4); > w.addDocument(doc); > / search // > IndexReader reader = w.getReader(); > w.close(); > IndexSearcher searcher = newSearcher(reader); > Polygon[] searchPoly = new Polygon[] {new Polygon(new double[] {4d, 4d, > 0d, 0d, 4d}, new double[] {0d, 7d, 7d, 0d, 0d})}; > Query q = LatLonShape.newPolygonQuery(FIELDNAME, QueryRelation.WITHIN, > searchPoly); > assertEquals(4, searcher.count(q)); > IOUtils.close(w, reader, dir); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8758) Class Field levelN is not populated correctly in QuadPrefixTree
[ https://issues.apache.org/jira/browse/LUCENE-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815114#comment-16815114 ] Ignacio Vera commented on LUCENE-8758: -- +1 to remove those arrays. They do not seem to be used and their content is buggy. That info can be deducted from the level anyway. > Class Field levelN is not populated correctly in QuadPrefixTree > --- > > Key: LUCENE-8758 > URL: https://issues.apache.org/jira/browse/LUCENE-8758 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/spatial-extras >Affects Versions: 4.0, 5.0, 6.0, 7.0, 8.0 >Reporter: Dominic Page >Priority: Trivial > Labels: beginner > Fix For: 8.x > > > QuadPrefixTree in Lucene prepopulates these arrays: > {{levelW = new double[maxLevels];}} > {{levelH = new double[maxLevels];}} > {{*levelS = new int[maxLevels];*}} > {{*levelN = new int[maxLevels];*}} > Like this > {{for (int i = 1; i < levelW.length; i++) {}} > {{ levelW[i] = levelW[i - 1] / 2.0;}} > {{ levelH[i] = levelH[i - 1] / 2.0;}} > {{ *levelS[i] = levelS[i - 1] * 2;*}} > {{ *levelN[i] = levelN[i - 1] * 4;*}} > {{}}} > The field > {{levelN[]}} > overflows after level 14 = 1073741824 where maxLevels is limited to > {{MAX_LEVELS_POSSIBLE = 50;}} > The field levelN appears not to be used anywhere. Likewise, the field > {{levelS[] }} > is only used in the > {{printInfo}} > method. I would propose either to remove both > {{levelN[],}}{{levelS[]}} > or to change the datatype > {{levelN = new long[maxLevels];}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8761) RandomGeoPolygonTest.testCompareSmallPolygons test failure
Ignacio Vera created LUCENE-8761: Summary: RandomGeoPolygonTest.testCompareSmallPolygons test failure Key: LUCENE-8761 URL: https://issues.apache.org/jira/browse/LUCENE-8761 Project: Lucene - Core Issue Type: Bug Components: modules/spatial3d Reporter: Ignacio Vera Reproduce with: {code:java} ant test -Dtestcase=RandomGeoPolygonTest -Dtests.method=testCompareSmallPolygons -Dtests.seed=5616B8AF18E73F5 -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=it-IT -Dtests.timezone=Africa/Luanda -Dtests.asserts=true -Dtests.file.encoding=UTF-8{code} Output: {code:java} [junit4] FAILURE 0.25s | RandomGeoPolygonTest.testCompareSmallPolygons {seed=[5616B8AF18E73F5:3F0F8C29D1934F55]} <<< [junit4] > Throwable #1: java.lang.AssertionError: [junit4] > Standard polygon: GeoCompositePolygon: {[GeoConvexPolygon: {planetmodel=PlanetModel.WGS84, points=[[lat=4.0E-323, lon=-1.5642641387776646([X=0.0065394500758205526, Y=-1.0010974954578205, Z=4.0E-323])], [lat=-4.1204793742327684E-7, lon=-1.5642638751890106([X=0.0065397139537611325, Y=-1.0010974937339752, Z=-4.125089589031438E-7])], [lat=4.7042128367572116E-7, lon=-1.5642630821305357([X=0.006540507882610503, Y=-1.0010974885472588, Z=4.709476164070912E-7])], [lat=1.0079428885548814E-6, lon=-1.5642647153663134([X=0.006538872854363873, Y=-1.0010974992277146, Z=1.0090706294797576E-6])]], internalEdges={3}}, GeoConvexPolygon: {planetmodel=PlanetModel.WGS84, points=[[lat=4.0E-323, lon=-1.5642641387776646([X=0.0065394500758205526, Y=-1.0010974954578205, Z=4.0E-323])], [lat=1.0079428885548814E-6, lon=-1.5642647153663134([X=0.006538872854363873, Y=-1.0010974992277146, Z=1.0090706294797576E-6])], [lat=5.018093706393593E-7, lon=-1.5642651810798487([X=0.006538406629710141, Y=-1.0010975022732327, Z=5.02370822057141E-7])]], internalEdges={0}}]} [junit4] > Large polygon: GeoComplexPolygon: {planetmodel=PlanetModel.WGS84, number of shapes=1, address=24da76d0, testPoint=[lat=3.1362512108941996E-7, lon=-1.5642641985086745([X=0.006539390279255702, Y=-1.0010974958483772, Z=3.139760218082874E-7])], testPointInSet=true, shapes={ {[lat=5.018093706393593E-7, lon=-1.5642651810798487([X=0.006538406629710141, Y=-1.0010975022732327, Z=5.02370822057141E-7])], [lat=4.0E-323, lon=-1.5642641387776646([X=0.0065394500758205526, Y=-1.0010974954578205, Z=4.0E-323])], [lat=-4.1204793742327684E-7, lon=-1.5642638751890106([X=0.0065397139537611325, Y=-1.0010974937339752, Z=-4.125089589031438E-7])], [lat=4.7042128367572116E-7, lon=-1.5642630821305357([X=0.006540507882610503, Y=-1.0010974885472588, Z=4.709476164070912E-7])], [lat=1.0079428885548814E-6, lon=-1.5642647153663134([X=0.006538872854363873, Y=-1.0010974992277146, Z=1.0090706294797576E-6])]}} [junit4] > Point: [lat=2.8705346198541964E-8, lon=7.537339947889447E-73([X=1.0011188539924787, Y=7.545773130782812E-73, Z=2.8737463289741695E-8])] [junit4] > WKT: POLYGON((-89.62573319562668 2.263E-321,-89.62571809310927 -2.3608607771424413E-5,-89.62567265420576 2.6953154147745272E-5,-89.62576623172276 5.775087350441979E-5,-89.62579291514281 2.875155905775134E-5,-89.62573319562668 2.263E-321)) [junit4] > WKT: POINT(4.318577677694212E-71 1.6446951866383562E-6) [junit4] > normal polygon: false [junit4] > large polygon: true{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8708) Can we simplify conjunctions of range queries automatically?
[ https://issues.apache.org/jira/browse/LUCENE-8708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813239#comment-16813239 ] Ignacio Vera commented on LUCENE-8708: -- Just an idea maybe bias for my background. One of the issues here is that we visit the tree for each range and this is what we are trying to improve. Maybe adding a query that can accept more than one range with a logical relationship ('AND', 'OR',...) might be less invasive and encapsulates the logic. > Can we simplify conjunctions of range queries automatically? > > > Key: LUCENE-8708 > URL: https://issues.apache.org/jira/browse/LUCENE-8708 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: interval_range_clauses_merging0704.patch > > > BooleanQuery#rewrite already has some logic to make queries more efficient, > such as deduplicating filters or rewriting boolean queries that wrap a single > positive clause to that clause. > It would be nice to also simplify conjunctions of range queries, so that eg. > {{foo: [5 TO *] AND foo:[* TO 20]}} would be rewritten to {{foo:[5 TO 20]}}. > When constructing queries manually or via the classic query parser, it feels > unnecessary as this is something that the user can fix easily. However if you > want to implement a query parser that only allows specifying one bound at > once, such as Gmail ({{after:2018-12-31}} > https://support.google.com/mail/answer/7190?hl=en) or GitHub > ({{updated:>=2018-12-31}} > https://help.github.com/en/articles/searching-issues-and-pull-requests#search-by-when-an-issue-or-pull-request-was-created-or-last-updated) > then you might end up with inefficient queries if the end user specifies > both an upper and a lower bound. It would be nice if we optimized those > automatically. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8746) Make EdgeTree (aka ComponentTree) support different type of components
Ignacio Vera created LUCENE-8746: Summary: Make EdgeTree (aka ComponentTree) support different type of components Key: LUCENE-8746 URL: https://issues.apache.org/jira/browse/LUCENE-8746 Project: Lucene - Core Issue Type: Improvement Reporter: Ignacio Vera Currently the class {{EdgeTree}} is a bit confusing as it is in reality a tree of components. The inner class {{Edge}} is the one that builds a tree of edges which is used by Polygon2D and Line2D to represent their structure. Here is proposed: 1) Create a new class called {{ComponentTree}} which is in fact the current {{EdgeTree}} 2) Modify {{EdgeTree}} to be in fact the inner class Edge 3) Extract a {{Component}} interface so we can have different types of components in the same tree. This allow us to support heterogeneous trees of components. 4) Make {{Polygon2D}} and {{Line2D}} instance of the component interface. 4) With this change, {{LatLonShapePolygonQuery}} and {{LatLonShapeLineQuery}} can be replaced with one {{LatLonShapeComponentQuery.}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8744) TestTessellator#testLinesIntersect failure
Ignacio Vera created LUCENE-8744: Summary: TestTessellator#testLinesIntersect failure Key: LUCENE-8744 URL: https://issues.apache.org/jira/browse/LUCENE-8744 Project: Lucene - Core Issue Type: Test Reporter: Ignacio Vera Reproduce with: {code:java} ant test -Dtestcase=TestTessellator -Dtests.method=testLinesIntersect -Dtests.seed=D8AE5A1A4CA3A81D -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=ar-IQ -Dtests.timezone=Europe/Sarajevo -Dtests.asserts=true -Dtests.file.encoding=UTF-8{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-8736) LatLonShapePolygonQuery returning incorrect WITHIN results with shared boundaries
[ https://issues.apache.org/jira/browse/LUCENE-8736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16802637#comment-16802637 ] Ignacio Vera edited comment on LUCENE-8736 at 3/27/19 10:43 AM: One thing it worries me about this approach is that it is adding quite a lot of complexity and a performance penalty on the {{relate}} methods, even for INTERSECTS queries. I think the problem here is not so much the mathematical accuracy to know if an edge terminates in another but encoding/decoding distortion of those edges. In your original example we take an edge of an indexed polygon, for example LINESTRING(1 0, 2 0), then after encoding/decoding the edge becomes LINESTRING(0.999403953552 0,1.999646097422 0). On the other hand the query polygon contains the edge LINESTRING(0 0, 7 0). Therefore indexed and query polygons shared the edge because the latitude value does not change during encoding/decoding and the logic you propose handles the situation. I think this is an exception more than the general case. If the indexed edge is LINESTRING(1 1, 2 1), then after encoding/decoding the edge becomes LINESTRING( 0.999403953552 0.999823048711,1.999646097422 0.999823048711). The query polygon contains the edge LINESTRING(1 0, 1 7) which is not a shared edge anymore because the latitude value has changed and the new logic has no effect. I think this is the general case. I am more inclined to leave the {{relate}} logic simple and fast and solve this edge cases in a different way. One of the ideas I am thinking about is to use some adaptive decoding. In this case INTERSECTS and DISJOINT queries work as they do now. For WITHIN queries we decode the indexed triangles minimising the area they cover by rounding up min values and rounding down max values. This seems to work well when triangles have an area (not points or lines). was (Author: ivera): One thing it worries me about this approach is that it is adding quite a lot of complexity and a performance penalty on the {{relate}} methods, even for INTERSECTS queries. I think the problem here is not so much the mathematical accuracy to know if an edge terminates in another but encoding/decoding distortion of those edges. In your original example we take an edge of an indexed polygon, for example LINESTRING(1 0, 2 0), then after encoding and decoding the edge becomes LINESTRING(0.999403953552 0,1.999646097422 0). On the other hand the query polygon contains the edge LINESTRING(0 0, 7 0). Therefore indexed and query polygons shared the edge because the latitude value does not change one decoding and the logic you propose handles the situation. I think this is an exception more than the general case. For example If the indexed edge is LINESTRING(1 1, 2 1), then after encoding and decoding the edge becomes LINESTRING( 0.999403953552 0.999823048711,1.999646097422 0.999823048711). The query polygon contains the edge LINESTRING(1 0, 1 7) which is not a shared edge anymore because the latitude value has changed and the new logic has no effect. I think this is the general case. I am more inclined to leave the {{relate}} logic simple and fast and solve this edge cases in a different way. One of the ideas I am thinking about is to use some adaptive decoding. In this case INTERSECTS and DISJOINT queries work as they do now. For WITHIN queries we decode the indexed triangles minimising the area they cover by rounding up min values and rounding down max values. This seems to work well when triangles have an area (not points or lines). > LatLonShapePolygonQuery returning incorrect WITHIN results with shared > boundaries > - > > Key: LUCENE-8736 > URL: https://issues.apache.org/jira/browse/LUCENE-8736 > Project: Lucene - Core > Issue Type: Bug >Reporter: Nicholas Knize >Priority: Major > Attachments: LUCENE-8736.patch > > > Triangles that are {{WITHIN}} a target polygon query that also share a > boundary with the polygon are incorrectly reported as {{CROSSES}} instead of > {{INSIDE}}. This leads to incorrect {{WITHIN}} query results as demonstrated > in the following test: > {code:java} > public void testWithinFailure() throws Exception { > Directory dir = newDirectory(); > RandomIndexWriter w = new RandomIndexWriter(random(), dir); > // test polygons: > Polygon indexPoly1 = new Polygon(new double[] {4d, 4d, 3d, 3d, 4d}, new > double[] {3d, 4d, 4d, 3d, 3d}); > Polygon indexPoly2 = new Polygon(new double[] {2d, 2d, 1d, 1d, 2d}, new > double[] {6d, 7d, 7d, 6d, 6d}); > Polygon indexPoly3 = new Polygon(new double[] {1d, 1d, 0d, 0d, 1d}, new > double[] {3d, 4d, 4d, 3d, 3d}); >
[jira] [Commented] (LUCENE-8736) LatLonShapePolygonQuery returning incorrect WITHIN results with shared boundaries
[ https://issues.apache.org/jira/browse/LUCENE-8736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16802637#comment-16802637 ] Ignacio Vera commented on LUCENE-8736: -- One thing it worries me about this approach is that it is adding quite a lot of complexity and a performance penalty on the {{relate}} methods, even for INTERSECTS queries. I think the problem here is not so much the mathematical accuracy to know if an edge terminates in another but encoding/decoding distortion of those edges. In your original example we take an edge of an indexed polygon, for example LINESTRING(1 0, 2 0), then after encoding and decoding the edge becomes LINESTRING(0.999403953552 0,1.999646097422 0). On the other hand the query polygon contains the edge LINESTRING(0 0, 7 0). Therefore indexed and query polygons shared the edge because the latitude value does not change one decoding and the logic you propose handles the situation. I think this is an exception more than the general case. For example If the indexed edge is LINESTRING(1 1, 2 1), then after encoding and decoding the edge becomes LINESTRING( 0.999403953552 0.999823048711,1.999646097422 0.999823048711). The query polygon contains the edge LINESTRING(1 0, 1 7) which is not a shared edge anymore because the latitude value has changed and the new logic has no effect. I think this is the general case. I am more inclined to leave the {{relate}} logic simple and fast and solve this edge cases in a different way. One of the ideas I am thinking about is to use some adaptive decoding. In this case INTERSECTS and DISJOINT queries work as they do now. For WITHIN queries we decode the indexed triangles minimising the area they cover by rounding up min values and rounding down max values. This seems to work well when triangles have an area (not points or lines). > LatLonShapePolygonQuery returning incorrect WITHIN results with shared > boundaries > - > > Key: LUCENE-8736 > URL: https://issues.apache.org/jira/browse/LUCENE-8736 > Project: Lucene - Core > Issue Type: Bug >Reporter: Nicholas Knize >Priority: Major > Attachments: LUCENE-8736.patch > > > Triangles that are {{WITHIN}} a target polygon query that also share a > boundary with the polygon are incorrectly reported as {{CROSSES}} instead of > {{INSIDE}}. This leads to incorrect {{WITHIN}} query results as demonstrated > in the following test: > {code:java} > public void testWithinFailure() throws Exception { > Directory dir = newDirectory(); > RandomIndexWriter w = new RandomIndexWriter(random(), dir); > // test polygons: > Polygon indexPoly1 = new Polygon(new double[] {4d, 4d, 3d, 3d, 4d}, new > double[] {3d, 4d, 4d, 3d, 3d}); > Polygon indexPoly2 = new Polygon(new double[] {2d, 2d, 1d, 1d, 2d}, new > double[] {6d, 7d, 7d, 6d, 6d}); > Polygon indexPoly3 = new Polygon(new double[] {1d, 1d, 0d, 0d, 1d}, new > double[] {3d, 4d, 4d, 3d, 3d}); > Polygon indexPoly4 = new Polygon(new double[] {2d, 2d, 1d, 1d, 2d}, new > double[] {0d, 1d, 1d, 0d, 0d}); > // index polygons: > Document doc; > addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly1); > w.addDocument(doc); > addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly2); > w.addDocument(doc); > addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly3); > w.addDocument(doc); > addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly4); > w.addDocument(doc); > / search // > IndexReader reader = w.getReader(); > w.close(); > IndexSearcher searcher = newSearcher(reader); > Polygon[] searchPoly = new Polygon[] {new Polygon(new double[] {4d, 4d, > 0d, 0d, 4d}, new double[] {0d, 7d, 7d, 0d, 0d})}; > Query q = LatLonShape.newPolygonQuery(FIELDNAME, QueryRelation.WITHIN, > searchPoly); > assertEquals(4, searcher.count(q)); > IOUtils.close(w, reader, dir); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8736) LatLonShapePolygonQuery returning incorrect WITHIN results with shared boundaries
[ https://issues.apache.org/jira/browse/LUCENE-8736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16801882#comment-16801882 ] Ignacio Vera commented on LUCENE-8736: -- +1 I will look into the logic in more detail. > LatLonShapePolygonQuery returning incorrect WITHIN results with shared > boundaries > - > > Key: LUCENE-8736 > URL: https://issues.apache.org/jira/browse/LUCENE-8736 > Project: Lucene - Core > Issue Type: Bug >Reporter: Nicholas Knize >Priority: Major > Attachments: LUCENE-8736.patch > > > Triangles that are {{WITHIN}} a target polygon query that also share a > boundary with the polygon are incorrectly reported as {{CROSSES}} instead of > {{INSIDE}}. This leads to incorrect {{WITHIN}} query results as demonstrated > in the following test: > {code:java} > public void testWithinFailure() throws Exception { > Directory dir = newDirectory(); > RandomIndexWriter w = new RandomIndexWriter(random(), dir); > // test polygons: > Polygon indexPoly1 = new Polygon(new double[] {4d, 4d, 3d, 3d, 4d}, new > double[] {3d, 4d, 4d, 3d, 3d}); > Polygon indexPoly2 = new Polygon(new double[] {2d, 2d, 1d, 1d, 2d}, new > double[] {6d, 7d, 7d, 6d, 6d}); > Polygon indexPoly3 = new Polygon(new double[] {1d, 1d, 0d, 0d, 1d}, new > double[] {3d, 4d, 4d, 3d, 3d}); > Polygon indexPoly4 = new Polygon(new double[] {2d, 2d, 1d, 1d, 2d}, new > double[] {0d, 1d, 1d, 0d, 0d}); > // index polygons: > Document doc; > addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly1); > w.addDocument(doc); > addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly2); > w.addDocument(doc); > addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly3); > w.addDocument(doc); > addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly4); > w.addDocument(doc); > / search // > IndexReader reader = w.getReader(); > w.close(); > IndexSearcher searcher = newSearcher(reader); > Polygon[] searchPoly = new Polygon[] {new Polygon(new double[] {4d, 4d, > 0d, 0d, 4d}, new double[] {0d, 7d, 7d, 0d, 0d})}; > Query q = LatLonShape.newPolygonQuery(FIELDNAME, QueryRelation.WITHIN, > searchPoly); > assertEquals(4, searcher.count(q)); > IOUtils.close(w, reader, dir); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-8736) LatLonShapePolygonQuery returning incorrect WITHIN results with shared boundaries
[ https://issues.apache.org/jira/browse/LUCENE-8736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800456#comment-16800456 ] Ignacio Vera edited comment on LUCENE-8736 at 3/25/19 8:25 AM: --- Thanks [~nknize] for sharing the algorithm, looks pretty powerful. I had a look into the patch and the first thing I notice is that the test {{testLUCENE8669}} is failing. It seems that the indexed polygons are never added to the index so fix is trivial (Method {{w.addDocument(doc)}} has been removed in the patch): {code:java} -Field[] fields = LatLonShape.createIndexableFields("test", indexPoly1); -for (Field f : fields) { - doc.add(f); -} -fields = LatLonShape.createIndexableFields("test", indexPoly2); -for (Field f : fields) { - doc.add(f); -} -w.addDocument(doc); +addPolygonsToDoc(FIELDNAME, doc, indexPoly1); +addPolygonsToDoc(FIELDNAME, doc, indexPoly2); w.forceMerge(1);{code} Regarding the new approach, it seems there is still something missing as it does not matter to make the methods more precise if we do not take any action regarding the distortion of polygons due to quatization. If we change the test by translating the polygons one degree north and 1 degree east, the change does not have effect due to the encoding of the indexed polygons: {code:java} public void testWithinFailure() throws Exception { Directory dir = newDirectory(); RandomIndexWriter w = new RandomIndexWriter(random(), dir); // test polygons: Polygon indexPoly1 = new Polygon(new double[] {4d + 1d, 4d + 1d, 3d + 1d, 3d + 1d, 4d + 1d}, new double[] {3d + 1d, 4d + 1d, 4d + 1d, 3d + 1d, 3d + 1d}); Polygon indexPoly2 = new Polygon(new double[] {2d + 1d, 2d + 1d, 1d + 1d, 1d + 1d, 2d + 1d}, new double[] {6d + 1d, 7d + 1d, 7d + 1d, 6d + 1d, 6d + 1d}); Polygon indexPoly3 = new Polygon(new double[] {1d + 1d, 1d + 1d, 0d + 1d, 0d + 1d, 1d + 1d}, new double[] {3d + 1d, 4d + 1d, 4d + 1d, 3d + 1d, 3d + 1d}); Polygon indexPoly4 = new Polygon(new double[] {2d + 1d, 2d + 1d, 1d + 1d, 1d + 1d, 2d + 1d}, new double[] {0d + 1d, 1d + 1d, 1d + 1d, 0d + 1d, 0d + 1d}); // index polygons: Document doc; addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly1); w.addDocument(doc); addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly2); w.addDocument(doc); addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly3); w.addDocument(doc); addPolygonsToDoc(FIELDNAME, doc = new Document(), indexPoly4); w.addDocument(doc); / search // IndexReader reader = w.getReader(); w.close(); IndexSearcher searcher = newSearcher(reader); Polygon[] searchPoly = new Polygon[] {new Polygon(new double[] {4d + 1d, 4d + 1d, 0d + 1d, 0d + 1d, 4d + 1d}, new double[] {0d + 1d, 7d + 1d, 7d + 1d, 0d + 1d, 0d + 1d})}; Query q = LatLonShape.newPolygonQuery(FIELDNAME, QueryRelation.WITHIN, searchPoly); assertEquals(4, searcher.count(q)); IOUtils.close(w, reader, dir); }{code} I have tried to quantize the query polygon but that seems to add other issues. was (Author: ivera): Thanks [~nknize] for sharing the algorithm, looks pretty powerful. I had a look into the patch and the first thing I notice is that the test {{testLUCENE8669}} is failing. It seems that the indexed polygons are never added to the index so fix is trivial (Method {{w.addDocument(doc)}} has been removed in the patch): {code:java} -Field[] fields = LatLonShape.createIndexableFields("test", indexPoly1); -for (Field f : fields) { - doc.add(f); -} -fields = LatLonShape.createIndexableFields("test", indexPoly2); -for (Field f : fields) { - doc.add(f); -} -w.addDocument(doc); +addPolygonsToDoc(FIELDNAME, doc, indexPoly1); +addPolygonsToDoc(FIELDNAME, doc, indexPoly2); w.forceMerge(1);{code} Regarding the new approach, it seems there is still something missing. If we change the test by translating the polygons one degree north and 1 degree east, the change does not have effect due to the encoding of the indexed polygons: {code:java} public void testWithinFailure() throws Exception { Directory dir = newDirectory(); RandomIndexWriter w = new RandomIndexWriter(random(), dir); // test polygons: Polygon indexPoly1 = new Polygon(new double[] {4d + 1d, 4d + 1d, 3d + 1d, 3d + 1d, 4d + 1d}, new double[] {3d + 1d, 4d + 1d, 4d + 1d, 3d + 1d, 3d + 1d}); Polygon indexPoly2 = new Polygon(new double[] {2d + 1d, 2d + 1d, 1d + 1d, 1d + 1d, 2d + 1d}, new double[] {6d + 1d, 7d + 1d, 7d + 1d, 6d + 1d, 6d + 1d}); Polygon indexPoly3 = new Polygon(new double[] {1d + 1d, 1d + 1d, 0d + 1d, 0d + 1d, 1d + 1d}, new double[] {3d + 1d, 4d + 1d, 4d + 1d, 3d + 1d, 3d + 1d}); Polygon indexPoly4 = new Polygon(new double[] {2d + 1d, 2d + 1d, 1d + 1d, 1d + 1d, 2d + 1d