[
https://issues.apache.org/jira/browse/LUCENE-10654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578674#comment-17578674
]
Nick Knize commented on LUCENE-10654:
-------------------------------------
Nightly test failure on XY bounding box:
{code:java}
Reproduce with: gradlew :lucene:core:test --tests
"org.apache.lucene.document.TestShapeDocValues.testXYPolygonBBox"
-Ptests.jvms=4 -Ptests.haltonfailure=false
-Ptests.jvmargs=-XX:TieredStopAtLevel=1 -Ptests.seed=ABDF070B81479950
-Ptests.multiplier=2 -Ptests.nightly=true -Ptests.badapples=false
-Ptests.gui=true -Ptests.file.encoding=ISO-8859-1
-Ptests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene/Lucene-NightlyTests-main/test-data/enwiki.random.lines.txt
{code}
{code:java}
1 tests failed.
FAILED: org.apache.lucene.document.TestShapeDocValues.testXYPolygonBBox
Error Message:
java.lang.AssertionError: expected:<-2.028229934961692E32> but
was:<-2.026382696309321E32>
{code}
This is caused because the {{{TestUtil.nextPolygon}}} is producing a polygon
with an extruded colinear self intersecting vertex and the
{{BaseXYShapeTestCase}} is not throwing this as an invalid polygon because the
test case uses {{randomBoolean}}. The simple fix is to switch the TestCase to
always throw an exception on invalid polygons so we never test with a
non-compliant polygon. This passed the queries because the tessellator would
filter out the dirty vertext. This test failed because the dirty vertext just
happened to be the minimum X value. So this does expose an inconsistency where
an invalid polygon will have a bounding box inconsistent with the raw geometry.
I think that's okay because we have API guardrails to enable or disable strict
validation and I don't think that should be removed.
I will open a PR to switch the base test cases over to strict geometry
validation instead of random validation.
> New companion doc value format for LatLonShape and XYShape field types
> ----------------------------------------------------------------------
>
> Key: LUCENE-10654
> URL: https://issues.apache.org/jira/browse/LUCENE-10654
> Project: Lucene - Core
> Issue Type: New Feature
> Reporter: Nick Knize
> Priority: Major
> Fix For: 9.4
>
> Time Spent: 7h 20m
> Remaining Estimate: 0h
>
> {{XYDocValuesField}} provides doc value support for {{XYPoint}}.
> {{LatLonDocValuesField}} provides docvalue support for {{LatLonPoint}}.
> However, neither {{LatLonShape}} nor {{XYShape}} currently have a docvalue
> format.
> This lack of doc value support for shapes means facets, aggregations, and
> IndexOrDocValues queries are currently not possible for Shape field types.
> This gap needs be closed in lucene.
> To support IndexOrDocValues queries along with various geometry aggregations
> and facets, the ability to compute the spatial relation with the doc value is
> needed. This is straightforward with {{XYPoint}} and {{LatLonPoint}} since
> the doc value encoding is nothing more than a simple 2D integer encoding of
> the x,y and lat,lon dimensional components. Accomplishing the same with a
> naive integer encoded binary representation for N-vertex shapes would be
> costly.
> {{ComponentTree}} already provides an efficient in memory structure for
> quickly computing spatial relations over Shape types based on a binary tree
> of tessellated triangles provided by the {{Tessellator}}. Furthermore, this
> tessellation is already computed at index time. If we create an on-disk
> representation of {{ComponentTree}} 's binary tree of tessellated triangles
> and use this as the doc value {{binaryValue}} format we will be able to
> efficiently compute spatial relations with this binary representation and
> achieve the same facet/aggregation result over shapes as we can with points
> today (e.g., grid facets, centroid, area, etc).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]