[
https://issues.apache.org/jira/browse/SEDONA-178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17617661#comment-17617661
]
Martin Andersson commented on SEDONA-178:
-----------------------------------------
I think the minimun change would be to update the documentation regarding
distance joins and make Circle throw an exception for geometries other than
points.
I see the following options:
1. Not support distance joins for geometries != points and instruct users to
workaround that with st_buffer.
2. Replace Cirkle with BufferedGeometry. Somewhat hacky. BufferedGeometry is
not a valid geometry recognized by standards and only some operations work.
3. Make DistanceJoinExec.scala rewrite 'st_distance(a, b) <= x' to
'st_intersects(st_buffer(a, x), b)'. This would give wrong results in edge
cases since the buffer is approximated with a polygon. That could be fixed by
adding a margin to st_buffer and then filter the results with the actual
predicate. 'st_intersects(st_buffer(a, x + margin), b) and st_distance(a, b) <=
x'
4. Other...
I'm just thinking out loud here about buffering for partitioning and then
executing the actual predicate.
Adding an optional buffer parameter to the spatial partitioner wouldn't work.
Deduplication needs the extent of the buffered geometry as well. See
JudgementHelper.java:34
Another option would be to add a wrapper to geometries. Postgis has taken this
route and i think H2 has as well. The wrapper would be passed around to
partitioner and join classes instead of the geometry. This would also make it
easier to add support for prepared geometries. You can read more about prepared
geometries here:
https://blog.cleverelephant.ca/2008/10/postgis-performance-prepared-geometry.html
wrapper:
class SedonaGeometry {
Envelope // used for partitioning and deduplication. Extended for distance
joins. Otherwise it's the envelope of the geometry
Geometry // Pure JTS geometry
}
The join predicate could be an actual predicate (like
java.util.function.BiPredicate<Geometry, Geometry>) instead of an enum.
This is essentially how postgis uses prepared geometries. I think that the
prepared geometries are only computed if the cost estimate thinks it's faster.
wrapper:
PostgisGeometry {
Geometry geom;
PreparedGeometry preparedGeom; // optionally null
}
intersects-predicate:
boolean test(PostgisGeometry g1, PostgisGeometry g2) {
if (g1.preparedGeom != null) {
return g1.preparedGeom.intersects(g2.geom);
} else {
return g1.geom.intersects(g2.geom);
}
}
A while ago I did a hacky test in Sedona with prepared geometries. It can be
used in both indexed and non indexed joins. For large, complex, polygons the
join was 5 times faster.
> Correctness issue in distance join queries
> ------------------------------------------
>
> Key: SEDONA-178
> URL: https://issues.apache.org/jira/browse/SEDONA-178
> Project: Apache Sedona
> Issue Type: Bug
> Reporter: Martin Andersson
> Priority: Major
>
> Hi,
> We are seeing erroneous results for some distance join queries.
>
> Case 1:
> The following query gives no output even though the distance between the
> geometries is 1.
> {{select *}}
> {{from (select ST_LineFromText('Linestring(1 1, 1 4)') as geom) a}}
> {{join (select st_point(1.0,5.0) as geom) b}}
> {{on st_distance(a.geom, b.geom) < 1.4}}
> I think the issue boils down to a misuse of/error in Circle class.
> DistanceJoinExec.scala:60 will create a Circle from the linestring and a
> radius of 1.4.
> Circle will compute the center point (1 2.5) and radius (1.5) for the
> linestring. The actual radius used is max(radius(linestring), 1.4). See
> Circle.java:80
> For the query to work the Circle needs a radius 1.4 _larger_ than the
> linestring. Like this:
> circle = new Circle(geom, 0.0);
> circle.setRadius(circle.getRadius() + 1.4)
>
> Case 2:
> The following query matches the geometries even though the distance is not
> less than 0.1. Actual distance is 1.
> {{select *}}
> {{from (select ST_LineFromText('Linestring(1 1, 1 3, 3 3)') as geom) a}}
> {{join (select st_point(2.0,2.0) as geom) b}}
> {{on st_distance(a.geom, b.geom) < 0.1}}
> Pseudo code for the join condition:
> new Circle(a.geom, 0.1).covers(b.geom)
> The circle does cover the point but the linestring is further away from the
> point than 0.1
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)