Kontinuation commented on code in PR #1208:
URL: https://github.com/apache/sedona/pull/1208#discussion_r1467198376
##########
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/JoinQueryDetector.scala:
##########
@@ -140,6 +134,18 @@ class JoinQueryDetector(sparkSession: SparkSession)
extends Strategy {
case Some(And(extraCondition, predicate: RS_Predicate)) =>
getRasterJoinDetection(left, right, predicate, Some(extraCondition))
// For distance joins we execute the actual predicate (condition) and
not only extraConditions.
+ case Some(ST_DWithin(Seq(leftShape, rightShape, distance))) =>
+ Some(JoinQueryDetection(left, right, leftShape, rightShape,
SpatialPredicate.INTERSECTS, isGeography = false, condition, Some(distance)))
+ case Some(And(ST_DWithin(Seq(leftShape, rightShape, distance)), _)) =>
+ Some(JoinQueryDetection(left, right, leftShape, rightShape,
SpatialPredicate.INTERSECTS, isGeography = false, condition, Some(distance)))
+ case Some(And(_, ST_DWithin(Seq(leftShape, rightShape, distance)))) =>
+ Some(JoinQueryDetection(left, right, leftShape, rightShape,
SpatialPredicate.INTERSECTS, isGeography = false, condition, Some(distance)))
+ case Some(ST_DWithin(Seq(leftShape, rightShape, distance,
useSpheroid))) =>
+ Some(JoinQueryDetection(left, right, leftShape, rightShape,
SpatialPredicate.INTERSECTS, isGeography =
useSpheroid.eval().asInstanceOf[Boolean], condition, Some(distance)))
+ case Some(And(ST_DWithin(Seq(leftShape, rightShape, distance,
useSpheroid)), _)) =>
+ Some(JoinQueryDetection(left, right, leftShape, rightShape,
SpatialPredicate.INTERSECTS, isGeography =
useSpheroid.eval().asInstanceOf[Boolean], condition, Some(distance)))
+ case Some(And(_, ST_DWithin(Seq(leftShape, rightShape, distance,
useSpheroid)))) =>
+ Some(JoinQueryDetection(left, right, leftShape, rightShape,
SpatialPredicate.INTERSECTS, isGeography =
useSpheroid.eval().asInstanceOf[Boolean], condition, Some(distance)))
Review Comment:
`useSpheroid.eval()` will raise an exception when `useSpheroid` cannot be
simplified as a numeric literal. Here is an example:
```python
df_point = spark.range(10).withColumn("pt", expr("ST_Point(id, id)"))
df_polygon = spark.range(10).withColumn("poly", expr("ST_Point(id, id +
0.01)"))
df_point.alias("a").join(df_polygon.alias("b"), expr("ST_DWithin(pt, poly,
10000, a.`id` % 2 = 0)")).show()
```
This query fails with the following message:
```
Py4JJavaError: An error occurred while calling o400.showString.
: org.apache.spark.SparkException: [INTERNAL_ERROR] Cannot evaluate
expression: id#1026L
at
org.apache.spark.SparkException$.internalError(SparkException.scala:92)
at
org.apache.spark.SparkException$.internalError(SparkException.scala:96)
at
org.apache.spark.sql.errors.QueryExecutionErrors$.cannotEvaluateExpressionError(QueryExecutionErrors.scala:65)
at
org.apache.spark.sql.catalyst.expressions.Unevaluable.eval(Expression.scala:385)
at
org.apache.spark.sql.catalyst.expressions.Unevaluable.eval$(Expression.scala:384)
at
org.apache.spark.sql.catalyst.expressions.AttributeReference.eval(namedExpressions.scala:260)
at
org.apache.spark.sql.catalyst.expressions.DivModLike.eval(arithmetic.scala:670)
at
org.apache.spark.sql.catalyst.expressions.DivModLike.eval$(arithmetic.scala:664)
at
org.apache.spark.sql.catalyst.expressions.Remainder.eval(arithmetic.scala:930)
at
org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:664)
at
org.apache.spark.sql.sedona_sql.strategy.join.JoinQueryDetector.apply(JoinQueryDetector.scala:144)
at
org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$1(QueryPlanner.scala:63)
```
We can fall back to cartesian join in this case since it may not be quite
common, but we should not prevent such queries from running.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]