Kristin Cowalcijk created SEDONA-560:
----------------------------------------
Summary: Spatial join involving dataframe containing 0 partition
throws exception
Key: SEDONA-560
URL: https://issues.apache.org/jira/browse/SEDONA-560
Project: Apache Sedona
Issue Type: Bug
Affects Versions: 1.6.0
Reporter: Kristin Cowalcijk
Sedona cannot handle dataframes containing 0 partitions properly when
performing spatial join. For example:
{code:python}
schema = StructType([
StructField("id", IntegerType(), True)
])
# Create empty RDD
empty_rdd = spark.sparkContext.emptyRDD()
# Create empty DataFrame
empty_df = spark.createDataFrame(empty_rdd, schema)
df_point = spark.range(0, 10).toDF("id").withColumn('geom', expr("ST_Point(id,
id)"))
df_poly = empty_df.withColumn("poly", expr("ST_Buffer(ST_Point(id, id),
2)")).drop("geom")
spark.conf.set("sedona.join.autoBroadcastJoinThreshold", "-1")
df_point.join(broadcast(df_poly), expr("ST_Intersects(poly, geom)")).count()
{code}
failed with the following error message:
{code:java}
Py4JJavaError: An error occurred while calling o107.showString.
: java.util.NoSuchElementException: next on empty iterator
at scala.collection.Iterator$$anon$2.next(Iterator.scala:41)
at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
at scala.collection.IterableLike.head(IterableLike.scala:109)
at scala.collection.IterableLike.head$(IterableLike.scala:108)
at scala.collection.AbstractIterable.head(Iterable.scala:56)
at
org.apache.spark.sql.sedona_sql.strategy.join.SpatialIndexExec.doExecuteBroadcast(SpatialIndexExec.scala:63)
{code}
This does not only happen to broadcast join, range join also has problems:
{code:python}
df_point.join(df_poly, expr("ST_Intersects(poly, geom)")).count()
24/05/27 10:50:13.989 WARN RangeJoinExec: [SedonaSQL] Join dominant side
partition number 8 is larger than 1/2 of the dominant side count 10
24/05/27 10:50:13.989 WARN RangeJoinExec: [SedonaSQL] Try to use follower side
partition number 0
Number of partitions must be >= 0
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)