Hi, I've upgraded from Sedona 1.3.1-incubating to 1.4.0 but am still seeing a significant slowdown in spatial joins as task processing increases.
For instance, for a spatial partition count of 4102, the first 3000 proceeds under 3 sec but then gets progressively worse for the same amount of shuffle read records. The final 100 tasks take over 1 hour each to complete. I've tried to disable the global index which seemed to reduce failures but am still seeing the performance degradation: // sedona indexing properties sparkSession.conf.set("sedona.join.gridtype", "kdbtree") sparkSession.conf.set("sedona.global.index", "false") sparkSession.conf.set("sedona.join.indexbuildside", "left") if (appConf.getNrInputPartitions > 0) { sparkSession.conf.set("spark.sql.shuffle.partitions", appConf.getNrInputPartitions.toString) sparkSession.conf.set("sedona.join.numpartition", (appConf.getNrInputPartitions).toString) sparkSession.conf.set("spark.default.parallelism", appConf.getNrInputPartitions.toString) LOG.info(s"Set spark.default.parallelism, spark.sql.shuffle.partitions to: ${appConf.getNrInputPartitions}") } I am using a RangeJoin with the dataframe API: st_intersects(geom, ST_collect(ST_Point(start_lon, start_lat), ST_Point(end_lon, end_lat))) [cid:image002.png@01D95CE1.C0A07B00] [cid:image001.png@01D95CE1.84AFEFE0] Is this a bug or are there steps or settings I could use to get to more stable performance? Thanks Trang