Hi Pietro,
As you see from our conversation, for the time being, you can disable Spark
Adaptive Query processing by "spark.sql.adaptive.enabled=false". I believe
this will fix this issue.
Adam and I will dive deep in this issue and fix this bug.
Thanks,
Jia
On Thu, Aug 5, 2021 at 3:10 PM Adam
I don't think that's the issue. The join detection is the same for both
broadcast and non-broadcast, so the same match statement needs to run
either way. I created an issue for what I found from the stack trace (don't
have a copy of the stack trace to share easily):
Hi Adam,
I believe the issue is caused by this chunk of code:
https://github.com/apache/incubator-sedona/blob/master/sql/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/JoinQueryDetector.scala#L84-L109
If we move the broadcast join detection as the first part of the detector
and set
Okay I actually did encounter it today. It happens when you have AQE
enabled. Looked into it a little bit and might have to rework the
SpatialIndexExec node to extend BroadcastExchangeLike or maybe even
directly BroadcastExchangeExec, but that might only be compatible with
Spark 3+, so not sure
I haven't encountered any issues with it but I can investigate with the
full stacktrace. Also which version of Spark is this with?
Adam
On Tue, Aug 3, 2021 at 4:25 AM Jia Yu wrote:
> Hi Pietro,
>
> Can you please share the full stacktrace of this scala.MatchError? I tried
> a couple test cases
Hi Pietro,
Can you please share the full stacktrace of this scala.MatchError? I tried
a couple test cases but wasn't able to reproduce this error on my end. In
fact, another user complained about the same issue a while back. I suspect
there is a bug for this part.
I also CCed the contributor of
Hello Jia,
thank you so much for your support.
We have been able to complete our task and to perform a few runs with
different number of partitions.
At the moment we obtained the best performance when running on 20 nodes and
setting the number of partitions to be 2000. With this configuration,
Hi Pietro,
A few tips to optimize your join:
1. Mix DF and RDD together and use RDD API for the join part. See the
example here:
https://github.com/apache/incubator-sedona/blob/master/binder/ApacheSedonaSQL_SpatialJoin_AirportsPerCountry.ipynb
2. When use
To whom it may concern,
we reported the following Sedona behaviour and would like to ask your
opinion on how we can otpimize it.
Our aim is to perform a inner spatial join between a points_df and a
polygon_df when a point in points_df is contained in a polygon from
polygons_df.
Below you can