Which Spark version are you using?
SPARK-36444[1] and SPARK-38138[2] may be related, please test w/ the
patched version or disable DPP by setting
spark.sql.optimizer.dynamicPartitionPruning.enabled=false to see if it
helps.
[1] https://issues.apache.org/jira/browse/SPARK-36444
[2]
Hi Tanin,
running your test with option "spark.sql.planChangeLog.level" set to
"info" or "warn" (depending on your Spark log level) will show you
insights into the planning (which rules are applied, how long rules
take, how many iterations are done).
Hoping this helps,
Enrico
Am 25.10.22
Hi All,
Our data job is very complex (e.g. 100+ joins), and we have switched from
RDD to Dataset recently.
We've found that the unit test takes much longer. We profiled it and have
found that it's the planning phase that is slow, not execution.
I wonder if anyone has encountered this issue