Which Spark version are you using? SPARK-36444[1] and SPARK-38138[2] may be related, please test w/ the patched version or disable DPP by setting spark.sql.optimizer.dynamicPartitionPruning.enabled=false to see if it helps.
[1] https://issues.apache.org/jira/browse/SPARK-36444 [2] https://issues.apache.org/jira/browse/SPARK-38138 Thanks, Cheng Pan On Nov 2, 2022 at 00:14:34, Enrico Minack <i...@enrico.minack.dev> wrote: > Hi Tanin, > > running your test with option "spark.sql.planChangeLog.level" set to > "info" or "warn" (depending on your Spark log level) will show you > insights into the planning (which rules are applied, how long rules > take, how many iterations are done). > > Hoping this helps, > Enrico > > > Am 25.10.22 um 21:54 schrieb Tanin Na Nakorn: > > Hi All, > > > Our data job is very complex (e.g. 100+ joins), and we have switched > > from RDD to Dataset recently. > > > We've found that the unit test takes much longer. We profiled it and > > have found that it's the planning phase that is slow, not execution. > > > I wonder if anyone has encountered this issue before and if there's a > > way to make the planning phase faster (e.g. maybe disabling certain > > optimizers). > > > Any thoughts or input would be appreciated. > > > Thank you, > > Tanin > > > > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >