Re: The Dataset unit test is much slower than the RDD unit test (in Scala)

2022-11-01 Thread Cheng Pan
Which Spark version are you using? SPARK-36444[1] and SPARK-38138[2] may be related, please test w/ the patched version or disable DPP by setting spark.sql.optimizer.dynamicPartitionPruning.enabled=false to see if it helps. [1] https://issues.apache.org/jira/browse/SPARK-36444 [2]

Re: The Dataset unit test is much slower than the RDD unit test (in Scala)

2022-11-01 Thread Enrico Minack
Hi Tanin, running your test with option "spark.sql.planChangeLog.level" set to "info" or "warn" (depending on your Spark log level) will show you insights into the planning (which rules are applied, how long rules take, how many iterations are done). Hoping this helps, Enrico Am 25.10.22

The Dataset unit test is much slower than the RDD unit test (in Scala)

2022-10-25 Thread Tanin Na Nakorn
Hi All, Our data job is very complex (e.g. 100+ joins), and we have switched from RDD to Dataset recently. We've found that the unit test takes much longer. We profiled it and have found that it's the planning phase that is slow, not execution. I wonder if anyone has encountered this issue