Re: The Dataset unit test is much slower than the RDD unit test (in Scala)

Cheng Pan Tue, 01 Nov 2022 09:24:26 -0700

Which Spark version are you using?

SPARK-36444[1] and SPARK-38138[2] may be related, please test w/ the
patched version or disable DPP by setting
spark.sql.optimizer.dynamicPartitionPruning.enabled=false to see if it
helps.


[1] https://issues.apache.org/jira/browse/SPARK-36444
[2] https://issues.apache.org/jira/browse/SPARK-38138


Thanks,
Cheng Pan


On Nov 2, 2022 at 00:14:34, Enrico Minack <i...@enrico.minack.dev> wrote:

> Hi Tanin,
>
> running your test with option "spark.sql.planChangeLog.level" set to
> "info" or "warn" (depending on your Spark log level) will show you
> insights into the planning (which rules are applied, how long rules
> take, how many iterations are done).
>
> Hoping this helps,
> Enrico
>
>
> Am 25.10.22 um 21:54 schrieb Tanin Na Nakorn:
>
> Hi All,
>
>
> Our data job is very complex (e.g. 100+ joins), and we have switched
>
> from RDD to Dataset recently.
>
>
> We've found that the unit test takes much longer. We profiled it and
>
> have found that it's the planning phase that is slow, not execution.
>
>
> I wonder if anyone has encountered this issue before and if there's a
>
> way to make the planning phase faster (e.g. maybe disabling certain
>
> optimizers).
>
>
> Any thoughts or input would be appreciated.
>
>
> Thank you,
>
> Tanin
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Re: The Dataset unit test is much slower than the RDD unit test (in Scala)

Reply via email to