Re: {EXT} Re: Spark sql slowness in Spark 3.0.1

2022-04-15 Thread Anil Dasari
Hello, DF is checkpointed here. So it is written to HDFS. DF is written in paraquet format and used default parallelism. Thanks. From: wilson Date: Thursday, April 14, 2022 at 2:54 PM To: user@spark.apache.org Subject: {EXT} Re: Spark sql slowness in Spark 3.0.1 just curious, where to write

Re: Spark sql slowness in Spark 3.0.1

2022-04-14 Thread wilson
just curious, where to write? Anil Dasari wrote: We are upgrading spark from 2.4.7 to 3.0.1. we use spark sql (hive) to checkpoint data frames (intermediate data). DF write is very slow in 3.0.1 compared to 2.4.7. - To

Re: Spark sql slowness in Spark 3.0.1

2022-04-14 Thread Sergey B.
The suggestion is to check: 1. Used format for write 2. Used parallelism On Thu, Apr 14, 2022 at 7:13 PM Anil Dasari wrote: > Hello, > > > > We are upgrading spark from 2.4.7 to 3.0.1. we use spark sql (hive) to > checkpoint data frames (intermediate data). DF write is very slow in 3.0.1 >

Spark sql slowness in Spark 3.0.1

2022-04-14 Thread Anil Dasari
Hello, We are upgrading spark from 2.4.7 to 3.0.1. we use spark sql (hive) to checkpoint data frames (intermediate data). DF write is very slow in 3.0.1 compared to 2.4.7. Have read the release notes and there were no major changes except managed tables and adaptive scheduling. We are not