Hello,
DF is checkpointed here. So it is written to HDFS. DF is written in paraquet
format and used default parallelism.
Thanks.
From: wilson
Date: Thursday, April 14, 2022 at 2:54 PM
To: user@spark.apache.org
Subject: {EXT} Re: Spark sql slowness in Spark 3.0.1
just curious, where to write
just curious, where to write?
Anil Dasari wrote:
We are upgrading spark from 2.4.7 to 3.0.1. we use spark sql (hive) to
checkpoint data frames (intermediate data). DF write is very slow in
3.0.1 compared to 2.4.7.
-
To
The suggestion is to check:
1. Used format for write
2. Used parallelism
On Thu, Apr 14, 2022 at 7:13 PM Anil Dasari wrote:
> Hello,
>
>
>
> We are upgrading spark from 2.4.7 to 3.0.1. we use spark sql (hive) to
> checkpoint data frames (intermediate data). DF write is very slow in 3.0.1
>
Hello,
We are upgrading spark from 2.4.7 to 3.0.1. we use spark sql (hive) to
checkpoint data frames (intermediate data). DF write is very slow in 3.0.1
compared to 2.4.7.
Have read the release notes and there were no major changes except managed
tables and adaptive scheduling. We are not