Re: [SQL] s.s.a.coalescePartitions.parallelismFirst true but recommends false

2021-09-07 Thread Jacek Laskowski
Thanks Wenchen. If it's ever asked on SO I'm simply gonna quote you :)

Pozdrawiam,
Jacek Laskowski

https://about.me/JacekLaskowski
"The Internals Of" Online Books 
Follow me on https://twitter.com/jaceklaskowski




On Tue, Sep 7, 2021 at 6:58 AM Wenchen Fan  wrote:

> This is correct. It's true by default so that AQE doesn't have performance
> regression. If you run a benchmark, larger parallelism usually means better
> performance. However, it's recommended to set it to false, so that AQE can
> give better resource utilization, which is good for a busy Spark cluster.
>
> On Fri, Sep 3, 2021 at 7:33 PM Jacek Laskowski  wrote:
>
>> Hi,
>>
>> Found this new spark.sql.adaptive.coalescePartitions.parallelismFirst
>> config property [1] with the default value `true` but the descriptions says
>> the opposite:
>>
>> > It's recommended to set this config to false
>>
>> Is this OK and I'm misreading it?
>>
>> [1]
>> https://github.com/apache/spark/blob/54cca7f82ecf23e062bb4f6d68697abec2dbcc5b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L519-L530
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> https://about.me/JacekLaskowski
>> "The Internals Of" Online Books 
>> Follow me on https://twitter.com/jaceklaskowski
>>
>> 
>>
>


Re: [SQL] s.s.a.coalescePartitions.parallelismFirst true but recommends false

2021-09-06 Thread Wenchen Fan
This is correct. It's true by default so that AQE doesn't have performance
regression. If you run a benchmark, larger parallelism usually means better
performance. However, it's recommended to set it to false, so that AQE can
give better resource utilization, which is good for a busy Spark cluster.

On Fri, Sep 3, 2021 at 7:33 PM Jacek Laskowski  wrote:

> Hi,
>
> Found this new spark.sql.adaptive.coalescePartitions.parallelismFirst
> config property [1] with the default value `true` but the descriptions says
> the opposite:
>
> > It's recommended to set this config to false
>
> Is this OK and I'm misreading it?
>
> [1]
> https://github.com/apache/spark/blob/54cca7f82ecf23e062bb4f6d68697abec2dbcc5b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L519-L530
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://about.me/JacekLaskowski
> "The Internals Of" Online Books 
> Follow me on https://twitter.com/jaceklaskowski
>
> 
>


[SQL] s.s.a.coalescePartitions.parallelismFirst true but recommends false

2021-09-03 Thread Jacek Laskowski
Hi,

Found this new spark.sql.adaptive.coalescePartitions.parallelismFirst
config property [1] with the default value `true` but the descriptions says
the opposite:

> It's recommended to set this config to false

Is this OK and I'm misreading it?

[1]
https://github.com/apache/spark/blob/54cca7f82ecf23e062bb4f6d68697abec2dbcc5b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L519-L530

Pozdrawiam,
Jacek Laskowski

https://about.me/JacekLaskowski
"The Internals Of" Online Books 
Follow me on https://twitter.com/jaceklaskowski