Tell me more about
spark.sql.cbo.strategy

tir. 12. des. 2023 kl. 00:25 skrev Nicholas Chammas <
nicholas.cham...@gmail.com>:

> Where exactly are you getting this information from?
>
> As far as I can tell, spark.sql.cbo.enabled has defaulted to false since
> it was introduced 7 years ago
> <https://github.com/apache/spark/commit/ae83c211257c508989c703d54f2aeec8b2b5f14d#diff-9ed2b0b7829b91eafb43e040a15247c90384e42fea1046864199fbad77527bb5R649>.
> It has never been enabled by default.
>
> And I cannot see mention of spark.sql.cbo.strategy anywhere at all in the
> code base.
>
> So again, where is this information coming from? Please link directly to
> your source.
>
>
>
> On Dec 11, 2023, at 5:45 PM, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
> You are right. By default CBO is not enabled. Whilst the CBO was the
> default optimizer in earlier versions of Spark, it has been replaced by
> the AQE in recent releases.
>
> spark.sql.cbo.strategy
>
> As I understand, The spark.sql.cbo.strategy configuration property
> specifies the optimizer strategy used by Spark SQL to generate query
> execution plans. There are two main optimizer strategies available:
>
>    -
>
>    CBO (Cost-Based Optimization): The default optimizer strategy, which
>    analyzes the query plan and estimates the execution costs associated with
>    each operation. It uses statistics to guide its decisions, selecting the
>    plan with the lowest estimated cost.
>    -
>
>    CBO-Like (Cost-Based Optimization-Like): A simplified optimizer
>    strategy that mimics some of the CBO's logic, but without the ability to
>    estimate costs. This strategy is faster than CBO for simple queries, but
>    may not produce the most efficient plan for complex queries.
>
> The spark.sql.cbo.strategy property can be set to either CBO or CBO-Like.
> The default value is AUTO, which means that Spark will automatically
> choose the most appropriate strategy based on the complexity of the query
> and availability of statistic
>
>
> Mich Talebzadeh,
> Distinguished Technologist, Solutions Architect & Engineer
> London
> United Kingdom
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Mon, 11 Dec 2023 at 17:11, Nicholas Chammas <nicholas.cham...@gmail.com>
> wrote:
>
>>
>> On Dec 11, 2023, at 6:40 AM, Mich Talebzadeh <mich.talebza...@gmail.com>
>> wrote:
>>
>> By default, the CBO is enabled in Spark.
>>
>>
>> Note that this is not correct. AQE is enabled
>> <https://github.com/apache/spark/blob/8235f1d56bf232bb713fe24ff6f2ffdaf49d2fcc/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L664-L669>
>>  by
>> default, but CBO isn’t
>> <https://github.com/apache/spark/blob/8235f1d56bf232bb713fe24ff6f2ffdaf49d2fcc/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2694-L2699>
>> .
>>
>
>

Reply via email to