Tell me more about spark.sql.cbo.strategy
tir. 12. des. 2023 kl. 00:25 skrev Nicholas Chammas < nicholas.cham...@gmail.com>: > Where exactly are you getting this information from? > > As far as I can tell, spark.sql.cbo.enabled has defaulted to false since > it was introduced 7 years ago > <https://github.com/apache/spark/commit/ae83c211257c508989c703d54f2aeec8b2b5f14d#diff-9ed2b0b7829b91eafb43e040a15247c90384e42fea1046864199fbad77527bb5R649>. > It has never been enabled by default. > > And I cannot see mention of spark.sql.cbo.strategy anywhere at all in the > code base. > > So again, where is this information coming from? Please link directly to > your source. > > > > On Dec 11, 2023, at 5:45 PM, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > > You are right. By default CBO is not enabled. Whilst the CBO was the > default optimizer in earlier versions of Spark, it has been replaced by > the AQE in recent releases. > > spark.sql.cbo.strategy > > As I understand, The spark.sql.cbo.strategy configuration property > specifies the optimizer strategy used by Spark SQL to generate query > execution plans. There are two main optimizer strategies available: > > - > > CBO (Cost-Based Optimization): The default optimizer strategy, which > analyzes the query plan and estimates the execution costs associated with > each operation. It uses statistics to guide its decisions, selecting the > plan with the lowest estimated cost. > - > > CBO-Like (Cost-Based Optimization-Like): A simplified optimizer > strategy that mimics some of the CBO's logic, but without the ability to > estimate costs. This strategy is faster than CBO for simple queries, but > may not produce the most efficient plan for complex queries. > > The spark.sql.cbo.strategy property can be set to either CBO or CBO-Like. > The default value is AUTO, which means that Spark will automatically > choose the most appropriate strategy based on the complexity of the query > and availability of statistic > > > Mich Talebzadeh, > Distinguished Technologist, Solutions Architect & Engineer > London > United Kingdom > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Mon, 11 Dec 2023 at 17:11, Nicholas Chammas <nicholas.cham...@gmail.com> > wrote: > >> >> On Dec 11, 2023, at 6:40 AM, Mich Talebzadeh <mich.talebza...@gmail.com> >> wrote: >> >> By default, the CBO is enabled in Spark. >> >> >> Note that this is not correct. AQE is enabled >> <https://github.com/apache/spark/blob/8235f1d56bf232bb713fe24ff6f2ffdaf49d2fcc/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L664-L669> >> by >> default, but CBO isn’t >> <https://github.com/apache/spark/blob/8235f1d56bf232bb713fe24ff6f2ffdaf49d2fcc/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2694-L2699> >> . >> > >