[ https://issues.apache.org/jira/browse/SPARK-29335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969494#comment-16969494 ]
venkata yerubandi commented on SPARK-29335: ------------------------------------------- Srini Even i am having the issue now , were you able to fix this issue -thanks -venkat > Cost Based Optimizer stats are not used while evaluating query plans in Spark > Sql > --------------------------------------------------------------------------------- > > Key: SPARK-29335 > URL: https://issues.apache.org/jira/browse/SPARK-29335 > Project: Spark > Issue Type: Question > Components: Optimizer > Affects Versions: 2.3.0 > Environment: We tried to execute the same using Spark-sql and Thrify > server using SQLWorkbench but we are not able to use the stats. > Reporter: Srini E > Priority: Major > Labels: Question, stack-overflow > Attachments: explain_plan_cbo_spark.txt > > > We are trying to leverage CBO for getting better plan results for few > critical queries run thru spark-sql or thru thrift server using jdbc driver. > Following settings added to spark-defaults.conf > {code} > spark.sql.cbo.enabled true > spark.experimental.extrastrategies intervaljoin > spark.sql.cbo.joinreorder.enabled true > {code} > > The tables that we are using are not partitioned. > {code} > spark-sql> analyze table arrow.t_fperiods_sundar compute statistics ; > analyze table arrow.t_fperiods_sundar compute statistics for columns eid, > year, ptype, absref, fpid , pid ; > analyze table arrow.t_fdata_sundar compute statistics ; > analyze table arrow.t_fdata_sundar compute statistics for columns eid, fpid, > absref; > {code} > Analyze completed success fully. > Describe extended , does not show column level stats data and queries are not > leveraging table or column level stats . > we are using Oracle as our Hive Catalog store and not Glue . > *When we are using spark sql and running queries we are not able to see the > stats in use in the explain plan and we are not sure if cbo is put to use.* > *A quick response would be helpful.* > *Explain Plan:* > Following Explain command does not reference to any Statistics usage. > > {code} > spark-sql> *explain extended select a13.imnem,a13.fvalue,a12.ptype,a13.absref > from arrow.t_fperiods_sundar a12, arrow.t_fdata_sundar a13 where a12.eid = > a13.eid and a12.fpid = a13.fpid and a13.absref = 'Y2017' and a12.year = 2017 > and a12.ptype = 'A' and a12.eid = 29940 and a12.PID is NULL ;* > > 19/09/05 14:15:15 INFO FileSourceStrategy: Pruning directories with: > 19/09/05 14:15:15 INFO FileSourceStrategy: Post-Scan Filters: > isnotnull(ptype#4546),isnotnull(year#4545),isnotnull(eid#4542),(year#4545 = > 2017),(ptype#4546 = A),(eid#4542 = > 29940),isnull(PID#4527),isnotnull(fpid#4523) > 19/09/05 14:15:15 INFO FileSourceStrategy: Output Data Schema: struct<FPID: > decimal(38,0), PID: string, EID: decimal(10,0), YEAR: int, PTYPE: string ... > 3 more fields> > 19/09/05 14:15:15 INFO FileSourceScanExec: Pushed Filters: > IsNotNull(PTYPE),IsNotNull(YEAR),IsNotNull(EID),EqualTo(YEAR,2017),EqualTo(PTYPE,A),EqualTo(EID,29940),IsNull(PID),IsNotNull(FPID) > 19/09/05 14:15:15 INFO FileSourceStrategy: Pruning directories with: > 19/09/05 14:15:15 INFO FileSourceStrategy: Post-Scan Filters: > isnotnull(absref#4569),(absref#4569 = > Y2017),isnotnull(fpid#4567),isnotnull(eid#4566),(eid#4566 = 29940) > 19/09/05 14:15:15 INFO FileSourceStrategy: Output Data Schema: struct<IMNEM: > string, FVALUE: string, EID: decimal(10,0), FPID: decimal(10,0), ABSREF: > string ... 3 more fields> > 19/09/05 14:15:15 INFO FileSourceScanExec: Pushed Filters: > IsNotNull(ABSREF),EqualTo(ABSREF,Y2017),IsNotNull(FPID),IsNotNull(EID),EqualTo(EID,29940) > == Parsed Logical Plan == > 'Project ['a13.imnem, 'a13.fvalue, 'a12.ptype, 'a13.absref] > +- 'Filter (((('a12.eid = 'a13.eid) && ('a12.fpid = 'a13.fpid)) && > (('a13.absref = Y2017) && ('a12.year = 2017))) && ((('a12.ptype = A) && > ('a12.eid = 29940)) && isnull('a12.PID))) > +- 'Join Inner > :- 'SubqueryAlias a12 > : +- 'UnresolvedRelation `arrow`.`t_fperiods_sundar` > +- 'SubqueryAlias a13 > +- 'UnresolvedRelation `arrow`.`t_fdata_sundar` > > == Analyzed Logical Plan == > imnem: string, fvalue: string, ptype: string, absref: string > Project [imnem#4548, fvalue#4552, ptype#4546, absref#4569] > +- Filter ((((eid#4542 = eid#4566) && (cast(fpid#4523 as decimal(38,0)) = > cast(fpid#4567 as decimal(38,0)))) && ((absref#4569 = Y2017) && (year#4545 = > 2017))) && (((ptype#4546 = A) && (cast(eid#4542 as decimal(10,0)) = > cast(cast(29940 as decimal(5,0)) as decimal(10,0)))) && isnull(PID#4527))) > +- Join Inner > :- SubqueryAlias a12 > : +- SubqueryAlias t_fperiods_sundar > : +- > Relation[FPID#4523,QTR#4524,ABSREF#4525,DSET#4526,PID#4527,NID#4528,CCD#4529,LOCKDATE#4530,LOCKUSER#4531,UPDUSER#4532,UPDDATE#4533,RESTATED#4534,DATADATE#4535,DATASTATE#4536,ACCSTD#4537,INDFMT#4538,DATAFMT#4539,CONSOL#4540,FYR#4541,EID#4542,FPSTATE#4543,BATCH_ID#4544L,YEAR#4545,PTYPE#4546] > parquet > +- SubqueryAlias a13 > +- SubqueryAlias t_fdata_sundar > +- > Relation[FDID#4547,IMNEM#4548,DSET#4549,DSRS#4550,PID#4551,FVALUE#4552,FCOMP#4553,CONF#4554,UPDDATE#4555,UPDUSER#4556,SEQ#4557,CCD#4558,NID#4559,CONVTYPE#4560,DATADATE#4561,DCOMMENT#4562,UDOMAIN#4563,FPDID#4564L,VLP_CALC#4565,EID#4566,FPID#4567,BATCH_ID#4568L,ABSREF#4569] > parquet > > == Optimized Logical Plan == > Project [imnem#4548, fvalue#4552, ptype#4546, absref#4569] > +- Join Inner, ((eid#4542 = eid#4566) && (fpid#4523 = cast(fpid#4567 as > decimal(38,0)))) > :- Project [FPID#4523, EID#4542, PTYPE#4546] > : +- Filter (((((((isnotnull(ptype#4546) && isnotnull(year#4545)) && > isnotnull(eid#4542)) && (year#4545 = 2017)) && (ptype#4546 = A)) && (eid#4542 > = 29940)) && isnull(PID#4527)) && isnotnull(fpid#4523)) > : +- > Relation[FPID#4523,QTR#4524,ABSREF#4525,DSET#4526,PID#4527,NID#4528,CCD#4529,LOCKDATE#4530,LOCKUSER#4531,UPDUSER#4532,UPDDATE#4533,RESTATED#4534,DATADATE#4535,DATASTATE#4536,ACCSTD#4537,INDFMT#4538,DATAFMT#4539,CONSOL#4540,FYR#4541,EID#4542,FPSTATE#4543,BATCH_ID#4544L,YEAR#4545,PTYPE#4546] > parquet > +- Project [IMNEM#4548, FVALUE#4552, EID#4566, FPID#4567, ABSREF#4569] > +- Filter ((((isnotnull(absref#4569) && (absref#4569 = Y2017)) && > isnotnull(fpid#4567)) && isnotnull(eid#4566)) && (eid#4566 = 29940)) > +- > Relation[FDID#4547,IMNEM#4548,DSET#4549,DSRS#4550,PID#4551,FVALUE#4552,FCOMP#4553,CONF#4554,UPDDATE#4555,UPDUSER#4556,SEQ#4557,CCD#4558,NID#4559,CONVTYPE#4560,DATADATE#4561,DCOMMENT#4562,UDOMAIN#4563,FPDID#4564L,VLP_CALC#4565,EID#4566,FPID#4567,BATCH_ID#4568L,ABSREF#4569] > parquet > > == Physical Plan == > *(2) Project [imnem#4548, fvalue#4552, ptype#4546, absref#4569] > +- *(2) BroadcastHashJoin [eid#4542, fpid#4523], [eid#4566, cast(fpid#4567 as > decimal(38,0))], Inner, BuildLeft > :- BroadcastExchange HashedRelationBroadcastMode(List(input[1, > decimal(10,0), true], input[0, decimal(38,0), true])) > : +- *(1) Project [FPID#4523, EID#4542, PTYPE#4546] > : +- *(1) Filter (((((((isnotnull(ptype#4546) && isnotnull(year#4545)) && > isnotnull(eid#4542)) && (year#4545 = 2017)) && (ptype#4546 = A)) && (eid#4542 > = 29940)) && isnull(PID#4527)) && isnotnull(fpid#4523)) > : +- *(1) FileScan parquet > arrow.t_fperiods_sundar[FPID#4523,PID#4527,EID#4542,YEAR#4545,PTYPE#4546] > Batched: true, Format: Parquet, Location: > InMemoryFileIndex[s3://spr-idf-uat-processed/replication/base/APPLICATION_SCHEMA/ARROW/FPERIODS_S..., > PartitionFilters: [], PushedFilters: [IsNotNull(PTYPE), IsNotNull(YEAR), > IsNotNull(EID), EqualTo(YEAR,2017), EqualTo(PTYPE,A), EqualTo..., ReadSchema: > struct<FPID:decimal(38,0),PID:string,EID:decimal(10,0),YEAR:int,PTYPE:string> > +- *(2) Project [IMNEM#4548, FVALUE#4552, EID#4566, FPID#4567, ABSREF#4569] > +- *(2) Filter ((((isnotnull(absref#4569) && (absref#4569 = Y2017)) && > isnotnull(fpid#4567)) && isnotnull(eid#4566)) && (eid#4566 = 29940)) > +- *(2) FileScan parquet > arrow.t_fdata_sundar[IMNEM#4548,FVALUE#4552,EID#4566,FPID#4567,ABSREF#4569] > Batched: true, Format: Parquet, Location: > InMemoryFileIndex[s3://spr-idf-uat-processed/replication/base/APPLICATION_SCHEMA/ARROW/FDATA_SUNDAR], > PartitionFilters: [], PushedFilters: [IsNotNull(ABSREF), > EqualTo(ABSREF,Y2017), IsNotNull(FPID), IsNotNull(EID), EqualTo(EID,29940)], > ReadSchema: > struct<IMNEM:string,FVALUE:string,EID:decimal(10,0),FPID:decimal(10,0),ABSREF:string> > Time taken: 0.35 seconds, Fetched 1 row(s) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org