[ https://issues.apache.org/jira/browse/SPARK-48660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856122#comment-17856122 ]
Wei Guo edited comment on SPARK-48660 at 6/19/24 4:18 AM: ---------------------------------------------------------- I am working on this and thank your for recommendation [~LuciferYang] was (Author: wayne guo): I am working on this and thank your for recommendation [~yangjie01] . > The result of explain is incorrect for CreateTableAsSelect > ---------------------------------------------------------- > > Key: SPARK-48660 > URL: https://issues.apache.org/jira/browse/SPARK-48660 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.5.0, 4.0.0, 3.5.1 > Reporter: Yuming Wang > Priority: Major > > How to reproduce: > {code:sql} > CREATE TABLE order_history_version_audit_rno ( > eventid STRING, > id STRING, > referenceid STRING, > type STRING, > referencetype STRING, > sellerid BIGINT, > buyerid BIGINT, > producerid STRING, > versionid INT, > changedocuments ARRAY<STRUCT<clientId: STRING, type: STRING, timestamp: > BIGINT, changeDetails: STRING>>, > dt STRING, > hr STRING) > USING parquet > PARTITIONED BY (dt, hr); > explain cost > CREATE TABLE order_history_version_audit_rno > USING parquet > PARTITIONED BY (dt) > CLUSTERED BY (id) INTO 1000 buckets > AS SELECT * FROM order_history_version_audit_rno > WHERE dt >= '2023-11-29'; > {code} > {noformat} > spark-sql (default)> > > explain cost > > CREATE TABLE order_history_version_audit_rno > > USING parquet > > PARTITIONED BY (dt) > > CLUSTERED BY (id) INTO 1000 buckets > > AS SELECT * FROM order_history_version_audit_rno > > WHERE dt >= '2023-11-29'; > == Optimized Logical Plan == > CreateDataSourceTableAsSelectCommand > `spark_catalog`.`default`.`order_history_version_audit_rno`, ErrorIfExists, > [eventid, id, referenceid, type, referencetype, sellerid, buyerid, > producerid, versionid, changedocuments, hr, dt] > +- Project [eventid#5, id#6, referenceid#7, type#8, referencetype#9, > sellerid#10L, buyerid#11L, producerid#12, versionid#13, changedocuments#14, > hr#16, dt#15] > +- Project [eventid#5, id#6, referenceid#7, type#8, referencetype#9, > sellerid#10L, buyerid#11L, producerid#12, versionid#13, changedocuments#14, > dt#15, hr#16] > +- Filter (dt#15 >= 2023-11-29) > +- SubqueryAlias > spark_catalog.default.order_history_version_audit_rno > +- Relation > spark_catalog.default.order_history_version_audit_rno[eventid#5,id#6,referenceid#7,type#8,referencetype#9,sellerid#10L,buyerid#11L,producerid#12,versionid#13,changedocuments#14,dt#15,hr#16] > parquet > == Physical Plan == > Execute CreateDataSourceTableAsSelectCommand > +- CreateDataSourceTableAsSelectCommand > `spark_catalog`.`default`.`order_history_version_audit_rno`, ErrorIfExists, > [eventid, id, referenceid, type, referencetype, sellerid, buyerid, > producerid, versionid, changedocuments, hr, dt] > +- Project [eventid#5, id#6, referenceid#7, type#8, referencetype#9, > sellerid#10L, buyerid#11L, producerid#12, versionid#13, changedocuments#14, > hr#16, dt#15] > +- Project [eventid#5, id#6, referenceid#7, type#8, > referencetype#9, sellerid#10L, buyerid#11L, producerid#12, versionid#13, > changedocuments#14, dt#15, hr#16] > +- Filter (dt#15 >= 2023-11-29) > +- SubqueryAlias > spark_catalog.default.order_history_version_audit_rno > +- Relation > spark_catalog.default.order_history_version_audit_rno[eventid#5,id#6,referenceid#7,type#8,referencetype#9,sellerid#10L,buyerid#11L,producerid#12,versionid#13,changedocuments#14,dt#15,hr#16] > parquet > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org