[jira] [Commented] (SPARK-34137) The tree string does not contain statistics for nested scalar sub queries
[ https://issues.apache.org/jira/browse/SPARK-34137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280220#comment-17280220 ] Apache Spark commented on SPARK-34137: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/31485 > The tree string does not contain statistics for nested scalar sub queries > - > > Key: SPARK-34137 > URL: https://issues.apache.org/jira/browse/SPARK-34137 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > How to reproduce: > {code:scala} > spark.sql("create table t1 using parquet as select id as a, id as b from > range(1000)") > spark.sql("create table t2 using parquet as select id as c, id as d from > range(2000)") > spark.sql("ANALYZE TABLE t1 COMPUTE STATISTICS FOR ALL COLUMNS") > spark.sql("ANALYZE TABLE t2 COMPUTE STATISTICS FOR ALL COLUMNS") > spark.sql("set spark.sql.cbo.enabled=true") > spark.sql( > """ > |WITH max_store_sales AS > | (SELECT max(csales) tpcds_cmax > | FROM (SELECT > |sum(b) csales > | FROM t1 WHERE a < 100 ) x), > |best_ss_customer AS > | (SELECT > |c > | FROM t2 > | WHERE d > (SELECT * FROM max_store_sales)) > | > |SELECT c FROM best_ss_customer > |""".stripMargin).explain("cost") > {code} > Output: > {noformat} > == Optimized Logical Plan == > Project [c#4263L], Statistics(sizeInBytes=31.3 KiB, rowCount=2.00E+3) > +- Filter (isnotnull(d#4264L) AND (d#4264L > scalar-subquery#4262 [])), > Statistics(sizeInBytes=46.9 KiB, rowCount=2.00E+3) >: +- Aggregate [max(csales#4260L) AS tpcds_cmax#4261L] >: +- Aggregate [sum(b#4266L) AS csales#4260L] >:+- Project [b#4266L] >: +- Filter ((a#4265L < 100) AND isnotnull(a#4265L)) >: +- Relation default.t1[a#4265L,b#4266L] parquet, > Statistics(sizeInBytes=23.4 KiB, rowCount=1.00E+3) >+- Relation default.t2[c#4263L,d#4264L] parquet, > Statistics(sizeInBytes=46.9 KiB, rowCount=2.00E+3) > {noformat} > Another case is TPC-DS q23a. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34137) The tree string does not contain statistics for nested scalar sub queries
[ https://issues.apache.org/jira/browse/SPARK-34137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17266610#comment-17266610 ] Yuming Wang commented on SPARK-34137: - cc [~maxgekk] > The tree string does not contain statistics for nested scalar sub queries > - > > Key: SPARK-34137 > URL: https://issues.apache.org/jira/browse/SPARK-34137 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > How to reproduce: > {code:scala} > spark.sql("create table t1 using parquet as select id as a, id as b from > range(1000)") > spark.sql("create table t2 using parquet as select id as c, id as d from > range(2000)") > spark.sql("ANALYZE TABLE t1 COMPUTE STATISTICS FOR ALL COLUMNS") > spark.sql("ANALYZE TABLE t2 COMPUTE STATISTICS FOR ALL COLUMNS") > spark.sql("set spark.sql.cbo.enabled=true") > spark.sql( > """ > |WITH max_store_sales AS > | (SELECT max(csales) tpcds_cmax > | FROM (SELECT > |sum(b) csales > | FROM t1 WHERE a < 100 ) x), > |best_ss_customer AS > | (SELECT > |c > | FROM t2 > | WHERE d > (SELECT * FROM max_store_sales)) > | > |SELECT c FROM best_ss_customer > |""".stripMargin).explain("cost") > {code} > Output: > {noformat} > == Optimized Logical Plan == > Project [c#4263L], Statistics(sizeInBytes=31.3 KiB, rowCount=2.00E+3) > +- Filter (isnotnull(d#4264L) AND (d#4264L > scalar-subquery#4262 [])), > Statistics(sizeInBytes=46.9 KiB, rowCount=2.00E+3) >: +- Aggregate [max(csales#4260L) AS tpcds_cmax#4261L] >: +- Aggregate [sum(b#4266L) AS csales#4260L] >:+- Project [b#4266L] >: +- Filter ((a#4265L < 100) AND isnotnull(a#4265L)) >: +- Relation default.t1[a#4265L,b#4266L] parquet, > Statistics(sizeInBytes=23.4 KiB, rowCount=1.00E+3) >+- Relation default.t2[c#4263L,d#4264L] parquet, > Statistics(sizeInBytes=46.9 KiB, rowCount=2.00E+3) > {noformat} > Another case is TPC-DS q23a. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org