[jira] [Commented] (SPARK-34137) The tree string does not contain statistics for nested scalar sub queries

2021-02-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280220#comment-17280220
 ] 

Apache Spark commented on SPARK-34137:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/31485

> The tree string does not contain statistics for nested scalar sub queries
> -
>
> Key: SPARK-34137
> URL: https://issues.apache.org/jira/browse/SPARK-34137
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce:
> {code:scala}
> spark.sql("create table t1 using parquet as select id as a, id as b from 
> range(1000)")
> spark.sql("create table t2 using parquet as select id as c, id as d from 
> range(2000)")
> spark.sql("ANALYZE TABLE t1 COMPUTE STATISTICS FOR ALL COLUMNS")
> spark.sql("ANALYZE TABLE t2 COMPUTE STATISTICS FOR ALL COLUMNS")
> spark.sql("set spark.sql.cbo.enabled=true")
> spark.sql(
>   """
> |WITH max_store_sales AS
> |  (SELECT max(csales) tpcds_cmax
> |  FROM (SELECT
> |sum(b) csales
> |  FROM t1 WHERE a < 100 ) x),
> |best_ss_customer AS
> |  (SELECT
> |c
> |  FROM t2
> |  WHERE d > (SELECT * FROM max_store_sales))
> |
> |SELECT c FROM best_ss_customer
> |""".stripMargin).explain("cost")
> {code}
> Output:
> {noformat}
> == Optimized Logical Plan ==
> Project [c#4263L], Statistics(sizeInBytes=31.3 KiB, rowCount=2.00E+3)
> +- Filter (isnotnull(d#4264L) AND (d#4264L > scalar-subquery#4262 [])), 
> Statistics(sizeInBytes=46.9 KiB, rowCount=2.00E+3)
>:  +- Aggregate [max(csales#4260L) AS tpcds_cmax#4261L]
>: +- Aggregate [sum(b#4266L) AS csales#4260L]
>:+- Project [b#4266L]
>:   +- Filter ((a#4265L < 100) AND isnotnull(a#4265L))
>:  +- Relation default.t1[a#4265L,b#4266L] parquet, 
> Statistics(sizeInBytes=23.4 KiB, rowCount=1.00E+3)
>+- Relation default.t2[c#4263L,d#4264L] parquet, 
> Statistics(sizeInBytes=46.9 KiB, rowCount=2.00E+3)
> {noformat}
> Another case is TPC-DS q23a.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34137) The tree string does not contain statistics for nested scalar sub queries

2021-01-16 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17266610#comment-17266610
 ] 

Yuming Wang commented on SPARK-34137:
-

cc [~maxgekk]

> The tree string does not contain statistics for nested scalar sub queries
> -
>
> Key: SPARK-34137
> URL: https://issues.apache.org/jira/browse/SPARK-34137
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce:
> {code:scala}
> spark.sql("create table t1 using parquet as select id as a, id as b from 
> range(1000)")
> spark.sql("create table t2 using parquet as select id as c, id as d from 
> range(2000)")
> spark.sql("ANALYZE TABLE t1 COMPUTE STATISTICS FOR ALL COLUMNS")
> spark.sql("ANALYZE TABLE t2 COMPUTE STATISTICS FOR ALL COLUMNS")
> spark.sql("set spark.sql.cbo.enabled=true")
> spark.sql(
>   """
> |WITH max_store_sales AS
> |  (SELECT max(csales) tpcds_cmax
> |  FROM (SELECT
> |sum(b) csales
> |  FROM t1 WHERE a < 100 ) x),
> |best_ss_customer AS
> |  (SELECT
> |c
> |  FROM t2
> |  WHERE d > (SELECT * FROM max_store_sales))
> |
> |SELECT c FROM best_ss_customer
> |""".stripMargin).explain("cost")
> {code}
> Output:
> {noformat}
> == Optimized Logical Plan ==
> Project [c#4263L], Statistics(sizeInBytes=31.3 KiB, rowCount=2.00E+3)
> +- Filter (isnotnull(d#4264L) AND (d#4264L > scalar-subquery#4262 [])), 
> Statistics(sizeInBytes=46.9 KiB, rowCount=2.00E+3)
>:  +- Aggregate [max(csales#4260L) AS tpcds_cmax#4261L]
>: +- Aggregate [sum(b#4266L) AS csales#4260L]
>:+- Project [b#4266L]
>:   +- Filter ((a#4265L < 100) AND isnotnull(a#4265L))
>:  +- Relation default.t1[a#4265L,b#4266L] parquet, 
> Statistics(sizeInBytes=23.4 KiB, rowCount=1.00E+3)
>+- Relation default.t2[c#4263L,d#4264L] parquet, 
> Statistics(sizeInBytes=46.9 KiB, rowCount=2.00E+3)
> {noformat}
> Another case is TPC-DS q23a.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org