[jira] [Created] (SPARK-34137) The tree string does not contain statistics for nested scalar sub queries

Yuming Wang (Jira) Sat, 16 Jan 2021 06:57:22 -0800

Yuming Wang created SPARK-34137:
-----------------------------------

             Summary: The tree string does not contain statistics for nested 
scalar sub queries
                 Key: SPARK-34137
                 URL: https://issues.apache.org/jira/browse/SPARK-34137
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 3.2.0
            Reporter: Yuming Wang



How to reproduce:
{code:scala}
spark.sql("create table t1 using parquet as select id as a, id as b from 
range(1000)")
spark.sql("create table t2 using parquet as select id as c, id as d from 
range(2000)")

spark.sql("ANALYZE TABLE t1 COMPUTE STATISTICS FOR ALL COLUMNS")
spark.sql("ANALYZE TABLE t2 COMPUTE STATISTICS FOR ALL COLUMNS")
spark.sql("set spark.sql.cbo.enabled=true")

spark.sql(
  """
    |WITH max_store_sales AS
    |  (SELECT max(csales) tpcds_cmax
    |  FROM (SELECT
    |    sum(b) csales
    |  FROM t1 WHERE a < 100 ) x),
    |best_ss_customer AS
    |  (SELECT
    |    c
    |  FROM t2
    |  WHERE d > (SELECT * FROM max_store_sales))
    |
    |SELECT c FROM best_ss_customer
    |""".stripMargin).explain("cost")
{code}

Output:
{noformat}
== Optimized Logical Plan ==
Project [c#4263L], Statistics(sizeInBytes=31.3 KiB, rowCount=2.00E+3)
+- Filter (isnotnull(d#4264L) AND (d#4264L > scalar-subquery#4262 [])), 
Statistics(sizeInBytes=46.9 KiB, rowCount=2.00E+3)
   :  +- Aggregate [max(csales#4260L) AS tpcds_cmax#4261L]
   :     +- Aggregate [sum(b#4266L) AS csales#4260L]
   :        +- Project [b#4266L]
   :           +- Filter ((a#4265L < 100) AND isnotnull(a#4265L))
   :              +- Relation default.t1[a#4265L,b#4266L] parquet, 
Statistics(sizeInBytes=23.4 KiB, rowCount=1.00E+3)
   +- Relation default.t2[c#4263L,d#4264L] parquet, Statistics(sizeInBytes=46.9 
KiB, rowCount=2.00E+3)
{noformat}

Another case is TPC-DS q23a.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34137) The tree string does not contain statistics for nested scalar sub queries

Reply via email to