[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31485: [SPARK-34137][SQL] Update suquery's stats when build LogicalPlan's stats

GitBox Mon, 08 Feb 2021 19:01:06 -0800


AngersZhuuuu commented on a change in pull request #31485:
URL: https://github.com/apache/spark/pull/31485#discussion_r572535922




##########
File path: sql/core/src/test/resources/sql-tests/results/explain-cbo.sql.out
##########
@@ -0,0 +1,80 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 5
+
+
+-- !query
+CREATE TABLE t1(a INT, b INT) USING PARQUET
+-- !query schema
+struct<>
+-- !query output
+
+
+
+-- !query
+CREATE TABLE t2(c INT, d INT) USING PARQUET
+-- !query schema
+struct<>
+-- !query output
+
+
+
+-- !query
+ANALYZE TABLE t1 COMPUTE STATISTICS FOR ALL COLUMNS
+-- !query schema
+struct<>
+-- !query output
+
+
+
+-- !query
+ANALYZE TABLE t2 COMPUTE STATISTICS FOR ALL COLUMNS
+-- !query schema
+struct<>
+-- !query output
+
+
+
+-- !query
+EXPLAIN COST WITH max_store_sales AS
+(
+  SELECT max(csales) tpcds_cmax
+  FROM (
+    SELECT sum(b) csales
+    FROM t1 WHERE a < 100
+  ) x
+),
+best_ss_customer AS
+(
+  SELECT c
+  FROM t2
+  WHERE d > (SELECT * FROM max_store_sales)
+)
+SELECT c FROM best_ss_customer
+-- !query schema
+struct<plan:string>
+-- !query output
+== Optimized Logical Plan ==
+Project [c#x], Statistics(sizeInBytes=1.0 B, rowCount=0)
++- Filter (isnotnull(d#x) AND (cast(d#x as bigint) > scalar-subquery#x [])), 
Statistics(sizeInBytes=1.0 B, rowCount=0)
+   :  +- Aggregate [max(csales#xL) AS tpcds_cmax#xL], 
Statistics(sizeInBytes=16.0 B, rowCount=1)
+   :     +- Aggregate [sum(b#x) AS csales#xL], Statistics(sizeInBytes=16.0 B, 
rowCount=1)
+   :        +- Project [b#x], Statistics(sizeInBytes=1.0 B, rowCount=0)
+   :           +- Filter (isnotnull(a#x) AND (a#x < 100)), 
Statistics(sizeInBytes=1.0 B, rowCount=0)
+   :              +- Relation[a#x,b#x] parquet, Statistics(sizeInBytes=1.0 B, 
rowCount=0)
+   +- Relation[c#x,d#x] parquet, Statistics(sizeInBytes=1.0 B, rowCount=0)
+
+== Physical Plan ==
+AdaptiveSparkPlan isFinalPlan=false
++- Project [c#x]
+   +- Filter (isnotnull(d#x) AND (cast(d#x as bigint) > Subquery subquery#x, 
[id=#x]))
+      :  +- Subquery subquery#x, [id=#x]
+      :     +- AdaptiveSparkPlan isFinalPlan=false
+      :        +- HashAggregate(keys=[], functions=[max(csales#xL)], 
output=[tpcds_cmax#xL])
+      :           +- HashAggregate(keys=[], 
functions=[partial_max(csales#xL)], output=[max#xL])
+      :              +- HashAggregate(keys=[], functions=[sum(b#x)], 
output=[csales#xL])
+      :                 +- Exchange SinglePartition, ENSURE_REQUIREMENTS, 
[id=#x]
+      :                    +- HashAggregate(keys=[], 
functions=[partial_sum(b#x)], output=[sum#xL])
+      :                       +- Project [b#x]
+      :                          +- Filter (isnotnull(a#x) AND (a#x < 100))
+      :                             +- FileScan parquet default.t1[a#x,b#x] 
Batched: true, DataFilters: [isnotnull(a#x), (a#x < 100)], Format: Parquet, 
Location: InMemoryFileIndex(1 
paths)[file:/Users/yi.zhu/Documents/project/Angerszhuuuu/spark/sql/core/spark...,
 PartitionFilters: [], PushedFilters: [IsNotNull(a), LessThan(a,100)], 
ReadSchema: struct<a:int,b:int>

Review comment:
       > The Physical Plan should be normalized.
   
   Hmmm. got the usage, updated.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31485: [SPARK-34137][SQL] Update suquery's stats when build LogicalPlan's stats

Reply via email to