Xiao Li created SPARK-29092:
-------------------------------

             Summary: EXPLAIN FORMATTED does not work well with DPP
                 Key: SPARK-29092
                 URL: https://issues.apache.org/jira/browse/SPARK-29092
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.0.0
            Reporter: Xiao Li


 
{code:java}
withSQLConf(SQLConf.DYNAMIC_PARTITION_PRUNING_ENABLED.key -> "true",
  SQLConf.DYNAMIC_PARTITION_PRUNING_REUSE_BROADCAST.key -> "false") {
  withTable("df1", "df2") {
    spark.range(1000)
      .select(col("id"), col("id").as("k"))
      .write
      .partitionBy("k")
      .format(tableFormat)
      .mode("overwrite")
      .saveAsTable("df1")

    spark.range(100)
      .select(col("id"), col("id").as("k"))
      .write
      .partitionBy("k")
      .format(tableFormat)
      .mode("overwrite")
      .saveAsTable("df2")

    sql("EXPLAIN FORMATTED SELECT df1.id, df2.k FROM df1 JOIN df2 ON df1.k = 
df2.k AND df2.id < 2")
      .show(false)

    sql("EXPLAIN EXTENDED SELECT df1.id, df2.k FROM df1 JOIN df2 ON df1.k = 
df2.k AND df2.id < 2")
      .show(false)
  }
}
{code}
The output of EXPLAIN EXTENDED is expected.
{code:java}

== Physical Plan ==
*(2) Project [id#2721L, k#2724L]
+- *(2) BroadcastHashJoin [k#2722L], [k#2724L], Inner, BuildRight
   :- *(2) ColumnarToRow
   :  +- FileScan parquet default.df1[id#2721L,k#2722L] Batched: true, 
DataFilters: [], Format: Parquet, Location: 
PrunedInMemoryFileIndex[file:/Users/lixiao/IdeaProjects/spark/sql/core/spark-warehouse/org.apache...,
 PartitionFilters: [isnotnull(k#2722L), dynamicpruningexpression(k#2722L IN 
subquery2741)], PushedFilters: [], ReadSchema: struct<id:bigint>
   :        +- Subquery subquery2741, [id=#358]
   :           +- *(2) HashAggregate(keys=[k#2724L], functions=[], 
output=[k#2724L#2740L])
   :              +- Exchange hashpartitioning(k#2724L, 5), true, [id=#354]
   :                 +- *(1) HashAggregate(keys=[k#2724L], functions=[], 
output=[k#2724L])
   :                    +- *(1) Project [k#2724L]
   :                       +- *(1) Filter (isnotnull(id#2723L) AND (id#2723L < 
2))
   :                          +- *(1) ColumnarToRow
   :                             +- FileScan parquet 
default.df2[id#2723L,k#2724L] Batched: true, DataFilters: [isnotnull(id#2723L), 
(id#2723L < 2)], Format: Parquet, Location: 
PrunedInMemoryFileIndex[file:/Users/lixiao/IdeaProjects/spark/sql/core/spark-warehouse/org.apache...,
 PartitionFilters: [isnotnull(k#2724L)], PushedFilters: [IsNotNull(id), 
LessThan(id,2)], ReadSchema: struct<id:bigint>
   +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, 
true])), [id=#379]
      +- *(1) Project [k#2724L]
         +- *(1) Filter (isnotnull(id#2723L) AND (id#2723L < 2))
            +- *(1) ColumnarToRow
               +- FileScan parquet default.df2[id#2723L,k#2724L] Batched: true, 
DataFilters: [isnotnull(id#2723L), (id#2723L < 2)], Format: Parquet, Location: 
PrunedInMemoryFileIndex[file:/Users/lixiao/IdeaProjects/spark/sql/core/spark-warehouse/org.apache...,
 PartitionFilters: [isnotnull(k#2724L)], PushedFilters: [IsNotNull(id), 
LessThan(id,2)], ReadSchema: struct<id:bigint>

{code}
However, the output of FileScan node of EXPLAIN FORMATTED does not show the 
effect of DPP
{code:java}
|== Physical Plan ==
* Project (9)
+- * BroadcastHashJoin Inner BuildRight (8)
 :- * ColumnarToRow (2)
 : +- Scan parquet default.df1 (1)
 +- BroadcastExchange (7)
 +- * Project (6)
 +- * Filter (5)
 +- * ColumnarToRow (4)
 +- Scan parquet default.df2 (3)

(1) Scan parquet default.df1 
Output: [id#2716L, k#2717L]
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to