[ https://issues.apache.org/jira/browse/SPARK-29092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930237#comment-16930237 ]
Dilip Biswal commented on SPARK-29092: -------------------------------------- I am looking into this. > EXPLAIN FORMATTED does not work well with DPP > --------------------------------------------- > > Key: SPARK-29092 > URL: https://issues.apache.org/jira/browse/SPARK-29092 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.0.0 > Reporter: Xiao Li > Priority: Major > > > {code:java} > withSQLConf(SQLConf.DYNAMIC_PARTITION_PRUNING_ENABLED.key -> "true", > SQLConf.DYNAMIC_PARTITION_PRUNING_REUSE_BROADCAST.key -> "false") { > withTable("df1", "df2") { > spark.range(1000) > .select(col("id"), col("id").as("k")) > .write > .partitionBy("k") > .format(tableFormat) > .mode("overwrite") > .saveAsTable("df1") > spark.range(100) > .select(col("id"), col("id").as("k")) > .write > .partitionBy("k") > .format(tableFormat) > .mode("overwrite") > .saveAsTable("df2") > sql("EXPLAIN FORMATTED SELECT df1.id, df2.k FROM df1 JOIN df2 ON df1.k = > df2.k AND df2.id < 2") > .show(false) > sql("EXPLAIN EXTENDED SELECT df1.id, df2.k FROM df1 JOIN df2 ON df1.k = > df2.k AND df2.id < 2") > .show(false) > } > } > {code} > The output of EXPLAIN EXTENDED is expected. > {code:java} > == Physical Plan == > *(2) Project [id#2721L, k#2724L] > +- *(2) BroadcastHashJoin [k#2722L], [k#2724L], Inner, BuildRight > :- *(2) ColumnarToRow > : +- FileScan parquet default.df1[id#2721L,k#2722L] Batched: true, > DataFilters: [], Format: Parquet, Location: > PrunedInMemoryFileIndex[file:/Users/lixiao/IdeaProjects/spark/sql/core/spark-warehouse/org.apache..., > PartitionFilters: [isnotnull(k#2722L), dynamicpruningexpression(k#2722L IN > subquery2741)], PushedFilters: [], ReadSchema: struct<id:bigint> > : +- Subquery subquery2741, [id=#358] > : +- *(2) HashAggregate(keys=[k#2724L], functions=[], > output=[k#2724L#2740L]) > : +- Exchange hashpartitioning(k#2724L, 5), true, [id=#354] > : +- *(1) HashAggregate(keys=[k#2724L], functions=[], > output=[k#2724L]) > : +- *(1) Project [k#2724L] > : +- *(1) Filter (isnotnull(id#2723L) AND (id#2723L > < 2)) > : +- *(1) ColumnarToRow > : +- FileScan parquet > default.df2[id#2723L,k#2724L] Batched: true, DataFilters: > [isnotnull(id#2723L), (id#2723L < 2)], Format: Parquet, Location: > PrunedInMemoryFileIndex[file:/Users/lixiao/IdeaProjects/spark/sql/core/spark-warehouse/org.apache..., > PartitionFilters: [isnotnull(k#2724L)], PushedFilters: [IsNotNull(id), > LessThan(id,2)], ReadSchema: struct<id:bigint> > +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, > true])), [id=#379] > +- *(1) Project [k#2724L] > +- *(1) Filter (isnotnull(id#2723L) AND (id#2723L < 2)) > +- *(1) ColumnarToRow > +- FileScan parquet default.df2[id#2723L,k#2724L] Batched: > true, DataFilters: [isnotnull(id#2723L), (id#2723L < 2)], Format: Parquet, > Location: > PrunedInMemoryFileIndex[file:/Users/lixiao/IdeaProjects/spark/sql/core/spark-warehouse/org.apache..., > PartitionFilters: [isnotnull(k#2724L)], PushedFilters: [IsNotNull(id), > LessThan(id,2)], ReadSchema: struct<id:bigint> > {code} > However, the output of FileScan node of EXPLAIN FORMATTED does not show the > effect of DPP > {code:java} > * Project (9) > +- * BroadcastHashJoin Inner BuildRight (8) > :- * ColumnarToRow (2) > : +- Scan parquet default.df1 (1) > +- BroadcastExchange (7) > +- * Project (6) > +- * Filter (5) > +- * ColumnarToRow (4) > +- Scan parquet default.df2 (3) > (1) Scan parquet default.df1 > Output: [id#2716L, k#2717L] > {code} > -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org