ulysses-you commented on PR #36845: URL: https://github.com/apache/spark/pull/36845#issuecomment-1153742980
I can re-produce it by: ```sql CREATE TABLE t1(c1 int) USING PARQUET PARTITIONED BY (p1 string); CREATE TABLE t2(c2 int) USING PARQUET PARTITIONED BY (p2 string); SELECT * from ( SELECT /*+ merge(t1) */ p1 FROM t1 JOIN t2 ON c1 = c2 ) x JOIN t2 ON p1 = p2 WHERE c2 > 0 ``` The reason is, AQE + DPP will insert a broadcast exchange at the top of `AdaptiveSparkPlanExec` when it is broadcast reusable. There exists some hacky code for this behavior during AQE `re-optimize`: ```scala // When both enabling AQE and DPP, `PlanAdaptiveDynamicPruningFilters` rule will // add the `BroadcastExchangeExec` node manually in the DPP subquery, // not through `EnsureRequirements` rule. Therefore, when the DPP subquery is complicated // and need to be re-optimized, AQE also need to manually insert the `BroadcastExchangeExec` // node to prevent the loss of the `BroadcastExchangeExec` node in DPP subquery. // Here, we also need to avoid to insert the `BroadcastExchangeExec` node when the newPlan // is already the `BroadcastExchangeExec` plan after apply the `LogicalQueryStageStrategy` rule. val finalPlan = currentPhysicalPlan match { case b: BroadcastExchangeLike if (!newPlan.isInstanceOf[BroadcastExchangeLike]) => b.withNewChildren(Seq(newPlan)) case _ => newPlan } ``` However, this code does not match if the top level broadcast exchange is wrapped by query stage. This case will happen if the broadcast exchange which is added by DPP is running before than the normal broadcast exchange(e.g. introduced by join). So we can match `BroadcastQueryStage(_, ReusedExchangeExec, _)` and skip the optimization. It is no meaning to optimize a child inside a reused exchange which is only for broadcast. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org