[GitHub] [spark] ulysses-you commented on pull request #36845: [SPARK-39447][SQL] Only non-broadcast query stage can propagate empty relation

GitBox Mon, 13 Jun 2022 03:24:29 -0700


ulysses-you commented on PR #36845:
URL: https://github.com/apache/spark/pull/36845#issuecomment-1153742980


   I can re-produce it by: 
   ```sql
   CREATE TABLE t1(c1 int) USING PARQUET PARTITIONED BY (p1 string);
   CREATE TABLE t2(c2 int) USING PARQUET PARTITIONED BY (p2 string);
   
   SELECT * from (
   SELECT /*+ merge(t1) */ p1 FROM t1 JOIN t2 ON c1 = c2
   ) x JOIN t2 ON p1 = p2
   WHERE
   c2 > 0
   ```
   
   The reason is, AQE + DPP will insert a broadcast exchange at the top of 
`AdaptiveSparkPlanExec` when it is broadcast reusable. There exists some hacky 
code for this behavior during AQE `re-optimize`:
   
   ```scala
   // When both enabling AQE and DPP, `PlanAdaptiveDynamicPruningFilters` rule 
will
   // add the `BroadcastExchangeExec` node manually in the DPP subquery,
   // not through `EnsureRequirements` rule. Therefore, when the DPP subquery 
is complicated
   // and need to be re-optimized, AQE also need to manually insert the 
`BroadcastExchangeExec`
   // node to prevent the loss of the `BroadcastExchangeExec` node in DPP 
subquery.
   // Here, we also need to avoid to insert the `BroadcastExchangeExec` node 
when the newPlan
   // is already the `BroadcastExchangeExec` plan after apply the 
`LogicalQueryStageStrategy` rule.
   val finalPlan = currentPhysicalPlan match {
     case b: BroadcastExchangeLike
       if (!newPlan.isInstanceOf[BroadcastExchangeLike]) => 
b.withNewChildren(Seq(newPlan))
     case _ => newPlan
   }
   ```
   
   However, this code does not match if the top level broadcast exchange is 
wrapped by query stage. This case will happen if the broadcast exchange which 
is added by DPP is running before than the normal broadcast exchange(e.g. 
introduced by join).
   
   So we can match `BroadcastQueryStage(_, ReusedExchangeExec, _)` and skip the 
optimization. It is no meaning to optimize a child inside a reused exchange 
which is only for broadcast.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ulysses-you commented on pull request #36845: [SPARK-39447][SQL] Only non-broadcast query stage can propagate empty relation

Reply via email to