[ https://issues.apache.org/jira/browse/SPARK-36809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan resolved SPARK-36809. --------------------------------- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34051 [https://github.com/apache/spark/pull/34051] > Remove broadcast for InSubqueryExec used in DPP > ----------------------------------------------- > > Key: SPARK-36809 > URL: https://issues.apache.org/jira/browse/SPARK-36809 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.3.0 > Reporter: L. C. Hsieh > Assignee: Apache Spark > Priority: Major > Fix For: 3.3.0 > > > Currently we include a broadcast variable in InSubqueryExec. We use it to > hold filtering side query result of DPP. It looks weird because we don't use > the result in executors but only need the result in the driver during query > planning. We already hold the original result, so basically we hold two > copied of query result at this moment. > Another thing related is, in pruningHasBenefit we estimate if DPP pruning has > benefit when the join type does not support broadcast. Due to the broadcast > variable above, we also check the filtering side against the config > autoBroadcastJoinThreshold. The config is not for the purpose and it is not a > broadcast join. As the broadcast variable is unnecessary, we can remove this > check and leave benefit estimation to overhead and pruning size. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org