yikf commented on issue #7807: URL: https://github.com/apache/incubator-gluten/issues/7807#issuecomment-2456698808
I will try to summarize this problem. the phenomenon and cause of the problem are that the reference of key cannot be found in output. During the dynamic pruning process, the execution plan is usually `ColumnarSubqueryBroadcastExec -> ColumnarBroadcastExchangeExec -> ...` And the buildKeys and child of ColumnarSubqueryBroadcastExec usually [come from the same side of Join](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PlanDynamicPruningFilters.scala#L54). so ideally, this bound reference method will not have problems. however, when reuse exchange is applied, the output of child may change. At this time, the output may not contain the reference of buildKeys. In the case of applying reuse exchange, name can be used for bound reference. But the limitation here is that it does not support that the child contains multiple outputs with the same name. this limitation is unreasonable. The reason for this situation is that the transformation of relation in gluten currently occurs in ColumnarSubqueryBroadcastExec. We should refer to Spark's approach and [perform the transformation in the child](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala#L193). In this way, even if reuse exchange occurs, the buildKeys(from BroadcastMode not SubqueryBroadcastExec) and output of the child are always connected. And ColumnarSubqueryBroadcastExec only needs to obtain the required value from the BuildRelation of the child node (at this time it is the output of buildKeys) [according to the index](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SubqueryBroadcastExec.scala#L93). The permanent fix for this problem is divided into two stages: 1. Short-term solution. after reuse exchange, use name for binding. this solution is similar to the previous implementation of https://github.com/apache/incubator-gluten/pull/7704, but has more support than that implementation. for example, it supports that the key contains multiple attrs. the short-term solution limits that the output of the child after reuse exchange cannot contain multiple outputs with the same name. 2. Long-term solution. the transformation occurs in the child, but it needs to be transformed according to different cases. for example, the regular BroadcastHashJoinExec is still column-based execution and does not require additional transformation. however, ColumnarSubqueryBroadcastExec needs it. I file a pr for short-term solution, Could you help check it in your env? @leoluan2009 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
