[GitHub] [spark] StevenChenDatabricks commented on pull request #40385: [SPARK-42753] ReusedExchange refers to non-existent nodes

via GitHub Tue, 14 Mar 2023 11:27:29 -0700


StevenChenDatabricks commented on PR #40385:
URL: https://github.com/apache/spark/pull/40385#issuecomment-1468621742


   @cloud-fan Thanks for the idea and response!
   
   1. I don't think this issue doesn't affects `ReusedSubquery` because of how 
its processed and printed. The current algorithm finds all Subquery nodes 
(including `ReusedSubquery`) and for each Subquery, it traverses the Subquery 
subtree to generate the IDs if they are missing. 
   Furthermore, a `ReusedSubquery` does not print the details of the Subquery 
it reuses whereas for ReusedExchange it does print the Exchange ID being 
reused. For a `ReusedSubquery`, all that is printed is this line:
   ```Subquery:5 Hosting operator id = 50 Hosting Expression = ReusedSubquery 
Subquery scalar-subquery#31, [id=#32]```
   `Hosting operator ID` is the parent operator that contains the 
`ReusedSubquery`. The subtree of the `ReusedSubquery` is not printed anywhere 
and the `ReusedSubquery` node itself is not printed in the main plan tree 
either. Even if there are non-existing children, the issue is not surfaced in 
the Explain plan by default.
   I guess there's still a chance it might affect Spark UI whereby the IDs in 
the subtree of a `ReusedSubquery` are incorrect because the IDs were generated 
in a previous AQE iteration... I'm not sure. I think it's best to wait and see 
if a ticket/bug like this is ever reported. 
   
   2. My fix detects all the ReusedExchanges with non-existing children and 
generate IDs on them. I guess your question is what if multiple 
`ReusedExchange` reference the same non-existing `Exchange`? That's a good 
point and I need to account for that edge case in the code in case that is 
possible.
   
   With regards to your idea for a section of non-existing `Exchanges`: we 
already only print each operator exactly once in the node details section. As 
shown in the PR description: I currently print out the plan subtree of the 
Non-Existing Exchange below the `ReusedExchange` (since that subtree is not 
shown anywhere else) and the node details while maintaining uniqueness.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] StevenChenDatabricks commented on pull request #40385: [SPARK-42753] ReusedExchange refers to non-existent nodes

Reply via email to