[I] The failure of spark's `ReusedExchange` rule optimization leads to data read amplification [hudi]

via GitHub Thu, 23 Oct 2025 05:14:10 -0700


TheR1sing3un opened a new issue, #14146:
URL: https://github.com/apache/hudi/issues/14146


   ### Bug Description
   
   **What happened:**
   
   spark has a rule for reusing reusable `Exchange Plan` to reduce unnecessary 
repetitive data processing. However, in some scenarios, such as complex sql, 
when spark's relation cache becomes obsolete, it is possible for us to create 
two Hoodie Relation instances. When conducting the ReusedExchange judgment in 
the subsequent process, it will compare whether each FileScanExec is equal 
after standardization. However, the current implementation would consider the 
above judgment as unequal, which in turn leads to our inability to take 
advantage of spark's reuse optimization
   
   
   
   
   **What you expected:**
   
   as above
   
   **Steps to reproduce:**
   1.
   2.
   3.
   
   
   ### Environment
   
   **Hudi version:**
   **Query engine:** (Spark/Flink/Trino etc)
   **Relevant configs:**
   
   
   ### Logs and Stack Trace
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] The failure of spark's `ReusedExchange` rule optimization leads to data read amplification [hudi]

Reply via email to