[PR] fix: unpersist cached objects in SqlQueryEqualityPreCommitValidator [hudi]

via GitHub Sat, 17 Jan 2026 16:00:42 -0800


suryaprasanna opened a new pull request, #17931:
URL: https://github.com/apache/hudi/pull/17931


   ### Describe the issue this Pull Request addresses
   
   Memory leak in SqlQueryEqualityPreCommitValidator where cached Dataset 
objects are not being unpersisted after use, potentially causing memory 
pressure in long-running validation operations.
   
   ### Summary and Changelog
   
   This PR ensures cached Spark Dataset objects are properly unpersisted after 
use in validation queries.
   
   **Changes:**
   - Added try-finally block to guarantee unpersist() is called on cached 
datasets
   - Initialize prevRows and newRows to null before the try block for proper 
cleanup in case of exceptions
   - Ensures unpersist() is called even when validation fails or exceptions 
occur
   
   ### Impact
   
   Reduces memory consumption during pre-commit validation by properly 
releasing cached Spark datasets. No public API changes or user-facing feature 
changes.
   
   ### Risk Level
   
   **Low** - Defensive change that adds proper resource cleanup. The validation 
logic remains unchanged; only adds cleanup code to prevent memory leaks.
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Enough context is provided in the sections above
   - [x] Adequate tests were added if applicable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] fix: unpersist cached objects in SqlQueryEqualityPreCommitValidator [hudi]

Reply via email to