[I] Use external spillable map for cachedAllInputFileSlices in BaseHoodieTableFileIndex [hudi]

via GitHub Sun, 30 Nov 2025 03:22:35 -0800


hudi-bot opened a new issue, #16781:
URL: https://github.com/apache/hudi/issues/16781


   {{cachedAllInputFileSlices}} in {{BaseHoodieTableFileIndex}} is using a 
HashMap, when doing multiple queries on large tables (1000+ partitions and each 
partition has 20K file groups/slices for example) on a single spark driver we 
cache all the file slices for queried partitions. This overwhelms the driver 
especially in the scenario when there are multiple downstream consumers for the 
same table and the ingestion/spark sql is running on the same driver.
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-8868
   - Type: Improvement


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Use external spillable map for cachedAllInputFileSlices in BaseHoodieTableFileIndex [hudi]

Reply via email to