hudi-bot opened a new issue, #16781:
URL: https://github.com/apache/hudi/issues/16781
{{cachedAllInputFileSlices}} in {{BaseHoodieTableFileIndex}} is using a
HashMap, when doing multiple queries on large tables (1000+ partitions and each
partition has 20K file groups/slices for example) on a single spark driver we
cache all the file slices for queried partitions. This overwhelms the driver
especially in the scenario when there are multiple downstream consumers for the
same table and the ingestion/spark sql is running on the same driver.
## JIRA info
- Link: https://issues.apache.org/jira/browse/HUDI-8868
- Type: Improvement
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]