[ https://issues.apache.org/jira/browse/HUDI-5385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Prashant Wason updated HUDI-5385: --------------------------------- Fix Version/s: 0.14.1 (was: 0.14.0) > Make behavior of keeping File Writers open configurable > ------------------------------------------------------- > > Key: HUDI-5385 > URL: https://issues.apache.org/jira/browse/HUDI-5385 > Project: Apache Hudi > Issue Type: Bug > Components: spark > Affects Versions: 0.12.1 > Reporter: Alexey Kudinkin > Priority: Critical > Fix For: 0.14.1 > > > Currently, when writing in Spark we will be keeping the File Writers for > individual partitions open as long as we're processing the batch which > entails that all of the data written out will be kept in memory (at least the > last row-group in case of Parquet writers) until batch is fully processed and > all of the writers are closed. > While this allows us to better control how many files are created in every > partition (we keep the writer open and hence we don't need to create a new > file when a new record comes in), this brings a penalty of keeping all of the > data in memory potentially leading to OOMs, longer GC cycles, etc -- This message was sent by Atlassian Jira (v8.20.10#820010)