Alexey Kudinkin created HUDI-5385:
-------------------------------------

             Summary: Make behavior of keeping File Writers open configurable
                 Key: HUDI-5385
                 URL: https://issues.apache.org/jira/browse/HUDI-5385
             Project: Apache Hudi
          Issue Type: Bug
          Components: spark
    Affects Versions: 0.121
            Reporter: Alexey Kudinkin
             Fix For: 0.13.0


Currently, when writing in Spark we will be keeping the File Writers for 
individual partitions open as long as we're processing the batch which entails 
that all of the data written out will be kept in memory (at least the last 
row-group in case of Parquet writers) until batch is fully processed and all of 
the writers are closed.

While this allows us to better control how many files are created in every 
partition (we keep the writer open and hence we don't need to create a new file 
when a new record comes in), this brings a penalty of keeping all of the data 
in memory potentially leading to OOMs, longer GC cycles, etc



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to