[GitHub] [hudi] bvaradar commented on issue #1939: [SUPPORT] Hudi creating parquet with huge size and not in sink with limitFileSize

GitBox Sun, 09 Aug 2020 10:26:53 -0700


bvaradar commented on issue #1939:
URL: https://github.com/apache/hudi/issues/1939#issuecomment-671079032



   Regarding OOM errors, please check if which Spark stage is causing the 
failure.  You might need to tune parallelism for this. The size of parquet 
files should not be the issue. 
   
   Regarding file sizing, How did you create the initial dataset ? Did you 
change the limitFileSize parameter between commits ? What is your average 
record size. During initial commit, Hudi relies on 
hoodie.copyonwrite.record.size.estimate to estimate the average record size 
needed for file sizing. For the subsequent commits, it will auto tune based on 
previous commit metadata. May be, your record size is really large and you need 
to tune this parameter the first time you write to the dataset.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] bvaradar commented on issue #1939: [SUPPORT] Hudi creating parquet with huge size and not in sink with limitFileSize

Reply via email to