bvaradar commented on issue #1939: URL: https://github.com/apache/hudi/issues/1939#issuecomment-671079032
Regarding OOM errors, please check if which Spark stage is causing the failure. You might need to tune parallelism for this. The size of parquet files should not be the issue. Regarding file sizing, How did you create the initial dataset ? Did you change the limitFileSize parameter between commits ? What is your average record size. During initial commit, Hudi relies on hoodie.copyonwrite.record.size.estimate to estimate the average record size needed for file sizing. For the subsequent commits, it will auto tune based on previous commit metadata. May be, your record size is really large and you need to tune this parameter the first time you write to the dataset. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org