[GitHub] [hudi] jtmzheng commented on issue #2408: [SUPPORT] OutOfMemory on upserting into MOR dataset
jtmzheng commented on issue #2408: URL: https://github.com/apache/hudi/issues/2408#issuecomment-774513919 @nsivabalan I have not encountered the issue again after temporarily lowering `hoodie.commits.archival.batch` which cleared out the large commit files being loaded for archival. I believe @umehrot2 identified the right root cause/bug in https://github.com/apache/hudi/issues/2408#issuecomment-758320870 (first one). I think these large commits were generated after I added the option `hoodie.cleaner.commits.retained:1` but I'm not sure (it lined up timeline-wise and that change caused the dataset size to shrink drastically) Some context: - dataset was always indexed with 0.6.0 (no upgrade) - we are trying to productionize a dataset in Hudi data lake, but it is not there yet. - this is also our first time working with Hudi I think this issue can be closed as a support request though would be great to understand the different archival configs better (couldn't find good documentation on these) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] jtmzheng commented on issue #2408: [SUPPORT] OutOfMemory on upserting into MOR dataset
jtmzheng commented on issue #2408: URL: https://github.com/apache/hudi/issues/2408#issuecomment-758360941 Thanks Udit! I'd tried setting `hoodie.commits.archival.batch` to 5 earlier today after going through the source code - that got my application back and running again. The first bug definitely seems like the root cause, after turning on more verbose logging I found several 300mb commit files being loaded in for archival before the crash (re: the second bug https://github.com/apache/hudi/blob/e3d3677b7e7899705b624925666317f0c074f7c7/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTimelineArchiveLog.java#L353 clears the list, which isn't the most intuitive). It seems like these large commit files were generated when I set `hoodie.cleaner.commits.retained` to 1. What is the trade-off in lowering `hoodie.keep.max.commits` and `hoodie.keep.min.commits`? I couldn't find much good documentation on the archival process/configs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] jtmzheng commented on issue #2408: [SUPPORT] OutOfMemory on upserting into MOR dataset
jtmzheng commented on issue #2408: URL: https://github.com/apache/hudi/issues/2408#issuecomment-756289589 Thank you, I have also tried setting `hoodie.write.buffer.limit.bytes` as per https://github.com/apache/hudi/issues/1491#issuecomment-610626104 but it still OOMs in same way (this config seems undocumented and defaults to 4*1024*1024 bytes, not sure the rationale on changing it to 131072 as per that comment) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org