[GitHub] [hudi] jtmzheng commented on issue #2408: [SUPPORT] OutOfMemory on upserting into MOR dataset

2021-02-06 Thread GitBox


jtmzheng commented on issue #2408:
URL: https://github.com/apache/hudi/issues/2408#issuecomment-774513919


   @nsivabalan I have not encountered the issue again after temporarily 
lowering `hoodie.commits.archival.batch` which cleared out the large commit 
files being loaded for archival. I believe @umehrot2 identified the right root 
cause/bug in https://github.com/apache/hudi/issues/2408#issuecomment-758320870 
(first one). I think these large commits were generated after I added the 
option `hoodie.cleaner.commits.retained:1` but I'm not sure (it lined up 
timeline-wise and that change caused the dataset size to shrink drastically)
   
   Some context:
   - dataset was always indexed with 0.6.0 (no upgrade)
   - we are trying to productionize a dataset in Hudi data lake, but it is not 
there yet.
   - this is also our first time working with Hudi 
   
   I think this issue can be closed as a support request though would be great 
to understand the different archival configs better (couldn't find good 
documentation on these)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] jtmzheng commented on issue #2408: [SUPPORT] OutOfMemory on upserting into MOR dataset

2021-01-11 Thread GitBox


jtmzheng commented on issue #2408:
URL: https://github.com/apache/hudi/issues/2408#issuecomment-758360941


   Thanks Udit! I'd tried setting `hoodie.commits.archival.batch` to 5 earlier 
today after going through the source code - that got my application back and 
running again. 
   
   The first bug definitely seems like the root cause, after turning on more 
verbose logging I found several 300mb commit files being loaded in for archival 
before the crash (re: the second bug 
https://github.com/apache/hudi/blob/e3d3677b7e7899705b624925666317f0c074f7c7/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTimelineArchiveLog.java#L353
 clears the list, which isn't the most intuitive). It seems like these large 
commit files were generated when I set `hoodie.cleaner.commits.retained` to 1.
   
   What is the trade-off in lowering `hoodie.keep.max.commits` and 
`hoodie.keep.min.commits`? I couldn't find much good documentation on the 
archival process/configs.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] jtmzheng commented on issue #2408: [SUPPORT] OutOfMemory on upserting into MOR dataset

2021-01-07 Thread GitBox


jtmzheng commented on issue #2408:
URL: https://github.com/apache/hudi/issues/2408#issuecomment-756289589


   Thank you, I have also tried setting `hoodie.write.buffer.limit.bytes` as 
per https://github.com/apache/hudi/issues/1491#issuecomment-610626104 but it 
still OOMs in same way (this config seems undocumented and defaults to 
4*1024*1024 bytes, not sure the rationale on changing it to 131072 as per that 
comment)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org