jtmzheng commented on issue #2408:
URL: https://github.com/apache/hudi/issues/2408#issuecomment-774513919


   @nsivabalan I have not encountered the issue again after temporarily 
lowering `hoodie.commits.archival.batch` which cleared out the large commit 
files being loaded for archival. I believe @umehrot2 identified the right root 
cause/bug in https://github.com/apache/hudi/issues/2408#issuecomment-758320870 
(first one). I think these large commits were generated after I added the 
option `hoodie.cleaner.commits.retained:1` but I'm not sure (it lined up 
timeline-wise and that change caused the dataset size to shrink drastically)
   
   Some context:
   - dataset was always indexed with 0.6.0 (no upgrade)
   - we are trying to productionize a dataset in Hudi data lake, but it is not 
there yet.
   - this is also our first time working with Hudi 
   
   I think this issue can be closed as a support request though would be great 
to understand the different archival configs better (couldn't find good 
documentation on these)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to