WTa-hash commented on issue #2229:
URL: https://github.com/apache/hudi/issues/2229#issuecomment-722808651


   > Just want to make sure if you understood compaction vs cleaning in Hudi. 
Why do you want to wait for 30 days before running compaction ? Do you mean 
cleaning the old versions 30 days back ?
   
   
   Hello. I was just giving an example of one of my concerns with the number of 
commits to trigger compaction approach.
   
   Here is my scenario:
   1) Spark structured stream queries Kinesis.
   2) Spark processed a batch
   3) Batch contains data from table X, Y, Z. My foreachBatch logic will group 
these records by table and Hudi will run 3 times using foreach table loop where 
Hudi will process each table sequentially. Hudi has 
INLINE_COMPACT_NUM_DELTA_COMMITS_PROP set to 10.
   4) Next 9 batches have data for table X and Y, but none of these batches 
contain data for table Z 
   
   Does this mean compaction will run for Table X and Y to compact data from 
batch 1 to 10, and Table Z will not be compact?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to