hudi-bot opened a new issue, #15050:
URL: https://github.com/apache/hudi/issues/15050

   Record level index uses metadata table which is a MOR table. 
   
   Each delta commit in metadata table, creates multiple hfile log blocks and 
so to read them multiple file handles has to be opened which might cause issues 
in read performance. To reduce the read performance, compaction can be run 
frequently which basically merges all the log blocks to base file and creates 
another base file. If this is done frequently, it would cause write 
amplification.
   
   Instead of merging all the log blocks to base file and doing a full 
compaction, minor compaction can be done which basically stitches log blocks 
and create one log block. 
   
   This can be achieved by adding a new action to Hudi called logcompaction, 
and it operates at log file level. Compaction is creating base files and issues 
.commit upon completion, similarly minor compaction which is basically creates 
a new log block can issue a .deltacommit commit on the timeline after 
completion.
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-3580
   - Type: Epic
   - Fix version(s):
     - 1.1.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to