Hi Surya,

This is a very interesting idea! I'll be looking forward to RFC.

I have a few high-level questions:

1. Trying to understand the main goal. Is it to balance the tradeoff
between read and write amplification for metadata table? Or is it purely to
optimize for reads?
2. Why do we need a separate action? Why can't any of the existing
compaction strategies (or a new one if needed) help to achieve this?
3. Is the proposed LogCompaction a replacement for regular compaction for
metadata table i.e. if LogCompaction is enabled then compaction cannot be
done?

Regards,
Sagar

On Thu, Mar 17, 2022 at 12:51 AM Surya Prasanna <[email protected]>
wrote:

> Hi Team,
>
>
> Record level index uses a metadata table which is a MOR table type.
>
> Each delta commit in the metadata table creates multiple hfile log blocks
> and so to read them multiple file handles have to be opened which might
> cause issues in read performance. To reduce the read performance,
> compaction can be run frequently which basically merges all the log blocks
> to base file and creates another version of base file. If this is done
> frequently, it would cause write amplification.
>
> Instead of merging all the log blocks to base file and doing a full
> compaction, minor compaction can be done which basically merges log blocks
> and creates one new log block.
>
> This can be achieved by adding a new action to Hudi called LogCompaction
> and requires a RFC. Please let me know what you think.
>
>
> Thanks,
>
> Surya
>

Reply via email to