[
https://issues.apache.org/jira/browse/HUDI-8654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17910743#comment-17910743
]
Y Ethan Guo commented on HUDI-8654:
-----------------------------------
h1. Problem
When NBCC or async compaction happens, there is a chance that the positions
generated at the time of writing the log blocks can be inaccurate for the
snapshot read or compaction at a later time due to new file slicing.
The heart of the problem is that when generating the positions, it is based on
the current base file available; the snapshot read or compaction at a later
time can rebase the log file onto a new base file based on the completion
time-based file slicing. If the new base file is generated from the old base
file with deletes in log files, the positions will be wrong, and the merging
results will also be wrong.
Take the following example, when writing {{.fg1_ts6.log}} , compaction {{ts7}}
is requested, hasn't completed, so the base file to fetch positions for
updates/deletes is {{fg1_ts1.parquet}} . After compaction happened generating
{{fg1_ts7.parquet}} and the {{.fg1_ts6.log}} has completion time {{ts8}} ,
{{fg1_ts7.parquet}} and {{.fg1_ts6.log}} belong to the latest file slice now
based on the completion, but the positions in {{.fg1_ts6.log}} cannot be used
for merging against {{fg1_ts7.parquet}} . Note that there is no issue with
positions for {{.fg1_ts2.log}} and {{.fg1_ts4.log}} since the base file
attached to the file slice does not change over time so the positions can still
be used for merging with correctness.
{code:java}
fg1_ts1.parquet (fg1_ts7.parquet)
from compaction
.fg1_ts2.log .fg1_ts4.log .fg1_ts6.log
(completion time ts3) (completion time ts5) (completion time ts8)
written before fg1_ts5.parquet is
generated
{code}
h1. Proposal
We should always write positions, and let the merger to decide whether to use
positional merging for correctness.
h2. Design Option 1
Add the base instant time for the positions generated against to the log block
header.
When doing merging, if the base instant time for positions does not match the
base file instant time, do not use positions for merging the records in this
log block. This is simple and straightforward and can avoid any confusion if
file slicing, particularly the base file, changed for a log file and block.
h2. Design Option 2
No new metadata. Rely on the relationship between base instant time, log file
instant time, and completion time to determine the base instant time for the
positions on the fly and whether to use positional merging.
In this case, we need to determine the base instant time for the positions
written in {{.fg1_ts6.log}} on the fly. There are two drawbacks: * the time of
writing new base file ( {{fg1_ts7.parquet}} ) and log file ( {{.fg1_ts6.log}} )
may not indicate the ordering of when these files are written, e.g.,
{{fg1_ts7.parquet}} can still be written before {{.fg1_ts6.log}} . So we'll
need a slightly complex condition to determine the base instant time for the
positions written in the log block, which is error-prone.
* We need to lookup completion time here, potentially reading LSM timeline,
which is another overhead.
> Support correct merging results with record positions in log blocks generated
> during pending compaction
> -------------------------------------------------------------------------------------------------------
>
> Key: HUDI-8654
> URL: https://issues.apache.org/jira/browse/HUDI-8654
> Project: Apache Hudi
> Issue Type: Sub-task
> Reporter: Y Ethan Guo
> Priority: Blocker
> Fix For: 1.0.1
>
> Original Estimate: 20h
> Remaining Estimate: 20h
>
> When there is a pending compaction, the new base files to be generated by
> compaction is not available during this transaction. Given the log files in
> MOR from this transaction can be attached to the base file generated by the
> compaction in the latest file slice, the accurate record positions may not be
> derived. However, the log files written in later delta commits after
> completed compaction have accurate positions.
> Similarly, for NBCC, the compaction can be schedule during an inflight
> deltacommit, and in this case the log file generated by the inflight
> deltacommit is associated with the new base file from the compaction, which
> may have different positions because of deletes.
> We need to make sure that the file group reader with position-based merging
> generates the correct results in such mix of log blocks.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)