[jira] [Commented] (HUDI-8654) Support correct merging results with record positions in log blocks generated during pending compaction

Y Ethan Guo (Jira) Tue, 07 Jan 2025 10:17:26 -0800


    [ 
https://issues.apache.org/jira/browse/HUDI-8654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17910743#comment-17910743
 ]


Y Ethan Guo commented on HUDI-8654:
-----------------------------------

h1. Problem
When NBCC or async compaction happens, there is a chance that the positions 
generated at the time of writing the log blocks can be inaccurate for the 
snapshot read or compaction at a later time due to new file slicing.
 
The heart of the problem is that when generating the positions, it is based on 
the current base file available; the snapshot read or compaction at a later 
time can rebase the log file onto a new base file based on the completion 
time-based file slicing. If the new base file is generated from the old base 
file with deletes in log files, the positions will be wrong, and the merging 
results will also be wrong.
 
Take the following example, when writing {{.fg1_ts6.log}} , compaction {{ts7}} 
is requested, hasn't completed, so the base file to fetch positions for 
updates/deletes is {{fg1_ts1.parquet}} . After compaction happened generating 
{{fg1_ts7.parquet}} and the {{.fg1_ts6.log}} has completion time {{ts8}} , 
{{fg1_ts7.parquet}} and {{.fg1_ts6.log}} belong to the latest file slice now 
based on the completion, but the positions in {{.fg1_ts6.log}} cannot be used 
for merging against {{fg1_ts7.parquet}} . Note that there is no issue with 
positions for {{.fg1_ts2.log}} and {{.fg1_ts4.log}} since the base file 
attached to the file slice does not change over time so the positions can still 
be used for merging with correctness.
{code:java}
fg1_ts1.parquet                         (fg1_ts7.parquet)
                                         from compaction
              .fg1_ts2.log  .fg1_ts4.log                  .fg1_ts6.log
     (completion time ts3) (completion time ts5)      (completion time ts8)
                                          written before fg1_ts5.parquet is 
generated
 {code}
h1. Proposal
We should always write positions, and let the merger to decide whether to use 
positional merging for correctness.
h2. Design Option 1
Add the base instant time for the positions generated against to the log block 
header.
When doing merging, if the base instant time for positions does not match the 
base file instant time, do not use positions for merging the records in this 
log block. This is simple and straightforward and can avoid any confusion if 
file slicing, particularly the base file, changed for a log file and block.
h2. Design Option 2
No new metadata. Rely on the relationship between base instant time, log file 
instant time, and completion time to determine the base instant time for the 
positions on the fly and whether to use positional merging.
In this case, we need to determine the base instant time for the positions 
written in {{.fg1_ts6.log}} on the fly. There are two drawbacks: * the time of 
writing new base file ( {{fg1_ts7.parquet}} ) and log file ( {{.fg1_ts6.log}} ) 
may not indicate the ordering of when these files are written, e.g., 
{{fg1_ts7.parquet}} can still be written before {{.fg1_ts6.log}} . So we'll 
need a slightly complex condition to determine the base instant time for the 
positions written in the log block, which is error-prone.
 * We need to lookup completion time here, potentially reading LSM timeline, 
which is another overhead.

> Support correct merging results with record positions in log blocks generated 
> during pending compaction
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HUDI-8654
>                 URL: https://issues.apache.org/jira/browse/HUDI-8654
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: Y Ethan Guo
>            Priority: Blocker
>             Fix For: 1.0.1
>
>   Original Estimate: 20h
>  Remaining Estimate: 20h
>
> When there is a pending compaction, the new base files to be generated by 
> compaction is not available during this transaction. Given the log files in 
> MOR from this transaction can be attached to the base file generated by the 
> compaction in the latest file slice, the accurate record positions may not be 
> derived.  However, the log files written in later delta commits after 
> completed compaction have accurate positions.
> Similarly, for NBCC, the compaction can be schedule during an inflight 
> deltacommit, and in this case the log file generated by the inflight 
> deltacommit is associated with the new base file from the compaction, which 
> may have different positions because of deletes.
> We need to make sure that the file group reader with position-based merging 
> generates the correct results in such mix of log blocks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HUDI-8654) Support correct merging results with record positions in log blocks generated during pending compaction

Reply via email to