[ https://issues.apache.org/jira/browse/HUDI-5176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ethan Guo updated HUDI-5176: ---------------------------- Status: In Progress (was: Open) > Incremental source may miss commits if there are inflight commits before > completed commits > ------------------------------------------------------------------------------------------ > > Key: HUDI-5176 > URL: https://issues.apache.org/jira/browse/HUDI-5176 > Project: Apache Hudi > Issue Type: Bug > Components: incremental-query > Reporter: Ethan Guo > Assignee: Ethan Guo > Priority: Blocker > Fix For: 0.12.2 > > > Consider the following scenario of concurrent writers. Writer 1 starts a > commit at t1 and later writer 2 starts another commit at t2 (t2 > t1). Commit > t2 finishes earlier than t1. > {code:java} > ---------------------------------------------------------> t > instant t1 |------------------------------| (writer 1) > instant t2 |--------------| (writer 2) {code} > This leaves an inflight commit (t1) before a completed commit (t2) on the > Hudi timeline. Given that the incremental pull uses only completed commits > to determine the start and end instants for incremental query and advance the > checkpoint, the data for the inflight commits may never be pulled from the > incremental source. > -- This message was sent by Atlassian Jira (v8.20.10#820010)