[ https://issues.apache.org/jira/browse/HUDI-2751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexey Kudinkin updated HUDI-2751: ---------------------------------- Status: Open (was: In Progress) > To avoid the duplicates for streaming read MOR table > ---------------------------------------------------- > > Key: HUDI-2751 > URL: https://issues.apache.org/jira/browse/HUDI-2751 > Project: Apache Hudi > Issue Type: Task > Components: Common Core > Reporter: Danny Chen > Assignee: Alexey Kudinkin > Priority: Blocker > Fix For: 0.11.0 > > > Imagine there are commits on the timeline: > inflight compaction complete compaction > | | > {code:java} > -----instant 99 - instant 100 ----- 101 — 102 ------ instant 100 ---------- > first read ->| second read ->| > – range 1 ----| ----------------------range 2 -------------------| > {code} > instant 99, 101, 102 are successful non-compaction delta commits; > instant 100 is compaction instant, > the first inc read consumes to instant 99 and the second read consumes from > instant 100 to instant 102, the second read would consumes the commit files > of instant 100 which has already been consumed before. > The duplicate reading happens when this condition triggers: a compaction > instant schedules then completes in *one* consume range. -- This message was sent by Atlassian Jira (v8.20.1#820001)