[ https://issues.apache.org/jira/browse/HUDI-2751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
sivabalan narayanan updated HUDI-2751: -------------------------------------- Fix Version/s: 0.12.0 (was: 0.11.0) > To avoid the duplicates for streaming read MOR table > ---------------------------------------------------- > > Key: HUDI-2751 > URL: https://issues.apache.org/jira/browse/HUDI-2751 > Project: Apache Hudi > Issue Type: Task > Components: Common Core > Reporter: Danny Chen > Assignee: sivabalan narayanan > Priority: Blocker > Fix For: 0.12.0 > > > Imagine there are commits on the timeline: > {noformat} > -----delta-99 ----- commit 100(include 99 delta data > set) ----- delta-101 ----- delta-102 ----- > first read ->| second read -> > – range 1 ---| ----------------------range 2 > -------------------| > {noformat} > instant 99, 101, 102 are successful non-compaction delta commits; > instant 100 is successful compaction instant. > The first inc read consumes to instant 99 and the second read consumes from > instant 100 to instant 102, the second read would consumes the commit files > of instant 100 which has already been consumed before. > The duplicate reading happens when this condition triggers: a compaction > instant schedules then completes in *one* consume range. -- This message was sent by Atlassian Jira (v8.20.1#820001)