[ 
https://issues.apache.org/jira/browse/HUDI-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yue Zhang updated HUDI-5517:
----------------------------
    Fix Version/s:     (was: 0.13.1)

> HoodieTimeline support filter instants by state transition time
> ---------------------------------------------------------------
>
>                 Key: HUDI-5517
>                 URL: https://issues.apache.org/jira/browse/HUDI-5517
>             Project: Apache Hudi
>          Issue Type: New Feature
>          Components: core, incremental-query
>            Reporter: Hui An
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.14.0
>
>
> Hudi timeline can actually miss some instants if we incremental pulling from 
> upstream hudi table, which is written by several writers.
> For example, say we have 2 writers writing data to the hudi table, and the 
> last success incremental pulling end timestamp is 001
> w1 is writing 002, w2 is writing 003, if w2 is finished earlier than the w1, 
> then the incremental pulling end timestamp will be updated to 003, and 
> actually w1's commit: 002 will be skipped since it's instant time is earlier 
> than the w2's.
> We actually needs to use commit end time(state transition time) to filter the 
> commits if using incremental pulling. As w2's state transition time is 
> earlier than the w1's, so w1's data won't be filtered.
> This relates to the HUDI-1623 but not adding end time to the end of each 
> commit, instead use `FileStatus.getModificationTime` to represent the end 
> time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to