[ https://issues.apache.org/jira/browse/HUDI-5990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated HUDI-5990: --------------------------------- Labels: pull-request-available (was: ) > Incremental queries on MOR sometimes miss data > ---------------------------------------------- > > Key: HUDI-5990 > URL: https://issues.apache.org/jira/browse/HUDI-5990 > Project: Apache Hudi > Issue Type: Bug > Components: spark-sql > Affects Versions: 0.12.2, 0.13.0 > Reporter: ruofan > Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > > env: hudi-0.12.2 spark-3.2.0 > Currently,we have a hudi timeline and data files. > {code:java} > -rw-r--r-- 1 rfyu rfyu 1.5K 3月 26 09:58 20230326095758155.deltacommit > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:57 > 20230326095758155.deltacommit.inflight > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:57 > 20230326095758155.deltacommit.requested > -rw-r--r-- 1 rfyu rfyu 1.6K 3月 26 09:58 20230326095810406.deltacommit > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:58 > 20230326095810406.deltacommit.inflight > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:58 > 20230326095810406.deltacommit.requested > -rw-r--r-- 1 rfyu rfyu 1.7K 3月 26 09:58 20230326095811072.deltacommit > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:58 > 20230326095811072.deltacommit.inflight > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:58 > 20230326095811072.deltacommit.requested > -rw-r--r-- 1 rfyu rfyu 1.7K 3月 26 09:58 20230326095820974.deltacommit > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:58 > 20230326095820974.deltacommit.inflight > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:58 > 20230326095820974.deltacommit.requested > -rw-r--r-- 1 rfyu rfyu 1.8K 3月 26 09:58 20230326095830980.deltacommit > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:58 > 20230326095830980.deltacommit.inflight > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:58 > 20230326095830980.deltacommit.requested > -rw-r--r-- 1 rfyu rfyu 1.8K 3月 26 09:58 > 20230326095840978.compaction.requested > -rw-r--r-- 1 rfyu rfyu 1.5K 3月 26 09:58 20230326095841125.deltacommit > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:58 > 20230326095841125.deltacommit.inflight > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:58 > 20230326095841125.deltacommit.requested > -rw-r--r-- 1 rfyu rfyu 1.6K 3月 26 09:59 20230326095850994.deltacommit > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:58 > 20230326095850994.deltacommit.inflight > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:58 > 20230326095850994.deltacommit.requested > -rw-r--r-- 1 rfyu rfyu 1.7K 3月 26 09:59 20230326095900988.deltacommit > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:59 > 20230326095900988.deltacommit.inflight > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:59 > 20230326095900988.deltacommit.requested > -rw-r--r-- 1 rfyu rfyu 1.7K 3月 26 09:59 20230326095910983.deltacommit > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:59 > 20230326095910983.deltacommit.inflight > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:59 > 20230326095910983.deltacommit.requested > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:59 > 20230326095920986.deltacommit.inflight > -rw-r--r-- 1 rfyu rfyu 0 3月 26 09:59 > 20230326095920986.deltacommit.requested > -rw-r--r-- 1 rfyu rfyu 1.5K 3月 26 09:58 > .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.1_0-1-0 > -rw-r--r-- 1 rfyu rfyu 3.0K 3月 26 09:58 > .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.2_0-1-0 > -rw-r--r-- 1 rfyu rfyu 3.0K 3月 26 09:58 > .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.3_0-1-0 > -rw-r--r-- 1 rfyu rfyu 3.0K 3月 26 09:58 > .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.4_0-1-0 > -rw-r--r-- 1 rfyu rfyu 3.0K 3月 26 09:58 > .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.5_0-1-0 > -rw-r--r-- 1 rfyu rfyu 3.0K 3月 26 09:58 > .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095840978.log.1_0-1-0 > -rw-r--r-- 1 rfyu rfyu 3.0K 3月 26 09:59 > .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095840978.log.2_0-1-0 > -rw-r--r-- 1 rfyu rfyu 3.0K 3月 26 09:59 > .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095840978.log.3_0-1-0 > -rw-r--r-- 1 rfyu rfyu 3.0K 3月 26 09:59 > .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095840978.log.4_0-1-0 {code} > We use spark to incrementally query this hudi table. Data maybe go missing > due to the incremental range contains an incomplete compaction plan. > There is an example of incremental query.Normally, from begin_instance_time > to end_instance_time, 6 commits should have been found, but only 3 were found. > {code:java} > sql: > call > copy_to_table(table=>'hudi_table',new_table=>'incremental_table',query_type=>'incremental',begin_instance_time=>'20230326095810406',end_instance_time=>'20230326095900988'); > select _hoodie_commit_time,count(*) from incremental_table group by > _hoodie_commit_time order by _hoodie_commit_time desc; > actual result: > +-------------------+--------+ > |_hoodie_commit_time|count(1)| > +-------------------+--------+ > |20230326095830980 |10 | > |20230326095820974 |10 | > |20230326095811072 |10 | > +-------------------+--------+ > expected result: > +-------------------+--------+ > |_hoodie_commit_time|count(1)| > +-------------------+--------+ > |20230326095830980 |10 | > |20230326095820974 |10 | > |20230326095811072 |10 | > |20230326095841125 |10 | > |20230326095850994 |10 | > |20230326095900988 |10 | {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)