[ 
https://issues.apache.org/jira/browse/HUDI-687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056186#comment-17056186
 ] 

Balaji Varadarajan commented on HUDI-687:
-----------------------------------------

cc [~vinothchandar] 

Just to be really clear, the potential race-condition happens only when doing 
incremental read using RO view (not RT) against MOR table.  In this case, 
Incremental Read will not make progress past the earliest pending compaction 
time to avoid any data-loss.

 

> incremental reads on MOR tables can lead to data loss
> -----------------------------------------------------
>
>                 Key: HUDI-687
>                 URL: https://issues.apache.org/jira/browse/HUDI-687
>             Project: Apache Hudi (incubating)
>          Issue Type: Improvement
>            Reporter: satish
>            Assignee: satish
>            Priority: Critical
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> example timeline:
> t0 -> create bucket1.parquet
> t1 -> create and append updates bucket1.log
> t2 -> request compaction 
> t3 -> create bucket2.parquet
> if compaction at t2 takes a long time, incremental reads using 
> HoodieParquetInputFormat can skip data ingested at t1 leading to 'data loss' 
> (Data will still be on disk, but incremental readers wont see it because its 
> in log file and readers move to t3)
> To workaround this problem, we want to stop returning data belonging to 
> commits > t1. After compaction is complete, incremental reader would see 
> updates in t2, t3, so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to