[ 
https://issues.apache.org/jira/browse/HUDI-687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-687:
------------------------------------
    Summary: incremental reads on MOR tables using RO view can lead to missing 
updates  (was: incremental reads on MOR RO tables can lead to data loss)

> incremental reads on MOR tables using RO view can lead to missing updates
> -------------------------------------------------------------------------
>
>                 Key: HUDI-687
>                 URL: https://issues.apache.org/jira/browse/HUDI-687
>             Project: Apache Hudi (incubating)
>          Issue Type: Improvement
>            Reporter: satish
>            Assignee: satish
>            Priority: Critical
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> example timeline:
> t0 -> create bucket1.parquet
> t1 -> create and append updates bucket1.log
> t2 -> request compaction 
> t3 -> create bucket2.parquet
> if compaction at t2 takes a long time, incremental reads using 
> HoodieParquetInputFormat can skip data ingested at t1 leading to 'data loss' 
> (Data will still be on disk, but incremental readers wont see it because its 
> in log file and readers move to t3)
> To workaround this problem, we want to stop returning data belonging to 
> commits > t1. After compaction is complete, incremental reader would see 
> updates in t2, t3, so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to