[ https://issues.apache.org/jira/browse/HUDI-687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
satish updated HUDI-687: ------------------------ Summary: incremental reads on MOR RO tables can lead to data loss (was: incremental reads on MOR tables can lead to data loss) > incremental reads on MOR RO tables can lead to data loss > -------------------------------------------------------- > > Key: HUDI-687 > URL: https://issues.apache.org/jira/browse/HUDI-687 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Reporter: satish > Assignee: satish > Priority: Critical > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > example timeline: > t0 -> create bucket1.parquet > t1 -> create and append updates bucket1.log > t2 -> request compaction > t3 -> create bucket2.parquet > if compaction at t2 takes a long time, incremental reads using > HoodieParquetInputFormat can skip data ingested at t1 leading to 'data loss' > (Data will still be on disk, but incremental readers wont see it because its > in log file and readers move to t3) > To workaround this problem, we want to stop returning data belonging to > commits > t1. After compaction is complete, incremental reader would see > updates in t2, t3, so on. -- This message was sent by Atlassian Jira (v8.3.4#803005)