Re: [I] [SUPPORT]Data Loss Issue with Hudi Table After 3 Days of Continuous Writes [hudi]

via GitHub Fri, 19 Apr 2024 03:24:42 -0700


danny0405 commented on issue #11016:
URL: https://github.com/apache/hudi/issues/11016#issuecomment-2066279351


   > but the issue is that we can't access older data.
   
   If you table is ingested in streaming `upsert`, then you just specify the 
`read.start-commit` as the first commit instant time on the timeline, and skip 
the compaction. Only  instant that has not been cleaned can be consumed.
   
   It actually depends on how you write the history dataset, because 
`bulk_insert` does not guarantee the payload sequence of one key, so if the 
table is boostraped with `bulk_insert`, the only way is to consume from 
`earliest`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] [SUPPORT]Data Loss Issue with Hudi Table After 3 Days of Continuous Writes [hudi]

Reply via email to