Thanks for starting a thread, Jack! I am yet to go through the proposal. I recently came across a similar idea in BigQuery, which relies on a staleness threshold: https://cloud.google.com/blog/products/data-analytics/bigquery-gains-change-data-capture-functionality/ <https://cloud.google.com/blog/products/data-analytics/bigquery-gains-change-data-capture-functionality/>
It would also be nice to check if there are any applicable ideas in Paimon: https://github.com/apache/incubator-paimon/ <https://github.com/apache/incubator-paimon/> - Anton > On Apr 26, 2023, at 11:32 AM, Jack Ye <[email protected]> wrote: > > Hi everyone, > > As we discussed in the community sync, it looks like we have some general > interest in improving the CDC streaming process. Dan mentioned that Ryan has > a proposal about an alternative CDC approach that has an accumulated > changelog that is periodically synced to a target table. > > I have a very similar design doc I have been working on for quite some time > to describe a set of improvements we could do to the Iceberg CDC use case, > and it contains a very similar improvement (see improvement 3). > > I would appreciate feedback from the community about this doc, and I can > organize some meetings to discuss our thoughts about this topic afterwards. > > Doc link: > https://docs.google.com/document/d/1kyyJp4masbd1FrIKUHF1ED_z1hTARL8bNoKCgb7fhSQ/edit# > > <https://docs.google.com/document/d/1kyyJp4masbd1FrIKUHF1ED_z1hTARL8bNoKCgb7fhSQ/edit#> > > Best, > Jack Ye
