Thanks Jack for the great write-up. Good summary of the current landscape of CDC too. Left a few comments to discuss.
On Wed, Apr 26, 2023 at 11:38 AM Anton Okolnychyi <aokolnyc...@apple.com.invalid> wrote: > Thanks for starting a thread, Jack! I am yet to go through the proposal. > > I recently came across a similar idea in BigQuery, which relies on a > staleness threshold: > > https://cloud.google.com/blog/products/data-analytics/bigquery-gains-change-data-capture-functionality/ > > It would also be nice to check if there are any applicable ideas in Paimon: > https://github.com/apache/incubator-paimon/ > > - Anton > > On Apr 26, 2023, at 11:32 AM, Jack Ye <yezhao...@gmail.com> wrote: > > Hi everyone, > > As we discussed in the community sync, it looks like we have some general > interest in improving the CDC streaming process. Dan mentioned that Ryan > has a proposal about an alternative CDC approach that has an accumulated > changelog that is periodically synced to a target table. > > I have a very similar design doc I have been working on for quite some > time to describe a set of improvements we could do to the Iceberg CDC use > case, and it contains a very similar improvement (see improvement 3). > > I would appreciate feedback from the community about this doc, and I can > organize some meetings to discuss our thoughts about this topic afterwards. > > Doc link: > https://docs.google.com/document/d/1kyyJp4masbd1FrIKUHF1ED_z1hTARL8bNoKCgb7fhSQ/edit# > > Best, > Jack Ye > > >