> Consequently, we must throw on DELETE snapshots, even if users opt-in to reading appends of OVERWRITE snapshots.
OVERWRITE snapshots themselves could still contain deletes. So in this regard, I don't see a difference between the DELETE and the OVERWRITE snapshots. Maximilian Michels <m...@apache.org> ezt írta (időpont: 2025. jún. 26., Cs, 11:03): > Thank you for your feedback Steven and Gyula! > > @Steven > >> the Flink streaming read only consumes `append` only commits. This is a >> snapshot commit `DataOperation` type. You were talking about row-level >> appends, delete etc. > > > Yes, there is no doubt the fact that Flink streaming reads only process > append snapshots. But the issue here is that it jumps over "overwrite" > snapshots, which contains both new data files and delete files. I don't > think users are generally aware of this. Even worse, reading that way leads > to an inconsistent view on the table state, which is a serious data > integrity issue. > > >>2. Add an option to read appended data of overwrite snapshots to allow > >>users to de-duplicate downstream (opt-in config) > >For updates it is true. But what about actual row deletion? > > Good point! I had only considered upserts, but deletes can definitely not > be reconstructed like it is the case for upserts via downstream > deduplication. Consequently, we must throw on DELETE snapshots, even if > users opt-in to reading appends of OVERWRITE snapshots. > > It would be interesting to hear what people think about this. In some >> scenarios, this is probably better. > > In the scenario of primarily streaming append (and occasionally batch >> overwrite), is this always a preferred behavior? > > > I think so, but I'm also curious about what others think. > > @Gyula > >> There is a slight risk with 3 that it may break pipelines where the >> current behaviour is well understood and expected but that's probably a >> very small minority of the users. > > > Agreed, that is a possible risk, but if we consider the hidden risk of > effectively "losing" data, it looks like a small risk to take. > > -Max > > > On Thu, Jun 26, 2025 at 9:18 AM Gyula Fóra <gyula.f...@gmail.com> wrote: > >> Hi Max! >> >> I like this proposal especially that proper streaming reads of deletes >> seem to be quite a bit of work based on recent efforts. >> >> Giving an option to include the append parts of OVERWRITE snapshots (2) >> is a great quick improvement that will unblock use-cases where the iceberg >> table is used to store insert/upsert only data which is pretty common when >> storing metadata/enrichment information for example. This can then later be >> simply removed once proper streaming read is available. >> >> As for (3) as a default , I think it's reasonable given that this is a >> source of a lot of user confusion when they see that the source simply >> doesn't read *anything* even if new snapshots are produced to the table. >> There is a slight risk with 3 that it may break pipelines where the >> current behaviour is well understood and expected but that's probably a >> very small minority of the users. >> >> Cheers >> Gyula >> >> On Wed, Jun 25, 2025 at 9:27 PM Steven Wu <stevenz...@gmail.com> wrote: >> >>> the Flink streaming read only consumes `append` only commits. This is a >>> snapshot commit `DataOperation` type. You were talking about row-level >>> appends, delete etc. >>> >>> > 2. Add an option to read appended data of overwrite snapshots to allow >>> users to de-duplicate downstream (opt-in config) >>> >>> For updates it is true. But what about actual row deletion? >>> >>> > 3. Throw an error if overwrite snapshots are discovered in an >>> append-only scan and the option in (2) is not activated >>> >>> It would be interesting to hear what people think about this. In some >>> scenarios, this is probably better. >>> >>> In the scenario of primarily streaming append (and occasionally batch >>> overwrite), is this always a preferred behavior? >>> >>> On Wed, Jun 25, 2025 at 8:24 AM Maximilian Michels <m...@apache.org> >>> wrote: >>> >>>> Hi, >>>> >>>> It is well known that Flink and other Iceberg engines do not support >>>> "merge-on-read" in streaming/incremental read mode. There are plans to >>>> change that, see the "Improving Merge-On-Read Query Performance" >>>> thread, but this is not what this message is about. >>>> >>>> I used to think that when we incrementally read from an Iceberg table >>>> with Flink, we would simply not apply delete files, but still see >>>> appends. That is not the case. Instead, we run an append-only table >>>> scan [1], which will silently skip over any "overwrite" [2] snapshots. >>>> That means that the consumer will skip appends and deletes if the >>>> producer runs in "merge-on-read" mode. >>>> >>>> Now, it is debatable whether it is a good idea to skip applying >>>> deletes, but I'm not sure ignoring "overwrite" snapshots entirely is a >>>> great idea either. Not every upsert effectively results in a delete + >>>> append. Many upserts are merely appends. Reading appends at least >>>> gives consumers the chance to de-duplicate later on, but skipping the >>>> entire snapshot definitely means data loss downstream. >>>> >>>> I think we have three options: >>>> >>>> 1. Implement "merge-on-read" efficiently for incremental / streaming >>>> reads (currently a work in progress) >>>> 2. Add an option to read appended data of overwrite snapshots to allow >>>> users to de-duplicate downstream (opt-in config) >>>> 3. Throw an error if overwrite snapshots are discovered in an >>>> append-only scan and the option in (2) is not activated >>>> >>>> (2) and (3) are stop gaps until we have (1). >>>> >>>> What do you think? >>>> >>>> Cheers, >>>> Max >>>> >>>> [1] >>>> https://github.com/apache/iceberg/blob/5b50afe8f2b4cf16af6c015625385021b44aca14/flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/source/FlinkSplitPlanner.java#L89 >>>> [2] >>>> https://github.com/apache/iceberg/blob/5b50afe8f2b4cf16af6c015625385021b44aca14/api/src/main/java/org/apache/iceberg/DataOperations.java#L32 >>>> >>>