Hi Max! I like this proposal especially that proper streaming reads of deletes seem to be quite a bit of work based on recent efforts.
Giving an option to include the append parts of OVERWRITE snapshots (2) is a great quick improvement that will unblock use-cases where the iceberg table is used to store insert/upsert only data which is pretty common when storing metadata/enrichment information for example. This can then later be simply removed once proper streaming read is available. As for (3) as a default , I think it's reasonable given that this is a source of a lot of user confusion when they see that the source simply doesn't read *anything* even if new snapshots are produced to the table. There is a slight risk with 3 that it may break pipelines where the current behaviour is well understood and expected but that's probably a very small minority of the users. Cheers Gyula On Wed, Jun 25, 2025 at 9:27 PM Steven Wu <stevenz...@gmail.com> wrote: > the Flink streaming read only consumes `append` only commits. This is a > snapshot commit `DataOperation` type. You were talking about row-level > appends, delete etc. > > > 2. Add an option to read appended data of overwrite snapshots to allow > users to de-duplicate downstream (opt-in config) > > For updates it is true. But what about actual row deletion? > > > 3. Throw an error if overwrite snapshots are discovered in an > append-only scan and the option in (2) is not activated > > It would be interesting to hear what people think about this. In some > scenarios, this is probably better. > > In the scenario of primarily streaming append (and occasionally batch > overwrite), is this always a preferred behavior? > > On Wed, Jun 25, 2025 at 8:24 AM Maximilian Michels <m...@apache.org> wrote: > >> Hi, >> >> It is well known that Flink and other Iceberg engines do not support >> "merge-on-read" in streaming/incremental read mode. There are plans to >> change that, see the "Improving Merge-On-Read Query Performance" >> thread, but this is not what this message is about. >> >> I used to think that when we incrementally read from an Iceberg table >> with Flink, we would simply not apply delete files, but still see >> appends. That is not the case. Instead, we run an append-only table >> scan [1], which will silently skip over any "overwrite" [2] snapshots. >> That means that the consumer will skip appends and deletes if the >> producer runs in "merge-on-read" mode. >> >> Now, it is debatable whether it is a good idea to skip applying >> deletes, but I'm not sure ignoring "overwrite" snapshots entirely is a >> great idea either. Not every upsert effectively results in a delete + >> append. Many upserts are merely appends. Reading appends at least >> gives consumers the chance to de-duplicate later on, but skipping the >> entire snapshot definitely means data loss downstream. >> >> I think we have three options: >> >> 1. Implement "merge-on-read" efficiently for incremental / streaming >> reads (currently a work in progress) >> 2. Add an option to read appended data of overwrite snapshots to allow >> users to de-duplicate downstream (opt-in config) >> 3. Throw an error if overwrite snapshots are discovered in an >> append-only scan and the option in (2) is not activated >> >> (2) and (3) are stop gaps until we have (1). >> >> What do you think? >> >> Cheers, >> Max >> >> [1] >> https://github.com/apache/iceberg/blob/5b50afe8f2b4cf16af6c015625385021b44aca14/flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/source/FlinkSplitPlanner.java#L89 >> [2] >> https://github.com/apache/iceberg/blob/5b50afe8f2b4cf16af6c015625385021b44aca14/api/src/main/java/org/apache/iceberg/DataOperations.java#L32 >> >