I would try to avoid breaking the current behaviour. Maybe after some grace period it could be ok, but not "suddenly"
<gyula.f...@gmail.com> ezt írta (időpont: 2025. jún. 27., P, 15:28): > Sounds good to me! > > Gyula > Sent from my iPhone > > On 27 Jun 2025, at 13:48, Maximilian Michels <m...@apache.org> wrote: > > > In my understanding, overwrite snapshots are specifically there to > overwrite data, i.e. there is always an append for a delete. That is also > the case for the Flink Iceberg sink. There may be other writers, which use > overwrite snapshots differently. But point taken, we may need to also > opt-in to skipping over delete snapshots, in addition to overwrite > snapshots. > > I've taken a look at how Spark handles this topic. Turns out, they came up > with a similar approach: > > [Spark] Iceberg only supports reading data from append snapshots. >> Overwrite snapshots cannot be processed and will cause an exception by >> default. Overwrites may be ignored by setting >> streaming-skip-overwrite-snapshots=true. Similarly, delete snapshots >> will cause an exception by default, and deletes may be ignored by setting >> streaming-skip-delete-snapshots=true. > > > https://iceberg.apache.org/docs/1.8.1/spark-structured-streaming/#streaming-reads > > Spark fails by default and has options to ignore overwrite and delete > snapshots entirely. For Flink, perhaps we can make the following changes to > align with Spark and also provide some flexibility for users: > > For Flink incremental / streaming reads: > A. By default, throw on non-append snapshots (like Spark, important for > maintaining an accurate table view) > B. Add options to skip overwrite / delete snapshots (like Spark) > C. Add an option to read append data of overwrite snapshots (allow down > the line deduplication of upserts) > > What do you think? > > Thanks, > Max > > On Thu, Jun 26, 2025 at 5:38 PM Gyula Fóra <gyula.f...@gmail.com> wrote: > >> I agree with Peter, it would be weird to get an error on a DELETE >> snapshot if you already explicitly opted in for reading the appends of >> OVERWRITE snapshots. >> >> Users may not be able to control the type of snapshot to be created so >> this would otherwise render this feature useless. >> >> Gyula >> >> On Thu, Jun 26, 2025 at 5:34 PM Péter Váry <peter.vary.apa...@gmail.com> >> wrote: >> >>> > Consequently, we must throw on DELETE snapshots, even if users opt-in >>> to reading appends of OVERWRITE snapshots. >>> >>> OVERWRITE snapshots themselves could still contain deletes. So in this >>> regard, I don't see a difference between the DELETE and the OVERWRITE >>> snapshots. >>> >>> Maximilian Michels <m...@apache.org> ezt írta (időpont: 2025. jún. 26., >>> Cs, 11:03): >>> >>>> Thank you for your feedback Steven and Gyula! >>>> >>>> @Steven >>>> >>>>> the Flink streaming read only consumes `append` only commits. This is >>>>> a snapshot commit `DataOperation` type. You were talking about row-level >>>>> appends, delete etc. >>>> >>>> >>>> Yes, there is no doubt the fact that Flink streaming reads only process >>>> append snapshots. But the issue here is that it jumps over "overwrite" >>>> snapshots, which contains both new data files and delete files. I don't >>>> think users are generally aware of this. Even worse, reading that way leads >>>> to an inconsistent view on the table state, which is a serious data >>>> integrity issue. >>>> >>>> >>2. Add an option to read appended data of overwrite snapshots to allow >>>> >>users to de-duplicate downstream (opt-in config) >>>> >For updates it is true. But what about actual row deletion? >>>> >>>> Good point! I had only considered upserts, but deletes can definitely >>>> not be reconstructed like it is the case for upserts via downstream >>>> deduplication. Consequently, we must throw on DELETE snapshots, even if >>>> users opt-in to reading appends of OVERWRITE snapshots. >>>> >>>> It would be interesting to hear what people think about this. In some >>>>> scenarios, this is probably better. >>>> >>>> In the scenario of primarily streaming append (and occasionally batch >>>>> overwrite), is this always a preferred behavior? >>>> >>>> >>>> I think so, but I'm also curious about what others think. >>>> >>>> @Gyula >>>> >>>>> There is a slight risk with 3 that it may break pipelines where the >>>>> current behaviour is well understood and expected but that's probably a >>>>> very small minority of the users. >>>> >>>> >>>> Agreed, that is a possible risk, but if we consider the hidden risk of >>>> effectively "losing" data, it looks like a small risk to take. >>>> >>>> -Max >>>> >>>> >>>> On Thu, Jun 26, 2025 at 9:18 AM Gyula Fóra <gyula.f...@gmail.com> >>>> wrote: >>>> >>>>> Hi Max! >>>>> >>>>> I like this proposal especially that proper streaming reads of deletes >>>>> seem to be quite a bit of work based on recent efforts. >>>>> >>>>> Giving an option to include the append parts of OVERWRITE snapshots >>>>> (2) is a great quick improvement that will unblock use-cases where the >>>>> iceberg table is used to store insert/upsert only data which is pretty >>>>> common when storing metadata/enrichment information for example. This can >>>>> then later be simply removed once proper streaming read is available. >>>>> >>>>> As for (3) as a default , I think it's reasonable given that this is a >>>>> source of a lot of user confusion when they see that the source simply >>>>> doesn't read *anything* even if new snapshots are produced to the table. >>>>> There is a slight risk with 3 that it may break pipelines where the >>>>> current behaviour is well understood and expected but that's probably a >>>>> very small minority of the users. >>>>> >>>>> Cheers >>>>> Gyula >>>>> >>>>> On Wed, Jun 25, 2025 at 9:27 PM Steven Wu <stevenz...@gmail.com> >>>>> wrote: >>>>> >>>>>> the Flink streaming read only consumes `append` only commits. This is >>>>>> a snapshot commit `DataOperation` type. You were talking about row-level >>>>>> appends, delete etc. >>>>>> >>>>>> > 2. Add an option to read appended data of overwrite snapshots to >>>>>> allow >>>>>> users to de-duplicate downstream (opt-in config) >>>>>> >>>>>> For updates it is true. But what about actual row deletion? >>>>>> >>>>>> > 3. Throw an error if overwrite snapshots are discovered in an >>>>>> append-only scan and the option in (2) is not activated >>>>>> >>>>>> It would be interesting to hear what people think about this. In some >>>>>> scenarios, this is probably better. >>>>>> >>>>>> In the scenario of primarily streaming append (and occasionally batch >>>>>> overwrite), is this always a preferred behavior? >>>>>> >>>>>> On Wed, Jun 25, 2025 at 8:24 AM Maximilian Michels <m...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> It is well known that Flink and other Iceberg engines do not support >>>>>>> "merge-on-read" in streaming/incremental read mode. There are plans >>>>>>> to >>>>>>> change that, see the "Improving Merge-On-Read Query Performance" >>>>>>> thread, but this is not what this message is about. >>>>>>> >>>>>>> I used to think that when we incrementally read from an Iceberg table >>>>>>> with Flink, we would simply not apply delete files, but still see >>>>>>> appends. That is not the case. Instead, we run an append-only table >>>>>>> scan [1], which will silently skip over any "overwrite" [2] >>>>>>> snapshots. >>>>>>> That means that the consumer will skip appends and deletes if the >>>>>>> producer runs in "merge-on-read" mode. >>>>>>> >>>>>>> Now, it is debatable whether it is a good idea to skip applying >>>>>>> deletes, but I'm not sure ignoring "overwrite" snapshots entirely is >>>>>>> a >>>>>>> great idea either. Not every upsert effectively results in a delete + >>>>>>> append. Many upserts are merely appends. Reading appends at least >>>>>>> gives consumers the chance to de-duplicate later on, but skipping the >>>>>>> entire snapshot definitely means data loss downstream. >>>>>>> >>>>>>> I think we have three options: >>>>>>> >>>>>>> 1. Implement "merge-on-read" efficiently for incremental / streaming >>>>>>> reads (currently a work in progress) >>>>>>> 2. Add an option to read appended data of overwrite snapshots to >>>>>>> allow >>>>>>> users to de-duplicate downstream (opt-in config) >>>>>>> 3. Throw an error if overwrite snapshots are discovered in an >>>>>>> append-only scan and the option in (2) is not activated >>>>>>> >>>>>>> (2) and (3) are stop gaps until we have (1). >>>>>>> >>>>>>> What do you think? >>>>>>> >>>>>>> Cheers, >>>>>>> Max >>>>>>> >>>>>>> [1] >>>>>>> https://github.com/apache/iceberg/blob/5b50afe8f2b4cf16af6c015625385021b44aca14/flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/source/FlinkSplitPlanner.java#L89 >>>>>>> [2] >>>>>>> https://github.com/apache/iceberg/blob/5b50afe8f2b4cf16af6c015625385021b44aca14/api/src/main/java/org/apache/iceberg/DataOperations.java#L32 >>>>>>> >>>>>>