Re: Query regarding UPSERT Mode in Flink

2024-04-24 Thread Aditya Narayan Gupta
Hi Péter, Thanks a lot for explaining and sharing the reference thread, I will follow it. Regards, Aditya On Mon, Apr 22, 2024 at 8:39 PM Péter Váry wrote: > Very high change rate means that it is most probably worth it to rewrite > the data files from time to time. > The high change rate

Re: Query regarding UPSERT Mode in Flink

2024-04-22 Thread Péter Váry
Very high change rate means that it is most probably worth it to rewrite the data files from time to time. The high change rate itself causes write amplification, as every new version of the record is essentially a new row to the table (and a tombstone for the old data). The

Re: Query regarding UPSERT Mode in Flink

2024-04-22 Thread Aditya Narayan Gupta
Hi Péter, Thanks for the detailed answers, we have some CDC streams that have very high change rate, for such tables we were thinking to leverage ConvertEqualityDeleteFiles

Re: Query regarding UPSERT Mode in Flink

2024-04-22 Thread Péter Váry
Hi Aditya, See my answers below: Aditya Narayan Gupta ezt írta (időpont: 2024. ápr. 20., Szo, 11:05): > Hi Péter, Gabor, > > Thanks a lot for clarifying and providing additional information, I had > few followup queries- > 1. We want to ingest an CDC stream using Flink to Iceberg sink, if we

Re: Query regarding UPSERT Mode in Flink

2024-04-20 Thread Aditya Narayan Gupta
Hi Péter, Gabor, Thanks a lot for clarifying and providing additional information, I had few followup queries- 1. We want to ingest an CDC stream using Flink to Iceberg sink, if we have a RETRACT like CDC stream that uniquely distinguishes between INSERT / UPDATE (for eg contains events as

Re: Query regarding UPSERT Mode in Flink

2024-04-18 Thread Péter Váry
Hi Aditya, The definition of UPSERT is that we have 2 types of messages: - DELETE - we need to remove the old record with the given id. - UPSERT - we need to remove the old version of the record based on the id, and should add a new version See:

Re: Query regarding UPSERT Mode in Flink

2024-04-18 Thread Gabor Kaszab
Hey, I had the chance to explore this area of eq-deletes recently myself too. Apparently, this behavior is by design in Flink. The reason why it unconditionally writes an eq-delete too for each insert (only in upsert-mode, though) is to guarantee the uniqueness of the primary key. So it drops the

Query regarding UPSERT Mode in Flink

2024-04-17 Thread Aditya Gupta
Hi all, In Flink SQL, in UPSERT mode, I have observed that if I INSERT a new record with a new equality field Id, then a equality delete file is also created with the corresponding entry, for example I executed following commands in Flink SQL with Apache Iceberg- CREATE TABLE