Hi Lu, Nice to hear from you here in the Iceberg community :)
We have built an internal service to stream upserts into position deletes which happens to have a lot in common with [1] and [2]. I believe this is a viable approach to achieve second freshness. [1] https://docs.google.com/document/d/1Jz4Fjt-6jRmwqbgHX_u0ohuyTB9ytDzfslS7lYraIjk [2] https://www.mooncake.dev/whitepaper Best, Gang On Wed, Jan 21, 2026 at 11:05 AM Lu Niu <[email protected]> wrote: > Hi Iceberg community, > > What are the current best practices for streaming upserts into an Iceberg > table? > > Today, we have the following setup in production to support CDC: > > 1. A Flink job that continuously appends CDC events into an append-only > raw table > 2, A periodically scheduled Spark job that performs upsert the `current` > table using `raw` table > > We are exploring whether it’s feasible to stream upserts directly into an > Iceberg table from Flink. This could simplify our architecture and > potentially further reduce our data SLA. We’ve experimented with this > approach before, but ran into reader-side performance issues due to the > accumulation of equality deletes over time. > > From what I can gather, streaming upserts still seems to be an open design > area: > > 1. (Please correct me if I’m wrong—this summary is partly based on ChatGPT > 5.1.) The book “Apache Iceberg: The Definitive Guide” suggests the > two-table pattern we’re currently using in production. > 2. These threads: > https://lists.apache.org/thread/gjjr30txq318qp6pff3x5fx1jmdnr6fv , > https://lists.apache.org/thread/xdkzllzt4p3tvcd3ft4t7jsvyvztr41j discuss > the idea of outputting only positional deletes (no equality deletes) by > introducing an index. However, this appears to still be under discussion > and may be targeted for v4, with no concrete timeline yet. > 3. this thread > https://lists.apache.org/thread/6fhpjszsfxd8p0vfzc3k5vw7zmcyv2mq talks > about deprecating equality deletes, but I haven’t seen a clearly defined > alternative come out of that discussion yet. > > Given all of the above, I’d really appreciate guidance from the community > on: > > 1. Recommended patterns for streaming upserts with Flink into Iceberg > today (it's good to know the long term possible as well, but my focus is > what's possible in near term). > 2. Practical experiences or lessons learned from teams running streaming > upserts in production > > Thanks in advance for any insights and corrections. > > Best > Lu >
