Re: Discuss proposal - IRC APIs for Multi-Statement Multi-Table Transactions

Dov Alperin Sun, 09 Nov 2025 13:53:43 -0800

That generally aligns with my sensibilities as well (avoiding overriding
existing fields' meaning). The fact that adding a CSN requires changes to
the spec is notable. What's the process that would be required to get that
landed in v4?


On Sun, Nov 9, 2025 at 2:40 PM Ryan Blue <[email protected]> wrote:

> I am fairly strongly opposed to repurposing the timestamp field for this.
> To move forward, I'd recommend working on catalog sequence numbers.
>
> On Sat, Nov 8, 2025 at 6:54 PM Dov Alperin
> <[email protected]> wrote:
>
>> Hi Iceberg community!
>> (I initially opened this message as it's own thread in error, sorry about
>> that)
>> I’m curious where this proposal landed? I work at Materialize
>> <http://materialize.com/> and we are keenly interested both in seeing
>> this
>> proposal come to fruition but possibly also helping to implement it.
>>
>> I see there was a call in May, but I’m not sure what the conclusion was.
>> As
>> spec v4 nears closer, I am curious which of the two proposals the
>> community
>> favors here?
>>
>> Best,
>> Dov
>>
>> On Tue, May 27, 2025 at 01:09:05AM -0700, Maninderjit Singh wrote:
>> > Forgot to attach a link to the update proposal
>> > <
>> https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&tab=t.0#heading=h.ypbwvr181qn4
>> >
>> > .
>> >
>> > On Tue, May 27, 2025 at 1:06 AM Maninderjit Singh <
>> > [email protected]> wrote:
>> >
>> > > Hi community,
>> > >
>> > >  I have updated the proposal with both the options (overwriting
>> existing
>> > > timestamps-ms vs introducing a new sequence/timestamp field) as we
>> have
>> > > initial consensus on using catalog authored sequence/timestamp.
>> Jagdeep,
>> > > please review to ensure that the options are correctly captured. I
>> have
>> > > also added additional arguments on why we can't assume timestamp to be
>> > > "informational" since it's being used in critical paths and
>> > > incorrect values can take the table offline.
>> > >
>> > > Also, I'm moving the meeting to Thursday to better accommodate
>> conflicts.
>> > > I would also record the meeting in case anyone misses and is
>> interested in
>> > > the discussion.
>> > >
>> > > Sync for iceberg multi-table transactions
>> > > Thursday, May 29 · 9:00 – 10:00am
>> > > Time zone: America/Los_Angeles
>> > > Google Meet joining info
>> > > Video call link: https://meet.google.com/ffc-ttjs-vti
>> > >
>> > > Thanks,
>> > > Maninder
>> > >
>> > >
>> > >
>> > > On Mon, May 26, 2025 at 12:47 AM Péter Váry <
>> [email protected]>
>> > > wrote:
>> > >
>> > >> I'm interested, but can't be there, but please record the meeting.
>> > >> Thanks,
>> > >> Peter
>> > >>
>> > >> Maninderjit Singh <[email protected]> ezt írta (időpont:
>> > >> 2025. máj. 24., Szo, 2:30):
>> > >>
>> > >>> Hi dev community,
>> > >>> I was wondering if we could join a call next week for discussing the
>> > >>> multi-table transactions so we can make progress. I have shared a
>> meeting
>> > >>> invite where anyone who's interested in the discussion can join.
>> Please let
>> > >>> me know if this works.
>> > >>>
>> > >>> Thanks,
>> > >>> Maninder
>> > >>>
>> > >>> Sync for iceberg multi-table transactions
>> > >>> Friday, May 30 · 9:00 – 10:00am
>> > >>> Time zone: America/Los_Angeles
>> > >>> Google Meet joining info
>> > >>> Video call link: https://meet.google.com/ffc-ttjs-vti
>> > >>>
>> > >>>
>> > >>> On Wed, May 21, 2025 at 10:25 AM Maninderjit Singh <
>> > >>> [email protected]> wrote:
>> > >>>
>> > >>>> Hi dev community,
>> > >>>> Following up on the thread here to continue the discussion and get
>> > >>>> feedback since we couldn't get to it in sync. I think we have made
>> some
>> > >>>> progress in the discussion that I want to capture while
>> highlighting the
>> > >>>> items where we need to create consensus along with pros and cons.
>> I would
>> > >>>> need help to add clarity and to make sure the arguments are
>> captured
>> > >>>> correctly.
>> > >>>>
>> > >>>> *Things we agree on*
>> > >>>>
>> > >>>>    1. Don't maintain server side state for tracking the
>> transactions.
>> > >>>>    2. Need global (catalog-wide) ordering of snapshots via some
>> > >>>>    (hybrid/logical) clock/CSN
>> > >>>>    3. Optionally expose the catalog's clock/CSN information without
>> > >>>>    changing how tables load
>> > >>>>    4. Loading consistent snapshot across multiple tables and
>> > >>>>    repeatable reads based on the reference clock/CSN
>> > >>>>
>> > >>>>
>> > >>>> *Things we disagree on*
>> > >>>>
>> > >>>>    1. Reuse existing timestamp field vs introduce a new field CSN
>> > >>>>
>> > >>>>
>> > >>>> *Reusing timestamp field approach*
>> > >>>>
>> > >>>>    - Pros:
>> > >>>>
>> > >>>>
>> > >>>>    1. Backwards compatibility, no change to table metadata spec so
>> > >>>>    could be used by existing v2 tables.
>> > >>>>    2. Fixes existing time travel and ordering issues
>> > >>>>    3. Simplifies and clarifies the spec (no new id for snapshots)
>> > >>>>    4. Common notion of timestamp that could be used to evaluate
>> causal
>> > >>>>    relationships in other proposals like events or commit reports.
>> > >>>>
>> > >>>>
>> > >>>>    - Cons
>> > >>>>
>> > >>>>
>> > >>>>    1. Unique timestamp generation in milliseconds. Potential
>> > >>>>    mitigations:
>> > >>>>
>> https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&disco=AAABjwaxXeg
>> > >>>>    2. Concerns about client side timestamp being overridden.
>> > >>>>
>> > >>>> *Adding new CSN field*
>> > >>>>
>> > >>>>    - Pros:
>> > >>>>
>> > >>>>
>> > >>>>    1. Flexibility to use logical or hybrid clocks. Not sure how
>> > >>>>    clients can generate a hybrid clock timestamp here without
>> suffering from
>> > >>>>    clock skew (Would be good to clarify this)?
>> > >>>>    2. No client side overriding concerns.
>> > >>>>
>> > >>>>
>> > >>>>    - Cons:
>> > >>>>
>> > >>>>
>> > >>>>    1. Not backwards compatible, requires new field in table
>> metadata
>> > >>>>    so need to wait for v4
>> > >>>>    2. Does not fix time travel and snapshot-log ordering issues
>> > >>>>    3. Adds another id for snapshots that clients need to generate
>> and
>> > >>>>    reason about.
>> > >>>>    4. Could not be extended to use in other proposals for causal
>> > >>>>    reasoning.
>> > >>>>
>> > >>>>
>> > >>>> Thanks,
>> > >>>> Maninder
>> > >>>>
>> > >>>> On Tue, May 20, 2025 at 8:16 PM Maninderjit Singh <
>> > >>>> [email protected]> wrote:
>> > >>>>
>> > >>>>> Appreciate the feedback on the "catalog-authored timestamp"
>> document
>> > >>>>> <
>> https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&tab=t.0
>> >
>> > >>>>> !
>> > >>>>>
>> > >>>>> Ryan, I don't think we can get consistent time travel queries in
>> > >>>>> iceberg without fixing the timestamp field since it's what the
>> spec
>> > >>>>> <https://iceberg.apache.org/spec/#point-in-time-reads-time-travel
>> >
>> > >>>>> prescribes for time travel. Hence I took the liberty to re-use it
>> for the
>> > >>>>> catalog timestamp which ensures that snapshot-log is correctly
>> ordered for
>> > >>>>> time travel.  Additionally, the timestamp field needs to be fixed
>> to avoid
>> > >>>>> breaking commits to the table due to accidental large skews as
>> per current
>> > >>>>> spec, the scenario is described in detail here
>> > >>>>> <
>> https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&tab=t.0#bookmark=id.6avx66vzo168
>> >
>> > >>>>> .
>> > >>>>> The other benefit of reusing the timestamp field is spec
>> simplicity
>> > >>>>> and clarity on timestamp generation responsibilities without
>> requiring the
>> > >>>>> need to manage yet another identifier (in addition to sequence
>> number,
>> > >>>>> snapshot id and timestamp) for snapshots.
>> > >>>>>
>> > >>>>> Jagdeep, your concerns about overriding the timestamp field are
>> valid
>> > >>>>> but the reason I'm not too worried about it is because client
>> can't assume
>> > >>>>> a commit is successful without their response being acknowledged
>> by the
>> > >>>>> catalog which returns the CommitTableResponse
>> > >>>>> <
>> https://github.com/apache/iceberg/blob/c2478968e65368c61799d8ca4b89506a61ca3e7c/open-api/rest-catalog-open-api.yaml#L3997>
>> with
>> > >>>>> new metadata (that has catalog authored timestamps in the
>> proposal). I'm
>> > >>>>> happy to work with you to put something common together and get
>> the best
>> > >>>>> out of the proposals.
>> > >>>>>
>> > >>>>> Thanks,
>> > >>>>> Maninder
>> > >>>>>
>> > >>>>>
>> > >>>>>
>> > >>>>>
>> > >>>>> On Tue, May 20, 2025 at 5:48 PM Jagdeep Sidhu <
>> [email protected]>
>> > >>>>> wrote:
>> > >>>>>
>> > >>>>>> Thank you Ryan, Maninder and the rest of the community for
>> feedback
>> > >>>>>> and ideas!
>> > >>>>>> Drew and I will take another pass and remove the catalog
>> > >>>>>> co-ordination requirement for LoadTable API, and bring the
>> proposal closer
>> > >>>>>> to "catalog-authored timestamp" in the sense that clients can
>> use CSN to
>> > >>>>>> find the right snapshot, but still leave upto Catalog on what it
>> want to
>> > >>>>>> use for CSN (Hybrid clock timestamp or another monotonically
>> increasing
>> > >>>>>> number).
>> > >>>>>>
>> > >>>>>> If more folks have feedback, please leave it in the doc or email
>> > >>>>>> list, so we can address it as well in the document update.
>> > >>>>>>
>> > >>>>>> Maninder, one reason we proposed a new field for
>> CommitSequenceNumber
>> > >>>>>> instead of using an existing field is for backwards
>> compatibility. Catalogs
>> > >>>>>> can start optionally exposing the new field, and interested
>> clients can use
>> > >>>>>> the new field, but existing clients keep working as is. Existing
>> and new
>> > >>>>>> clients can also keep working as is against the same tables in
>> the
>> > >>>>>> same Catalog. My one worry is that having Catalog override the
>> timestamp
>> > >>>>>> field for commits may break some existing clients? Today all
>> Iceberg
>> > >>>>>> engines/clients do not expect the timestamp field in
>> metadata/snapshot-log
>> > >>>>>> to be overwritten by the Catalog.
>> > >>>>>>
>> > >>>>>> How do you feel about taking the best from each proposal?, i.e.
>> > >>>>>> monotonically increasing commit sequence numbers (some catalogs
>> can use
>> > >>>>>> timestamps, some can use logical clock but we don't have to
>> enforce it -
>> > >>>>>> leave it up to Catalog), but keep client side logic for
>> resolving the right
>> > >>>>>> snapshot using sequence numbers instead of adding that
>> functionality to
>> > >>>>>> Catalog. Let me know!
>> > >>>>>>
>> > >>>>>> Thank you!
>> > >>>>>> -Jagdeep
>> > >>>>>>
>> > >>>>>> On Tue, May 20, 2025 at 2:45 PM Ryan Blue <[email protected]>
>> wrote:
>> > >>>>>>
>> > >>>>>>> Thanks for the proposals! There are things that I think are good
>> > >>>>>>> about both of them. I think that the catalog-authored
>> timestamps proposal
>> > >>>>>>> misunderstands the purpose of the timestamp field, but does get
>> right that
>> > >>>>>>> a monotonically increasing "time" field (really a sequence
>> number) across
>> > >>>>>>> tables enables the coordination needed for snapshot isolated
>> reads. I like
>> > >>>>>>> that the sequence number proposal leaves the meaning of the
>> field to the
>> > >>>>>>> catalog for coordination, but it still proposes catalog
>> coordination by
>> > >>>>>>> loading tables "at" some sequence number. Ideally, we would be
>> able to
>> > >>>>>>> (optionally) expose this extra catalog information to clients
>> and not need
>> > >>>>>>> to change how loading works.
>> > >>>>>>>
>> > >>>>>>> Ryan
>> > >>>>>>>
>> > >>>>>>> On Tue, May 20, 2025 at 9:45 AM Ryan Blue <[email protected]>
>> wrote:
>> > >>>>>>>
>> > >>>>>>>> Hi everyone,
>> > >>>>>>>>
>> > >>>>>>>> To avoid passing copies of a file around for comments, I put
>> the
>> > >>>>>>>> doc for commit sequence numbers into Google so we can comment
>> on a central
>> > >>>>>>>> copy:
>> > >>>>>>>>
>> https://docs.google.com/document/d/1jr4Ah8oceOmo6fwxG_0II4vKDUHUKScb/edit?usp=sharing&ouid=100239850723655533404&rtpof=true&sd=true
>> > >>>>>>>>
>> > >>>>>>>> Ryan
>> > >>>>>>>>
>> > >>>>>>>> On Fri, May 16, 2025 at 2:51 AM Maninderjit Singh <
>> > >>>>>>>> [email protected]> wrote:
>> > >>>>>>>>
>> > >>>>>>>>> Thanks for the updated proposal Drew!
>> > >>>>>>>>> My preference for using the catalog authored timestamp is to
>> > >>>>>>>>> minimize changes to the REST spec so we can have good
>> backwards
>> > >>>>>>>>> compatibility. I have quickly put together a draft proposal
>> on how this
>> > >>>>>>>>> should work. Looking forward to feedback and discussion.
>> > >>>>>>>>>
>> > >>>>>>>>>  Draft Proposal: Catalog‑Authored Timestamps for
>> Apache Iceberg
>> > >>>>>>>>> REST Catalog
>> > >>>>>>>>> <
>> https://drive.google.com/open?id=1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE
>> >
>> > >>>>>>>>>
>> > >>>>>>>>> Thanks,
>> > >>>>>>>>> Maninder
>> > >>>>>>>>>
>> > >>>>>>>>> On Wed, May 14, 2025 at 6:12 PM Drew <[email protected]>
>> wrote:
>> > >>>>>>>>>
>> > >>>>>>>>>> Hi everyone,
>> > >>>>>>>>>>
>> > >>>>>>>>>> Thank you for feedback on the MTT proposal and during
>> community
>> > >>>>>>>>>> sync. Based on it, Jagdeep and I have iterated on the
>> document and added a
>> > >>>>>>>>>> second option to use *Catalog CommitSequenceNumbers*. Looking
>> > >>>>>>>>>> forward to getting more feedback on the proposal, where to
>> add more details
>> > >>>>>>>>>> or approach/changes to consider. We appreciate everyone's
>> time on this!
>> > >>>>>>>>>>
>> > >>>>>>>>>> The option introduces *Catalog CommitSequenceNumbers(CSNs)*,
>> > >>>>>>>>>> which allow clients/engines to read a consistent view of
>> multiple tables
>> > >>>>>>>>>> without needing to register a transaction context with the
>> catalog. This
>> > >>>>>>>>>> removes the need of registering a transaction context with
>> Catalog, thus
>> > >>>>>>>>>> removing the need of transaction bookkeeping on the catalog
>> side. For
>> > >>>>>>>>>> aborting transactions early, clients can use LoadTable with
>> and without CSN
>> > >>>>>>>>>> to figure out if there is already a conflicting write on any
>> of the tables
>> > >>>>>>>>>> being modified. Also removed the section where transactions
>> were staging
>> > >>>>>>>>>> commits on Catalog, and changed the proposal to align with
>> Eduard's PR
>> > >>>>>>>>>> around staging changes locally before commit (
>> > >>>>>>>>>> https://github.com/apache/iceberg/pull/6948).
>> > >>>>>>>>>>
>> > >>>>>>>>>> Jagdeep also clarified in an example in a previous email
>> where a
>> > >>>>>>>>>> workload may require multi table snapshot isolation, even if
>> the tables are
>> > >>>>>>>>>> being updated without Multi-Table commit API. Though most
>> MTT transactions
>> > >>>>>>>>>> will commit using the multi table commit API.
>> > >>>>>>>>>>
>> > >>>>>>>>>> Maninder, for the approach of "common notion of time between
>> > >>>>>>>>>> clients and catalog" - I spent some time thinking about it,
>> but cannot find
>> > >>>>>>>>>> a feasible way to do this. Yes, the catalogs can use a high
>> precision
>> > >>>>>>>>>> clock, but clients cannot use Catalog Timestamp from API
>> calls to set local
>> > >>>>>>>>>> clock due to network latency for request/response. For
>> example, different
>> > >>>>>>>>>> requests to the same Catalog servers can return different
>> timestamps based
>> > >>>>>>>>>> on network latency. Also what if a client works with more
>> than 1 Catalog.
>> > >>>>>>>>>> If you want to do a rough write-up or share a reference
>> implementation that
>> > >>>>>>>>>> uses such an approach, I will be happy to brainstorm it
>> more. Let us know!
>> > >>>>>>>>>>
>> > >>>>>>>>>> Here is the link to updated proposal
>> > >>>>>>>>>>
>> > >>>>>>>>>>
>> > >>>>>>>>>> <
>> https://docs.google.com/document/d/1jr4Ah8oceOmo6fwxG_0II4vKDUHUKScb/edit?usp=sharing&ouid=100384647237395649950&rtpof=true&sd=true
>> >
>> > >>>>>>>>>> Thanks Again!
>> > >>>>>>>>>> - Drew
>> > >>>>>>>>>>
>> > >>>>>>>>>
>>
>

Re: Discuss proposal - IRC APIs for Multi-Statement Multi-Table Transactions

Reply via email to