RE: Re: Discuss proposal - IRC APIs for Multi-Statement Multi-Table Transactions

Dov Alperin Fri, 07 Nov 2025 13:01:07 -0800

Hi Iceberg community!
I’m curious where this proposal landed? I work at Materialize
<http://materialize.com/> and we are keenly interested both in seeing this
proposal come to fruition but possibly also helping to implement it.


I see there was a call in May, but I’m not sure what the conclusion was. As
spec v4 nears closer, I am curious which of the two proposals the community
favors here?

Best,
Dov

On 2025/05/27 08:09:05 Maninderjit Singh wrote:

Forgot to attach a link to the update proposal

<
https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&tab=t.0#heading=h.ypbwvr181qn4
>

.


On Tue, May 27, 2025 at 1:06 AM Maninderjit Singh <

[email protected]> wrote:


> Hi community,

>

>  I have updated the proposal with both the options (overwriting existing

> timestamps-ms vs introducing a new sequence/timestamp field) as we have

> initial consensus on using catalog authored sequence/timestamp. Jagdeep,

> please review to ensure that the options are correctly captured. I have

> also added additional arguments on why we can't assume timestamp to be

> "informational" since it's being used in critical paths and

> incorrect values can take the table offline.

>

> Also, I'm moving the meeting to Thursday to better accommodate conflicts.

> I would also record the meeting in case anyone misses and is interested in

> the discussion.

>

> Sync for iceberg multi-table transactions

> Thursday, May 29 · 9:00 – 10:00am

> Time zone: America/Los_Angeles

> Google Meet joining info

> Video call link: https://meet.google.com/ffc-ttjs-vti

>

> Thanks,

> Maninder

>

>

>

> On Mon, May 26, 2025 at 12:47 AM Péter Váry <[email protected]>

> wrote:

>

>> I'm interested, but can't be there, but please record the meeting.

>> Thanks,

>> Peter

>>

>> Maninderjit Singh <[email protected]> ezt írta (időpont:

>> 2025. máj. 24., Szo, 2:30):

>>

>>> Hi dev community,

>>> I was wondering if we could join a call next week for discussing the

>>> multi-table transactions so we can make progress. I have shared a
meeting

>>> invite where anyone who's interested in the discussion can join. Please
let

>>> me know if this works.

>>>

>>> Thanks,

>>> Maninder

>>>

>>> Sync for iceberg multi-table transactions

>>> Friday, May 30 · 9:00 – 10:00am

>>> Time zone: America/Los_Angeles

>>> Google Meet joining info

>>> Video call link: https://meet.google.com/ffc-ttjs-vti

>>>

>>>

>>> On Wed, May 21, 2025 at 10:25 AM Maninderjit Singh <

>>> [email protected]> wrote:

>>>

>>>> Hi dev community,

>>>> Following up on the thread here to continue the discussion and get

>>>> feedback since we couldn't get to it in sync. I think we have made some

>>>> progress in the discussion that I want to capture while highlighting
the

>>>> items where we need to create consensus along with pros and cons. I
would

>>>> need help to add clarity and to make sure the arguments are captured

>>>> correctly.

>>>>

>>>> *Things we agree on*

>>>>

>>>>    1. Don't maintain server side state for tracking the transactions.

>>>>    2. Need global (catalog-wide) ordering of snapshots via some

>>>>    (hybrid/logical) clock/CSN

>>>>    3. Optionally expose the catalog's clock/CSN information without

>>>>    changing how tables load

>>>>    4. Loading consistent snapshot across multiple tables and

>>>>    repeatable reads based on the reference clock/CSN

>>>>

>>>>

>>>> *Things we disagree on*

>>>>

>>>>    1. Reuse existing timestamp field vs introduce a new field CSN

>>>>

>>>>

>>>> *Reusing timestamp field approach*

>>>>

>>>>    - Pros:

>>>>

>>>>

>>>>    1. Backwards compatibility, no change to table metadata spec so

>>>>    could be used by existing v2 tables.

>>>>    2. Fixes existing time travel and ordering issues

>>>>    3. Simplifies and clarifies the spec (no new id for snapshots)

>>>>    4. Common notion of timestamp that could be used to evaluate causal

>>>>    relationships in other proposals like events or commit reports.

>>>>

>>>>

>>>>    - Cons

>>>>

>>>>

>>>>    1. Unique timestamp generation in milliseconds. Potential

>>>>    mitigations:

>>>>
https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&disco=AAABjwaxXeg

>>>>    2. Concerns about client side timestamp being overridden.

>>>>

>>>> *Adding new CSN field*

>>>>

>>>>    - Pros:

>>>>

>>>>

>>>>    1. Flexibility to use logical or hybrid clocks. Not sure how

>>>>    clients can generate a hybrid clock timestamp here without
suffering from

>>>>    clock skew (Would be good to clarify this)?

>>>>    2. No client side overriding concerns.

>>>>

>>>>

>>>>    - Cons:

>>>>

>>>>

>>>>    1. Not backwards compatible, requires new field in table metadata

>>>>    so need to wait for v4

>>>>    2. Does not fix time travel and snapshot-log ordering issues

>>>>    3. Adds another id for snapshots that clients need to generate and

>>>>    reason about.

>>>>    4. Could not be extended to use in other proposals for causal

>>>>    reasoning.

>>>>

>>>>

>>>> Thanks,

>>>> Maninder

>>>>

>>>> On Tue, May 20, 2025 at 8:16 PM Maninderjit Singh <

>>>> [email protected]> wrote:

>>>>

>>>>> Appreciate the feedback on the "catalog-authored timestamp" document

>>>>> <
https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&tab=t.0
>

>>>>> !

>>>>>

>>>>> Ryan, I don't think we can get consistent time travel queries in

>>>>> iceberg without fixing the timestamp field since it's what the spec

>>>>> <https://iceberg.apache.org/spec/#point-in-time-reads-time-travel>

>>>>> prescribes for time travel. Hence I took the liberty to re-use it for
the

>>>>> catalog timestamp which ensures that snapshot-log is correctly
ordered for

>>>>> time travel.  Additionally, the timestamp field needs to be fixed to
avoid

>>>>> breaking commits to the table due to accidental large skews as per
current

>>>>> spec, the scenario is described in detail here

>>>>> <
https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&tab=t.0#bookmark=id.6avx66vzo168
>

>>>>> .

>>>>> The other benefit of reusing the timestamp field is spec simplicity

>>>>> and clarity on timestamp generation responsibilities without
requiring the

>>>>> need to manage yet another identifier (in addition to sequence number,

>>>>> snapshot id and timestamp) for snapshots.

>>>>>

>>>>> Jagdeep, your concerns about overriding the timestamp field are valid

>>>>> but the reason I'm not too worried about it is because client can't
assume

>>>>> a commit is successful without their response being acknowledged by
the

>>>>> catalog which returns the CommitTableResponse

>>>>> <
https://github.com/apache/iceberg/blob/c2478968e65368c61799d8ca4b89506a61ca3e7c/open-api/rest-catalog-open-api.yaml#L3997>
with

>>>>> new metadata (that has catalog authored timestamps in the proposal).
I'm

>>>>> happy to work with you to put something common together and get the
best

>>>>> out of the proposals.

>>>>>

>>>>> Thanks,

>>>>> Maninder

>>>>>

>>>>>

>>>>>

>>>>>

>>>>> On Tue, May 20, 2025 at 5:48 PM Jagdeep Sidhu <[email protected]>

>>>>> wrote:

>>>>>

>>>>>> Thank you Ryan, Maninder and the rest of the community for feedback

>>>>>> and ideas!

>>>>>> Drew and I will take another pass and remove the catalog

>>>>>> co-ordination requirement for LoadTable API, and bring the proposal
closer

>>>>>> to "catalog-authored timestamp" in the sense that clients can use
CSN to

>>>>>> find the right snapshot, but still leave upto Catalog on what it
want to

>>>>>> use for CSN (Hybrid clock timestamp or another monotonically
increasing

>>>>>> number).

>>>>>>

>>>>>> If more folks have feedback, please leave it in the doc or email

>>>>>> list, so we can address it as well in the document update.

>>>>>>

>>>>>> Maninder, one reason we proposed a new field for CommitSequenceNumber

>>>>>> instead of using an existing field is for backwards compatibility.
Catalogs

>>>>>> can start optionally exposing the new field, and interested clients
can use

>>>>>> the new field, but existing clients keep working as is. Existing and
new

>>>>>> clients can also keep working as is against the same tables in the

>>>>>> same Catalog. My one worry is that having Catalog override the
timestamp

>>>>>> field for commits may break some existing clients? Today all Iceberg

>>>>>> engines/clients do not expect the timestamp field in
metadata/snapshot-log

>>>>>> to be overwritten by the Catalog.

>>>>>>

>>>>>> How do you feel about taking the best from each proposal?, i.e.

>>>>>> monotonically increasing commit sequence numbers (some catalogs can
use

>>>>>> timestamps, some can use logical clock but we don't have to enforce
it -

>>>>>> leave it up to Catalog), but keep client side logic for resolving
the right

>>>>>> snapshot using sequence numbers instead of adding that functionality
to

>>>>>> Catalog. Let me know!

>>>>>>

>>>>>> Thank you!

>>>>>> -Jagdeep

>>>>>>

>>>>>> On Tue, May 20, 2025 at 2:45 PM Ryan Blue <[email protected]> wrote:

>>>>>>

>>>>>>> Thanks for the proposals! There are things that I think are good

>>>>>>> about both of them. I think that the catalog-authored timestamps
proposal

>>>>>>> misunderstands the purpose of the timestamp field, but does get
right that

>>>>>>> a monotonically increasing "time" field (really a sequence number)
across

>>>>>>> tables enables the coordination needed for snapshot isolated reads.
I like

>>>>>>> that the sequence number proposal leaves the meaning of the field
to the

>>>>>>> catalog for coordination, but it still proposes catalog
coordination by

>>>>>>> loading tables "at" some sequence number. Ideally, we would be able
to

>>>>>>> (optionally) expose this extra catalog information to clients and
not need

>>>>>>> to change how loading works.

>>>>>>>

>>>>>>> Ryan

>>>>>>>

>>>>>>> On Tue, May 20, 2025 at 9:45 AM Ryan Blue <[email protected]> wrote:

>>>>>>>

>>>>>>>> Hi everyone,

>>>>>>>>

>>>>>>>> To avoid passing copies of a file around for comments, I put the

>>>>>>>> doc for commit sequence numbers into Google so we can comment on a
central

>>>>>>>> copy:

>>>>>>>>
https://docs.google.com/document/d/1jr4Ah8oceOmo6fwxG_0II4vKDUHUKScb/edit?usp=sharing&ouid=100239850723655533404&rtpof=true&sd=true

>>>>>>>>

>>>>>>>> Ryan

>>>>>>>>

>>>>>>>> On Fri, May 16, 2025 at 2:51 AM Maninderjit Singh <

>>>>>>>> [email protected]> wrote:

>>>>>>>>

>>>>>>>>> Thanks for the updated proposal Drew!

>>>>>>>>> My preference for using the catalog authored timestamp is to

>>>>>>>>> minimize changes to the REST spec so we can have good backwards

>>>>>>>>> compatibility. I have quickly put together a draft proposal on
how this

>>>>>>>>> should work. Looking forward to feedback and discussion.

>>>>>>>>>

>>>>>>>>>  Draft Proposal: Catalog‑Authored Timestamps for Apache Iceberg

>>>>>>>>> REST Catalog

>>>>>>>>> <
https://drive.google.com/open?id=1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE
>

>>>>>>>>>

>>>>>>>>> Thanks,

>>>>>>>>> Maninder

>>>>>>>>>

>>>>>>>>> On Wed, May 14, 2025 at 6:12 PM Drew <[email protected]> wrote:

>>>>>>>>>

>>>>>>>>>> Hi everyone,

>>>>>>>>>>

>>>>>>>>>> Thank you for feedback on the MTT proposal and during community

>>>>>>>>>> sync. Based on it, Jagdeep and I have iterated on the document
and added a

>>>>>>>>>> second option to use *Catalog CommitSequenceNumbers*. Looking

>>>>>>>>>> forward to getting more feedback on the proposal, where to add
more details

>>>>>>>>>> or approach/changes to consider. We appreciate everyone's time
on this!

>>>>>>>>>>

>>>>>>>>>> The option introduces *Catalog CommitSequenceNumbers(CSNs)*,

>>>>>>>>>> which allow clients/engines to read a consistent view of
multiple tables

>>>>>>>>>> without needing to register a transaction context with the
catalog. This

>>>>>>>>>> removes the need of registering a transaction context with
Catalog, thus

>>>>>>>>>> removing the need of transaction bookkeeping on the catalog
side. For

>>>>>>>>>> aborting transactions early, clients can use LoadTable with and
without CSN

>>>>>>>>>> to figure out if there is already a conflicting write on any of
the tables

>>>>>>>>>> being modified. Also removed the section where transactions were
staging

>>>>>>>>>> commits on Catalog, and changed the proposal to align with
Eduard's PR

>>>>>>>>>> around staging changes locally before commit (

>>>>>>>>>> https://github.com/apache/iceberg/pull/6948).

>>>>>>>>>>

>>>>>>>>>> Jagdeep also clarified in an example in a previous email where a

>>>>>>>>>> workload may require multi table snapshot isolation, even if the
tables are

>>>>>>>>>> being updated without Multi-Table commit API. Though most MTT
transactions

>>>>>>>>>> will commit using the multi table commit API.

>>>>>>>>>>

>>>>>>>>>> Maninder, for the approach of "common notion of time between

>>>>>>>>>> clients and catalog" - I spent some time thinking about it, but
cannot find

>>>>>>>>>> a feasible way to do this. Yes, the catalogs can use a high
precision

>>>>>>>>>> clock, but clients cannot use Catalog Timestamp from API calls
to set local

>>>>>>>>>> clock due to network latency for request/response. For example,
different

>>>>>>>>>> requests to the same Catalog servers can return different
timestamps based

>>>>>>>>>> on network latency. Also what if a client works with more than 1
Catalog.

>>>>>>>>>> If you want to do a rough write-up or share a reference
implementation that

>>>>>>>>>> uses such an approach, I will be happy to brainstorm it more.
Let us know!

>>>>>>>>>>

>>>>>>>>>> Here is the link to updated proposal

>>>>>>>>>>

>>>>>>>>>>

>>>>>>>>>> <
https://docs.google.com/document/d/1jr4Ah8oceOmo6fwxG_0II4vKDUHUKScb/edit?usp=sharing&ouid=100384647237395649950&rtpof=true&sd=true
>

>>>>>>>>>> Thanks Again!

>>>>>>>>>> - Drew

>>>>>>>>>>

>>>>>>>>>

RE: Re: Discuss proposal - IRC APIs for Multi-Statement Multi-Table Transactions

Reply via email to