Re: [DISCUSS] Multi-statement multi-transaction proposal for Apache REST Catalog

Dmitri Bourlatchkov Wed, 25 Jun 2025 09:31:30 -0700

Hi Maninder,

Thanks for adding a section on opaque IDs and apologies for delayed reply
from my side. I could not find a place where to fit my text in the doc, so
I'm sending it in this email :)

This option is mostly related to option 2 (CSN) but proposes to use commit
IDs (alternative to CSN) that are opaque to clients - this is the same as
in your opaque ID section in the doc, but I hope that thoughts below
might help to clarity how it is intended to work. The main difference is
delegating the resolution of commit IDs to snapshots to catalog servers.

Catalog Servers are free to use any implementation for commit IDs,
including monotonically increasing numbers (but they are not limited to
CSN).

Catalog Servers produce a commit ID for every change, which will be exposed
to clients as a reasonably short string. Multi-table changes naturally get
the same commit ID.

Commit IDs are part of REST Catalog responses, but do not have to be in the
metadata files. No Iceberg spec changes are required. REST API changes are
needed, but they are optional and transparent to clients, unless the client
wishes extra consistency guarantees.

Clients can request table metadata for any table using a particular commit
ID. This mechanism can be used to ensure consistency in time-travel queries.

An engine can proceed as follows, while executing a multi-table change:
1. Load table A - receive metadata and commit ID C1
2. Load table B by providing C1 as a request parameter to the Catalog server
3. Load table C by providing C1 as a request parameter to the server
4. Process data in tables A, B, C
5. Update table A
6. Update table B
7. Submit metadata updates for A and B to the Catalog, passing C1 as the
“base” commit ID to the server. Additionally submit the name C as a “read
but not changed” table.
8. The Catalog server checks whether the change has any conflicts between
C1 and the current state of the catalog (including validating that C has
not changed)
9. The Catalog commits changes and returns commit ID C2 to the client (this
commit ID represents the committed state of the submitted metadata changes).

If the commit fails due to conflicts, the client receives a “conflict”
error and a commit ID C3, which represents the most up-to-date state of the
catalog (the state that was conflicting with the submitted changes). The
client then re-loads tables based on C3 and retries its workflows.

Load table responses when a commit ID is provided do not have to return all
of the table's metadata. It is sufficient to return only the most relevant
snapshots (usually the latest plus its parent). This is similar to the
partial metadata loading proposal, but not critical for consistency
guarantees. The critical part is that the Catalog communicates to engines
what snapshot is current for a particular commit ID.

Resolving Time Travels Queries: When a client executes a time travel query,
the client provides a timestamp when loading the first table that is
included in the query. The Catalog will resolve the timestamp to a commit
ID and include it in the response. Client using the returned commit ID to
load subsequent tables.

Optionally a new endpoint may be added to the REST Catalog API to handle
the resolution of timestamps to commit IDs.

Caching Metadata on the Client Side: Reloading table metadata for a
particular snapshot could leverage the ETag mechanism to reduce the amount
of network traffic.

Servers do not need to keep any in-progress state for transactions. The
same multi-table commit mechanism servers have for the existing commit
endpoint can be extended to also produce commit IDs. Resolving timestamps
to commit ID is an implementation detail. Some changes in existing servers
will probably be required for that. Conceptually this problem does not
appear to be more complex than providing a monotonic CSN or implementing
the existing multi-table commit endpoint.

Retention of the data related to time-travel is a server-side concern. If a
client wishes to time travel to a point that no longer has commit tracking
information and error is returned.

WDYT?

Thanks,
Dmitri.

On Thu, Jun 19, 2025 at 6:24 PM Maninderjit Singh <
parmar.maninder...@gmail.com> wrote:

> Thanks Dmitri for the review!
>
> We have been deliberate about not including server side implementation for
> brevity and to allow each vendor to choose the best option for them. Having
> said that, I have included a few papers that you can reference.
>
> I have also added a new
> <https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&tab=t.0#bookmark=id.typa1ivjs7pw>
> section under alternative to explore opaque ids further. Could you validate
> and fill in the details? There are a few open questions and
> dependencies that would be required for this proposal:
>
> Why do we even need an opaque id, could we use tableIdentifier + Sequence
> number as an implicit opaque id?
> How are opaque ids compared across tables and with time?
> Not clear on who issues the timestamp for opaque ids and how it
> is achieving consistency beyond repeatable reads?
> Would this require dependency on the partial metadata load proposal
> <https://docs.google.com/document/d/1eXnT0ZiFvdm_Zvk6fLGT_UxVWO-HsiqVywqu1Uk8s7E/edit?tab=t.0#heading=h.t6emwabb4tkr>
> ?
>
> Regards,
> Maninder
>
> On Thu, Jun 19, 2025 at 12:12 PM Dmitri Bourlatchkov <di...@apache.org>
> wrote:
>
>> Thanks for the quick response, Jagdeep!
>>
>> I can certainly add a section to the doc. Could you clarify what you mean
>> by "chatty protocol", though. I did not find that term in the linked email
>> discussion :)
>>
>> Thanks,
>> Dmitri.
>>
>> On Thu, Jun 19, 2025 at 2:28 PM Jagdeep Sidhu <sidhujagde...@gmail.com>
>> wrote:
>>
>>> Hi Dmitri,
>>>
>>> Thank you for reviewing. As you said, we previously explored and dropped
>>> TransactionContext APIs with opaque IDs because it created a very chatty
>>> protocol and also led to complex transaction state management on Server
>>> side, link to old thread below.
>>>
>>> Would you add a section to the existing document on the approach you are
>>> thinking - Opaque IDs without the chatty protocol and complex transaction
>>> state management on Catalog Server? Then we can compare all of them and
>>> discuss the best path forward. Thank you!
>>>
>>> Older thread -
>>> https://lists.apache.org/thread/q7vgnfwdxng5q6mq45m0psghzy7553r7
>>>
>>> -Jagdeep
>>>
>>> On Thu, Jun 19, 2025 at 10:42 AM Dmitri Bourlatchkov <di...@apache.org>
>>> wrote:
>>>
>>>> Thanks for driving this proposal, Maninder!
>>>>
>>>> From my POV the need for Catalogs to provide a monotonic sequence
>>>> number has deep implications on the catalog implementations. I added a
>>>> related comment to the doc as well.
>>>>
>>>> The document does a good job at discussing the client operation. I'd
>>>> appreciate it if the server-side impact were considered in more depth too,
>>>> since the proposal implies changes on both sides.
>>>>
>>>> I know that an opaque "commit ID" was considered before, however, if
>>>> I'm not mistaken previous discussions revolved around the idea of
>>>> a TransactionContext as an entity exposed via new APIs for sharing state
>>>> between clients/engines and the catalog. I'd like to revisit the idea of
>>>> opaque transaction IDs (managed by the catalog) but without the use
>>>> of TransactionContext. I made a brief comment about that in the doc, and
>>>> I'm willing to expand on this. I believe it can be implemented
>>>> without having a durable context object to represent a transaction between
>>>> the client and the catalog.
>>>>
>>>> The main idea for "opaque commit IDs" is to allow more flexibility for
>>>> Catalog implementations, while keeping the same client-side guarantees
>>>> (snapshot isolation, causally consistent multi-table changes, etc.).
>>>>
>>>> Thanks,
>>>> Dmitri.
>>>>
>>>> On Mon, Jun 16, 2025 at 9:32 PM Maninderjit Singh <
>>>> parmar.maninder...@gmail.com> wrote:
>>>>
>>>>> Hi Iceberg dev community,
>>>>>
>>>>> We have been iterating on the Multi Table Transactions proposal and
>>>>> have merged the proposals for using catalog authored timestamps and
>>>>> sequence numbers together as well incorporated feedback from the 
>>>>> community:
>>>>>  Proposal: Multi-table multi-statement transactions for
>>>>> Apache Iceberg REST Catalog
>>>>> <https://drive.google.com/open?id=1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE>
>>>>>
>>>>> We have captured the tradeoffs involved with each approach as well as
>>>>> the reasoning for making those choices. We would love to hear your 
>>>>> opinions
>>>>> on the consolidated proposal and which approach is more suitable for your
>>>>> requirements and why.
>>>>>
>>>>> Thank you in advance!
>>>>>
>>>>

Re: [DISCUSS] Multi-statement multi-transaction proposal for Apache REST Catalog

Reply via email to