Re: [DISCUSS] Multi-statement multi-transaction proposal for Apache REST Catalog

Maninderjit Singh Thu, 26 Jun 2025 22:52:16 -0700

Thanks Dmitri!
I will add this to the doc. Also, it might be a good idea to discuss it in
a meeting so we can hash out the details.


On Wed, Jun 25, 2025, 9:23 AM Dmitri Bourlatchkov <[email protected]> wrote:

> Hi Maninder,
>
> Thanks for adding a section on opaque IDs and apologies for delayed reply
> from my side. I could not find a place where to fit my text in the doc, so
> I'm sending it in this email :)
>
> This option is mostly related to option 2 (CSN) but proposes to use commit
> IDs (alternative to CSN) that are opaque to clients - this is the same as
> in your opaque ID section in the doc, but I hope that thoughts below
> might help to clarity how it is intended to work. The main difference is
> delegating the resolution of commit IDs to snapshots to catalog servers.
>
> Catalog Servers are free to use any implementation for commit IDs,
> including monotonically increasing numbers (but they are not limited to
> CSN).
>
> Catalog Servers produce a commit ID for every change, which will be
> exposed to clients as a reasonably short string. Multi-table changes
> naturally get the same commit ID.
>
> Commit IDs are part of REST Catalog responses, but do not have to be in
> the metadata files. No Iceberg spec changes are required. REST API changes
> are needed, but they are optional and transparent to clients, unless the
> client wishes extra consistency guarantees.
>
> Clients can request table metadata for any table using a particular commit
> ID. This mechanism can be used to ensure consistency in time-travel queries.
>
> An engine can proceed as follows, while executing a multi-table change:
> 1. Load table A - receive metadata and commit ID C1
> 2. Load table B by providing C1 as a request parameter to the Catalog
> server
> 3. Load table C by providing C1 as a request parameter to the server
> 4. Process data in tables A, B, C
> 5. Update table A
> 6. Update table B
> 7. Submit metadata updates for A and B to the Catalog, passing C1 as the
> “base” commit ID to the server. Additionally submit the name C as a “read
> but not changed” table.
> 8. The Catalog server checks whether the change has any conflicts between
> C1 and the current state of the catalog (including validating that C has
> not changed)
> 9. The Catalog commits changes and returns commit ID C2 to the client
> (this commit ID represents the committed state of the submitted metadata
> changes).
>
> If the commit fails due to conflicts, the client receives a “conflict”
> error and a commit ID C3, which represents the most up-to-date state of the
> catalog (the state that was conflicting with the submitted changes). The
> client then re-loads tables based on C3 and retries its workflows.
>
> Load table responses when a commit ID is provided do not have to return
> all of the table's metadata. It is sufficient to return only the most
> relevant snapshots (usually the latest plus its parent). This is similar to
> the partial metadata loading proposal, but not critical for consistency
> guarantees. The critical part is that the Catalog communicates to engines
> what snapshot is current for a particular commit ID.
>
> Resolving Time Travels Queries: When a client executes a time travel
> query, the client provides a timestamp when loading the first table that is
> included in the query. The Catalog will resolve the timestamp to a commit
> ID and include it in the response. Client using the returned commit ID to
> load subsequent tables.
>
> Optionally a new endpoint may be added to the REST Catalog API to handle
> the resolution of timestamps to commit IDs.
>
> Caching Metadata on the Client Side: Reloading table metadata for a
> particular snapshot could leverage the ETag mechanism to reduce the amount
> of network traffic.
>
> Servers do not need to keep any in-progress state for transactions. The
> same multi-table commit mechanism servers have for the existing commit
> endpoint can be extended to also produce commit IDs. Resolving timestamps
> to commit ID is an implementation detail. Some changes in existing servers
> will probably be required for that. Conceptually this problem does not
> appear to be more complex than providing a monotonic CSN or implementing
> the existing multi-table commit endpoint.
>
> Retention of the data related to time-travel is a server-side concern. If
> a client wishes to time travel to a point that no longer has commit
> tracking information and error is returned.
>
> WDYT?
>
> Thanks,
> Dmitri.
>
> On Thu, Jun 19, 2025 at 6:24 PM Maninderjit Singh <
> [email protected]> wrote:
>
>> Thanks Dmitri for the review!
>>
>> We have been deliberate about not including server side implementation
>> for brevity and to allow each vendor to choose the best option for them.
>> Having said that, I have included a few papers that you can reference.
>>
>> I have also added a new
>> <https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&tab=t.0#bookmark=id.typa1ivjs7pw>
>> section under alternative to explore opaque ids further. Could you validate
>> and fill in the details? There are a few open questions and
>> dependencies that would be required for this proposal:
>>
>> Why do we even need an opaque id, could we use tableIdentifier + Sequence
>> number as an implicit opaque id?
>> How are opaque ids compared across tables and with time?
>> Not clear on who issues the timestamp for opaque ids and how it
>> is achieving consistency beyond repeatable reads?
>> Would this require dependency on the partial metadata load proposal
>> <https://docs.google.com/document/d/1eXnT0ZiFvdm_Zvk6fLGT_UxVWO-HsiqVywqu1Uk8s7E/edit?tab=t.0#heading=h.t6emwabb4tkr>
>> ?
>>
>> Regards,
>> Maninder
>>
>> On Thu, Jun 19, 2025 at 12:12 PM Dmitri Bourlatchkov <[email protected]>
>> wrote:
>>
>>> Thanks for the quick response, Jagdeep!
>>>
>>> I can certainly add a section to the doc. Could you clarify what you
>>> mean by "chatty protocol", though. I did not find that term in the
>>> linked email discussion :)
>>>
>>> Thanks,
>>> Dmitri.
>>>
>>> On Thu, Jun 19, 2025 at 2:28 PM Jagdeep Sidhu <[email protected]>
>>> wrote:
>>>
>>>> Hi Dmitri,
>>>>
>>>> Thank you for reviewing. As you said, we previously explored and
>>>> dropped TransactionContext APIs with opaque IDs because it created a very
>>>> chatty protocol and also led to complex transaction state management on
>>>> Server side, link to old thread below.
>>>>
>>>> Would you add a section to the existing document on the approach you
>>>> are thinking - Opaque IDs without the chatty protocol and complex
>>>> transaction state management on Catalog Server? Then we can compare all of
>>>> them and discuss the best path forward. Thank you!
>>>>
>>>> Older thread -
>>>> https://lists.apache.org/thread/q7vgnfwdxng5q6mq45m0psghzy7553r7
>>>>
>>>> -Jagdeep
>>>>
>>>> On Thu, Jun 19, 2025 at 10:42 AM Dmitri Bourlatchkov <[email protected]>
>>>> wrote:
>>>>
>>>>> Thanks for driving this proposal, Maninder!
>>>>>
>>>>> From my POV the need for Catalogs to provide a monotonic sequence
>>>>> number has deep implications on the catalog implementations. I added a
>>>>> related comment to the doc as well.
>>>>>
>>>>> The document does a good job at discussing the client operation. I'd
>>>>> appreciate it if the server-side impact were considered in more depth too,
>>>>> since the proposal implies changes on both sides.
>>>>>
>>>>> I know that an opaque "commit ID" was considered before, however, if
>>>>> I'm not mistaken previous discussions revolved around the idea of
>>>>> a TransactionContext as an entity exposed via new APIs for sharing state
>>>>> between clients/engines and the catalog. I'd like to revisit the idea of
>>>>> opaque transaction IDs (managed by the catalog) but without the use
>>>>> of TransactionContext. I made a brief comment about that in the doc, and
>>>>> I'm willing to expand on this. I believe it can be implemented
>>>>> without having a durable context object to represent a transaction between
>>>>> the client and the catalog.
>>>>>
>>>>> The main idea for "opaque commit IDs" is to allow more flexibility for
>>>>> Catalog implementations, while keeping the same client-side guarantees
>>>>> (snapshot isolation, causally consistent multi-table changes, etc.).
>>>>>
>>>>> Thanks,
>>>>> Dmitri.
>>>>>
>>>>> On Mon, Jun 16, 2025 at 9:32 PM Maninderjit Singh <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi Iceberg dev community,
>>>>>>
>>>>>> We have been iterating on the Multi Table Transactions proposal and
>>>>>> have merged the proposals for using catalog authored timestamps and
>>>>>> sequence numbers together as well incorporated feedback from the 
>>>>>> community:
>>>>>>  Proposal: Multi-table multi-statement transactions for
>>>>>> Apache Iceberg REST Catalog
>>>>>> <https://drive.google.com/open?id=1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE>
>>>>>>
>>>>>> We have captured the tradeoffs involved with each approach as well as
>>>>>> the reasoning for making those choices. We would love to hear your 
>>>>>> opinions
>>>>>> on the consolidated proposal and which approach is more suitable for your
>>>>>> requirements and why.
>>>>>>
>>>>>> Thank you in advance!
>>>>>>
>>>>>

Re: [DISCUSS] Multi-statement multi-transaction proposal for Apache REST Catalog

Reply via email to