Re: [DISCUSS] Multi-statement multi-transaction proposal for Apache REST Catalog

Dmitri Bourlatchkov Fri, 27 Jun 2025 11:34:58 -0700

Thanks, Maninder! Good idea.

Is any meeting for this already scheduled?


Cheers,
Dmitri.

On Fri, Jun 27, 2025 at 1:52 AM Maninderjit Singh <
[email protected]> wrote:

> Thanks Dmitri!
> I will add this to the doc. Also, it might be a good idea to discuss it in
> a meeting so we can hash out the details.
>
> On Wed, Jun 25, 2025, 9:23 AM Dmitri Bourlatchkov <[email protected]>
> wrote:
>
>> Hi Maninder,
>>
>> Thanks for adding a section on opaque IDs and apologies for delayed reply
>> from my side. I could not find a place where to fit my text in the doc, so
>> I'm sending it in this email :)
>>
>> This option is mostly related to option 2 (CSN) but proposes to use
>> commit IDs (alternative to CSN) that are opaque to clients - this is the
>> same as in your opaque ID section in the doc, but I hope that thoughts
>> below might help to clarity how it is intended to work. The main difference
>> is delegating the resolution of commit IDs to snapshots to catalog servers.
>>
>> Catalog Servers are free to use any implementation for commit IDs,
>> including monotonically increasing numbers (but they are not limited to
>> CSN).
>>
>> Catalog Servers produce a commit ID for every change, which will be
>> exposed to clients as a reasonably short string. Multi-table changes
>> naturally get the same commit ID.
>>
>> Commit IDs are part of REST Catalog responses, but do not have to be in
>> the metadata files. No Iceberg spec changes are required. REST API changes
>> are needed, but they are optional and transparent to clients, unless the
>> client wishes extra consistency guarantees.
>>
>> Clients can request table metadata for any table using a particular
>> commit ID. This mechanism can be used to ensure consistency in time-travel
>> queries.
>>
>> An engine can proceed as follows, while executing a multi-table change:
>> 1. Load table A - receive metadata and commit ID C1
>> 2. Load table B by providing C1 as a request parameter to the Catalog
>> server
>> 3. Load table C by providing C1 as a request parameter to the server
>> 4. Process data in tables A, B, C
>> 5. Update table A
>> 6. Update table B
>> 7. Submit metadata updates for A and B to the Catalog, passing C1 as the
>> “base” commit ID to the server. Additionally submit the name C as a “read
>> but not changed” table.
>> 8. The Catalog server checks whether the change has any conflicts between
>> C1 and the current state of the catalog (including validating that C has
>> not changed)
>> 9. The Catalog commits changes and returns commit ID C2 to the client
>> (this commit ID represents the committed state of the submitted metadata
>> changes).
>>
>> If the commit fails due to conflicts, the client receives a “conflict”
>> error and a commit ID C3, which represents the most up-to-date state of the
>> catalog (the state that was conflicting with the submitted changes). The
>> client then re-loads tables based on C3 and retries its workflows.
>>
>> Load table responses when a commit ID is provided do not have to return
>> all of the table's metadata. It is sufficient to return only the most
>> relevant snapshots (usually the latest plus its parent). This is similar to
>> the partial metadata loading proposal, but not critical for consistency
>> guarantees. The critical part is that the Catalog communicates to engines
>> what snapshot is current for a particular commit ID.
>>
>> Resolving Time Travels Queries: When a client executes a time travel
>> query, the client provides a timestamp when loading the first table that is
>> included in the query. The Catalog will resolve the timestamp to a commit
>> ID and include it in the response. Client using the returned commit ID to
>> load subsequent tables.
>>
>> Optionally a new endpoint may be added to the REST Catalog API to handle
>> the resolution of timestamps to commit IDs.
>>
>> Caching Metadata on the Client Side: Reloading table metadata for a
>> particular snapshot could leverage the ETag mechanism to reduce the amount
>> of network traffic.
>>
>> Servers do not need to keep any in-progress state for transactions. The
>> same multi-table commit mechanism servers have for the existing commit
>> endpoint can be extended to also produce commit IDs. Resolving timestamps
>> to commit ID is an implementation detail. Some changes in existing servers
>> will probably be required for that. Conceptually this problem does not
>> appear to be more complex than providing a monotonic CSN or implementing
>> the existing multi-table commit endpoint.
>>
>> Retention of the data related to time-travel is a server-side concern. If
>> a client wishes to time travel to a point that no longer has commit
>> tracking information and error is returned.
>>
>> WDYT?
>>
>> Thanks,
>> Dmitri.
>>
>> On Thu, Jun 19, 2025 at 6:24 PM Maninderjit Singh <
>> [email protected]> wrote:
>>
>>> Thanks Dmitri for the review!
>>>
>>> We have been deliberate about not including server side implementation
>>> for brevity and to allow each vendor to choose the best option for them.
>>> Having said that, I have included a few papers that you can reference.
>>>
>>> I have also added a new
>>> <https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&tab=t.0#bookmark=id.typa1ivjs7pw>
>>> section under alternative to explore opaque ids further. Could you validate
>>> and fill in the details? There are a few open questions and
>>> dependencies that would be required for this proposal:
>>>
>>> Why do we even need an opaque id, could we use tableIdentifier +
>>> Sequence number as an implicit opaque id?
>>> How are opaque ids compared across tables and with time?
>>> Not clear on who issues the timestamp for opaque ids and how it
>>> is achieving consistency beyond repeatable reads?
>>> Would this require dependency on the partial metadata load proposal
>>> <https://docs.google.com/document/d/1eXnT0ZiFvdm_Zvk6fLGT_UxVWO-HsiqVywqu1Uk8s7E/edit?tab=t.0#heading=h.t6emwabb4tkr>
>>> ?
>>>
>>> Regards,
>>> Maninder
>>>
>>> On Thu, Jun 19, 2025 at 12:12 PM Dmitri Bourlatchkov <[email protected]>
>>> wrote:
>>>
>>>> Thanks for the quick response, Jagdeep!
>>>>
>>>> I can certainly add a section to the doc. Could you clarify what you
>>>> mean by "chatty protocol", though. I did not find that term in the
>>>> linked email discussion :)
>>>>
>>>> Thanks,
>>>> Dmitri.
>>>>
>>>> On Thu, Jun 19, 2025 at 2:28 PM Jagdeep Sidhu <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Dmitri,
>>>>>
>>>>> Thank you for reviewing. As you said, we previously explored and
>>>>> dropped TransactionContext APIs with opaque IDs because it created a very
>>>>> chatty protocol and also led to complex transaction state management on
>>>>> Server side, link to old thread below.
>>>>>
>>>>> Would you add a section to the existing document on the approach you
>>>>> are thinking - Opaque IDs without the chatty protocol and complex
>>>>> transaction state management on Catalog Server? Then we can compare all of
>>>>> them and discuss the best path forward. Thank you!
>>>>>
>>>>> Older thread -
>>>>> https://lists.apache.org/thread/q7vgnfwdxng5q6mq45m0psghzy7553r7
>>>>>
>>>>> -Jagdeep
>>>>>
>>>>> On Thu, Jun 19, 2025 at 10:42 AM Dmitri Bourlatchkov <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Thanks for driving this proposal, Maninder!
>>>>>>
>>>>>> From my POV the need for Catalogs to provide a monotonic sequence
>>>>>> number has deep implications on the catalog implementations. I added a
>>>>>> related comment to the doc as well.
>>>>>>
>>>>>> The document does a good job at discussing the client operation. I'd
>>>>>> appreciate it if the server-side impact were considered in more depth 
>>>>>> too,
>>>>>> since the proposal implies changes on both sides.
>>>>>>
>>>>>> I know that an opaque "commit ID" was considered before, however, if
>>>>>> I'm not mistaken previous discussions revolved around the idea of
>>>>>> a TransactionContext as an entity exposed via new APIs for sharing state
>>>>>> between clients/engines and the catalog. I'd like to revisit the idea of
>>>>>> opaque transaction IDs (managed by the catalog) but without the use
>>>>>> of TransactionContext. I made a brief comment about that in the doc, and
>>>>>> I'm willing to expand on this. I believe it can be implemented
>>>>>> without having a durable context object to represent a transaction 
>>>>>> between
>>>>>> the client and the catalog.
>>>>>>
>>>>>> The main idea for "opaque commit IDs" is to allow more flexibility
>>>>>> for Catalog implementations, while keeping the same client-side 
>>>>>> guarantees
>>>>>> (snapshot isolation, causally consistent multi-table changes, etc.).
>>>>>>
>>>>>> Thanks,
>>>>>> Dmitri.
>>>>>>
>>>>>> On Mon, Jun 16, 2025 at 9:32 PM Maninderjit Singh <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi Iceberg dev community,
>>>>>>>
>>>>>>> We have been iterating on the Multi Table Transactions proposal and
>>>>>>> have merged the proposals for using catalog authored timestamps and
>>>>>>> sequence numbers together as well incorporated feedback from the 
>>>>>>> community:
>>>>>>>  Proposal: Multi-table multi-statement transactions for
>>>>>>> Apache Iceberg REST Catalog
>>>>>>> <https://drive.google.com/open?id=1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE>
>>>>>>>
>>>>>>> We have captured the tradeoffs involved with each approach as well
>>>>>>> as the reasoning for making those choices. We would love to hear your
>>>>>>> opinions on the consolidated proposal and which approach is more 
>>>>>>> suitable
>>>>>>> for your requirements and why.
>>>>>>>
>>>>>>> Thank you in advance!
>>>>>>>
>>>>>>

Re: [DISCUSS] Multi-statement multi-transaction proposal for Apache REST Catalog

Reply via email to