Re: [DISCUSS] Multi-statement multi-transaction proposal for Apache REST Catalog

Maninderjit Singh Mon, 30 Jun 2025 17:38:53 -0700

Let me close the opaque id option with you and then I will send an invite
for broader discussion with the community (hopefully in next week or so).


Thanks,
Maninder

On Fri, Jun 27, 2025 at 11:34 AM Dmitri Bourlatchkov <di...@apache.org>
wrote:

> Thanks, Maninder! Good idea.
>
> Is any meeting for this already scheduled?
>
> Cheers,
> Dmitri.
>
> On Fri, Jun 27, 2025 at 1:52 AM Maninderjit Singh <
> parmar.maninder...@gmail.com> wrote:
>
>> Thanks Dmitri!
>> I will add this to the doc. Also, it might be a good idea to discuss it
>> in a meeting so we can hash out the details.
>>
>> On Wed, Jun 25, 2025, 9:23 AM Dmitri Bourlatchkov <di...@apache.org>
>> wrote:
>>
>>> Hi Maninder,
>>>
>>> Thanks for adding a section on opaque IDs and apologies for delayed
>>> reply from my side. I could not find a place where to fit my text in the
>>> doc, so I'm sending it in this email :)
>>>
>>> This option is mostly related to option 2 (CSN) but proposes to use
>>> commit IDs (alternative to CSN) that are opaque to clients - this is the
>>> same as in your opaque ID section in the doc, but I hope that thoughts
>>> below might help to clarity how it is intended to work. The main difference
>>> is delegating the resolution of commit IDs to snapshots to catalog servers.
>>>
>>> Catalog Servers are free to use any implementation for commit IDs,
>>> including monotonically increasing numbers (but they are not limited to
>>> CSN).
>>>
>>> Catalog Servers produce a commit ID for every change, which will be
>>> exposed to clients as a reasonably short string. Multi-table changes
>>> naturally get the same commit ID.
>>>
>>> Commit IDs are part of REST Catalog responses, but do not have to be in
>>> the metadata files. No Iceberg spec changes are required. REST API changes
>>> are needed, but they are optional and transparent to clients, unless the
>>> client wishes extra consistency guarantees.
>>>
>>> Clients can request table metadata for any table using a particular
>>> commit ID. This mechanism can be used to ensure consistency in time-travel
>>> queries.
>>>
>>> An engine can proceed as follows, while executing a multi-table change:
>>> 1. Load table A - receive metadata and commit ID C1
>>> 2. Load table B by providing C1 as a request parameter to the Catalog
>>> server
>>> 3. Load table C by providing C1 as a request parameter to the server
>>> 4. Process data in tables A, B, C
>>> 5. Update table A
>>> 6. Update table B
>>> 7. Submit metadata updates for A and B to the Catalog, passing C1 as the
>>> “base” commit ID to the server. Additionally submit the name C as a “read
>>> but not changed” table.
>>> 8. The Catalog server checks whether the change has any conflicts
>>> between C1 and the current state of the catalog (including validating that
>>> C has not changed)
>>> 9. The Catalog commits changes and returns commit ID C2 to the client
>>> (this commit ID represents the committed state of the submitted metadata
>>> changes).
>>>
>>> If the commit fails due to conflicts, the client receives a “conflict”
>>> error and a commit ID C3, which represents the most up-to-date state of the
>>> catalog (the state that was conflicting with the submitted changes). The
>>> client then re-loads tables based on C3 and retries its workflows.
>>>
>>> Load table responses when a commit ID is provided do not have to return
>>> all of the table's metadata. It is sufficient to return only the most
>>> relevant snapshots (usually the latest plus its parent). This is similar to
>>> the partial metadata loading proposal, but not critical for consistency
>>> guarantees. The critical part is that the Catalog communicates to engines
>>> what snapshot is current for a particular commit ID.
>>>
>>> Resolving Time Travels Queries: When a client executes a time travel
>>> query, the client provides a timestamp when loading the first table that is
>>> included in the query. The Catalog will resolve the timestamp to a commit
>>> ID and include it in the response. Client using the returned commit ID to
>>> load subsequent tables.
>>>
>>> Optionally a new endpoint may be added to the REST Catalog API to handle
>>> the resolution of timestamps to commit IDs.
>>>
>>> Caching Metadata on the Client Side: Reloading table metadata for a
>>> particular snapshot could leverage the ETag mechanism to reduce the amount
>>> of network traffic.
>>>
>>> Servers do not need to keep any in-progress state for transactions. The
>>> same multi-table commit mechanism servers have for the existing commit
>>> endpoint can be extended to also produce commit IDs. Resolving timestamps
>>> to commit ID is an implementation detail. Some changes in existing servers
>>> will probably be required for that. Conceptually this problem does not
>>> appear to be more complex than providing a monotonic CSN or implementing
>>> the existing multi-table commit endpoint.
>>>
>>> Retention of the data related to time-travel is a server-side concern.
>>> If a client wishes to time travel to a point that no longer has commit
>>> tracking information and error is returned.
>>>
>>> WDYT?
>>>
>>> Thanks,
>>> Dmitri.
>>>
>>> On Thu, Jun 19, 2025 at 6:24 PM Maninderjit Singh <
>>> parmar.maninder...@gmail.com> wrote:
>>>
>>>> Thanks Dmitri for the review!
>>>>
>>>> We have been deliberate about not including server side implementation
>>>> for brevity and to allow each vendor to choose the best option for them.
>>>> Having said that, I have included a few papers that you can reference.
>>>>
>>>> I have also added a new
>>>> <https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&tab=t.0#bookmark=id.typa1ivjs7pw>
>>>> section under alternative to explore opaque ids further. Could you validate
>>>> and fill in the details? There are a few open questions and
>>>> dependencies that would be required for this proposal:
>>>>
>>>> Why do we even need an opaque id, could we use tableIdentifier +
>>>> Sequence number as an implicit opaque id?
>>>> How are opaque ids compared across tables and with time?
>>>> Not clear on who issues the timestamp for opaque ids and how it
>>>> is achieving consistency beyond repeatable reads?
>>>> Would this require dependency on the partial metadata load proposal
>>>> <https://docs.google.com/document/d/1eXnT0ZiFvdm_Zvk6fLGT_UxVWO-HsiqVywqu1Uk8s7E/edit?tab=t.0#heading=h.t6emwabb4tkr>
>>>> ?
>>>>
>>>> Regards,
>>>> Maninder
>>>>
>>>> On Thu, Jun 19, 2025 at 12:12 PM Dmitri Bourlatchkov <di...@apache.org>
>>>> wrote:
>>>>
>>>>> Thanks for the quick response, Jagdeep!
>>>>>
>>>>> I can certainly add a section to the doc. Could you clarify what you
>>>>> mean by "chatty protocol", though. I did not find that term in the
>>>>> linked email discussion :)
>>>>>
>>>>> Thanks,
>>>>> Dmitri.
>>>>>
>>>>> On Thu, Jun 19, 2025 at 2:28 PM Jagdeep Sidhu <sidhujagde...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Dmitri,
>>>>>>
>>>>>> Thank you for reviewing. As you said, we previously explored and
>>>>>> dropped TransactionContext APIs with opaque IDs because it created a very
>>>>>> chatty protocol and also led to complex transaction state management on
>>>>>> Server side, link to old thread below.
>>>>>>
>>>>>> Would you add a section to the existing document on the approach you
>>>>>> are thinking - Opaque IDs without the chatty protocol and complex
>>>>>> transaction state management on Catalog Server? Then we can compare all 
>>>>>> of
>>>>>> them and discuss the best path forward. Thank you!
>>>>>>
>>>>>> Older thread -
>>>>>> https://lists.apache.org/thread/q7vgnfwdxng5q6mq45m0psghzy7553r7
>>>>>>
>>>>>> -Jagdeep
>>>>>>
>>>>>> On Thu, Jun 19, 2025 at 10:42 AM Dmitri Bourlatchkov <
>>>>>> di...@apache.org> wrote:
>>>>>>
>>>>>>> Thanks for driving this proposal, Maninder!
>>>>>>>
>>>>>>> From my POV the need for Catalogs to provide a monotonic sequence
>>>>>>> number has deep implications on the catalog implementations. I added a
>>>>>>> related comment to the doc as well.
>>>>>>>
>>>>>>> The document does a good job at discussing the client operation. I'd
>>>>>>> appreciate it if the server-side impact were considered in more depth 
>>>>>>> too,
>>>>>>> since the proposal implies changes on both sides.
>>>>>>>
>>>>>>> I know that an opaque "commit ID" was considered before, however, if
>>>>>>> I'm not mistaken previous discussions revolved around the idea of
>>>>>>> a TransactionContext as an entity exposed via new APIs for sharing state
>>>>>>> between clients/engines and the catalog. I'd like to revisit the idea of
>>>>>>> opaque transaction IDs (managed by the catalog) but without the use
>>>>>>> of TransactionContext. I made a brief comment about that in the doc, and
>>>>>>> I'm willing to expand on this. I believe it can be implemented
>>>>>>> without having a durable context object to represent a transaction 
>>>>>>> between
>>>>>>> the client and the catalog.
>>>>>>>
>>>>>>> The main idea for "opaque commit IDs" is to allow more flexibility
>>>>>>> for Catalog implementations, while keeping the same client-side 
>>>>>>> guarantees
>>>>>>> (snapshot isolation, causally consistent multi-table changes, etc.).
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Dmitri.
>>>>>>>
>>>>>>> On Mon, Jun 16, 2025 at 9:32 PM Maninderjit Singh <
>>>>>>> parmar.maninder...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Iceberg dev community,
>>>>>>>>
>>>>>>>> We have been iterating on the Multi Table Transactions proposal and
>>>>>>>> have merged the proposals for using catalog authored timestamps and
>>>>>>>> sequence numbers together as well incorporated feedback from the 
>>>>>>>> community:
>>>>>>>>  Proposal: Multi-table multi-statement transactions for
>>>>>>>> Apache Iceberg REST Catalog
>>>>>>>> <https://drive.google.com/open?id=1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE>
>>>>>>>>
>>>>>>>> We have captured the tradeoffs involved with each approach as well
>>>>>>>> as the reasoning for making those choices. We would love to hear your
>>>>>>>> opinions on the consolidated proposal and which approach is more 
>>>>>>>> suitable
>>>>>>>> for your requirements and why.
>>>>>>>>
>>>>>>>> Thank you in advance!
>>>>>>>>
>>>>>>>

Re: [DISCUSS] Multi-statement multi-transaction proposal for Apache REST Catalog

Reply via email to