Hi Maninder, Thanks for adding a section on opaque IDs and apologies for delayed reply from my side. I could not find a place where to fit my text in the doc, so I'm sending it in this email :)
This option is mostly related to option 2 (CSN) but proposes to use commit IDs (alternative to CSN) that are opaque to clients - this is the same as in your opaque ID section in the doc, but I hope that thoughts below might help to clarity how it is intended to work. The main difference is delegating the resolution of commit IDs to snapshots to catalog servers. Catalog Servers are free to use any implementation for commit IDs, including monotonically increasing numbers (but they are not limited to CSN). Catalog Servers produce a commit ID for every change, which will be exposed to clients as a reasonably short string. Multi-table changes naturally get the same commit ID. Commit IDs are part of REST Catalog responses, but do not have to be in the metadata files. No Iceberg spec changes are required. REST API changes are needed, but they are optional and transparent to clients, unless the client wishes extra consistency guarantees. Clients can request table metadata for any table using a particular commit ID. This mechanism can be used to ensure consistency in time-travel queries. An engine can proceed as follows, while executing a multi-table change: 1. Load table A - receive metadata and commit ID C1 2. Load table B by providing C1 as a request parameter to the Catalog server 3. Load table C by providing C1 as a request parameter to the server 4. Process data in tables A, B, C 5. Update table A 6. Update table B 7. Submit metadata updates for A and B to the Catalog, passing C1 as the “base” commit ID to the server. Additionally submit the name C as a “read but not changed” table. 8. The Catalog server checks whether the change has any conflicts between C1 and the current state of the catalog (including validating that C has not changed) 9. The Catalog commits changes and returns commit ID C2 to the client (this commit ID represents the committed state of the submitted metadata changes). If the commit fails due to conflicts, the client receives a “conflict” error and a commit ID C3, which represents the most up-to-date state of the catalog (the state that was conflicting with the submitted changes). The client then re-loads tables based on C3 and retries its workflows. Load table responses when a commit ID is provided do not have to return all of the table's metadata. It is sufficient to return only the most relevant snapshots (usually the latest plus its parent). This is similar to the partial metadata loading proposal, but not critical for consistency guarantees. The critical part is that the Catalog communicates to engines what snapshot is current for a particular commit ID. Resolving Time Travels Queries: When a client executes a time travel query, the client provides a timestamp when loading the first table that is included in the query. The Catalog will resolve the timestamp to a commit ID and include it in the response. Client using the returned commit ID to load subsequent tables. Optionally a new endpoint may be added to the REST Catalog API to handle the resolution of timestamps to commit IDs. Caching Metadata on the Client Side: Reloading table metadata for a particular snapshot could leverage the ETag mechanism to reduce the amount of network traffic. Servers do not need to keep any in-progress state for transactions. The same multi-table commit mechanism servers have for the existing commit endpoint can be extended to also produce commit IDs. Resolving timestamps to commit ID is an implementation detail. Some changes in existing servers will probably be required for that. Conceptually this problem does not appear to be more complex than providing a monotonic CSN or implementing the existing multi-table commit endpoint. Retention of the data related to time-travel is a server-side concern. If a client wishes to time travel to a point that no longer has commit tracking information and error is returned. WDYT? Thanks, Dmitri. On Thu, Jun 19, 2025 at 6:24 PM Maninderjit Singh < parmar.maninder...@gmail.com> wrote: > Thanks Dmitri for the review! > > We have been deliberate about not including server side implementation for > brevity and to allow each vendor to choose the best option for them. Having > said that, I have included a few papers that you can reference. > > I have also added a new > <https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&tab=t.0#bookmark=id.typa1ivjs7pw> > section under alternative to explore opaque ids further. Could you validate > and fill in the details? There are a few open questions and > dependencies that would be required for this proposal: > > Why do we even need an opaque id, could we use tableIdentifier + Sequence > number as an implicit opaque id? > How are opaque ids compared across tables and with time? > Not clear on who issues the timestamp for opaque ids and how it > is achieving consistency beyond repeatable reads? > Would this require dependency on the partial metadata load proposal > <https://docs.google.com/document/d/1eXnT0ZiFvdm_Zvk6fLGT_UxVWO-HsiqVywqu1Uk8s7E/edit?tab=t.0#heading=h.t6emwabb4tkr> > ? > > Regards, > Maninder > > On Thu, Jun 19, 2025 at 12:12 PM Dmitri Bourlatchkov <di...@apache.org> > wrote: > >> Thanks for the quick response, Jagdeep! >> >> I can certainly add a section to the doc. Could you clarify what you mean >> by "chatty protocol", though. I did not find that term in the linked email >> discussion :) >> >> Thanks, >> Dmitri. >> >> On Thu, Jun 19, 2025 at 2:28 PM Jagdeep Sidhu <sidhujagde...@gmail.com> >> wrote: >> >>> Hi Dmitri, >>> >>> Thank you for reviewing. As you said, we previously explored and dropped >>> TransactionContext APIs with opaque IDs because it created a very chatty >>> protocol and also led to complex transaction state management on Server >>> side, link to old thread below. >>> >>> Would you add a section to the existing document on the approach you are >>> thinking - Opaque IDs without the chatty protocol and complex transaction >>> state management on Catalog Server? Then we can compare all of them and >>> discuss the best path forward. Thank you! >>> >>> Older thread - >>> https://lists.apache.org/thread/q7vgnfwdxng5q6mq45m0psghzy7553r7 >>> >>> -Jagdeep >>> >>> On Thu, Jun 19, 2025 at 10:42 AM Dmitri Bourlatchkov <di...@apache.org> >>> wrote: >>> >>>> Thanks for driving this proposal, Maninder! >>>> >>>> From my POV the need for Catalogs to provide a monotonic sequence >>>> number has deep implications on the catalog implementations. I added a >>>> related comment to the doc as well. >>>> >>>> The document does a good job at discussing the client operation. I'd >>>> appreciate it if the server-side impact were considered in more depth too, >>>> since the proposal implies changes on both sides. >>>> >>>> I know that an opaque "commit ID" was considered before, however, if >>>> I'm not mistaken previous discussions revolved around the idea of >>>> a TransactionContext as an entity exposed via new APIs for sharing state >>>> between clients/engines and the catalog. I'd like to revisit the idea of >>>> opaque transaction IDs (managed by the catalog) but without the use >>>> of TransactionContext. I made a brief comment about that in the doc, and >>>> I'm willing to expand on this. I believe it can be implemented >>>> without having a durable context object to represent a transaction between >>>> the client and the catalog. >>>> >>>> The main idea for "opaque commit IDs" is to allow more flexibility for >>>> Catalog implementations, while keeping the same client-side guarantees >>>> (snapshot isolation, causally consistent multi-table changes, etc.). >>>> >>>> Thanks, >>>> Dmitri. >>>> >>>> On Mon, Jun 16, 2025 at 9:32 PM Maninderjit Singh < >>>> parmar.maninder...@gmail.com> wrote: >>>> >>>>> Hi Iceberg dev community, >>>>> >>>>> We have been iterating on the Multi Table Transactions proposal and >>>>> have merged the proposals for using catalog authored timestamps and >>>>> sequence numbers together as well incorporated feedback from the >>>>> community: >>>>> Proposal: Multi-table multi-statement transactions for >>>>> Apache Iceberg REST Catalog >>>>> <https://drive.google.com/open?id=1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE> >>>>> >>>>> We have captured the tradeoffs involved with each approach as well as >>>>> the reasoning for making those choices. We would love to hear your >>>>> opinions >>>>> on the consolidated proposal and which approach is more suitable for your >>>>> requirements and why. >>>>> >>>>> Thank you in advance! >>>>> >>>>