Thanks Dmitri! I will add this to the doc. Also, it might be a good idea to discuss it in a meeting so we can hash out the details.
On Wed, Jun 25, 2025, 9:23 AM Dmitri Bourlatchkov <di...@apache.org> wrote: > Hi Maninder, > > Thanks for adding a section on opaque IDs and apologies for delayed reply > from my side. I could not find a place where to fit my text in the doc, so > I'm sending it in this email :) > > This option is mostly related to option 2 (CSN) but proposes to use commit > IDs (alternative to CSN) that are opaque to clients - this is the same as > in your opaque ID section in the doc, but I hope that thoughts below > might help to clarity how it is intended to work. The main difference is > delegating the resolution of commit IDs to snapshots to catalog servers. > > Catalog Servers are free to use any implementation for commit IDs, > including monotonically increasing numbers (but they are not limited to > CSN). > > Catalog Servers produce a commit ID for every change, which will be > exposed to clients as a reasonably short string. Multi-table changes > naturally get the same commit ID. > > Commit IDs are part of REST Catalog responses, but do not have to be in > the metadata files. No Iceberg spec changes are required. REST API changes > are needed, but they are optional and transparent to clients, unless the > client wishes extra consistency guarantees. > > Clients can request table metadata for any table using a particular commit > ID. This mechanism can be used to ensure consistency in time-travel queries. > > An engine can proceed as follows, while executing a multi-table change: > 1. Load table A - receive metadata and commit ID C1 > 2. Load table B by providing C1 as a request parameter to the Catalog > server > 3. Load table C by providing C1 as a request parameter to the server > 4. Process data in tables A, B, C > 5. Update table A > 6. Update table B > 7. Submit metadata updates for A and B to the Catalog, passing C1 as the > “base” commit ID to the server. Additionally submit the name C as a “read > but not changed” table. > 8. The Catalog server checks whether the change has any conflicts between > C1 and the current state of the catalog (including validating that C has > not changed) > 9. The Catalog commits changes and returns commit ID C2 to the client > (this commit ID represents the committed state of the submitted metadata > changes). > > If the commit fails due to conflicts, the client receives a “conflict” > error and a commit ID C3, which represents the most up-to-date state of the > catalog (the state that was conflicting with the submitted changes). The > client then re-loads tables based on C3 and retries its workflows. > > Load table responses when a commit ID is provided do not have to return > all of the table's metadata. It is sufficient to return only the most > relevant snapshots (usually the latest plus its parent). This is similar to > the partial metadata loading proposal, but not critical for consistency > guarantees. The critical part is that the Catalog communicates to engines > what snapshot is current for a particular commit ID. > > Resolving Time Travels Queries: When a client executes a time travel > query, the client provides a timestamp when loading the first table that is > included in the query. The Catalog will resolve the timestamp to a commit > ID and include it in the response. Client using the returned commit ID to > load subsequent tables. > > Optionally a new endpoint may be added to the REST Catalog API to handle > the resolution of timestamps to commit IDs. > > Caching Metadata on the Client Side: Reloading table metadata for a > particular snapshot could leverage the ETag mechanism to reduce the amount > of network traffic. > > Servers do not need to keep any in-progress state for transactions. The > same multi-table commit mechanism servers have for the existing commit > endpoint can be extended to also produce commit IDs. Resolving timestamps > to commit ID is an implementation detail. Some changes in existing servers > will probably be required for that. Conceptually this problem does not > appear to be more complex than providing a monotonic CSN or implementing > the existing multi-table commit endpoint. > > Retention of the data related to time-travel is a server-side concern. If > a client wishes to time travel to a point that no longer has commit > tracking information and error is returned. > > WDYT? > > Thanks, > Dmitri. > > On Thu, Jun 19, 2025 at 6:24 PM Maninderjit Singh < > parmar.maninder...@gmail.com> wrote: > >> Thanks Dmitri for the review! >> >> We have been deliberate about not including server side implementation >> for brevity and to allow each vendor to choose the best option for them. >> Having said that, I have included a few papers that you can reference. >> >> I have also added a new >> <https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&tab=t.0#bookmark=id.typa1ivjs7pw> >> section under alternative to explore opaque ids further. Could you validate >> and fill in the details? There are a few open questions and >> dependencies that would be required for this proposal: >> >> Why do we even need an opaque id, could we use tableIdentifier + Sequence >> number as an implicit opaque id? >> How are opaque ids compared across tables and with time? >> Not clear on who issues the timestamp for opaque ids and how it >> is achieving consistency beyond repeatable reads? >> Would this require dependency on the partial metadata load proposal >> <https://docs.google.com/document/d/1eXnT0ZiFvdm_Zvk6fLGT_UxVWO-HsiqVywqu1Uk8s7E/edit?tab=t.0#heading=h.t6emwabb4tkr> >> ? >> >> Regards, >> Maninder >> >> On Thu, Jun 19, 2025 at 12:12 PM Dmitri Bourlatchkov <di...@apache.org> >> wrote: >> >>> Thanks for the quick response, Jagdeep! >>> >>> I can certainly add a section to the doc. Could you clarify what you >>> mean by "chatty protocol", though. I did not find that term in the >>> linked email discussion :) >>> >>> Thanks, >>> Dmitri. >>> >>> On Thu, Jun 19, 2025 at 2:28 PM Jagdeep Sidhu <sidhujagde...@gmail.com> >>> wrote: >>> >>>> Hi Dmitri, >>>> >>>> Thank you for reviewing. As you said, we previously explored and >>>> dropped TransactionContext APIs with opaque IDs because it created a very >>>> chatty protocol and also led to complex transaction state management on >>>> Server side, link to old thread below. >>>> >>>> Would you add a section to the existing document on the approach you >>>> are thinking - Opaque IDs without the chatty protocol and complex >>>> transaction state management on Catalog Server? Then we can compare all of >>>> them and discuss the best path forward. Thank you! >>>> >>>> Older thread - >>>> https://lists.apache.org/thread/q7vgnfwdxng5q6mq45m0psghzy7553r7 >>>> >>>> -Jagdeep >>>> >>>> On Thu, Jun 19, 2025 at 10:42 AM Dmitri Bourlatchkov <di...@apache.org> >>>> wrote: >>>> >>>>> Thanks for driving this proposal, Maninder! >>>>> >>>>> From my POV the need for Catalogs to provide a monotonic sequence >>>>> number has deep implications on the catalog implementations. I added a >>>>> related comment to the doc as well. >>>>> >>>>> The document does a good job at discussing the client operation. I'd >>>>> appreciate it if the server-side impact were considered in more depth too, >>>>> since the proposal implies changes on both sides. >>>>> >>>>> I know that an opaque "commit ID" was considered before, however, if >>>>> I'm not mistaken previous discussions revolved around the idea of >>>>> a TransactionContext as an entity exposed via new APIs for sharing state >>>>> between clients/engines and the catalog. I'd like to revisit the idea of >>>>> opaque transaction IDs (managed by the catalog) but without the use >>>>> of TransactionContext. I made a brief comment about that in the doc, and >>>>> I'm willing to expand on this. I believe it can be implemented >>>>> without having a durable context object to represent a transaction between >>>>> the client and the catalog. >>>>> >>>>> The main idea for "opaque commit IDs" is to allow more flexibility for >>>>> Catalog implementations, while keeping the same client-side guarantees >>>>> (snapshot isolation, causally consistent multi-table changes, etc.). >>>>> >>>>> Thanks, >>>>> Dmitri. >>>>> >>>>> On Mon, Jun 16, 2025 at 9:32 PM Maninderjit Singh < >>>>> parmar.maninder...@gmail.com> wrote: >>>>> >>>>>> Hi Iceberg dev community, >>>>>> >>>>>> We have been iterating on the Multi Table Transactions proposal and >>>>>> have merged the proposals for using catalog authored timestamps and >>>>>> sequence numbers together as well incorporated feedback from the >>>>>> community: >>>>>> Proposal: Multi-table multi-statement transactions for >>>>>> Apache Iceberg REST Catalog >>>>>> <https://drive.google.com/open?id=1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE> >>>>>> >>>>>> We have captured the tradeoffs involved with each approach as well as >>>>>> the reasoning for making those choices. We would love to hear your >>>>>> opinions >>>>>> on the consolidated proposal and which approach is more suitable for your >>>>>> requirements and why. >>>>>> >>>>>> Thank you in advance! >>>>>> >>>>>