Hi Dmitri,

By "chatty protocol" in my last email, I meant it would lead to multiple
Catalog calls for simple read queries. I also wrote this down under
alternatives
considered
<https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?tab=t.0#bookmark=id.k6o42bbncn3j>
in the document. For example, when a new transaction is started, the engine
may not know if it will be a multi-table transaction or not. So engines
will have to register all transaction contexts with Catalog and later
delete them, leading to a lot more Catalog API calls.

This may be different than the option you are thinking of with opaque IDs,
so adding it to the document would help us discuss and compare the options.

Thank you!
-Jagdeep


On Fri, Jun 20, 2025 at 12:24 AM Maninderjit Singh <
parmar.maninder...@gmail.com> wrote:

> Thanks Dmitri for the review!
>
> We have been deliberate about not including server side implementation for
> brevity and to allow each vendor to choose the best option for them. Having
> said that, I have included a few papers that you can reference.
>
> I have also added a new
> <https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&tab=t.0#bookmark=id.typa1ivjs7pw>
> section under alternative to explore opaque ids further. Could you validate
> and fill in the details? There are a few open questions and
> dependencies that would be required for this proposal:
>
> Why do we even need an opaque id, could we use tableIdentifier + Sequence
> number as an implicit opaque id?
> How are opaque ids compared across tables and with time?
> Not clear on who issues the timestamp for opaque ids and how it
> is achieving consistency beyond repeatable reads?
> Would this require dependency on the partial metadata load proposal
> <https://docs.google.com/document/d/1eXnT0ZiFvdm_Zvk6fLGT_UxVWO-HsiqVywqu1Uk8s7E/edit?tab=t.0#heading=h.t6emwabb4tkr>
> ?
>
> Regards,
> Maninder
>
> On Thu, Jun 19, 2025 at 12:12 PM Dmitri Bourlatchkov <di...@apache.org>
> wrote:
>
>> Thanks for the quick response, Jagdeep!
>>
>> I can certainly add a section to the doc. Could you clarify what you mean
>> by "chatty protocol", though. I did not find that term in the linked email
>> discussion :)
>>
>> Thanks,
>> Dmitri.
>>
>> On Thu, Jun 19, 2025 at 2:28 PM Jagdeep Sidhu <sidhujagde...@gmail.com>
>> wrote:
>>
>>> Hi Dmitri,
>>>
>>> Thank you for reviewing. As you said, we previously explored and dropped
>>> TransactionContext APIs with opaque IDs because it created a very chatty
>>> protocol and also led to complex transaction state management on Server
>>> side, link to old thread below.
>>>
>>> Would you add a section to the existing document on the approach you are
>>> thinking - Opaque IDs without the chatty protocol and complex transaction
>>> state management on Catalog Server? Then we can compare all of them and
>>> discuss the best path forward. Thank you!
>>>
>>> Older thread -
>>> https://lists.apache.org/thread/q7vgnfwdxng5q6mq45m0psghzy7553r7
>>>
>>> -Jagdeep
>>>
>>> On Thu, Jun 19, 2025 at 10:42 AM Dmitri Bourlatchkov <di...@apache.org>
>>> wrote:
>>>
>>>> Thanks for driving this proposal, Maninder!
>>>>
>>>> From my POV the need for Catalogs to provide a monotonic sequence
>>>> number has deep implications on the catalog implementations. I added a
>>>> related comment to the doc as well.
>>>>
>>>> The document does a good job at discussing the client operation. I'd
>>>> appreciate it if the server-side impact were considered in more depth too,
>>>> since the proposal implies changes on both sides.
>>>>
>>>> I know that an opaque "commit ID" was considered before, however, if
>>>> I'm not mistaken previous discussions revolved around the idea of
>>>> a TransactionContext as an entity exposed via new APIs for sharing state
>>>> between clients/engines and the catalog. I'd like to revisit the idea of
>>>> opaque transaction IDs (managed by the catalog) but without the use
>>>> of TransactionContext. I made a brief comment about that in the doc, and
>>>> I'm willing to expand on this. I believe it can be implemented
>>>> without having a durable context object to represent a transaction between
>>>> the client and the catalog.
>>>>
>>>> The main idea for "opaque commit IDs" is to allow more flexibility for
>>>> Catalog implementations, while keeping the same client-side guarantees
>>>> (snapshot isolation, causally consistent multi-table changes, etc.).
>>>>
>>>> Thanks,
>>>> Dmitri.
>>>>
>>>> On Mon, Jun 16, 2025 at 9:32 PM Maninderjit Singh <
>>>> parmar.maninder...@gmail.com> wrote:
>>>>
>>>>> Hi Iceberg dev community,
>>>>>
>>>>> We have been iterating on the Multi Table Transactions proposal and
>>>>> have merged the proposals for using catalog authored timestamps and
>>>>> sequence numbers together as well incorporated feedback from the 
>>>>> community:
>>>>>  Proposal: Multi-table multi-statement transactions for
>>>>> Apache Iceberg REST Catalog
>>>>> <https://drive.google.com/open?id=1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE>
>>>>>
>>>>> We have captured the tradeoffs involved with each approach as well as
>>>>> the reasoning for making those choices. We would love to hear your 
>>>>> opinions
>>>>> on the consolidated proposal and which approach is more suitable for your
>>>>> requirements and why.
>>>>>
>>>>> Thank you in advance!
>>>>>
>>>>

Reply via email to