Hi Yufei,

Makes sense. While many other catalogs use an independent DB, not all of
them do. We don't have to modify the REST spec, let's add these
requirements in a different doc (probably the encryption.md I'm working on).

Cheers, Gidon


On Sun, Nov 16, 2025 at 12:30 AM Yufei Gu <[email protected]> wrote:

> Hi Gidon, sorry for the late reply.
>
> > Yep, that'd work, as long as the checksum is kept in a trusted
> independent storage/db.
> > Then I guess both catalog clients and servers would need access to the
> trusted storage of the checksums.
>
> Thanks for chiming in. These two lines actually change how I think about
> table metadata.json protection. I’m leaning toward the conclusion that we
> don’t need to add extra messages to the REST spec. A few reasons:
> 1. REST catalog isn't fundamentally different from other catalogs (HMS,
> Hadoop) in terms of table metadata.json security boundary.
> 2. The tamper-proof requirement should be exactly the same across
> different types of catalogs.
> 3. Each IRC impl. can still choose to add extra protections like the
> checksum I proposed.
> 4. And longer-term, if we ever manage to remove the requirement that table
> metadata.json must live in storage as a file, then we could revisit the
> spec and add more targeted guarantees at the API layer.
>
>
> Yufei
>
>
> On Thu, Nov 6, 2025 at 4:34 AM Gidon Gershinsky <[email protected]> wrote:
>
>> Hi Yufei, thank you.
>>
>> I'll start with saying - if the main storage is tamper-proof, then there
>> is no problem and no extra requirements for REST catalogs.
>> The rest of the mail refers to the scenarios where the main storage is
>> not tamper-proof.
>>
>> > For metadata.json integrity, the REST catalog can add a checksum to the
>> metadata.json file at the commit time and validate it while loading it back
>>
>> Yep, that'd work, as long as the checksum is kept in a trusted
>> independent storage/db.
>>
>> > There are certain use cases where engines may still load tables
>> directly from storage even when IRC is used for committing.
>>
>> Then I guess both catalog clients and servers would need access to the
>> trusted storage of the checksums.
>>
>> > It seems like a loophole, but IRC couldn't really do anything about it.
>> It's probably the system admin's responsibility to take care of it.
>>
>> Ok. As a baseline protection, the REST spec addition patch explicitly
>> states what is required of a catalog implementation/deployment to prevent a
>> compromise of encrypted tables. Maybe some IRC implementations will handle
>> this requirement (fully or partially) for untrusted main storage backends.
>> But I agree - eventually, it is the admin's responsibility to make sure the
>> requirement is handled; e.g. by choosing a tamper-proof main storage
>> backend, or by deploying an independent storage / db for the metadata or
>> its checksums.
>>
>> > For metadata.json confidentiality, I thought the metadata.json itself
>> is encrypted as well, no?
>>
>> By broken confidentially, I meant leaks in the data files and
>> manifest/list files. They are obviously confidential (values and stats). In
>> the community discussion on encryption, we've decided not to encrypt the
>> metadata.json file, for two reasons: metadata fields don't have
>> confidential info, and a loss of the metadata encryption key due to a
>> catalog glitch would mean a loss of the table.
>>
>> Cheers, Gidon
>>
>>
>> On Wed, Nov 5, 2025 at 11:38 PM Yufei Gu <[email protected]> wrote:
>>
>>> Thanks Gidon for raising this! It's great that we start to think through
>>> REST API support for encryption. We have been asked to support Encryption
>>> in the Polaris community multiple times.
>>>
>>> For metadata.json integrity, the REST catalog can add a checksum to the
>>> metadata.json file at the commit time and validate it while loading it
>>> back. There are certain use cases where engines may still load tables
>>> directly from storage even when IRC is used for committing. It seems like a
>>> loophole, but IRC couldn't really do anything about it. It's probably the
>>> system admin's responsibility to take care of it.
>>>
>>> For metadata.json confidentiality, I thought the metadata.json itself is
>>> encrypted as well, no?
>>>
>>> Yufei
>>>
>>>
>>> On Wed, Nov 5, 2025 at 12:28 AM Gidon Gershinsky <[email protected]>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> The REST catalog server implementations that keep the table metadata in
>>>> a json file in an untrusted storage, are not safe for table encryption [1].
>>>> The data confidentiality and integrity can be broken by malicious
>>>> modifications of the metadata.json.
>>>>
>>>> We propose a short addition to the REST spec [2] that requires
>>>> protection of the metadata integrity in catalog implementations that will
>>>> be used for encrypted tables.
>>>>
>>>> Being a spec add-on, this is brought for a community discussion. All
>>>> comments are welcome.
>>>>
>>>> Thanks,
>>>> Gidon
>>>>
>>>>
>>>>
>>>> [1] thread starting at
>>>> https://github.com/apache/iceberg/pull/13225#discussion_r2465759567
>>>> [2] https://github.com/apache/iceberg/pull/14486
>>>>
>>>

Reply via email to