Hi Yufei, thank you.

I'll start with saying - if the main storage is tamper-proof, then there is
no problem and no extra requirements for REST catalogs.
The rest of the mail refers to the scenarios where the main storage is not
tamper-proof.

> For metadata.json integrity, the REST catalog can add a checksum to the
metadata.json file at the commit time and validate it while loading it back

Yep, that'd work, as long as the checksum is kept in a trusted independent
storage/db.

> There are certain use cases where engines may still load tables directly
from storage even when IRC is used for committing.

Then I guess both catalog clients and servers would need access to the
trusted storage of the checksums.

> It seems like a loophole, but IRC couldn't really do anything about it.
It's probably the system admin's responsibility to take care of it.

Ok. As a baseline protection, the REST spec addition patch explicitly
states what is required of a catalog implementation/deployment to prevent a
compromise of encrypted tables. Maybe some IRC implementations will handle
this requirement (fully or partially) for untrusted main storage backends.
But I agree - eventually, it is the admin's responsibility to make sure the
requirement is handled; e.g. by choosing a tamper-proof main storage
backend, or by deploying an independent storage / db for the metadata or
its checksums.

> For metadata.json confidentiality, I thought the metadata.json itself is
encrypted as well, no?

By broken confidentially, I meant leaks in the data files and manifest/list
files. They are obviously confidential (values and stats). In the community
discussion on encryption, we've decided not to encrypt the metadata.json
file, for two reasons: metadata fields don't have confidential info, and a
loss of the metadata encryption key due to a catalog glitch would mean a
loss of the table.

Cheers, Gidon


On Wed, Nov 5, 2025 at 11:38 PM Yufei Gu <[email protected]> wrote:

> Thanks Gidon for raising this! It's great that we start to think through
> REST API support for encryption. We have been asked to support Encryption
> in the Polaris community multiple times.
>
> For metadata.json integrity, the REST catalog can add a checksum to the
> metadata.json file at the commit time and validate it while loading it
> back. There are certain use cases where engines may still load tables
> directly from storage even when IRC is used for committing. It seems like a
> loophole, but IRC couldn't really do anything about it. It's probably the
> system admin's responsibility to take care of it.
>
> For metadata.json confidentiality, I thought the metadata.json itself is
> encrypted as well, no?
>
> Yufei
>
>
> On Wed, Nov 5, 2025 at 12:28 AM Gidon Gershinsky <[email protected]> wrote:
>
>> Hi all,
>>
>> The REST catalog server implementations that keep the table metadata in a
>> json file in an untrusted storage, are not safe for table encryption [1].
>> The data confidentiality and integrity can be broken by malicious
>> modifications of the metadata.json.
>>
>> We propose a short addition to the REST spec [2] that requires protection
>> of the metadata integrity in catalog implementations that will be used for
>> encrypted tables.
>>
>> Being a spec add-on, this is brought for a community discussion. All
>> comments are welcome.
>>
>> Thanks,
>> Gidon
>>
>>
>>
>> [1] thread starting at
>> https://github.com/apache/iceberg/pull/13225#discussion_r2465759567
>> [2] https://github.com/apache/iceberg/pull/14486
>>
>

Reply via email to