Hi Yufei, thank you. I'll start with saying - if the main storage is tamper-proof, then there is no problem and no extra requirements for REST catalogs. The rest of the mail refers to the scenarios where the main storage is not tamper-proof.
> For metadata.json integrity, the REST catalog can add a checksum to the metadata.json file at the commit time and validate it while loading it back Yep, that'd work, as long as the checksum is kept in a trusted independent storage/db. > There are certain use cases where engines may still load tables directly from storage even when IRC is used for committing. Then I guess both catalog clients and servers would need access to the trusted storage of the checksums. > It seems like a loophole, but IRC couldn't really do anything about it. It's probably the system admin's responsibility to take care of it. Ok. As a baseline protection, the REST spec addition patch explicitly states what is required of a catalog implementation/deployment to prevent a compromise of encrypted tables. Maybe some IRC implementations will handle this requirement (fully or partially) for untrusted main storage backends. But I agree - eventually, it is the admin's responsibility to make sure the requirement is handled; e.g. by choosing a tamper-proof main storage backend, or by deploying an independent storage / db for the metadata or its checksums. > For metadata.json confidentiality, I thought the metadata.json itself is encrypted as well, no? By broken confidentially, I meant leaks in the data files and manifest/list files. They are obviously confidential (values and stats). In the community discussion on encryption, we've decided not to encrypt the metadata.json file, for two reasons: metadata fields don't have confidential info, and a loss of the metadata encryption key due to a catalog glitch would mean a loss of the table. Cheers, Gidon On Wed, Nov 5, 2025 at 11:38 PM Yufei Gu <[email protected]> wrote: > Thanks Gidon for raising this! It's great that we start to think through > REST API support for encryption. We have been asked to support Encryption > in the Polaris community multiple times. > > For metadata.json integrity, the REST catalog can add a checksum to the > metadata.json file at the commit time and validate it while loading it > back. There are certain use cases where engines may still load tables > directly from storage even when IRC is used for committing. It seems like a > loophole, but IRC couldn't really do anything about it. It's probably the > system admin's responsibility to take care of it. > > For metadata.json confidentiality, I thought the metadata.json itself is > encrypted as well, no? > > Yufei > > > On Wed, Nov 5, 2025 at 12:28 AM Gidon Gershinsky <[email protected]> wrote: > >> Hi all, >> >> The REST catalog server implementations that keep the table metadata in a >> json file in an untrusted storage, are not safe for table encryption [1]. >> The data confidentiality and integrity can be broken by malicious >> modifications of the metadata.json. >> >> We propose a short addition to the REST spec [2] that requires protection >> of the metadata integrity in catalog implementations that will be used for >> encrypted tables. >> >> Being a spec add-on, this is brought for a community discussion. All >> comments are welcome. >> >> Thanks, >> Gidon >> >> >> >> [1] thread starting at >> https://github.com/apache/iceberg/pull/13225#discussion_r2465759567 >> [2] https://github.com/apache/iceberg/pull/14486 >> >
