Hi Gidon, sorry for the late reply. > Yep, that'd work, as long as the checksum is kept in a trusted independent storage/db. > Then I guess both catalog clients and servers would need access to the trusted storage of the checksums.
Thanks for chiming in. These two lines actually change how I think about table metadata.json protection. I’m leaning toward the conclusion that we don’t need to add extra messages to the REST spec. A few reasons: 1. REST catalog isn't fundamentally different from other catalogs (HMS, Hadoop) in terms of table metadata.json security boundary. 2. The tamper-proof requirement should be exactly the same across different types of catalogs. 3. Each IRC impl. can still choose to add extra protections like the checksum I proposed. 4. And longer-term, if we ever manage to remove the requirement that table metadata.json must live in storage as a file, then we could revisit the spec and add more targeted guarantees at the API layer. Yufei On Thu, Nov 6, 2025 at 4:34 AM Gidon Gershinsky <[email protected]> wrote: > Hi Yufei, thank you. > > I'll start with saying - if the main storage is tamper-proof, then there > is no problem and no extra requirements for REST catalogs. > The rest of the mail refers to the scenarios where the main storage is not > tamper-proof. > > > For metadata.json integrity, the REST catalog can add a checksum to the > metadata.json file at the commit time and validate it while loading it back > > Yep, that'd work, as long as the checksum is kept in a trusted independent > storage/db. > > > There are certain use cases where engines may still load tables directly > from storage even when IRC is used for committing. > > Then I guess both catalog clients and servers would need access to the > trusted storage of the checksums. > > > It seems like a loophole, but IRC couldn't really do anything about it. > It's probably the system admin's responsibility to take care of it. > > Ok. As a baseline protection, the REST spec addition patch explicitly > states what is required of a catalog implementation/deployment to prevent a > compromise of encrypted tables. Maybe some IRC implementations will handle > this requirement (fully or partially) for untrusted main storage backends. > But I agree - eventually, it is the admin's responsibility to make sure the > requirement is handled; e.g. by choosing a tamper-proof main storage > backend, or by deploying an independent storage / db for the metadata or > its checksums. > > > For metadata.json confidentiality, I thought the metadata.json itself is > encrypted as well, no? > > By broken confidentially, I meant leaks in the data files and > manifest/list files. They are obviously confidential (values and stats). In > the community discussion on encryption, we've decided not to encrypt the > metadata.json file, for two reasons: metadata fields don't have > confidential info, and a loss of the metadata encryption key due to a > catalog glitch would mean a loss of the table. > > Cheers, Gidon > > > On Wed, Nov 5, 2025 at 11:38 PM Yufei Gu <[email protected]> wrote: > >> Thanks Gidon for raising this! It's great that we start to think through >> REST API support for encryption. We have been asked to support Encryption >> in the Polaris community multiple times. >> >> For metadata.json integrity, the REST catalog can add a checksum to the >> metadata.json file at the commit time and validate it while loading it >> back. There are certain use cases where engines may still load tables >> directly from storage even when IRC is used for committing. It seems like a >> loophole, but IRC couldn't really do anything about it. It's probably the >> system admin's responsibility to take care of it. >> >> For metadata.json confidentiality, I thought the metadata.json itself is >> encrypted as well, no? >> >> Yufei >> >> >> On Wed, Nov 5, 2025 at 12:28 AM Gidon Gershinsky <[email protected]> >> wrote: >> >>> Hi all, >>> >>> The REST catalog server implementations that keep the table metadata in >>> a json file in an untrusted storage, are not safe for table encryption [1]. >>> The data confidentiality and integrity can be broken by malicious >>> modifications of the metadata.json. >>> >>> We propose a short addition to the REST spec [2] that requires >>> protection of the metadata integrity in catalog implementations that will >>> be used for encrypted tables. >>> >>> Being a spec add-on, this is brought for a community discussion. All >>> comments are welcome. >>> >>> Thanks, >>> Gidon >>> >>> >>> >>> [1] thread starting at >>> https://github.com/apache/iceberg/pull/13225#discussion_r2465759567 >>> [2] https://github.com/apache/iceberg/pull/14486 >>> >>
