Hi Yufei, Makes sense. While many other catalogs use an independent DB, not all of them do. We don't have to modify the REST spec, let's add these requirements in a different doc (probably the encryption.md I'm working on).
Cheers, Gidon On Sun, Nov 16, 2025 at 12:30 AM Yufei Gu <[email protected]> wrote: > Hi Gidon, sorry for the late reply. > > > Yep, that'd work, as long as the checksum is kept in a trusted > independent storage/db. > > Then I guess both catalog clients and servers would need access to the > trusted storage of the checksums. > > Thanks for chiming in. These two lines actually change how I think about > table metadata.json protection. I’m leaning toward the conclusion that we > don’t need to add extra messages to the REST spec. A few reasons: > 1. REST catalog isn't fundamentally different from other catalogs (HMS, > Hadoop) in terms of table metadata.json security boundary. > 2. The tamper-proof requirement should be exactly the same across > different types of catalogs. > 3. Each IRC impl. can still choose to add extra protections like the > checksum I proposed. > 4. And longer-term, if we ever manage to remove the requirement that table > metadata.json must live in storage as a file, then we could revisit the > spec and add more targeted guarantees at the API layer. > > > Yufei > > > On Thu, Nov 6, 2025 at 4:34 AM Gidon Gershinsky <[email protected]> wrote: > >> Hi Yufei, thank you. >> >> I'll start with saying - if the main storage is tamper-proof, then there >> is no problem and no extra requirements for REST catalogs. >> The rest of the mail refers to the scenarios where the main storage is >> not tamper-proof. >> >> > For metadata.json integrity, the REST catalog can add a checksum to the >> metadata.json file at the commit time and validate it while loading it back >> >> Yep, that'd work, as long as the checksum is kept in a trusted >> independent storage/db. >> >> > There are certain use cases where engines may still load tables >> directly from storage even when IRC is used for committing. >> >> Then I guess both catalog clients and servers would need access to the >> trusted storage of the checksums. >> >> > It seems like a loophole, but IRC couldn't really do anything about it. >> It's probably the system admin's responsibility to take care of it. >> >> Ok. As a baseline protection, the REST spec addition patch explicitly >> states what is required of a catalog implementation/deployment to prevent a >> compromise of encrypted tables. Maybe some IRC implementations will handle >> this requirement (fully or partially) for untrusted main storage backends. >> But I agree - eventually, it is the admin's responsibility to make sure the >> requirement is handled; e.g. by choosing a tamper-proof main storage >> backend, or by deploying an independent storage / db for the metadata or >> its checksums. >> >> > For metadata.json confidentiality, I thought the metadata.json itself >> is encrypted as well, no? >> >> By broken confidentially, I meant leaks in the data files and >> manifest/list files. They are obviously confidential (values and stats). In >> the community discussion on encryption, we've decided not to encrypt the >> metadata.json file, for two reasons: metadata fields don't have >> confidential info, and a loss of the metadata encryption key due to a >> catalog glitch would mean a loss of the table. >> >> Cheers, Gidon >> >> >> On Wed, Nov 5, 2025 at 11:38 PM Yufei Gu <[email protected]> wrote: >> >>> Thanks Gidon for raising this! It's great that we start to think through >>> REST API support for encryption. We have been asked to support Encryption >>> in the Polaris community multiple times. >>> >>> For metadata.json integrity, the REST catalog can add a checksum to the >>> metadata.json file at the commit time and validate it while loading it >>> back. There are certain use cases where engines may still load tables >>> directly from storage even when IRC is used for committing. It seems like a >>> loophole, but IRC couldn't really do anything about it. It's probably the >>> system admin's responsibility to take care of it. >>> >>> For metadata.json confidentiality, I thought the metadata.json itself is >>> encrypted as well, no? >>> >>> Yufei >>> >>> >>> On Wed, Nov 5, 2025 at 12:28 AM Gidon Gershinsky <[email protected]> >>> wrote: >>> >>>> Hi all, >>>> >>>> The REST catalog server implementations that keep the table metadata in >>>> a json file in an untrusted storage, are not safe for table encryption [1]. >>>> The data confidentiality and integrity can be broken by malicious >>>> modifications of the metadata.json. >>>> >>>> We propose a short addition to the REST spec [2] that requires >>>> protection of the metadata integrity in catalog implementations that will >>>> be used for encrypted tables. >>>> >>>> Being a spec add-on, this is brought for a community discussion. All >>>> comments are welcome. >>>> >>>> Thanks, >>>> Gidon >>>> >>>> >>>> >>>> [1] thread starting at >>>> https://github.com/apache/iceberg/pull/13225#discussion_r2465759567 >>>> [2] https://github.com/apache/iceberg/pull/14486 >>>> >>>
