Hi all,

Thank you both for the discussion, to Gidon and others for driving this
work, and to everyone for the reviews on the REST encryption PR [1].

A few thoughts (and sorry for the late response):

1. I'm unsure about currently suggesting that catalogs store metadata JSONs
in, say, a database as opposed to object storage. For IRCs, the
`metadata-location` field in the spec comes to mind [2] - IRCs that still
implement such an approach may encounter client issues (e.g. [3]). I'm
unsure whether the larger discussion of relaxing the storage constraint
needs to happen for encryption.
2. Thank you for the encryption doc [4], Gidon, left comments - I think it
would be nice to understand the security requirements for catalogs that
store metadata JSONs in the same object storage as data. (If that involves
IRC implementation changes, might an accompanying REST spec change still
make sense? Not too opinionated)
3. On metadata JSON encryption: it would be nice to reduce catalog
requirements to storing that key metadata, to mitigate the loophole that
Yufei highlights of clients reading metadata from storage, and to perhaps
make us less reliant on a metadata JSON not containing sensitive
information in the future (to some, it already does, with the table's
schema). The table loss counterargument does feel strong though.

Thanks,
Sreesh

[1] https://github.com/apache/iceberg/pull/13225
[2]
https://github.com/apache/iceberg/blob/cd8d2a3345cb387f1d735763ff3914ac4e2617e2/open-api/rest-catalog-open-api.yaml#L3355-L3356
,
[3]
https://github.com/apache/iceberg/blob/cd8d2a3345cb387f1d735763ff3914ac4e2617e2/core/src/main/java/org/apache/iceberg/rest/RESTTableOperations.java#L249
[4] https://github.com/apache/iceberg/pull/14621

On Mon, Nov 17, 2025 at 9:05 AM Gidon Gershinsky <[email protected]> wrote:

> Hi Yufei,
>
> Makes sense. While many other catalogs use an independent DB, not all of
> them do. We don't have to modify the REST spec, let's add these
> requirements in a different doc (probably the encryption.md I'm working on).
>
> Cheers, Gidon
>
>
> On Sun, Nov 16, 2025 at 12:30 AM Yufei Gu <[email protected]> wrote:
>
>> Hi Gidon, sorry for the late reply.
>>
>> > Yep, that'd work, as long as the checksum is kept in a trusted
>> independent storage/db.
>> > Then I guess both catalog clients and servers would need access to the
>> trusted storage of the checksums.
>>
>> Thanks for chiming in. These two lines actually change how I think about
>> table metadata.json protection. I’m leaning toward the conclusion that we
>> don’t need to add extra messages to the REST spec. A few reasons:
>> 1. REST catalog isn't fundamentally different from other catalogs (HMS,
>> Hadoop) in terms of table metadata.json security boundary.
>> 2. The tamper-proof requirement should be exactly the same across
>> different types of catalogs.
>> 3. Each IRC impl. can still choose to add extra protections like the
>> checksum I proposed.
>> 4. And longer-term, if we ever manage to remove the requirement that
>> table metadata.json must live in storage as a file, then we could revisit
>> the spec and add more targeted guarantees at the API layer.
>>
>>
>> Yufei
>>
>>
>> On Thu, Nov 6, 2025 at 4:34 AM Gidon Gershinsky <[email protected]> wrote:
>>
>>> Hi Yufei, thank you.
>>>
>>> I'll start with saying - if the main storage is tamper-proof, then there
>>> is no problem and no extra requirements for REST catalogs.
>>> The rest of the mail refers to the scenarios where the main storage is
>>> not tamper-proof.
>>>
>>> > For metadata.json integrity, the REST catalog can add a checksum to
>>> the metadata.json file at the commit time and validate it while loading it
>>> back
>>>
>>> Yep, that'd work, as long as the checksum is kept in a trusted
>>> independent storage/db.
>>>
>>> > There are certain use cases where engines may still load tables
>>> directly from storage even when IRC is used for committing.
>>>
>>> Then I guess both catalog clients and servers would need access to the
>>> trusted storage of the checksums.
>>>
>>> > It seems like a loophole, but IRC couldn't really do anything about
>>> it. It's probably the system admin's responsibility to take care of it.
>>>
>>> Ok. As a baseline protection, the REST spec addition patch explicitly
>>> states what is required of a catalog implementation/deployment to prevent a
>>> compromise of encrypted tables. Maybe some IRC implementations will handle
>>> this requirement (fully or partially) for untrusted main storage backends.
>>> But I agree - eventually, it is the admin's responsibility to make sure the
>>> requirement is handled; e.g. by choosing a tamper-proof main storage
>>> backend, or by deploying an independent storage / db for the metadata or
>>> its checksums.
>>>
>>> > For metadata.json confidentiality, I thought the metadata.json itself
>>> is encrypted as well, no?
>>>
>>> By broken confidentially, I meant leaks in the data files and
>>> manifest/list files. They are obviously confidential (values and stats). In
>>> the community discussion on encryption, we've decided not to encrypt the
>>> metadata.json file, for two reasons: metadata fields don't have
>>> confidential info, and a loss of the metadata encryption key due to a
>>> catalog glitch would mean a loss of the table.
>>>
>>> Cheers, Gidon
>>>
>>>
>>> On Wed, Nov 5, 2025 at 11:38 PM Yufei Gu <[email protected]> wrote:
>>>
>>>> Thanks Gidon for raising this! It's great that we start to think
>>>> through REST API support for encryption. We have been asked to
>>>> support Encryption in the Polaris community multiple times.
>>>>
>>>> For metadata.json integrity, the REST catalog can add a checksum to the
>>>> metadata.json file at the commit time and validate it while loading it
>>>> back. There are certain use cases where engines may still load tables
>>>> directly from storage even when IRC is used for committing. It seems like a
>>>> loophole, but IRC couldn't really do anything about it. It's probably the
>>>> system admin's responsibility to take care of it.
>>>>
>>>> For metadata.json confidentiality, I thought the metadata.json itself
>>>> is encrypted as well, no?
>>>>
>>>> Yufei
>>>>
>>>>
>>>> On Wed, Nov 5, 2025 at 12:28 AM Gidon Gershinsky <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> The REST catalog server implementations that keep the table metadata
>>>>> in a json file in an untrusted storage, are not safe for table encryption
>>>>> [1]. The data confidentiality and integrity can be broken by malicious
>>>>> modifications of the metadata.json.
>>>>>
>>>>> We propose a short addition to the REST spec [2] that requires
>>>>> protection of the metadata integrity in catalog implementations that will
>>>>> be used for encrypted tables.
>>>>>
>>>>> Being a spec add-on, this is brought for a community discussion. All
>>>>> comments are welcome.
>>>>>
>>>>> Thanks,
>>>>> Gidon
>>>>>
>>>>>
>>>>>
>>>>> [1] thread starting at
>>>>> https://github.com/apache/iceberg/pull/13225#discussion_r2465759567
>>>>> [2] https://github.com/apache/iceberg/pull/14486
>>>>>
>>>>

Reply via email to