I don't feel "security" is the right approach to justify adding checksums
to file entries in metadata.  This may just be the wording being used, but
"integrity" is probably closer to what we're trying to communicate.

However, this distinction is partially what keeps me from thinking
introducing checksums is helpful.  We currently track location and length
and implementations should always use unique paths and never overwrite
existing paths.  It is highly unlikely that a bit flip would manifest in a
way that keeps the data consumable. The existing compressions, encodings,
and validations make all of the random scenarios incredibly unlikely to be
anything but transitory.  I've seen many cases where data was corrupted at
the hardware/engine layer, but never needed a checksum in the read path to
identify that.  The FileIO implementations perform checks on production of
data, so the write path is reasonably covered.

That leaves us with the "security" aspect, which implies some sort of
malicious intent.  However, if someone can craft a file that meets the
length and location requirements, they could also update the metadata
reference and checksum.  This isn't a security feature and leads to more
"security theater" than actual security.

I don't think it adds value beyond the existing checks and validation
performed at the FileIO layer.  So while it seems like an improvement, it
just adds unnecessary complexity.

-Dan

On Thu, May 7, 2026 at 11:50 AM Kurtis Wright <[email protected]>
wrote:

> Hi Russell,
>
> Thank you for the quick response. I think the security use case is a great
> example. I initially think of the security use case as relevant to the
> Bracketing concept in a Client to remote Server Side IRC setting.
> Essentially validating that what the Client sent didn't get intercepted and
> changed over the wire. The durability and integrity checks are awesome
> because it can give confidence that no matter if the storage/network
> solution is a cloud provider or a self-hosted storage system (like CEPH or
> others) you have protections against bit rot, cosmic ray
> <https://en.wikipedia.org/wiki/Single-event_upset> caused bit flip, file
> corruption, network errors, and more.
>
> On Thu, May 7, 2026 at 11:23 AM Russell Spitzer <[email protected]>
> wrote:
>
>> The last time we discussed this was in conjunction with encryption. The
>> consideration would be to add something like that as additional security
>> against file tampering. Every entry would essentially have it's key as well
>> as additional bytes to confirm that the contents were as expected.
>>
>> On Thu, May 7, 2026 at 1:08 PM Kurtis Wright <[email protected]>
>> wrote:
>>
>>> Hi Everyone,
>>>
>>>   Kurtis from S3Tables, S3 utilizes checksums
>>> <https://en.wikipedia.org/wiki/Checksum> for durability and correctness
>>> <https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html>.
>>> I see that S3 & GCS clients utilize checksumming, but after searching
>>> through both the Java implementation and the mail list (just going back a
>>> few months) I couldn't find any reference to something in the spec. I
>>> started writing a proposal for adding checksums for durability and
>>> correctness at a few different layers of Iceberg, but before I complete a
>>> proposal I wanted to check with the community to gauge interest in the
>>> concepts and hopefully have some initial feedback.
>>>
>>> The layers of Iceberg I am considering are:
>>>
>>>    1. At rest/storage in the file layer (metadata.json, manifest layer,
>>>    data file layer)
>>>    2. Bracketing in the Catalog
>>>    3. Maybe during compaction operations (unsure exactly how this would
>>>    work)
>>>
>>> Please let me know if there were considerations that we denied or grew
>>> stale in the past. I would really appreciate reading more on what the
>>> community has considered already and learn from that. Otherwise if you
>>> think this is cool and want to talk or just plus one please reach out.
>>>
>>> --
>>> Thank You,
>>> Kurtis C. Wright
>>>
>>
>
> --
> Thank You and Cheers,
> Kurtis C. Wright
>

Reply via email to