Hi Kurtis, all, Just a couple cents from the databricks / delta side.
We run an independent end-to-end integrity check on our DML / merge compute path in production. We do see a small but non-zero rate of mismatches at fleet scale that survive all the codec, length, and FileIO validations—they only surface because we have an independent invariant to compare against. ECC, TLS, and codec checks do a lot, but they aren't end-to-end across caches, interconnects, and heterogeneous fleets. So +1 on the proposal, especially with the opt-in framing in your last message—that's the right shape. Operators who have observed integrity issues (or who run on storage / network paths where the cloud-store guarantees don't fully apply) get to turn it on; everyone else pays nothing. Format-layer checksums and engine-layer checks are complementary, not redundant. + @Marco Kroll <[email protected]> here, who can go on empirical side deeper if this interests you Best, Andrei On Thu, May 7, 2026 at 10:57 PM Kurtis Wright <[email protected]> wrote: > Hi Daniel & Steve, > > I appreciate the feedback. I believe my response to Russell's initial > comment portrayed my stance as indexing more than I intended into the > security aspect. I am coming from the mental model of checksumming as > a durability, integrity, and correctness tool primarily with some debatable > potential security added benefits. > > I want to really +1 the CRC32C and opcode point Steve made as it is a big > reason why checksum calculation performance hit is lessened on > modern systems, though not eliminated. > > What I am taking away, hopefully not incorrectly, is that I should submit > a formal proposal document to continue this discussion with more structure > even if this idea has a high bar to clear in terms of proving its > ubiquitous usefulness. > > I would also be remiss to not mention early and often, I am hoping for > this to be an optional Table and/or IRC feature not something > defaulted/imposed on all customers *IF* implemented. > > On Thu, May 7, 2026 at 1:15 PM Daniel Weeks <[email protected]> > wrote: > >> I don't feel "security" is the right approach to justify adding checksums >> to file entries in metadata. This may just be the wording being used, but >> "integrity" is probably closer to what we're trying to communicate. >> >> However, this distinction is partially what keeps me from thinking >> introducing checksums is helpful. We currently track location and length >> and implementations should always use unique paths and never overwrite >> existing paths. It is highly unlikely that a bit flip would manifest in a >> way that keeps the data consumable. The existing compressions, encodings, >> and validations make all of the random scenarios incredibly unlikely to be >> anything but transitory. I've seen many cases where data was corrupted at >> the hardware/engine layer, but never needed a checksum in the read path to >> identify that. The FileIO implementations perform checks on production of >> data, so the write path is reasonably covered. >> >> That leaves us with the "security" aspect, which implies some sort of >> malicious intent. However, if someone can craft a file that meets the >> length and location requirements, they could also update the metadata >> reference and checksum. This isn't a security feature and leads to more >> "security theater" than actual security. >> >> I don't think it adds value beyond the existing checks and validation >> performed at the FileIO layer. So while it seems like an improvement, it >> just adds unnecessary complexity. >> >> -Dan >> >> On Thu, May 7, 2026 at 11:50 AM Kurtis Wright <[email protected]> >> wrote: >> >>> Hi Russell, >>> >>> Thank you for the quick response. I think the security use case is a >>> great example. I initially think of the security use case as relevant to >>> the Bracketing concept in a Client to remote Server Side IRC setting. >>> Essentially validating that what the Client sent didn't get intercepted and >>> changed over the wire. The durability and integrity checks are awesome >>> because it can give confidence that no matter if the storage/network >>> solution is a cloud provider or a self-hosted storage system (like CEPH or >>> others) you have protections against bit rot, cosmic ray >>> <https://en.wikipedia.org/wiki/Single-event_upset> caused bit flip, >>> file corruption, network errors, and more. >>> >>> On Thu, May 7, 2026 at 11:23 AM Russell Spitzer < >>> [email protected]> wrote: >>> >>>> The last time we discussed this was in conjunction with encryption. The >>>> consideration would be to add something like that as additional security >>>> against file tampering. Every entry would essentially have it's key as well >>>> as additional bytes to confirm that the contents were as expected. >>>> >>>> On Thu, May 7, 2026 at 1:08 PM Kurtis Wright <[email protected]> >>>> wrote: >>>> >>>>> Hi Everyone, >>>>> >>>>> Kurtis from S3Tables, S3 utilizes checksums >>>>> <https://en.wikipedia.org/wiki/Checksum> for durability and >>>>> correctness >>>>> <https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html>. >>>>> I see that S3 & GCS clients utilize checksumming, but after searching >>>>> through both the Java implementation and the mail list (just going back a >>>>> few months) I couldn't find any reference to something in the spec. I >>>>> started writing a proposal for adding checksums for durability and >>>>> correctness at a few different layers of Iceberg, but before I complete a >>>>> proposal I wanted to check with the community to gauge interest in the >>>>> concepts and hopefully have some initial feedback. >>>>> >>>>> The layers of Iceberg I am considering are: >>>>> >>>>> 1. At rest/storage in the file layer (metadata.json, manifest >>>>> layer, data file layer) >>>>> 2. Bracketing in the Catalog >>>>> 3. Maybe during compaction operations (unsure exactly how this >>>>> would work) >>>>> >>>>> Please let me know if there were considerations that we denied or grew >>>>> stale in the past. I would really appreciate reading more on what the >>>>> community has considered already and learn from that. Otherwise if you >>>>> think this is cool and want to talk or just plus one please reach out. >>>>> >>>>> -- >>>>> Thank You, >>>>> Kurtis C. Wright >>>>> >>>> >>> >>> -- >>> Thank You and Cheers, >>> Kurtis C. Wright >>> >>
