Hi Everyone,

  Kurtis from S3Tables, S3 utilizes checksums
<https://en.wikipedia.org/wiki/Checksum> for durability and correctness
<https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html>.
I see that S3 & GCS clients utilize checksumming, but after searching
through both the Java implementation and the mail list (just going back a
few months) I couldn't find any reference to something in the spec. I
started writing a proposal for adding checksums for durability and
correctness at a few different layers of Iceberg, but before I complete a
proposal I wanted to check with the community to gauge interest in the
concepts and hopefully have some initial feedback.

The layers of Iceberg I am considering are:

   1. At rest/storage in the file layer (metadata.json, manifest layer,
   data file layer)
   2. Bracketing in the Catalog
   3. Maybe during compaction operations (unsure exactly how this would
   work)

Please let me know if there were considerations that we denied or grew
stale in the past. I would really appreciate reading more on what the
community has considered already and learn from that. Otherwise if you
think this is cool and want to talk or just plus one please reach out.

-- 
Thank You,
Kurtis C. Wright

Reply via email to