Hey folks, If there is no further discussion or alignment needed, I will start a vote thread next week. I am out on vacation this week.
Thanks, Aravind Aravind K. Patnam On Wed, Apr 22, 2026 at 1:59 PM Aravind Patnam <[email protected]> wrote: > Hi Folks, > > As mentioned in the thread, here is the CIP22: Encryption at Rest for > Celeborn Shuffle Data > <https://docs.google.com/document/d/1xBrLtpb8bk8CdJENiM3aLJCbFThRKLor8uAjSsfXoxE/edit?usp=sharing> > . > Pls let me know if there are any concerns, and feel free to comment on the > doc itself or reply in this thread. > > Otherwise, I will plan to start a vote thread early next week. > > > Thanks, > Aravind > > On Mon, Apr 20, 2026 at 6:07 PM Karthik Prabhakar <[email protected]> > wrote: > >> Hi all, >> >> Great to see the convergence; client-side encryption is a clean approach >> and aligns well with the original intent. Looking forward to the CIP >> thread, Aravind. Happy to contribute. >> >> - Karthik >> >> On Mon, Apr 20, 2026 at 1:17 PM Aravind Patnam <[email protected]> >> wrote: >> >> > Let me start a thread for a CIP later this week. >> > >> > Aravind K. Patnam >> > >> > >> > On Mon, Apr 20, 2026 at 8:37 AM Mridul Muralidharan <[email protected]> >> > wrote: >> > >> > > Hi, >> > > >> > > This is pretty much what Aravind has implemented internally :-) >> > > >> > > Regards, >> > > Mridul >> > > >> > > On Mon, Apr 20, 2026 at 8:34 AM rexxiong <[email protected]> wrote: >> > > >> > > > Hi Karthik, >> > > > >> > > > Thanks for the thorough proposal. I'd like to suggest an alternative >> > > > approach: client-side encryption, where encryption/decryption >> happens >> > > > entirely in the Spark >> > > > client — similar to how compression works today. Celeborn workers >> and >> > > > master would only ever see ciphertext. >> > > > >> > > > Rationale: >> > > > >> > > > 1. Simpler architecture — no worker write/read path changes. >> Celeborn >> > > > remains a stateless byte pipe, just like it is for compression. >> > > > 2. Stronger security — Plaintext and encryption keys never reach >> > workers >> > > or >> > > > master. The trust boundary stays within the client, eliminating the >> > need >> > > > for KMS >> > > > credentials on the server side. >> > > > 3. No sendfile regression — Since workers store ciphertext natively, >> > > > CELEBORN-2301's zero-copy sendfile works unchanged for all >> workloads, >> > > > encrypted or not. >> > > > 4. Aligns with Spark — This naturally respects >> > > spark.io.encryption.enabled >> > > > and we can reuse Spark's existing key distribution via >> IOEncryptionKey. >> > > > >> > > > Happy to discuss further. >> > > > >> > > > Regards, >> > > > Jiashu Xiong >> > > > >> > > > Aravind Patnam <[email protected]> 于2026年4月20日周一 05:08写道: >> > > > >> > > > > Hi, >> > > > > >> > > > > We already have EAR for celeborn shuffle data internally at >> LinkedIn, >> > > > where >> > > > > we have added this support to respect the existing >> > > > > spark.io.encryption.enabled config in Spark on the client side. >> > > > > >> > > > > I am happy to contribute this back and start a CIP for this next >> > week. >> > > > > >> > > > > Thanks, >> > > > > Aravind >> > > > > >> > > > > >> > > > > >> > > > > On Sat, Apr 18, 2026 at 10:24 PM Karthik Prabhakar < >> > > > [email protected]> >> > > > > wrote: >> > > > > >> > > > > > Hi dev@, >> > > > > > >> > > > > > I’d like to propose adding at-rest encryption for shuffle data >> in >> > > > > Celeborn >> > > > > > and would appreciate the community’s input before writing a full >> > > > > > implementation. >> > > > > > cURRENT gap >> > > > > > >> > > > > > Celeborn encrypts data in transit (TLS, SASL) but not at rest. >> > When a >> > > > > > worker flushes shuffle data to local disk, HDFS, S3, or OSS, the >> > > bytes >> > > > > land >> > > > > > as plaintext. >> > > > > > >> > > > > > The only write site for local disk is LocalFlushTask.flush() in >> > > > > > FlushTask.scala (L66, L71 at commit a56f69a), which calls >> > > > > > fileChannel.write(buffer) with no cipher transform. The >> > > tiered-storage >> > > > > > paths (HdfsFlushTask, S3FlushTask, OssFlushTask) are the same — >> raw >> > > > bytes >> > > > > > to the underlying store. >> > > > > > >> > > > > > Verified with: >> > > > > > >> > > > > > grep -rnE 'cipher|\.encrypt|aes|envelope' worker/src/main/ >> > > > > > grep -rn 'javax\.crypto' worker/src/main/ >> > > > > > (both zero matches) >> > > > > > >> > > > > > This matters because spark.io.encryption.enabled does *not* >> cover >> > the >> > > > > > Celeborn path. When Celeborn’s ShuffleManager replaces Spark’s >> > > shuffle >> > > > > > writer, Spark’s encryption key is never consulted — confirmed by >> > > > grepping >> > > > > > client-spark/ for IOEncryptionKey (zero matches). >> > > > > > >> > > > > > Teams adopting Celeborn for performance silently lose >> > > > shuffle-encryption >> > > > > > guarantees their compliance posture may assume. >> > > > > > Who Needs This >> > > > > > >> > > > > > - Regulated industries (healthcare, finance, public sector) >> > whose >> > > > > > auditors require application-layer encryption independent of >> > > > > disk/volume >> > > > > > encryption. >> > > > > > - Multi-tenant platforms needing cryptographic isolation >> between >> > > > > tenants >> > > > > > on shared workers. >> > > > > > - Teams using object-store tiering who want encryption before >> > > > offload. >> > > > > > >> > > > > > Proposed Approach (High Level) >> > > > > > >> > > > > > 1. A *StreamCipher SPI* in common/ for wrapping >> > > WritableByteChannel >> > > > / >> > > > > > ReadableByteChannel with encrypt/decrypt. No KMS SDK in core. >> > > > > > 2. A *KeyService SPI* for envelope encryption — >> generate/unwrap >> > > DEKs >> > > > > > using a KMS-held KEK. Implementations live in separate >> optional >> > > > > modules >> > > > > > ( >> > > > > > aws-kms, gcp-kms, azure-kv, vault, static for dev/PoC). >> > > > > > 3. Wire into the worker write path: LocalFlushTask wraps >> > > fileChannel >> > > > > > with StreamCipher.wrapForWrite(). Same for HDFS/S3/OSS flush >> > > tasks. >> > > > > > 4. Wire into the reader path: LocalPartitionDataReader >> detects a >> > > > > 16-byte >> > > > > > encrypted-file header, unwraps the DEK (cached per >> > > worker+shuffle), >> > > > > > wraps >> > > > > > the channel with StreamCipher.wrapForRead(). >> > > > > > 5. Opt-in via celeborn.shuffle.io.encryption.enabled=true. >> > > Default >> > > > > off. >> > > > > > Unencrypted deployments are byte-identical to today, zero >> > > overhead. >> > > > > > 6. Per-shuffle DEKs by default (one KMS call per shuffle >> > > > reservation, >> > > > > > amortized). Per-application DEK scope as an option. >> > > > > > >> > > > > > Interaction with Recent Work >> > > > > > >> > > > > > CELEBORN-2301 (commit 95419e1) recently landed enhanced >> zero-copy >> > > > > sendfile >> > > > > > for FileRegion on native transports — a nice throughput win for >> the >> > > > fetch >> > > > > > path. >> > > > > > >> > > > > > Encryption and sendfile are fundamentally incompatible: >> sendfile(2) >> > > > > cannot >> > > > > > transform bytes, so encrypted partitions must use a buffered >> read >> > > path. >> > > > > > This is only relevant for encrypted workloads; unencrypted >> > workloads >> > > on >> > > > > the >> > > > > > same cluster keep the full CELEBORN-2301 benefit. >> Per-application >> > > > > > encryption flags (not per-cluster) would let encrypted and >> > > unencrypted >> > > > > apps >> > > > > > coexist without regressing the latter. >> > > > > > Questions for the Community >> > > > > > >> > > > > > Trimming to three since these are the ones I’d need opinions on >> > > before >> > > > > > writing code. Happy to take the rest up in follow-ups. >> > > > > > >> > > > > > - Any prior design work or internal discussion on this topic >> I >> > > > should >> > > > > > know about before proceeding? >> > > > > > - *Per-shuffle vs. per-application DEK scope* as the default? >> > > > > > Per-shuffle gives smaller blast radius and simpler lifecycle; >> > > > > > per-application amortizes KMS round-trips and is friendlier >> for >> > > > > > long-running jobs. >> > > > > > - *Key distribution path:* wrapped DEKs flow through Master >> > > metadata >> > > > > > (simpler, one KMS-aware role) vs. workers unwrap directly >> from >> > KMS >> > > > > > (removes >> > > > > > Master from the key path, but every worker needs KMS >> > credentials). >> > > > > > Preference? >> > > > > > >> > > > > > Tracking >> > > > > > >> > > > > > JIRA: CELEBORN-2311 < >> > > > https://issues.apache.org/jira/browse/CELEBORN-2311 >> > > > > > >> > > > > > >> > > > > > I have a detailed design document with source citations, threat >> > > model, >> > > > > > performance analysis, and phased implementation plan. Happy to >> > share >> > > > > > on-list or off-list if there’s interest. >> > > > > > >> > > > > > - Karthik >> > > > > > >> > > > > >> > > > > >> > > > > -- >> > > > > Aravind K. Patnam >> > > > > >> > > > >> > > >> > >> >> >> -- >> Thanks, >> Karthik >> > > > -- > Aravind K. Patnam >
