Hi Folks, As mentioned in the thread, here is the CIP22: Encryption at Rest for Celeborn Shuffle Data <https://docs.google.com/document/d/1xBrLtpb8bk8CdJENiM3aLJCbFThRKLor8uAjSsfXoxE/edit?usp=sharing> . Pls let me know if there are any concerns, and feel free to comment on the doc itself or reply in this thread.
Otherwise, I will plan to start a vote thread early next week. Thanks, Aravind On Mon, Apr 20, 2026 at 6:07 PM Karthik Prabhakar <[email protected]> wrote: > Hi all, > > Great to see the convergence; client-side encryption is a clean approach > and aligns well with the original intent. Looking forward to the CIP > thread, Aravind. Happy to contribute. > > - Karthik > > On Mon, Apr 20, 2026 at 1:17 PM Aravind Patnam <[email protected]> > wrote: > > > Let me start a thread for a CIP later this week. > > > > Aravind K. Patnam > > > > > > On Mon, Apr 20, 2026 at 8:37 AM Mridul Muralidharan <[email protected]> > > wrote: > > > > > Hi, > > > > > > This is pretty much what Aravind has implemented internally :-) > > > > > > Regards, > > > Mridul > > > > > > On Mon, Apr 20, 2026 at 8:34 AM rexxiong <[email protected]> wrote: > > > > > > > Hi Karthik, > > > > > > > > Thanks for the thorough proposal. I'd like to suggest an alternative > > > > approach: client-side encryption, where encryption/decryption happens > > > > entirely in the Spark > > > > client — similar to how compression works today. Celeborn workers and > > > > master would only ever see ciphertext. > > > > > > > > Rationale: > > > > > > > > 1. Simpler architecture — no worker write/read path changes. Celeborn > > > > remains a stateless byte pipe, just like it is for compression. > > > > 2. Stronger security — Plaintext and encryption keys never reach > > workers > > > or > > > > master. The trust boundary stays within the client, eliminating the > > need > > > > for KMS > > > > credentials on the server side. > > > > 3. No sendfile regression — Since workers store ciphertext natively, > > > > CELEBORN-2301's zero-copy sendfile works unchanged for all workloads, > > > > encrypted or not. > > > > 4. Aligns with Spark — This naturally respects > > > spark.io.encryption.enabled > > > > and we can reuse Spark's existing key distribution via > IOEncryptionKey. > > > > > > > > Happy to discuss further. > > > > > > > > Regards, > > > > Jiashu Xiong > > > > > > > > Aravind Patnam <[email protected]> 于2026年4月20日周一 05:08写道: > > > > > > > > > Hi, > > > > > > > > > > We already have EAR for celeborn shuffle data internally at > LinkedIn, > > > > where > > > > > we have added this support to respect the existing > > > > > spark.io.encryption.enabled config in Spark on the client side. > > > > > > > > > > I am happy to contribute this back and start a CIP for this next > > week. > > > > > > > > > > Thanks, > > > > > Aravind > > > > > > > > > > > > > > > > > > > > On Sat, Apr 18, 2026 at 10:24 PM Karthik Prabhakar < > > > > [email protected]> > > > > > wrote: > > > > > > > > > > > Hi dev@, > > > > > > > > > > > > I’d like to propose adding at-rest encryption for shuffle data in > > > > > Celeborn > > > > > > and would appreciate the community’s input before writing a full > > > > > > implementation. > > > > > > cURRENT gap > > > > > > > > > > > > Celeborn encrypts data in transit (TLS, SASL) but not at rest. > > When a > > > > > > worker flushes shuffle data to local disk, HDFS, S3, or OSS, the > > > bytes > > > > > land > > > > > > as plaintext. > > > > > > > > > > > > The only write site for local disk is LocalFlushTask.flush() in > > > > > > FlushTask.scala (L66, L71 at commit a56f69a), which calls > > > > > > fileChannel.write(buffer) with no cipher transform. The > > > tiered-storage > > > > > > paths (HdfsFlushTask, S3FlushTask, OssFlushTask) are the same — > raw > > > > bytes > > > > > > to the underlying store. > > > > > > > > > > > > Verified with: > > > > > > > > > > > > grep -rnE 'cipher|\.encrypt|aes|envelope' worker/src/main/ > > > > > > grep -rn 'javax\.crypto' worker/src/main/ > > > > > > (both zero matches) > > > > > > > > > > > > This matters because spark.io.encryption.enabled does *not* cover > > the > > > > > > Celeborn path. When Celeborn’s ShuffleManager replaces Spark’s > > > shuffle > > > > > > writer, Spark’s encryption key is never consulted — confirmed by > > > > grepping > > > > > > client-spark/ for IOEncryptionKey (zero matches). > > > > > > > > > > > > Teams adopting Celeborn for performance silently lose > > > > shuffle-encryption > > > > > > guarantees their compliance posture may assume. > > > > > > Who Needs This > > > > > > > > > > > > - Regulated industries (healthcare, finance, public sector) > > whose > > > > > > auditors require application-layer encryption independent of > > > > > disk/volume > > > > > > encryption. > > > > > > - Multi-tenant platforms needing cryptographic isolation > between > > > > > tenants > > > > > > on shared workers. > > > > > > - Teams using object-store tiering who want encryption before > > > > offload. > > > > > > > > > > > > Proposed Approach (High Level) > > > > > > > > > > > > 1. A *StreamCipher SPI* in common/ for wrapping > > > WritableByteChannel > > > > / > > > > > > ReadableByteChannel with encrypt/decrypt. No KMS SDK in core. > > > > > > 2. A *KeyService SPI* for envelope encryption — > generate/unwrap > > > DEKs > > > > > > using a KMS-held KEK. Implementations live in separate > optional > > > > > modules > > > > > > ( > > > > > > aws-kms, gcp-kms, azure-kv, vault, static for dev/PoC). > > > > > > 3. Wire into the worker write path: LocalFlushTask wraps > > > fileChannel > > > > > > with StreamCipher.wrapForWrite(). Same for HDFS/S3/OSS flush > > > tasks. > > > > > > 4. Wire into the reader path: LocalPartitionDataReader > detects a > > > > > 16-byte > > > > > > encrypted-file header, unwraps the DEK (cached per > > > worker+shuffle), > > > > > > wraps > > > > > > the channel with StreamCipher.wrapForRead(). > > > > > > 5. Opt-in via celeborn.shuffle.io.encryption.enabled=true. > > > Default > > > > > off. > > > > > > Unencrypted deployments are byte-identical to today, zero > > > overhead. > > > > > > 6. Per-shuffle DEKs by default (one KMS call per shuffle > > > > reservation, > > > > > > amortized). Per-application DEK scope as an option. > > > > > > > > > > > > Interaction with Recent Work > > > > > > > > > > > > CELEBORN-2301 (commit 95419e1) recently landed enhanced zero-copy > > > > > sendfile > > > > > > for FileRegion on native transports — a nice throughput win for > the > > > > fetch > > > > > > path. > > > > > > > > > > > > Encryption and sendfile are fundamentally incompatible: > sendfile(2) > > > > > cannot > > > > > > transform bytes, so encrypted partitions must use a buffered read > > > path. > > > > > > This is only relevant for encrypted workloads; unencrypted > > workloads > > > on > > > > > the > > > > > > same cluster keep the full CELEBORN-2301 benefit. Per-application > > > > > > encryption flags (not per-cluster) would let encrypted and > > > unencrypted > > > > > apps > > > > > > coexist without regressing the latter. > > > > > > Questions for the Community > > > > > > > > > > > > Trimming to three since these are the ones I’d need opinions on > > > before > > > > > > writing code. Happy to take the rest up in follow-ups. > > > > > > > > > > > > - Any prior design work or internal discussion on this topic I > > > > should > > > > > > know about before proceeding? > > > > > > - *Per-shuffle vs. per-application DEK scope* as the default? > > > > > > Per-shuffle gives smaller blast radius and simpler lifecycle; > > > > > > per-application amortizes KMS round-trips and is friendlier > for > > > > > > long-running jobs. > > > > > > - *Key distribution path:* wrapped DEKs flow through Master > > > metadata > > > > > > (simpler, one KMS-aware role) vs. workers unwrap directly from > > KMS > > > > > > (removes > > > > > > Master from the key path, but every worker needs KMS > > credentials). > > > > > > Preference? > > > > > > > > > > > > Tracking > > > > > > > > > > > > JIRA: CELEBORN-2311 < > > > > https://issues.apache.org/jira/browse/CELEBORN-2311 > > > > > > > > > > > > > > > > > > I have a detailed design document with source citations, threat > > > model, > > > > > > performance analysis, and phased implementation plan. Happy to > > share > > > > > > on-list or off-list if there’s interest. > > > > > > > > > > > > - Karthik > > > > > > > > > > > > > > > > > > > > > -- > > > > > Aravind K. Patnam > > > > > > > > > > > > > > > > > -- > Thanks, > Karthik > -- Aravind K. Patnam
