Let me start a thread for a CIP later this week. Aravind K. Patnam
On Mon, Apr 20, 2026 at 8:37 AM Mridul Muralidharan <[email protected]> wrote: > Hi, > > This is pretty much what Aravind has implemented internally :-) > > Regards, > Mridul > > On Mon, Apr 20, 2026 at 8:34 AM rexxiong <[email protected]> wrote: > > > Hi Karthik, > > > > Thanks for the thorough proposal. I'd like to suggest an alternative > > approach: client-side encryption, where encryption/decryption happens > > entirely in the Spark > > client — similar to how compression works today. Celeborn workers and > > master would only ever see ciphertext. > > > > Rationale: > > > > 1. Simpler architecture — no worker write/read path changes. Celeborn > > remains a stateless byte pipe, just like it is for compression. > > 2. Stronger security — Plaintext and encryption keys never reach workers > or > > master. The trust boundary stays within the client, eliminating the need > > for KMS > > credentials on the server side. > > 3. No sendfile regression — Since workers store ciphertext natively, > > CELEBORN-2301's zero-copy sendfile works unchanged for all workloads, > > encrypted or not. > > 4. Aligns with Spark — This naturally respects > spark.io.encryption.enabled > > and we can reuse Spark's existing key distribution via IOEncryptionKey. > > > > Happy to discuss further. > > > > Regards, > > Jiashu Xiong > > > > Aravind Patnam <[email protected]> 于2026年4月20日周一 05:08写道: > > > > > Hi, > > > > > > We already have EAR for celeborn shuffle data internally at LinkedIn, > > where > > > we have added this support to respect the existing > > > spark.io.encryption.enabled config in Spark on the client side. > > > > > > I am happy to contribute this back and start a CIP for this next week. > > > > > > Thanks, > > > Aravind > > > > > > > > > > > > On Sat, Apr 18, 2026 at 10:24 PM Karthik Prabhakar < > > [email protected]> > > > wrote: > > > > > > > Hi dev@, > > > > > > > > I’d like to propose adding at-rest encryption for shuffle data in > > > Celeborn > > > > and would appreciate the community’s input before writing a full > > > > implementation. > > > > cURRENT gap > > > > > > > > Celeborn encrypts data in transit (TLS, SASL) but not at rest. When a > > > > worker flushes shuffle data to local disk, HDFS, S3, or OSS, the > bytes > > > land > > > > as plaintext. > > > > > > > > The only write site for local disk is LocalFlushTask.flush() in > > > > FlushTask.scala (L66, L71 at commit a56f69a), which calls > > > > fileChannel.write(buffer) with no cipher transform. The > tiered-storage > > > > paths (HdfsFlushTask, S3FlushTask, OssFlushTask) are the same — raw > > bytes > > > > to the underlying store. > > > > > > > > Verified with: > > > > > > > > grep -rnE 'cipher|\.encrypt|aes|envelope' worker/src/main/ > > > > grep -rn 'javax\.crypto' worker/src/main/ > > > > (both zero matches) > > > > > > > > This matters because spark.io.encryption.enabled does *not* cover the > > > > Celeborn path. When Celeborn’s ShuffleManager replaces Spark’s > shuffle > > > > writer, Spark’s encryption key is never consulted — confirmed by > > grepping > > > > client-spark/ for IOEncryptionKey (zero matches). > > > > > > > > Teams adopting Celeborn for performance silently lose > > shuffle-encryption > > > > guarantees their compliance posture may assume. > > > > Who Needs This > > > > > > > > - Regulated industries (healthcare, finance, public sector) whose > > > > auditors require application-layer encryption independent of > > > disk/volume > > > > encryption. > > > > - Multi-tenant platforms needing cryptographic isolation between > > > tenants > > > > on shared workers. > > > > - Teams using object-store tiering who want encryption before > > offload. > > > > > > > > Proposed Approach (High Level) > > > > > > > > 1. A *StreamCipher SPI* in common/ for wrapping > WritableByteChannel > > / > > > > ReadableByteChannel with encrypt/decrypt. No KMS SDK in core. > > > > 2. A *KeyService SPI* for envelope encryption — generate/unwrap > DEKs > > > > using a KMS-held KEK. Implementations live in separate optional > > > modules > > > > ( > > > > aws-kms, gcp-kms, azure-kv, vault, static for dev/PoC). > > > > 3. Wire into the worker write path: LocalFlushTask wraps > fileChannel > > > > with StreamCipher.wrapForWrite(). Same for HDFS/S3/OSS flush > tasks. > > > > 4. Wire into the reader path: LocalPartitionDataReader detects a > > > 16-byte > > > > encrypted-file header, unwraps the DEK (cached per > worker+shuffle), > > > > wraps > > > > the channel with StreamCipher.wrapForRead(). > > > > 5. Opt-in via celeborn.shuffle.io.encryption.enabled=true. > Default > > > off. > > > > Unencrypted deployments are byte-identical to today, zero > overhead. > > > > 6. Per-shuffle DEKs by default (one KMS call per shuffle > > reservation, > > > > amortized). Per-application DEK scope as an option. > > > > > > > > Interaction with Recent Work > > > > > > > > CELEBORN-2301 (commit 95419e1) recently landed enhanced zero-copy > > > sendfile > > > > for FileRegion on native transports — a nice throughput win for the > > fetch > > > > path. > > > > > > > > Encryption and sendfile are fundamentally incompatible: sendfile(2) > > > cannot > > > > transform bytes, so encrypted partitions must use a buffered read > path. > > > > This is only relevant for encrypted workloads; unencrypted workloads > on > > > the > > > > same cluster keep the full CELEBORN-2301 benefit. Per-application > > > > encryption flags (not per-cluster) would let encrypted and > unencrypted > > > apps > > > > coexist without regressing the latter. > > > > Questions for the Community > > > > > > > > Trimming to three since these are the ones I’d need opinions on > before > > > > writing code. Happy to take the rest up in follow-ups. > > > > > > > > - Any prior design work or internal discussion on this topic I > > should > > > > know about before proceeding? > > > > - *Per-shuffle vs. per-application DEK scope* as the default? > > > > Per-shuffle gives smaller blast radius and simpler lifecycle; > > > > per-application amortizes KMS round-trips and is friendlier for > > > > long-running jobs. > > > > - *Key distribution path:* wrapped DEKs flow through Master > metadata > > > > (simpler, one KMS-aware role) vs. workers unwrap directly from KMS > > > > (removes > > > > Master from the key path, but every worker needs KMS credentials). > > > > Preference? > > > > > > > > Tracking > > > > > > > > JIRA: CELEBORN-2311 < > > https://issues.apache.org/jira/browse/CELEBORN-2311 > > > > > > > > > > > > I have a detailed design document with source citations, threat > model, > > > > performance analysis, and phased implementation plan. Happy to share > > > > on-list or off-list if there’s interest. > > > > > > > > - Karthik > > > > > > > > > > > > > -- > > > Aravind K. Patnam > > > > > >
