Re: [DISCUSS] At-rest encryption for shuffle data on workers and tiered storage

Aravind Patnam Wed, 22 Apr 2026 14:02:13 -0700

Hi Folks,

As mentioned in the thread, here is the CIP22: Encryption at Rest for
Celeborn Shuffle Data
<https://docs.google.com/document/d/1xBrLtpb8bk8CdJENiM3aLJCbFThRKLor8uAjSsfXoxE/edit?usp=sharing>
.
Pls let me know if there are any concerns, and feel free to comment on the
doc itself or reply in this thread.


Otherwise, I will plan to start a vote thread early next week.


Thanks,
Aravind

On Mon, Apr 20, 2026 at 6:07 PM Karthik Prabhakar <[email protected]>
wrote:

> Hi all,
>
> Great to see the convergence; client-side encryption is a clean approach
> and aligns well with the original intent. Looking forward to the CIP
> thread, Aravind. Happy to contribute.
>
> - Karthik
>
> On Mon, Apr 20, 2026 at 1:17 PM Aravind Patnam <[email protected]>
> wrote:
>
> > Let me start a thread for a CIP later this week.
> >
> > Aravind K. Patnam
> >
> >
> > On Mon, Apr 20, 2026 at 8:37 AM Mridul Muralidharan <[email protected]>
> > wrote:
> >
> > > Hi,
> > >
> > >   This is pretty much what Aravind has implemented internally :-)
> > >
> > > Regards,
> > > Mridul
> > >
> > > On Mon, Apr 20, 2026 at 8:34 AM rexxiong <[email protected]> wrote:
> > >
> > > > Hi Karthik,
> > > >
> > > > Thanks for the thorough proposal. I'd like to suggest an alternative
> > > > approach: client-side encryption, where encryption/decryption happens
> > > > entirely in the Spark
> > > > client — similar to how compression works today. Celeborn workers and
> > > > master would only ever see ciphertext.
> > > >
> > > > Rationale:
> > > >
> > > > 1. Simpler architecture — no worker write/read path changes. Celeborn
> > > > remains a stateless byte pipe, just like it is for compression.
> > > > 2. Stronger security — Plaintext and encryption keys never reach
> > workers
> > > or
> > > > master. The trust boundary stays within the client, eliminating the
> > need
> > > > for KMS
> > > > credentials on the server side.
> > > > 3. No sendfile regression — Since workers store ciphertext natively,
> > > > CELEBORN-2301's zero-copy sendfile works unchanged for all workloads,
> > > > encrypted or not.
> > > > 4. Aligns with Spark — This naturally respects
> > > spark.io.encryption.enabled
> > > > and we can reuse Spark's existing key distribution via
> IOEncryptionKey.
> > > >
> > > > Happy to discuss further.
> > > >
> > > > Regards,
> > > > Jiashu Xiong
> > > >
> > > > Aravind Patnam <[email protected]> 于2026年4月20日周一 05:08写道：
> > > >
> > > > > Hi,
> > > > >
> > > > > We already have EAR for celeborn shuffle data internally at
> LinkedIn,
> > > > where
> > > > > we have added this support to respect the existing
> > > > > spark.io.encryption.enabled config in Spark on the client side.
> > > > >
> > > > > I am happy to contribute this back and start a CIP for this next
> > week.
> > > > >
> > > > > Thanks,
> > > > > Aravind
> > > > >
> > > > >
> > > > >
> > > > > On Sat, Apr 18, 2026 at 10:24 PM Karthik Prabhakar <
> > > > [email protected]>
> > > > > wrote:
> > > > >
> > > > > > Hi dev@,
> > > > > >
> > > > > > I’d like to propose adding at-rest encryption for shuffle data in
> > > > > Celeborn
> > > > > > and would appreciate the community’s input before writing a full
> > > > > > implementation.
> > > > > > cURRENT gap
> > > > > >
> > > > > > Celeborn encrypts data in transit (TLS, SASL) but not at rest.
> > When a
> > > > > > worker flushes shuffle data to local disk, HDFS, S3, or OSS, the
> > > bytes
> > > > > land
> > > > > > as plaintext.
> > > > > >
> > > > > > The only write site for local disk is LocalFlushTask.flush() in
> > > > > > FlushTask.scala (L66, L71 at commit a56f69a), which calls
> > > > > > fileChannel.write(buffer) with no cipher transform. The
> > > tiered-storage
> > > > > > paths (HdfsFlushTask, S3FlushTask, OssFlushTask) are the same —
> raw
> > > > bytes
> > > > > > to the underlying store.
> > > > > >
> > > > > > Verified with:
> > > > > >
> > > > > > grep -rnE 'cipher|\.encrypt|aes|envelope' worker/src/main/
> > > > > > grep -rn  'javax\.crypto'                 worker/src/main/
> > > > > > (both zero matches)
> > > > > >
> > > > > > This matters because spark.io.encryption.enabled does *not* cover
> > the
> > > > > > Celeborn path. When Celeborn’s ShuffleManager replaces Spark’s
> > > shuffle
> > > > > > writer, Spark’s encryption key is never consulted — confirmed by
> > > > grepping
> > > > > > client-spark/ for IOEncryptionKey (zero matches).
> > > > > >
> > > > > > Teams adopting Celeborn for performance silently lose
> > > > shuffle-encryption
> > > > > > guarantees their compliance posture may assume.
> > > > > > Who Needs This
> > > > > >
> > > > > >    - Regulated industries (healthcare, finance, public sector)
> > whose
> > > > > >    auditors require application-layer encryption independent of
> > > > > disk/volume
> > > > > >    encryption.
> > > > > >    - Multi-tenant platforms needing cryptographic isolation
> between
> > > > > tenants
> > > > > >    on shared workers.
> > > > > >    - Teams using object-store tiering who want encryption before
> > > > offload.
> > > > > >
> > > > > > Proposed Approach (High Level)
> > > > > >
> > > > > >    1. A *StreamCipher SPI* in common/ for wrapping
> > > WritableByteChannel
> > > > /
> > > > > >    ReadableByteChannel with encrypt/decrypt. No KMS SDK in core.
> > > > > >    2. A *KeyService SPI* for envelope encryption —
> generate/unwrap
> > > DEKs
> > > > > >    using a KMS-held KEK. Implementations live in separate
> optional
> > > > > modules
> > > > > > (
> > > > > >    aws-kms, gcp-kms, azure-kv, vault, static for dev/PoC).
> > > > > >    3. Wire into the worker write path: LocalFlushTask wraps
> > > fileChannel
> > > > > >     with StreamCipher.wrapForWrite(). Same for HDFS/S3/OSS flush
> > > tasks.
> > > > > >    4. Wire into the reader path: LocalPartitionDataReader
> detects a
> > > > > 16-byte
> > > > > >    encrypted-file header, unwraps the DEK (cached per
> > > worker+shuffle),
> > > > > > wraps
> > > > > >    the channel with StreamCipher.wrapForRead().
> > > > > >    5. Opt-in via celeborn.shuffle.io.encryption.enabled=true.
> > > Default
> > > > > off.
> > > > > >    Unencrypted deployments are byte-identical to today, zero
> > > overhead.
> > > > > >    6. Per-shuffle DEKs by default (one KMS call per shuffle
> > > > reservation,
> > > > > >    amortized). Per-application DEK scope as an option.
> > > > > >
> > > > > > Interaction with Recent Work
> > > > > >
> > > > > > CELEBORN-2301 (commit 95419e1) recently landed enhanced zero-copy
> > > > > sendfile
> > > > > > for FileRegion on native transports — a nice throughput win for
> the
> > > > fetch
> > > > > > path.
> > > > > >
> > > > > > Encryption and sendfile are fundamentally incompatible:
> sendfile(2)
> > > > > cannot
> > > > > > transform bytes, so encrypted partitions must use a buffered read
> > > path.
> > > > > > This is only relevant for encrypted workloads; unencrypted
> > workloads
> > > on
> > > > > the
> > > > > > same cluster keep the full CELEBORN-2301 benefit. Per-application
> > > > > > encryption flags (not per-cluster) would let encrypted and
> > > unencrypted
> > > > > apps
> > > > > > coexist without regressing the latter.
> > > > > > Questions for the Community
> > > > > >
> > > > > > Trimming to three since these are the ones I’d need opinions on
> > > before
> > > > > > writing code. Happy to take the rest up in follow-ups.
> > > > > >
> > > > > >    - Any prior design work or internal discussion on this topic I
> > > > should
> > > > > >    know about before proceeding?
> > > > > >    - *Per-shuffle vs. per-application DEK scope* as the default?
> > > > > >    Per-shuffle gives smaller blast radius and simpler lifecycle;
> > > > > >    per-application amortizes KMS round-trips and is friendlier
> for
> > > > > >    long-running jobs.
> > > > > >    - *Key distribution path:* wrapped DEKs flow through Master
> > > metadata
> > > > > >    (simpler, one KMS-aware role) vs. workers unwrap directly from
> > KMS
> > > > > > (removes
> > > > > >    Master from the key path, but every worker needs KMS
> > credentials).
> > > > > >    Preference?
> > > > > >
> > > > > > Tracking
> > > > > >
> > > > > > JIRA: CELEBORN-2311 <
> > > > https://issues.apache.org/jira/browse/CELEBORN-2311
> > > > > >
> > > > > >
> > > > > > I have a detailed design document with source citations, threat
> > > model,
> > > > > > performance analysis, and phased implementation plan. Happy to
> > share
> > > > > > on-list or off-list if there’s interest.
> > > > > >
> > > > > > - Karthik
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Aravind K. Patnam
> > > > >
> > > >
> > >
> >
>
>
> --
> Thanks,
> Karthik
>


-- 
Aravind K. Patnam

Re: [DISCUSS] At-rest encryption for shuffle data on workers and tiered storage

Reply via email to