Hey folks,

If there is no further discussion or alignment needed, I will start a vote
thread next week. I am out on vacation this week.

Thanks,
Aravind

Aravind K. Patnam


On Wed, Apr 22, 2026 at 1:59 PM Aravind Patnam <[email protected]> wrote:

> Hi Folks,
>
> As mentioned in the thread, here is the CIP22: Encryption at Rest for
> Celeborn Shuffle Data
> <https://docs.google.com/document/d/1xBrLtpb8bk8CdJENiM3aLJCbFThRKLor8uAjSsfXoxE/edit?usp=sharing>
> .
> Pls let me know if there are any concerns, and feel free to comment on the
> doc itself or reply in this thread.
>
> Otherwise, I will plan to start a vote thread early next week.
>
>
> Thanks,
> Aravind
>
> On Mon, Apr 20, 2026 at 6:07 PM Karthik Prabhakar <[email protected]>
> wrote:
>
>> Hi all,
>>
>> Great to see the convergence; client-side encryption is a clean approach
>> and aligns well with the original intent. Looking forward to the CIP
>> thread, Aravind. Happy to contribute.
>>
>> - Karthik
>>
>> On Mon, Apr 20, 2026 at 1:17 PM Aravind Patnam <[email protected]>
>> wrote:
>>
>> > Let me start a thread for a CIP later this week.
>> >
>> > Aravind K. Patnam
>> >
>> >
>> > On Mon, Apr 20, 2026 at 8:37 AM Mridul Muralidharan <[email protected]>
>> > wrote:
>> >
>> > > Hi,
>> > >
>> > >   This is pretty much what Aravind has implemented internally :-)
>> > >
>> > > Regards,
>> > > Mridul
>> > >
>> > > On Mon, Apr 20, 2026 at 8:34 AM rexxiong <[email protected]> wrote:
>> > >
>> > > > Hi Karthik,
>> > > >
>> > > > Thanks for the thorough proposal. I'd like to suggest an alternative
>> > > > approach: client-side encryption, where encryption/decryption
>> happens
>> > > > entirely in the Spark
>> > > > client — similar to how compression works today. Celeborn workers
>> and
>> > > > master would only ever see ciphertext.
>> > > >
>> > > > Rationale:
>> > > >
>> > > > 1. Simpler architecture — no worker write/read path changes.
>> Celeborn
>> > > > remains a stateless byte pipe, just like it is for compression.
>> > > > 2. Stronger security — Plaintext and encryption keys never reach
>> > workers
>> > > or
>> > > > master. The trust boundary stays within the client, eliminating the
>> > need
>> > > > for KMS
>> > > > credentials on the server side.
>> > > > 3. No sendfile regression — Since workers store ciphertext natively,
>> > > > CELEBORN-2301's zero-copy sendfile works unchanged for all
>> workloads,
>> > > > encrypted or not.
>> > > > 4. Aligns with Spark — This naturally respects
>> > > spark.io.encryption.enabled
>> > > > and we can reuse Spark's existing key distribution via
>> IOEncryptionKey.
>> > > >
>> > > > Happy to discuss further.
>> > > >
>> > > > Regards,
>> > > > Jiashu Xiong
>> > > >
>> > > > Aravind Patnam <[email protected]> 于2026年4月20日周一 05:08写道:
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > We already have EAR for celeborn shuffle data internally at
>> LinkedIn,
>> > > > where
>> > > > > we have added this support to respect the existing
>> > > > > spark.io.encryption.enabled config in Spark on the client side.
>> > > > >
>> > > > > I am happy to contribute this back and start a CIP for this next
>> > week.
>> > > > >
>> > > > > Thanks,
>> > > > > Aravind
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Sat, Apr 18, 2026 at 10:24 PM Karthik Prabhakar <
>> > > > [email protected]>
>> > > > > wrote:
>> > > > >
>> > > > > > Hi dev@,
>> > > > > >
>> > > > > > I’d like to propose adding at-rest encryption for shuffle data
>> in
>> > > > > Celeborn
>> > > > > > and would appreciate the community’s input before writing a full
>> > > > > > implementation.
>> > > > > > cURRENT gap
>> > > > > >
>> > > > > > Celeborn encrypts data in transit (TLS, SASL) but not at rest.
>> > When a
>> > > > > > worker flushes shuffle data to local disk, HDFS, S3, or OSS, the
>> > > bytes
>> > > > > land
>> > > > > > as plaintext.
>> > > > > >
>> > > > > > The only write site for local disk is LocalFlushTask.flush() in
>> > > > > > FlushTask.scala (L66, L71 at commit a56f69a), which calls
>> > > > > > fileChannel.write(buffer) with no cipher transform. The
>> > > tiered-storage
>> > > > > > paths (HdfsFlushTask, S3FlushTask, OssFlushTask) are the same —
>> raw
>> > > > bytes
>> > > > > > to the underlying store.
>> > > > > >
>> > > > > > Verified with:
>> > > > > >
>> > > > > > grep -rnE 'cipher|\.encrypt|aes|envelope' worker/src/main/
>> > > > > > grep -rn  'javax\.crypto'                 worker/src/main/
>> > > > > > (both zero matches)
>> > > > > >
>> > > > > > This matters because spark.io.encryption.enabled does *not*
>> cover
>> > the
>> > > > > > Celeborn path. When Celeborn’s ShuffleManager replaces Spark’s
>> > > shuffle
>> > > > > > writer, Spark’s encryption key is never consulted — confirmed by
>> > > > grepping
>> > > > > > client-spark/ for IOEncryptionKey (zero matches).
>> > > > > >
>> > > > > > Teams adopting Celeborn for performance silently lose
>> > > > shuffle-encryption
>> > > > > > guarantees their compliance posture may assume.
>> > > > > > Who Needs This
>> > > > > >
>> > > > > >    - Regulated industries (healthcare, finance, public sector)
>> > whose
>> > > > > >    auditors require application-layer encryption independent of
>> > > > > disk/volume
>> > > > > >    encryption.
>> > > > > >    - Multi-tenant platforms needing cryptographic isolation
>> between
>> > > > > tenants
>> > > > > >    on shared workers.
>> > > > > >    - Teams using object-store tiering who want encryption before
>> > > > offload.
>> > > > > >
>> > > > > > Proposed Approach (High Level)
>> > > > > >
>> > > > > >    1. A *StreamCipher SPI* in common/ for wrapping
>> > > WritableByteChannel
>> > > > /
>> > > > > >    ReadableByteChannel with encrypt/decrypt. No KMS SDK in core.
>> > > > > >    2. A *KeyService SPI* for envelope encryption —
>> generate/unwrap
>> > > DEKs
>> > > > > >    using a KMS-held KEK. Implementations live in separate
>> optional
>> > > > > modules
>> > > > > > (
>> > > > > >    aws-kms, gcp-kms, azure-kv, vault, static for dev/PoC).
>> > > > > >    3. Wire into the worker write path: LocalFlushTask wraps
>> > > fileChannel
>> > > > > >     with StreamCipher.wrapForWrite(). Same for HDFS/S3/OSS flush
>> > > tasks.
>> > > > > >    4. Wire into the reader path: LocalPartitionDataReader
>> detects a
>> > > > > 16-byte
>> > > > > >    encrypted-file header, unwraps the DEK (cached per
>> > > worker+shuffle),
>> > > > > > wraps
>> > > > > >    the channel with StreamCipher.wrapForRead().
>> > > > > >    5. Opt-in via celeborn.shuffle.io.encryption.enabled=true.
>> > > Default
>> > > > > off.
>> > > > > >    Unencrypted deployments are byte-identical to today, zero
>> > > overhead.
>> > > > > >    6. Per-shuffle DEKs by default (one KMS call per shuffle
>> > > > reservation,
>> > > > > >    amortized). Per-application DEK scope as an option.
>> > > > > >
>> > > > > > Interaction with Recent Work
>> > > > > >
>> > > > > > CELEBORN-2301 (commit 95419e1) recently landed enhanced
>> zero-copy
>> > > > > sendfile
>> > > > > > for FileRegion on native transports — a nice throughput win for
>> the
>> > > > fetch
>> > > > > > path.
>> > > > > >
>> > > > > > Encryption and sendfile are fundamentally incompatible:
>> sendfile(2)
>> > > > > cannot
>> > > > > > transform bytes, so encrypted partitions must use a buffered
>> read
>> > > path.
>> > > > > > This is only relevant for encrypted workloads; unencrypted
>> > workloads
>> > > on
>> > > > > the
>> > > > > > same cluster keep the full CELEBORN-2301 benefit.
>> Per-application
>> > > > > > encryption flags (not per-cluster) would let encrypted and
>> > > unencrypted
>> > > > > apps
>> > > > > > coexist without regressing the latter.
>> > > > > > Questions for the Community
>> > > > > >
>> > > > > > Trimming to three since these are the ones I’d need opinions on
>> > > before
>> > > > > > writing code. Happy to take the rest up in follow-ups.
>> > > > > >
>> > > > > >    - Any prior design work or internal discussion on this topic
>> I
>> > > > should
>> > > > > >    know about before proceeding?
>> > > > > >    - *Per-shuffle vs. per-application DEK scope* as the default?
>> > > > > >    Per-shuffle gives smaller blast radius and simpler lifecycle;
>> > > > > >    per-application amortizes KMS round-trips and is friendlier
>> for
>> > > > > >    long-running jobs.
>> > > > > >    - *Key distribution path:* wrapped DEKs flow through Master
>> > > metadata
>> > > > > >    (simpler, one KMS-aware role) vs. workers unwrap directly
>> from
>> > KMS
>> > > > > > (removes
>> > > > > >    Master from the key path, but every worker needs KMS
>> > credentials).
>> > > > > >    Preference?
>> > > > > >
>> > > > > > Tracking
>> > > > > >
>> > > > > > JIRA: CELEBORN-2311 <
>> > > > https://issues.apache.org/jira/browse/CELEBORN-2311
>> > > > > >
>> > > > > >
>> > > > > > I have a detailed design document with source citations, threat
>> > > model,
>> > > > > > performance analysis, and phased implementation plan. Happy to
>> > share
>> > > > > > on-list or off-list if there’s interest.
>> > > > > >
>> > > > > > - Karthik
>> > > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Aravind K. Patnam
>> > > > >
>> > > >
>> > >
>> >
>>
>>
>> --
>> Thanks,
>> Karthik
>>
>
>
> --
> Aravind K. Patnam
>

Reply via email to