[ https://issues.apache.org/jira/browse/FLINK-13246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Konstantin Knauf reopened FLINK-13246: -------------------------------------- Re-opening in accordance with https://issues.apache.org/jira/browse/FLINK-23206. > Implement external shuffle service for Kubernetes > ------------------------------------------------- > > Key: FLINK-13246 > URL: https://issues.apache.org/jira/browse/FLINK-13246 > Project: Flink > Issue Type: New Feature > Components: Runtime / Network > Reporter: MalcolmSanders > Assignee: MalcolmSanders > Priority: Minor > Labels: auto-closed, stale-assigned > > Flink batch job users could achieve better cluster utilization and job > throughput throught external shuffle service because the producers of > intermedia result partitions can be released once intermedia result > partitions have been persisted on disks. In > [FLINK-10653|https://issues.apache.org/jira/browse/FLINK-10653], [~zjwang] > has introduced pluggable shuffle manager architecture which abstracts the > process of data transfer between stages from flink runtime as shuffle > service. I propose to k8s implementation for flink external shuffle service. > There are a few points needed to be discussed: > (1) how to deploy external shuffle service in k8s? > DaemonSet Vs. Sidecar mode > (2) how to manage pv used for storing intermedia result partition data? > Plan A: Shuffle servers(or other volume provisioners) provision pv, and > producers write to local pv; > Plan B: Producers write to shuffle server through network, and let shuffle > server control the use of pv; > (3) shuffle server could temporarily apply persistent storage backed by cloud > storages such as AWSElasticBlockStore, cephFs and etc. > I'll bring a design document later. -- This message was sent by Atlassian Jira (v8.3.4#803005)