MalcolmSanders created FLINK-11805:
--------------------------------------

             Summary: A Common External Shuffle Service Framework
                 Key: FLINK-11805
                 URL: https://issues.apache.org/jira/browse/FLINK-11805
             Project: Flink
          Issue Type: New Feature
          Components: Runtime / Network
            Reporter: MalcolmSanders
            Assignee: MalcolmSanders


In [FLINK-10653|https://issues.apache.org/jira/browse/FLINK-10653], [~zjwang] 
has introduced pluggable shuffle manager architecture which abstracts the 
process of data transfer between stages from flink runtime as shuffle service. 
Here I'd like to propose a common external shuffle service framework so that a 
majority of external shuffle services could be achieved more easily by 
compositing and wrapping this framework as well as implementing a few 
interfaces according to the specific platform or deployment system.

As far as I'm concerned, a common external shuffle service scenario:
(1) a shuffle service daemon process runs on each host machine as a server to 
provide shuffle data for remote(maybe local) consumers.
(2) a producer gets a local persistent output directory for writing shuffle 
data from the shuffle service daemon process of current host machine, and 
writes shuffle data afterwards.
(3) a consumer fetch its subpartition data from the shuffle service daemon on 
the host machine where the partition locates.

In my point of view, such framework could be applicable to external shuffle 
services such as YarnShuffleService, KubernetesShuffleService and 
StandaloneShuffleService. As to KubernetesShuffleService, there is also another 
plan, named as sidecar mode, to achieve shuffle service on k8s which puts a TM 
process and a shuffle service process into a pod. As a result, there might be 
multiple shuffle service daemons running on a host machine, it can still fit 
into the framework since the only difference might be whether the port of each 
shuffle service process is fixed or not. Accroding to [~zjwang]'s 
[proposal|https://cwiki.apache.org/confluence/display/FLINK/FLIP-31%3A+Pluggable+Shuffle+Manager],
 this case can be handled via UpdatePartitionInfo so that the actual port of 
each shuffle service process can be updated to the consumers.

This framework is not intended to handle external shuffle services which use 
global storages as the media for shuffle data, such as DfsShuffleService, or 
other implementations which don't request an actual shuffle service role such 
as RdmaShuffleService.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to