MalcolmSanders created FLINK-11805:
--------------------------------------
Summary: A Common External Shuffle Service Framework
Key: FLINK-11805
URL: https://issues.apache.org/jira/browse/FLINK-11805
Project: Flink
Issue Type: New Feature
Components: Runtime / Network
Reporter: MalcolmSanders
Assignee: MalcolmSanders
In [FLINK-10653|https://issues.apache.org/jira/browse/FLINK-10653], [~zjwang]
has introduced pluggable shuffle manager architecture which abstracts the
process of data transfer between stages from flink runtime as shuffle service.
Here I'd like to propose a common external shuffle service framework so that a
majority of external shuffle services could be achieved more easily by
compositing and wrapping this framework as well as implementing a few
interfaces according to the specific platform or deployment system.
As far as I'm concerned, a common external shuffle service scenario:
(1) a shuffle service daemon process runs on each host machine as a server to
provide shuffle data for remote(maybe local) consumers.
(2) a producer gets a local persistent output directory for writing shuffle
data from the shuffle service daemon process of current host machine, and
writes shuffle data afterwards.
(3) a consumer fetch its subpartition data from the shuffle service daemon on
the host machine where the partition locates.
In my point of view, such framework could be applicable to external shuffle
services such as YarnShuffleService, KubernetesShuffleService and
StandaloneShuffleService. As to KubernetesShuffleService, there is also another
plan, named as sidecar mode, to achieve shuffle service on k8s which puts a TM
process and a shuffle service process into a pod. As a result, there might be
multiple shuffle service daemons running on a host machine, it can still fit
into the framework since the only difference might be whether the port of each
shuffle service process is fixed or not. Accroding to [~zjwang]'s
[proposal|https://cwiki.apache.org/confluence/display/FLINK/FLIP-31%3A+Pluggable+Shuffle+Manager],
this case can be handled via UpdatePartitionInfo so that the actual port of
each shuffle service process can be updated to the consumers.
This framework is not intended to handle external shuffle services which use
global storages as the media for shuffle data, such as DfsShuffleService, or
other implementations which don't request an actual shuffle service role such
as RdmaShuffleService.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)