回复:[DISCUSS] Proposal of external shuffle service

2018-12-09 Thread zhijiang
 generic container of all available 
ShuffleService’s. It could be part of TaskManagerServices instead of 
NetworkEnvironment which could go into specific ShuffleServiceImpl.

I might still miss some details, I would appreciate any feedback.

Best,
Andrey

On 28 Nov 2018, at 08:59, zhijiang  wrote:
Hi all,

I adjusted the umbrella jira [1] and corresponding google doc [2] to narrow 
down the scope of introducing pluggable shuffle manager architecture as the 
first step. 
Welcome further feedbacks and suggestions, then I would create specific 
subtasks for it to forward.

[1] https://issues.apache.org/jira/browse/FLINK-10653

[2] 
https://docs.google.com/document/d/1ssTu8QE8RnF31zal4JHM1VaVENow-PweUtXSRr68nGg/edit?usp=sharing
--
发件人:zhijiang 
发送时间:2018年11月1日(星期四) 17:19
收件人:dev ; Jin Sun 
抄 送:Nico Kruber ; Piotr Nowojski 
; Stephan Ewen 
主 题:回复:[DISCUSS] Proposal of external shuffle service

Thanks for the efficient response till!

Thanks sunjin for the good feedbacks, we will further confirm with the comments 
then! :)
--
发件人:Jin Sun 
发送时间:2018年11月1日(星期四) 06:42
收件人:dev 
抄 送:Zhijiang(wangzhijiang999) ; Nico Kruber 
; Piotr Nowojski ; Stephan 
Ewen 
主 题:Re: [DISCUSS] Proposal of external shuffle service

Thanks Zhijiang for the proposal. I like the idea of external shuffle service, 
have left some comments on the document. 

On Oct 31, 2018, at 2:26 AM, Till Rohrmann  wrote:

Thanks for the update Zhijiang! The community is currently quite busy with
the next Flink release. I hope that we can finish the release in two weeks.
After that people will become more responsive again.

Cheers,
Till

On Wed, Oct 31, 2018 at 7:49 AM zhijiang  wrote:

I already created the umbrella jira [1] for this improvement, and attched
the design doc [2] in this jira.

Welcome for further discussion about the details.

[1] https://issues.apache.org/jira/browse/FLINK-10653
[2]
https://docs.google.com/document/d/1Jb0Mf46ace-6cLRQxJzo6VNQQVxn3hwf9Zqmv5pcb34/edit?usp=sharing


<https://docs.google.com/document/d/1Jb0Mf46ace-6cLRQxJzo6VNQQVxn3hwf9Zqmv5pcb34/edit?usp=sharing>
Best,
Zhijiang

--
发件人:Zhijiang(wangzhijiang999) 
发送时间:2018年9月11日(星期二) 15:21
收件人:dev 
抄 送:dev 
主 题:回复:[DISCUSS] Proposal of external shuffle service

Many thanks Till!


I would create a JIRA for this feature and design a document attched with it.
I will let you know after ready! :)

Best,
Zhijiang


--
发件人:Till Rohrmann 
发送时间:2018年9月7日(星期五) 22:01
收件人:Zhijiang(wangzhijiang999) 
抄 送:dev 
主 题:Re: [DISCUSS] Proposal of external shuffle service

The rough plan sounds good Zhijiang. I think we should continue with what
you've proposed: Open a JIRA issue and creating a design document which
outlines the required changes a little bit more in detail. Once this is
done, we should link the design document in the JIRA issue and post it here
for further discussion.

Cheers,
Till

On Wed, Aug 29, 2018 at 6:04 PM Zhijiang(wangzhijiang999) <
wangzhijiang...@aliyun.com> wrote:

Glad to receive your positive feedbacks Till!

Actually our motivation is to support batch job well as you mentioned.

For output level, flink already has the Subpartition abstraction(writer),
and currently there are PipelinedSubpartition(memory output) and
SpillableSubpartition(one-sp-one-file output) implementations. We can
extend this abstraction to realize other persistent outputs (e.g.
sort-merge-file).

For transport level(shuffle service), the current SubpartitionView
abstraction(reader) seems as the brige linked with the output level, then

the view can understand and read the different output formats. The current
NetworkEnvironment seems take the role of internal shuffle service in
TaskManager and the transport server is realized by netty inside. This

component can also be started in other external containers like NodeManager
of yarn to take the role of external shuffle service. Further we can

abstract to extend the shuffle service for transporting outputs by http or

rdma instead of current netty.  This abstraction should provide the way for
output registration in order to read the results correctly, similar with
current SubpartitionView.

The above is still a rough idea. Next I plan to create a feature jira to
cover the related changes if possible. It would be better if getting help
from related committers to review the detail designs together.

Best,
Zhijiang

--
发件人:Till Rohrmann 
发送时间:2018年8月29日(星期三) 17:36
收件人:dev ; Zhijiang(wangzhijiang999) <
wangzhijiang...@aliyun.com>
主 题:Re: [DISCUSS] Proposal of external shuffle service

Thanks for starting this design discussion Zhijiang!

I really like the idea to introduce a ShuffleService abstraction which

allows to have

回复:[DISCUSS] Proposal of external shuffle service

2018-11-28 Thread zhijiang
Hi all,

I adjusted the umbrella jira [1] and corresponding google doc [2] to narrow 
down the scope of introducing pluggable shuffle manager architecture as the 
first step. 
Welcome further feedbacks and suggestions, then I would create specific 
subtasks for it to forward.

[1] https://issues.apache.org/jira/browse/FLINK-10653

[2] 
https://docs.google.com/document/d/1ssTu8QE8RnF31zal4JHM1VaVENow-PweUtXSRr68nGg/edit?usp=sharing
--
发件人:zhijiang 
发送时间:2018年11月1日(星期四) 17:19
收件人:dev ; Jin Sun 
抄 送:Nico Kruber ; Piotr Nowojski 
; Stephan Ewen 
主 题:回复:[DISCUSS] Proposal of external shuffle service

Thanks for the efficient response till!

Thanks sunjin for the good feedbacks, we will further confirm with the comments 
then! :)
--
发件人:Jin Sun 
发送时间:2018年11月1日(星期四) 06:42
收件人:dev 
抄 送:Zhijiang(wangzhijiang999) ; Nico Kruber 
; Piotr Nowojski ; Stephan 
Ewen 
主 题:Re: [DISCUSS] Proposal of external shuffle service

Thanks Zhijiang for the proposal. I like the idea of external shuffle service, 
have left some comments on the document. 

> On Oct 31, 2018, at 2:26 AM, Till Rohrmann  wrote:
> 
> Thanks for the update Zhijiang! The community is currently quite busy with
> the next Flink release. I hope that we can finish the release in two weeks.
> After that people will become more responsive again.
> 
> Cheers,
> Till
> 
> On Wed, Oct 31, 2018 at 7:49 AM zhijiang  wrote:
> 
>> I already created the umbrella jira [1] for this improvement, and attched
>> the design doc [2] in this jira.
>> 
>> Welcome for further discussion about the details.
>> 
>> [1] https://issues.apache.org/jira/browse/FLINK-10653
>> [2]
>> https://docs.google.com/document/d/1Jb0Mf46ace-6cLRQxJzo6VNQQVxn3hwf9Zqmv5pcb34/edit?usp=sharing
>> 
>> 
>> <https://docs.google.com/document/d/1Jb0Mf46ace-6cLRQxJzo6VNQQVxn3hwf9Zqmv5pcb34/edit?usp=sharing>
>> Best,
>> Zhijiang
>> 
>> ----------
>> 发件人:Zhijiang(wangzhijiang999) 
>> 发送时间:2018年9月11日(星期二) 15:21
>> 收件人:dev 
>> 抄 送:dev 
>> 主 题:回复:[DISCUSS] Proposal of external shuffle service
>> 
>> Many thanks Till!
>> 
>> 
>> I would create a JIRA for this feature and design a document attched with it.
>> I will let you know after ready! :)
>> 
>> Best,
>> Zhijiang
>> 
>> 
>> --
>> 发件人:Till Rohrmann 
>> 发送时间:2018年9月7日(星期五) 22:01
>> 收件人:Zhijiang(wangzhijiang999) 
>> 抄 送:dev 
>> 主 题:Re: [DISCUSS] Proposal of external shuffle service
>> 
>> The rough plan sounds good Zhijiang. I think we should continue with what
>> you've proposed: Open a JIRA issue and creating a design document which
>> outlines the required changes a little bit more in detail. Once this is
>> done, we should link the design document in the JIRA issue and post it here
>> for further discussion.
>> 
>> Cheers,
>> Till
>> 
>> On Wed, Aug 29, 2018 at 6:04 PM Zhijiang(wangzhijiang999) <
>> wangzhijiang...@aliyun.com> wrote:
>> 
>>> Glad to receive your positive feedbacks Till!
>>> 
>>> Actually our motivation is to support batch job well as you mentioned.
>>> 
>>> For output level, flink already has the Subpartition abstraction(writer),
>>> and currently there are PipelinedSubpartition(memory output) and
>>> SpillableSubpartition(one-sp-one-file output) implementations. We can
>>> extend this abstraction to realize other persistent outputs (e.g.
>>> sort-merge-file).
>>> 
>>> For transport level(shuffle service), the current SubpartitionView
>>> abstraction(reader) seems as the brige linked with the output level, then
>> 
>>> the view can understand and read the different output formats. The current
>>> NetworkEnvironment seems take the role of internal shuffle service in
>>> TaskManager and the transport server is realized by netty inside. This
>> 
>>> component can also be started in other external containers like NodeManager
>>> of yarn to take the role of external shuffle service. Further we can
>> 
>>> abstract to extend the shuffle service for transporting outputs by http or
>> 
>>> rdma instead of current netty.  This abstraction should provide the way for
>>> output registration in order to read the results correctly, similar with
>>> current SubpartitionView.
>>> 
>>> The above is still a rough idea. Next I plan to create a

回复:[DISCUSS] Proposal of external shuffle service

2018-11-01 Thread zhijiang
Thanks for the efficient response till!

Thanks sunjin for the good feedbacks, we will further confirm with the comments 
then! :)
--
发件人:Jin Sun 
发送时间:2018年11月1日(星期四) 06:42
收件人:dev 
抄 送:Zhijiang(wangzhijiang999) ; Nico Kruber 
; Piotr Nowojski ; Stephan 
Ewen 
主 题:Re: [DISCUSS] Proposal of external shuffle service

Thanks Zhijiang for the proposal. I like the idea of external shuffle service, 
have left some comments on the document. 

> On Oct 31, 2018, at 2:26 AM, Till Rohrmann  wrote:
> 
> Thanks for the update Zhijiang! The community is currently quite busy with
> the next Flink release. I hope that we can finish the release in two weeks.
> After that people will become more responsive again.
> 
> Cheers,
> Till
> 
> On Wed, Oct 31, 2018 at 7:49 AM zhijiang  wrote:
> 
>> I already created the umbrella jira [1] for this improvement, and attched
>> the design doc [2] in this jira.
>> 
>> Welcome for further discussion about the details.
>> 
>> [1] https://issues.apache.org/jira/browse/FLINK-10653
>> [2]
>> https://docs.google.com/document/d/1Jb0Mf46ace-6cLRQxJzo6VNQQVxn3hwf9Zqmv5pcb34/edit?usp=sharing
>> 
>> 
>> <https://docs.google.com/document/d/1Jb0Mf46ace-6cLRQxJzo6VNQQVxn3hwf9Zqmv5pcb34/edit?usp=sharing>
>> Best,
>> Zhijiang
>> 
>> ----------
>> 发件人:Zhijiang(wangzhijiang999) 
>> 发送时间:2018年9月11日(星期二) 15:21
>> 收件人:dev 
>> 抄 送:dev 
>> 主 题:回复:[DISCUSS] Proposal of external shuffle service
>> 
>> Many thanks Till!
>> 
>> 
>> I would create a JIRA for this feature and design a document attched with it.
>> I will let you know after ready! :)
>> 
>> Best,
>> Zhijiang
>> 
>> 
>> --
>> 发件人:Till Rohrmann 
>> 发送时间:2018年9月7日(星期五) 22:01
>> 收件人:Zhijiang(wangzhijiang999) 
>> 抄 送:dev 
>> 主 题:Re: [DISCUSS] Proposal of external shuffle service
>> 
>> The rough plan sounds good Zhijiang. I think we should continue with what
>> you've proposed: Open a JIRA issue and creating a design document which
>> outlines the required changes a little bit more in detail. Once this is
>> done, we should link the design document in the JIRA issue and post it here
>> for further discussion.
>> 
>> Cheers,
>> Till
>> 
>> On Wed, Aug 29, 2018 at 6:04 PM Zhijiang(wangzhijiang999) <
>> wangzhijiang...@aliyun.com> wrote:
>> 
>>> Glad to receive your positive feedbacks Till!
>>> 
>>> Actually our motivation is to support batch job well as you mentioned.
>>> 
>>> For output level, flink already has the Subpartition abstraction(writer),
>>> and currently there are PipelinedSubpartition(memory output) and
>>> SpillableSubpartition(one-sp-one-file output) implementations. We can
>>> extend this abstraction to realize other persistent outputs (e.g.
>>> sort-merge-file).
>>> 
>>> For transport level(shuffle service), the current SubpartitionView
>>> abstraction(reader) seems as the brige linked with the output level, then
>> 
>>> the view can understand and read the different output formats. The current
>>> NetworkEnvironment seems take the role of internal shuffle service in
>>> TaskManager and the transport server is realized by netty inside. This
>> 
>>> component can also be started in other external containers like NodeManager
>>> of yarn to take the role of external shuffle service. Further we can
>> 
>>> abstract to extend the shuffle service for transporting outputs by http or
>> 
>>> rdma instead of current netty.  This abstraction should provide the way for
>>> output registration in order to read the results correctly, similar with
>>> current SubpartitionView.
>>> 
>>> The above is still a rough idea. Next I plan to create a feature jira to
>>> cover the related changes if possible. It would be better if getting help
>>> from related committers to review the detail designs together.
>>> 
>>> Best,
>>> Zhijiang
>>> 
>>> --
>>> 发件人:Till Rohrmann 
>>> 发送时间:2018年8月29日(星期三) 17:36
>>> 收件人:dev ; Zhijiang(wangzhijiang999) <
>>> wangzhijiang...@aliyun.com>
>>> 主 题:Re: [DISCUSS] Proposal of external shuffle service
>>> 
>>> Thanks for starting this design discussion Zhijiang!
>>> 
>>> I

回复:[DISCUSS] Proposal of external shuffle service

2018-10-31 Thread zhijiang
I already created the umbrella jira [1] for this improvement, and attched the 
design doc [2] in this jira.

Welcome for further discussion about the details. 

[1] https://issues.apache.org/jira/browse/FLINK-10653
[2] 
https://docs.google.com/document/d/1Jb0Mf46ace-6cLRQxJzo6VNQQVxn3hwf9Zqmv5pcb34/edit?usp=sharing

Best,
Zhijiang
--
发件人:Zhijiang(wangzhijiang999) 
发送时间:2018年9月11日(星期二) 15:21
收件人:dev 
抄 送:dev 
主 题:回复:[DISCUSS] Proposal of external shuffle service

Many thanks Till!

I would create a JIRA for this feature and design a document attched with it. 
I will let you know after ready! :)

Best,
Zhijiang


--
发件人:Till Rohrmann 
发送时间:2018年9月7日(星期五) 22:01
收件人:Zhijiang(wangzhijiang999) 
抄 送:dev 
主 题:Re: [DISCUSS] Proposal of external shuffle service

The rough plan sounds good Zhijiang. I think we should continue with what
you've proposed: Open a JIRA issue and creating a design document which
outlines the required changes a little bit more in detail. Once this is
done, we should link the design document in the JIRA issue and post it here
for further discussion.

Cheers,
Till

On Wed, Aug 29, 2018 at 6:04 PM Zhijiang(wangzhijiang999) <
wangzhijiang...@aliyun.com> wrote:

> Glad to receive your positive feedbacks Till!
>
> Actually our motivation is to support batch job well as you mentioned.
>
> For output level, flink already has the Subpartition abstraction(writer),
> and currently there are PipelinedSubpartition(memory output) and
> SpillableSubpartition(one-sp-one-file output) implementations. We can
> extend this abstraction to realize other persistent outputs (e.g.
> sort-merge-file).
>
> For transport level(shuffle service), the current SubpartitionView
> abstraction(reader) seems as the brige linked with the output level, then
> the view can understand and read the different output formats. The current
> NetworkEnvironment seems take the role of internal shuffle service in
> TaskManager and the transport server is realized by netty inside. This
> component can also be started in other external containers like NodeManager
> of yarn to take the role of external shuffle service. Further we can
> abstract to extend the shuffle service for transporting outputs by http or
> rdma instead of current netty.  This abstraction should provide the way for
> output registration in order to read the results correctly, similar with
> current SubpartitionView.
>
> The above is still a rough idea. Next I plan to create a feature jira to
> cover the related changes if possible. It would be better if getting help
> from related committers to review the detail designs together.
>
> Best,
> Zhijiang
>
> --
> 发件人:Till Rohrmann 
> 发送时间:2018年8月29日(星期三) 17:36
> 收件人:dev ; Zhijiang(wangzhijiang999) <
> wangzhijiang...@aliyun.com>
> 主 题:Re: [DISCUSS] Proposal of external shuffle service
>
> Thanks for starting this design discussion Zhijiang!
>
> I really like the idea to introduce a ShuffleService abstraction which
> allows to have different implementations depending on the actual use case.
> Especially for batch jobs I can clearly see the benefits of persisting the
> results somewhere else.
>
> Do you already know which interfaces we need to extend and where to
> introduce new abstractions?
>
> Cheers,
> Till
>
> On Mon, Aug 27, 2018 at 1:57 PM Zhijiang(wangzhijiang999)
>  wrote:
> Hi all!
>
> The shuffle service is responsible for transporting upstream produced data
> to the downstream side. In flink, the NettyServer is used for network
> transport service and this component is started in the TaskManager process.
> That means the TaskManager can support internal shuffle service which
> exists some concerns:
> 1. If a task finishes, the ResultPartition of this task still retains
> registered in TaskManager, because the output buffers have to be
> transported by internal shuffle service in TaskManager. That means the
> TaskManager can not be released by ResourceManager until ResultPartition
> released. It may waste container resources and can not support well for
> dynamic resource scenarios.
> 2. If we want to expand another shuffle service implementation, the
> current mechanism is not easy to handle, because the output level (result
> partition) and transport level (shuffle service) are not divided clearly
> and loss of abstraction to be extended.
>
> For above considerations, we propose the external shuffle service which
> can be deployed on any other external contaienrs, e.g. NodeManager
> container in yarn. Then the TaskManager can be released ASAP ifneeded when
> all the internal tasks finish

回复:[DISCUSS] Proposal of external shuffle service

2018-09-11 Thread Zhijiang(wangzhijiang999)
Many thanks Till!

I would create a JIRA for this feature and design a document attched with it. 
I will let you know after ready! :)

Best,
Zhijiang


--
发件人:Till Rohrmann 
发送时间:2018年9月7日(星期五) 22:01
收件人:Zhijiang(wangzhijiang999) 
抄 送:dev 
主 题:Re: [DISCUSS] Proposal of external shuffle service

The rough plan sounds good Zhijiang. I think we should continue with what
you've proposed: Open a JIRA issue and creating a design document which
outlines the required changes a little bit more in detail. Once this is
done, we should link the design document in the JIRA issue and post it here
for further discussion.

Cheers,
Till

On Wed, Aug 29, 2018 at 6:04 PM Zhijiang(wangzhijiang999) <
wangzhijiang...@aliyun.com> wrote:

> Glad to receive your positive feedbacks Till!
>
> Actually our motivation is to support batch job well as you mentioned.
>
> For output level, flink already has the Subpartition abstraction(writer),
> and currently there are PipelinedSubpartition(memory output) and
> SpillableSubpartition(one-sp-one-file output) implementations. We can
> extend this abstraction to realize other persistent outputs (e.g.
> sort-merge-file).
>
> For transport level(shuffle service), the current SubpartitionView
> abstraction(reader) seems as the brige linked with the output level, then
> the view can understand and read the different output formats. The current
> NetworkEnvironment seems take the role of internal shuffle service in
> TaskManager and the transport server is realized by netty inside. This
> component can also be started in other external containers like NodeManager
> of yarn to take the role of external shuffle service. Further we can
> abstract to extend the shuffle service for transporting outputs by http or
> rdma instead of current netty.  This abstraction should provide the way for
> output registration in order to read the results correctly, similar with
> current SubpartitionView.
>
> The above is still a rough idea. Next I plan to create a feature jira to
> cover the related changes if possible. It would be better if getting help
> from related committers to review the detail designs together.
>
> Best,
> Zhijiang
>
> --
> 发件人:Till Rohrmann 
> 发送时间:2018年8月29日(星期三) 17:36
> 收件人:dev ; Zhijiang(wangzhijiang999) <
> wangzhijiang...@aliyun.com>
> 主 题:Re: [DISCUSS] Proposal of external shuffle service
>
> Thanks for starting this design discussion Zhijiang!
>
> I really like the idea to introduce a ShuffleService abstraction which
> allows to have different implementations depending on the actual use case.
> Especially for batch jobs I can clearly see the benefits of persisting the
> results somewhere else.
>
> Do you already know which interfaces we need to extend and where to
> introduce new abstractions?
>
> Cheers,
> Till
>
> On Mon, Aug 27, 2018 at 1:57 PM Zhijiang(wangzhijiang999)
>  wrote:
> Hi all!
>
> The shuffle service is responsible for transporting upstream produced data
> to the downstream side. In flink, the NettyServer is used for network
> transport service and this component is started in the TaskManager process.
> That means the TaskManager can support internal shuffle service which
> exists some concerns:
> 1. If a task finishes, the ResultPartition of this task still retains
> registered in TaskManager, because the output buffers have to be
> transported by internal shuffle service in TaskManager. That means the
> TaskManager can not be released by ResourceManager until ResultPartition
> released. It may waste container resources and can not support well for
> dynamic resource scenarios.
> 2. If we want to expand another shuffle service implementation, the
> current mechanism is not easy to handle, because the output level (result
> partition) and transport level (shuffle service) are not divided clearly
> and loss of abstraction to be extended.
>
> For above considerations, we propose the external shuffle service which
> can be deployed on any other external contaienrs, e.g. NodeManager
> container in yarn. Then the TaskManager can be released ASAP ifneeded when
> all the internal tasks finished. The persistent output files of these
> finished tasks can be served to transport by external shuffle service in
> the same machine.
>
> Further we can abstract both of the output level and transport level to
> support different implementations. e.g. We realized merging the data of all
> the subpartitions into limited persistent local files for disk improvements
> in some scenarios instead of one-subpartiton-one-file.
>
> I know it may be a big work for doing this, and I just point out some
> ideas, and wish getting any feedbacks from you!
>
> Best,
> Zhijiang
>
>
>



回复:[DISCUSS] Proposal of external shuffle service

2018-08-29 Thread Zhijiang(wangzhijiang999)
Glad to receive your positive feedbacks Till! 

Actually our motivation is to support batch job well as you mentioned.

For output level, flink already has the Subpartition abstraction(writer), and 
currently there are PipelinedSubpartition(memory output) and 
SpillableSubpartition(one-sp-one-file output) implementations. We can extend 
this abstraction to realize other persistent outputs (e.g. sort-merge-file).

For transport level(shuffle service), the current SubpartitionView 
abstraction(reader) seems as the brige linked with the output level, then the 
view can understand and read the different output formats. The current 
NetworkEnvironment seems take the role of internal shuffle service in 
TaskManager and the transport server is realized by netty inside. This 
component can also be started in other external containers like NodeManager of 
yarn to take the role of external shuffle service. Further we can abstract to 
extend the shuffle service for transporting outputs by http or rdma instead of 
current netty.  This abstraction should provide the way for output registration 
in order to read the results correctly, similar with current SubpartitionView.

The above is still a rough idea. Next I plan to create a feature jira to cover 
the related changes if possible. It would be better if getting help from 
related committers to review the detail designs together.

Best,
Zhijiang


--
发件人:Till Rohrmann 
发送时间:2018年8月29日(星期三) 17:36
收件人:dev ; Zhijiang(wangzhijiang999) 

主 题:Re: [DISCUSS] Proposal of external shuffle service

Thanks for starting this design discussion Zhijiang!

I really like the idea to introduce a ShuffleService abstraction which allows 
to have different implementations depending on the actual use case. Especially 
for batch jobs I can clearly see the benefits of persisting the results 
somewhere else.

Do you already know which interfaces we need to extend and where to introduce 
new abstractions?

Cheers,
Till
On Mon, Aug 27, 2018 at 1:57 PM Zhijiang(wangzhijiang999) 
 wrote:
Hi all!

 The shuffle service is responsible for transporting upstream produced data to 
the downstream side. In flink, the NettyServer is used for network transport 
service and this component is started in the TaskManager process. That means 
the TaskManager can support internal shuffle service which exists some concerns:
 1. If a task finishes, the ResultPartition of this task still retains 
registered in TaskManager, because the output buffers have to be transported by 
internal shuffle service in TaskManager. That means the TaskManager can not be 
released by ResourceManager until ResultPartition released. It may waste 
container resources and can not support well for dynamic resource scenarios.
 2. If we want to expand another shuffle service implementation, the current 
mechanism is not easy to handle, because the output level (result partition) 
and transport level (shuffle service) are not divided clearly and loss of 
abstraction to be extended.

 For above considerations, we propose the external shuffle service which can be 
deployed on any other external contaienrs, e.g. NodeManager container in yarn. 
Then the TaskManager can be released ASAP ifneeded when all the internal tasks 
finished. The persistent output files of these finished tasks can be served to 
transport by external shuffle service in the same machine.

 Further we can abstract both of the output level and transport level to 
support different implementations. e.g. We realized merging the data of all the 
subpartitions into limited persistent local files for disk improvements in some 
scenarios instead of one-subpartiton-one-file.

 I know it may be a big work for doing this, and I just point out some ideas, 
and wish getting any feedbacks from you!

 Best,
 Zhijiang