Hi all!

The shuffle service is responsible for transporting upstream produced data to 
the downstream side. In flink, the NettyServer is used for network transport 
service and this component is started in the TaskManager process. That means 
the TaskManager can support internal shuffle service which exists some concerns:
1. If a task finishes, the ResultPartition of this task still retains 
registered in TaskManager, because the output buffers have to be transported by 
internal shuffle service in TaskManager. That means the TaskManager can not be 
released by ResourceManager until ResultPartition released. It may waste 
container resources and can not support well for dynamic resource scenarios.
2. If we want to expand another shuffle service implementation, the current 
mechanism is not easy to handle, because the output level (result partition) 
and transport level (shuffle service) are not divided clearly and loss of 
abstraction to be extended.

For above considerations, we propose the external shuffle service which can be 
deployed on any other external contaienrs, e.g. NodeManager container in yarn. 
Then the TaskManager can be released ASAP ifneeded when all the internal tasks 
finished. The persistent output files of these finished tasks can be served to 
transport by external shuffle service in the same machine.

Further we can abstract both of the output level and transport level to support 
different implementations. e.g. We realized merging the data of all the 
subpartitions into limited persistent local files for disk improvements in some 
scenarios instead of one-subpartiton-one-file.

I know it may be a big work for doing this, and I just point out some ideas, 
and wish getting any feedbacks from you!

Best,
Zhijiang

Reply via email to