Thanks for the update Zhijiang! The community is currently quite busy with the next Flink release. I hope that we can finish the release in two weeks. After that people will become more responsive again.
Cheers, Till On Wed, Oct 31, 2018 at 7:49 AM zhijiang <[email protected]> wrote: > I already created the umbrella jira [1] for this improvement, and attched > the design doc [2] in this jira. > > Welcome for further discussion about the details. > > [1] https://issues.apache.org/jira/browse/FLINK-10653 > [2] > https://docs.google.com/document/d/1Jb0Mf46ace-6cLRQxJzo6VNQQVxn3hwf9Zqmv5pcb34/edit?usp=sharing > > > <https://docs.google.com/document/d/1Jb0Mf46ace-6cLRQxJzo6VNQQVxn3hwf9Zqmv5pcb34/edit?usp=sharing> > Best, > Zhijiang > > ------------------------------------------------------------------ > 发件人:Zhijiang(wangzhijiang999) <[email protected]> > 发送时间:2018年9月11日(星期二) 15:21 > 收件人:dev <[email protected]> > 抄 送:dev <[email protected]> > 主 题:回复:[DISCUSS] Proposal of external shuffle service > > Many thanks Till! > > > I would create a JIRA for this feature and design a document attched with it. > I will let you know after ready! :) > > Best, > Zhijiang > > > ------------------------------------------------------------------ > 发件人:Till Rohrmann <[email protected]> > 发送时间:2018年9月7日(星期五) 22:01 > 收件人:Zhijiang(wangzhijiang999) <[email protected]> > 抄 送:dev <[email protected]> > 主 题:Re: [DISCUSS] Proposal of external shuffle service > > The rough plan sounds good Zhijiang. I think we should continue with what > you've proposed: Open a JIRA issue and creating a design document which > outlines the required changes a little bit more in detail. Once this is > done, we should link the design document in the JIRA issue and post it here > for further discussion. > > Cheers, > Till > > On Wed, Aug 29, 2018 at 6:04 PM Zhijiang(wangzhijiang999) < > [email protected]> wrote: > > > Glad to receive your positive feedbacks Till! > > > > Actually our motivation is to support batch job well as you mentioned. > > > > For output level, flink already has the Subpartition abstraction(writer), > > and currently there are PipelinedSubpartition(memory output) and > > SpillableSubpartition(one-sp-one-file output) implementations. We can > > extend this abstraction to realize other persistent outputs (e.g. > > sort-merge-file). > > > > For transport level(shuffle service), the current SubpartitionView > > abstraction(reader) seems as the brige linked with the output level, then > > > the view can understand and read the different output formats. The current > > NetworkEnvironment seems take the role of internal shuffle service in > > TaskManager and the transport server is realized by netty inside. This > > > component can also be started in other external containers like NodeManager > > of yarn to take the role of external shuffle service. Further we can > > > abstract to extend the shuffle service for transporting outputs by http or > > > rdma instead of current netty. This abstraction should provide the way for > > output registration in order to read the results correctly, similar with > > current SubpartitionView. > > > > The above is still a rough idea. Next I plan to create a feature jira to > > cover the related changes if possible. It would be better if getting help > > from related committers to review the detail designs together. > > > > Best, > > Zhijiang > > > > ------------------------------------------------------------------ > > 发件人:Till Rohrmann <[email protected]> > > 发送时间:2018年8月29日(星期三) 17:36 > > 收件人:dev <[email protected]>; Zhijiang(wangzhijiang999) < > > [email protected]> > > 主 题:Re: [DISCUSS] Proposal of external shuffle service > > > > Thanks for starting this design discussion Zhijiang! > > > > I really like the idea to introduce a ShuffleService abstraction which > > > allows to have different implementations depending on the actual use case. > > > Especially for batch jobs I can clearly see the benefits of persisting the > > results somewhere else. > > > > Do you already know which interfaces we need to extend and where to > > introduce new abstractions? > > > > Cheers, > > Till > > > > On Mon, Aug 27, 2018 at 1:57 PM Zhijiang(wangzhijiang999) > > <[email protected]> wrote: > > Hi all! > > > > > The shuffle service is responsible for transporting upstream produced data > > to the downstream side. In flink, the NettyServer is used for network > > > transport service and this component is started in the TaskManager process. > > That means the TaskManager can support internal shuffle service which > > exists some concerns: > > 1. If a task finishes, the ResultPartition of this task still retains > > registered in TaskManager, because the output buffers have to be > > transported by internal shuffle service in TaskManager. That means the > > TaskManager can not be released by ResourceManager until ResultPartition > > released. It may waste container resources and can not support well for > > dynamic resource scenarios. > > 2. If we want to expand another shuffle service implementation, the > > current mechanism is not easy to handle, because the output level (result > > partition) and transport level (shuffle service) are not divided clearly > > and loss of abstraction to be extended. > > > > For above considerations, we propose the external shuffle service which > > can be deployed on any other external contaienrs, e.g. NodeManager > > > container in yarn. Then the TaskManager can be released ASAP ifneeded when > > all the internal tasks finished. The persistent output files of these > > finished tasks can be served to transport by external shuffle service in > > the same machine. > > > > Further we can abstract both of the output level and transport level to > > > support different implementations. e.g. We realized merging the data of all > > > the subpartitions into limited persistent local files for disk improvements > > in some scenarios instead of one-subpartiton-one-file. > > > > I know it may be a big work for doing this, and I just point out some > > ideas, and wish getting any feedbacks from you! > > > > Best, > > Zhijiang > > > > > > > > >
