If this improves the performance+১ On Sat, 24 Dec, 2022, 5:47 pm Guowei Ma, <guowei....@gmail.com> wrote:
> Hi, > Thank you very much for driving this FLIP in order to improve user > usability. > > I understand that a key goal of this FLIP is to adjust the memory > requirements of shuffle to a more reasonable range. Through this adaptive > range adjustment, the memory efficiency can be improved under the premise > of ensuring the performance, thereby improving the user experience. > > I have no problem with this goal, but I have a concern about the means of > implementation: should we introduce a _new_ non-orthogonal > option(`taskmanager.memory.network.required-buffer-per-gate.max`). That is > to say, the option will affect both streaming and batch shuffle behavior at > the same time. > > From the description in FLIP, we can see that we do not want this value to > be the same in streaming and batch scenarios. But we still let the user > configure this parameter, and once this parameter is configured, the > shuffle behavior of streaming and batch may be the same. In theory, there > may be a configuration that can meet the requirements of batch shuffle, but > it will affect the performance of streaming shuffle. (For example, we need > to reduce the memory overhead in batch scenarios, but it will affect the > performance of streaming shuffle). In other words, do we really want to add > a new option that exposes this possible risk problem? > > Personally, I think there might be two ways: > 1. Modify the current implementation of streaming shuffle. Don't let > the streaming shuffle performance regression. In this way, this option will > not couple streaming shuffle and batch shuffle. This also avoids confusion > for the user. But I am not sure how to do it. :-) > 2. Introduce a pure batch read option, similar to the one introduced on > the batch write side. > > BTW: It's better not to expose more implementation-related concepts to > users. For example, the "gate" is related to the internal implementation. > Relatively speaking, `shuffle.read/shuffle.client.read` may be more > general. After all, it can also avoid coupling with the topology structure > and scheduling units. > > Best, > Guowei > > > On Fri, Dec 23, 2022 at 2:57 PM Lijie Wang <wangdachui9...@gmail.com> > wrote: > > > Hi, > > > > Thanks for driving this FLIP, +1 for the proposed changes. > > > > Limit the maximum value of shuffle read memory is very useful when using > > when using adaptive batch scheduler. Currently, the adaptive batch > > scheduler may cause a large number of input channels in a certain TM, so > we > > generally recommend that users configure > > "taskmanager.network.memory.buffers-per-channel: 0" to decrease the the > > possibility of “Insufficient number of network buffers” error. After this > > FLIP, users no longer need to configure the > > "taskmanager.network.memory.buffers-per-channel". > > > > So +1 from my side. > > > > Best, > > Lijie > > > > Xintong Song <tonysong...@gmail.com> 于2022年12月20日周二 10:04写道: > > > > > Thanks for the proposal, Yuxin. > > > > > > +1 for the proposed changes. I think these are indeed helpful usability > > > improvements. > > > > > > Best, > > > > > > Xintong > > > > > > > > > > > > On Mon, Dec 19, 2022 at 3:36 PM Yuxin Tan <tanyuxinw...@gmail.com> > > wrote: > > > > > > > Hi, devs, > > > > > > > > I'd like to start a discussion about FLIP-266: Simplify network > memory > > > > configurations for TaskManager[1]. > > > > > > > > When using Flink, users may encounter the following issues that > affect > > > > usability. > > > > 1. The job may fail with an "Insufficient number of network buffers" > > > > exception. > > > > 2. Flink network memory size adjustment is complex. > > > > When encountering these issues, users can solve some problems by > adding > > > or > > > > adjusting parameters. However, multiple memory config options should > be > > > > changed. The config option adjustment requires understanding the > > detailed > > > > internal implementation, which is impractical for most users. > > > > > > > > To simplify network memory configurations for TaskManager and improve > > > Flink > > > > usability, this FLIP proposed some optimization solutions for the > > issues. > > > > > > > > Looking forward to your feedback. > > > > > > > > [1] > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-266%3A+Simplify+network+memory+configurations+for+TaskManager > > > > > > > > Best regards, > > > > Yuxin > > > > > > > > > >