If this improves the performance+১

On Sat, 24 Dec, 2022, 5:47 pm Guowei Ma, <guowei....@gmail.com> wrote:

> Hi,
> Thank you very much for driving this FLIP in order to improve user
> usability.
>
> I understand that a key goal of this FLIP is to adjust the memory
> requirements of shuffle to a more reasonable range. Through this adaptive
> range adjustment, the memory efficiency can be improved under the premise
> of ensuring the performance, thereby improving the user experience.
>
> I have no problem with this goal, but I have a concern about the means of
> implementation: should we introduce a _new_ non-orthogonal
> option(`taskmanager.memory.network.required-buffer-per-gate.max`). That is
> to say, the option will affect both streaming and batch shuffle behavior at
> the same time.
>
> From the description in FLIP, we can see that we do not want this value to
> be the same in streaming and batch scenarios. But we still let the user
> configure this parameter, and once this parameter is configured, the
> shuffle behavior of streaming and batch may be the same. In theory, there
> may be a configuration that can meet the requirements of batch shuffle, but
> it will affect the performance of streaming shuffle. (For example, we need
> to reduce the memory overhead in batch scenarios, but it will affect the
> performance of streaming shuffle). In other words, do we really want to add
> a new option that exposes this possible risk problem?
>
>   Personally, I think there might be two ways:
>     1. Modify the current implementation of streaming shuffle. Don't let
> the streaming shuffle performance regression. In this way, this option will
> not couple streaming shuffle and batch shuffle. This also avoids confusion
> for the user.  But I am not sure how to do it. :-)
>     2. Introduce a pure batch read option, similar to the one introduced on
> the batch write side.
>
> BTW: It's better not to expose more implementation-related concepts to
> users. For example, the "gate" is related to the internal implementation.
> Relatively speaking, `shuffle.read/shuffle.client.read` may be more
> general. After all, it can also avoid coupling with the topology structure
> and scheduling units.
>
> Best,
> Guowei
>
>
> On Fri, Dec 23, 2022 at 2:57 PM Lijie Wang <wangdachui9...@gmail.com>
> wrote:
>
> > Hi,
> >
> > Thanks for driving this FLIP, +1 for the proposed changes.
> >
> > Limit the maximum value of shuffle read memory is very useful when using
> > when using adaptive batch scheduler. Currently, the adaptive batch
> > scheduler may cause a large number of input channels in a certain TM, so
> we
> > generally recommend that users configure
> > "taskmanager.network.memory.buffers-per-channel: 0" to decrease the the
> > possibility of “Insufficient number of network buffers” error. After this
> > FLIP, users no longer need to configure the
> > "taskmanager.network.memory.buffers-per-channel".
> >
> > So +1 from my side.
> >
> > Best,
> > Lijie
> >
> > Xintong Song <tonysong...@gmail.com> 于2022年12月20日周二 10:04写道:
> >
> > > Thanks for the proposal, Yuxin.
> > >
> > > +1 for the proposed changes. I think these are indeed helpful usability
> > > improvements.
> > >
> > > Best,
> > >
> > > Xintong
> > >
> > >
> > >
> > > On Mon, Dec 19, 2022 at 3:36 PM Yuxin Tan <tanyuxinw...@gmail.com>
> > wrote:
> > >
> > > > Hi, devs,
> > > >
> > > > I'd like to start a discussion about FLIP-266: Simplify network
> memory
> > > > configurations for TaskManager[1].
> > > >
> > > > When using Flink, users may encounter the following issues that
> affect
> > > > usability.
> > > > 1. The job may fail with an "Insufficient number of network buffers"
> > > > exception.
> > > > 2. Flink network memory size adjustment is complex.
> > > > When encountering these issues, users can solve some problems by
> adding
> > > or
> > > > adjusting parameters. However, multiple memory config options should
> be
> > > > changed. The config option adjustment requires understanding the
> > detailed
> > > > internal implementation, which is impractical for most users.
> > > >
> > > > To simplify network memory configurations for TaskManager and improve
> > > Flink
> > > > usability, this FLIP proposed some optimization solutions for the
> > issues.
> > > >
> > > > Looking forward to your feedback.
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-266%3A+Simplify+network+memory+configurations+for+TaskManager
> > > >
> > > > Best regards,
> > > > Yuxin
> > > >
> > >
> >
>

Reply via email to