H Xiangyui,

The sentiment of the FLIP makes sense, but I keep wondering whether this is
the best way to think about the problem. I assume that "interactive session
cluster" users always want to keep some spare resources around (up to a
configured threshold) to reduce cold start instead of statically
configuring the minimum.

It's just a tiny change from the original proposal, but it could make all
the difference (eliminate overprovisioning, maintain latencies with a
growing # of jobs, ..)

WDYT?

Best,
D.

On Mon, Sep 25, 2023 at 5:11 PM Jing Ge <j...@ververica.com.invalid> wrote:

> Hi Yangze,
>
> Thanks for the clarification. The example of two batch jobs team up with
> one streaming job is interesting.
>
> Best regards,
> Jing
>
> On Wed, Sep 20, 2023 at 7:19 PM Yangze Guo <karma...@gmail.com> wrote:
>
> > Thanks for the comments, Jing.
> >
> > > Will the minimum resource configuration also take effect for streaming
> > jobs in application mode?
> > > Since it is not recommended to configure
> slotmanager.number-of-slots.max
> > for streaming jobs, does it make sense to disable it for common streaming
> > jobs? At least disable the check for avoiding the oscillation?
> >
> > Yes. The minimum resource configuration will only disabled in
> > standalone cluster atm. I agree it make sense to disable it for a pure
> > streaming job, however:
> > - By default, the minimum resource is configured to 0. If users do not
> > proactively set it, either the oscillation check or the minimum
> > restriction can be considered as disabled.
> > - The minimum resource is a cluster-level configuration rather than a
> > job-level configuration. If a user has an application with two batch
> > jobs preceding the streaming job, they may also require this
> > configuration to accelerate the execution of batch jobs.
> >
> > WDYT?
> >
> > Best,
> > Yangze Guo
> >
> > On Thu, Sep 21, 2023 at 4:49 AM Jing Ge <j...@ververica.com.invalid>
> > wrote:
> > >
> > > Hi Xiangyu,
> > >
> > > Thanks for driving it! There is one thing I am not really sure if I
> > > understand you correctly.
> > >
> > > According to the FLIP: "The minimum resource limitation will be
> > implemented
> > > in the DefaultResourceAllocationStrategy of FineGrainedSlotManager.
> > >
> > > Each time when SlotManager needs to reconcile the cluster resources or
> > > fulfill job resource requirements, the
> DefaultResourceAllocationStrategy
> > > will check if the minimum resource requirement has been fulfilled. If
> it
> > is
> > > not, DefaultResourceAllocationStrategy will request new
> > PendingTaskManagers
> > > and FineGrainedSlotManager will allocate new worker resources
> > accordingly."
> > >
> > > "To avoid this oscillation, we need to check the worker number derived
> > from
> > > minimum and maximum resource configuration is consistent before
> starting
> > > SlotManager."
> > >
> > > Will the minimum resource configuration also take effect for streaming
> > jobs
> > > in application mode? Since it is not recommended to
> > > configure slotmanager.number-of-slots.max for streaming jobs, does it
> > make
> > > sense to disable it for common streaming jobs? At least disable the
> check
> > > for avoiding the oscillation?
> > >
> > > Best regards,
> > > Jing
> > >
> > >
> > > On Tue, Sep 19, 2023 at 4:58 PM Chen Zhanghao <
> zhanghao.c...@outlook.com
> > >
> > > wrote:
> > >
> > > > Thanks for driving this, Xiangyu. We use Session clusters for quick
> SQL
> > > > debugging internally, and found cold-start job submission slow due to
> > lack
> > > > of the exact minimum resource reservation feature proposed here. This
> > > > should improve the experience a lot for running short lived-jobs in
> > session
> > > > clusters.
> > > >
> > > > Best,
> > > > Zhanghao Chen
> > > > ________________________________
> > > > 发件人: Yangze Guo <karma...@gmail.com>
> > > > 发送时间: 2023年9月19日 13:10
> > > > 收件人: xiangyu feng <xiangyu...@gmail.com>
> > > > 抄送: dev@flink.apache.org <dev@flink.apache.org>
> > > > 主题: Re: [Discuss] FLIP-362: Support minimum resource limitation
> > > >
> > > > Thanks for driving this @Xiangyu. This is a feature that many users
> > > > have requested for a long time. +1 for the overall proposal.
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > > On Tue, Sep 19, 2023 at 11:48 AM xiangyu feng <xiangyu...@gmail.com>
> > > > wrote:
> > > > >
> > > > > Hi Devs,
> > > > >
> > > > > I'm opening this thread to discuss FLIP-362: Support minimum
> resource
> > > > limitation. The design doc can be found at:
> > > > > FLIP-362: Support minimum resource limitation
> > > > >
> > > > > Currently, the Flink cluster only requests Task Managers (TMs) when
> > > > there is a resource requirement, and idle TMs are released after a
> > certain
> > > > period of time. However, in certain scenarios, such as running short
> > > > lived-jobs in session cluster and scheduling batch jobs stage by
> > stage, we
> > > > need to improve the efficiency of job execution by maintaining a
> > certain
> > > > number of available workers in the cluster all the time.
> > > > >
> > > > > After discussed with Yangze, we introduced this new feature. The
> new
> > > > added public options and proposed changes are described in this FLIP.
> > > > >
> > > > > Looking forward to your feedback, thanks.
> > > > >
> > > > > Best regards,
> > > > > Xiangyu
> > > > >
> > > >
> >
>

Reply via email to