H Xiangyui, The sentiment of the FLIP makes sense, but I keep wondering whether this is the best way to think about the problem. I assume that "interactive session cluster" users always want to keep some spare resources around (up to a configured threshold) to reduce cold start instead of statically configuring the minimum.
It's just a tiny change from the original proposal, but it could make all the difference (eliminate overprovisioning, maintain latencies with a growing # of jobs, ..) WDYT? Best, D. On Mon, Sep 25, 2023 at 5:11 PM Jing Ge <j...@ververica.com.invalid> wrote: > Hi Yangze, > > Thanks for the clarification. The example of two batch jobs team up with > one streaming job is interesting. > > Best regards, > Jing > > On Wed, Sep 20, 2023 at 7:19 PM Yangze Guo <karma...@gmail.com> wrote: > > > Thanks for the comments, Jing. > > > > > Will the minimum resource configuration also take effect for streaming > > jobs in application mode? > > > Since it is not recommended to configure > slotmanager.number-of-slots.max > > for streaming jobs, does it make sense to disable it for common streaming > > jobs? At least disable the check for avoiding the oscillation? > > > > Yes. The minimum resource configuration will only disabled in > > standalone cluster atm. I agree it make sense to disable it for a pure > > streaming job, however: > > - By default, the minimum resource is configured to 0. If users do not > > proactively set it, either the oscillation check or the minimum > > restriction can be considered as disabled. > > - The minimum resource is a cluster-level configuration rather than a > > job-level configuration. If a user has an application with two batch > > jobs preceding the streaming job, they may also require this > > configuration to accelerate the execution of batch jobs. > > > > WDYT? > > > > Best, > > Yangze Guo > > > > On Thu, Sep 21, 2023 at 4:49 AM Jing Ge <j...@ververica.com.invalid> > > wrote: > > > > > > Hi Xiangyu, > > > > > > Thanks for driving it! There is one thing I am not really sure if I > > > understand you correctly. > > > > > > According to the FLIP: "The minimum resource limitation will be > > implemented > > > in the DefaultResourceAllocationStrategy of FineGrainedSlotManager. > > > > > > Each time when SlotManager needs to reconcile the cluster resources or > > > fulfill job resource requirements, the > DefaultResourceAllocationStrategy > > > will check if the minimum resource requirement has been fulfilled. If > it > > is > > > not, DefaultResourceAllocationStrategy will request new > > PendingTaskManagers > > > and FineGrainedSlotManager will allocate new worker resources > > accordingly." > > > > > > "To avoid this oscillation, we need to check the worker number derived > > from > > > minimum and maximum resource configuration is consistent before > starting > > > SlotManager." > > > > > > Will the minimum resource configuration also take effect for streaming > > jobs > > > in application mode? Since it is not recommended to > > > configure slotmanager.number-of-slots.max for streaming jobs, does it > > make > > > sense to disable it for common streaming jobs? At least disable the > check > > > for avoiding the oscillation? > > > > > > Best regards, > > > Jing > > > > > > > > > On Tue, Sep 19, 2023 at 4:58 PM Chen Zhanghao < > zhanghao.c...@outlook.com > > > > > > wrote: > > > > > > > Thanks for driving this, Xiangyu. We use Session clusters for quick > SQL > > > > debugging internally, and found cold-start job submission slow due to > > lack > > > > of the exact minimum resource reservation feature proposed here. This > > > > should improve the experience a lot for running short lived-jobs in > > session > > > > clusters. > > > > > > > > Best, > > > > Zhanghao Chen > > > > ________________________________ > > > > 发件人: Yangze Guo <karma...@gmail.com> > > > > 发送时间: 2023年9月19日 13:10 > > > > 收件人: xiangyu feng <xiangyu...@gmail.com> > > > > 抄送: dev@flink.apache.org <dev@flink.apache.org> > > > > 主题: Re: [Discuss] FLIP-362: Support minimum resource limitation > > > > > > > > Thanks for driving this @Xiangyu. This is a feature that many users > > > > have requested for a long time. +1 for the overall proposal. > > > > > > > > Best, > > > > Yangze Guo > > > > > > > > On Tue, Sep 19, 2023 at 11:48 AM xiangyu feng <xiangyu...@gmail.com> > > > > wrote: > > > > > > > > > > Hi Devs, > > > > > > > > > > I'm opening this thread to discuss FLIP-362: Support minimum > resource > > > > limitation. The design doc can be found at: > > > > > FLIP-362: Support minimum resource limitation > > > > > > > > > > Currently, the Flink cluster only requests Task Managers (TMs) when > > > > there is a resource requirement, and idle TMs are released after a > > certain > > > > period of time. However, in certain scenarios, such as running short > > > > lived-jobs in session cluster and scheduling batch jobs stage by > > stage, we > > > > need to improve the efficiency of job execution by maintaining a > > certain > > > > number of available workers in the cluster all the time. > > > > > > > > > > After discussed with Yangze, we introduced this new feature. The > new > > > > added public options and proposed changes are described in this FLIP. > > > > > > > > > > Looking forward to your feedback, thanks. > > > > > > > > > > Best regards, > > > > > Xiangyu > > > > > > > > > > > >