Re: [DISCUSS] FLIP-XXX: Aligning timeout logic in the AdaptiveScheduler's WaitingForResources and Executing states

Zdenek Tison Wed, 24 Jul 2024 05:22:57 -0700

Hi Gyula,

Thank you for reviewing the document and providing feedback.


   1. I agree that we need two separate parameters for stabilization
   intervals in different states. I will update the FLIP document accordingly.
   2. That's correct. We reached the same conclusion while prototyping the
   implementation. I will add a new bullet point to the FLIP document.

Thanks a lot.

Regards,
Zdenek


On Tue, Jul 23, 2024 at 3:02 PM Gyula Fóra <[email protected]> wrote:

> Hi All!
>
> Thank you for the proposal, I think it will be great to simplify the
> current rescaling flow to make it more digestible :)
>
> I have 2 comments:
>
> 1. Related to what Matthias already pointed out, I think in production
> scenarios it may be a typical requirement to have a fairly short
> stabilization interval for job startup (reduce downtime) but overall a
> longer stabilization period for Executing jobs before rescaling to avoid
> fluctuations and therefore reduce downtime. I think it would be very
> important to have 2 configs for that, one could fall back to the other of
> course if undefined.
>
> 2. The document mentions that the stabilization period for executing jobs
> is measured from the first resource event. I feel that if after the
> stabilization period we dont have sufficient resources we should completely
> reset this timer and start the timeout from 0 when the next event arrives.
> This will be more in line with the concept of stabilization, otherwise if
> you receive a batch of new resources you may not utilize it because as soon
> as you have sufficient we rescale immediately.
>
> Cheers,
> Gyula
>
>
>
> On Thu, Jul 18, 2024 at 9:58 AM Zdenek Tison <[email protected]>
> wrote:
>
> > Thanks, Mathias, for your opinions.
> >
> > I see two scenarios where different values for starting and rescaling
> would
> > be appropriate:
> >
> > 1) Flink serverless providers may prefer the fastest possible job startup
> > time, which can also be achieved by setting a smaller value for the
> > stabilization timeout, such as 1 second, in the WaitingForResources
> state.
> > Conversely, to ensure maximum job uptime, it would be prudent to increase
> > the stabilization period for rescaling to a higher value, such as 1
> minute,
> > to handle server/node maintenance effectively.
> >
> > 2) In Reactive mode, the stabilization period is set to 0 by default.
> > Setting a different default value for the rescale state could enhance job
> > stability during node maintenance, especially since the parameter
> > min-parallelism-increase is no longer applicable.
> >
> > Regards,
> >
> > Zdenek
> >
> > On Tue, Jul 16, 2024 at 5:49 PM Matthias Pohl <[email protected]> wrote:
> >
> > > Thanks Zdenek for your proposal on aligning the resource control logic
> > > within the AdaptiveScheduler and cleaning up the rescaling code.
> > >
> > > Consolidating the parameters and the code as part of the 2.0 release
> > makes
> > > sense in my opinion: The proposed change adds consistent behavior to
> the
> > > WaitingForResources and Executing states of the AdaptiveScheduler and
> > irons
> > > out some flaws of the current implementation. This should help users
> get
> > a
> > > clearer picture of the resource control logic. Removing obsolete
> rescale
> > > waiting time if only sufficient resources are available is also a nice
> > > improvement.
> > >
> > > The j.a.min-parallelism-increase [1] parameter became kind of obsolete
> > with
> > > the introduction of the rescale REST endpoint in FLIP-291 [2] as you
> > > pointed out in the FLIP. So, deprecating it sounds reasonable.
> > >
> > > On the topic of replacing the j.a.scaling-interval.max parameter [3]
> with
> > > the j.a.resource-stabilization-timeout [4]: I'm in favor of reducing
> the
> > > complexity of the Flink configuration. Therefore, using one parameter
> for
> > > both (WaitingForResources and Executing state) to stabilize the
> resources
> > > sounds like a good idea.
> > >
> > > I'm wondering whether there are scenarios, where we would want to have
> > > different stabilization timeouts for starting (WaitingForResources) and
> > > rescaling (Executing) a job. In that case, having two resource
> > > stabilization parameters (one job starts and one for rescales) with one
> > > being the fallback for the other is a straight-forward solution.
> > >
> > > Just as a side note because it came up: Keep in mind that FLIP-461
> still
> > > allows for immediate rescaling on a change event if checkpointing is
> > > disabled or j.a.max-delay-for-scale-trigger [5] is configured
> > accordingly.
> > >
> > > Best,
> > > Matthias
> > >
> > > [1]
> > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-min-parallelism-increase
> > > [2]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-291%3A+Externalized+Declarative+Resource+Management
> > > [3]
> > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-scaling-interval-max
> > > [4]
> > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-resource-stabilization-timeout
> > > [5]
> > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-max-delay-for-scale-trigger
> > >
> > >
> > >
> > > On Tue, Jul 16, 2024 at 3:05 PM Zdenek Tison
> <[email protected]
> > >
> > > wrote:
> > >
> > > > Hi, I'd like to move a discussion from Google Docs to the mailing
> list
> > so
> > > > that it's visible to everyone.
> > > >
> > > > *Yuanfeng Hu* brought up two concerns:
> > > >
> > > > 1) Related to the resource-stabilization-timeout,he thinks 10s May be
> > too
> > > > short. In a container environment, if the number of tm added by rest
> > > > requests is greater than 1, the tm initialization time may be much
> > longer
> > > > than 10s.
> > > >
> > > > and
> > > >
> > > > 2) He proposed a little scenario:
> > > > There is 1 slot in the entire cluster. At this time, my task is
> running
> > > at
> > > > 1 parallelism (the required slot is also 1). Then I add a tm(1slot),
> > > which
> > > > will obviously trigger a change event, and it will become stable
> after
> > 10
> > > > seconds. If I change the required resources to 3 through rest at this
> > > time,
> > > > rescale will be triggered immediately. and runs at a parallelism of
> 2,
> > Is
> > > > this the expected result, or do we expect that the Rescale will be
> > > > triggered after adding another tm, because this exactly matches the
> > > > required resources
> > > >
> > > > Thank you, *Yuanfeng Hu, *for opening the discussion.
> > > >
> > > >
> > > >
> > >
> >
> ---------------------------------------------------------------------------------------
> > > >
> > > > 1) Regarding the stabilization period:
> > > >
> > > > I am unsure what you mean by the part, 'if the number of tm added by
> > rest
> > > > requests is greater than 1.' However, I understand that it can take
> > some
> > > > time to spawn additional containers/pods in a containerized
> > environment.
> > > On
> > > > the other hand, if a user adds more TMs, for instance, by increasing
> > the
> > > > number of replicas in a Kubernetes deployment, these replicas should
> > > appear
> > > > with some delay but at a similar time, correct?
> > > >
> > > > It's worth mentioning that since  FLIP-461
> > > > <
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-461%3A+Synchronize+rescaling+with+checkpoint+creation+to+minimize+reprocessing+for+the+AdaptiveScheduler
> > > > >,
> > > > the
> > > > rescale operation is synchronized with checkpoint events, so the
> > rescale
> > > > doesn't happen right after this timeout expires.
> > > >
> > > > If we believe it is necessary to have different values for the
> > > > stabilization period in the Executing and WaitingForResources states,
> > > even
> > > > though this increases configuration complexity slightly, we could
> have
> > > > separate parameters for these two states:
> > > > jobmanager.adaptive-scheduler.resource-stabilization-timeout
> > > > <
> > > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-resource-stabilization-timeout
> > > > >
> > > >  and *jobmanager.adaptive-scheduler.scaling-stabilization-timeout
> > > > *(replacing
> > > > the jobmanager.adaptive-scheduler.scaling-interval.max
> > > > <
> > > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-scaling-interval-max
> > > > >
> > > > ).
> > > >
> > > >
> > > > *2) *Regarding the proposed scenario:
> > > >
> > > > The same behavior occurs in the current Flink version when the
> > > > `min-parallelism-increase` is set to its default value 1. In this
> case,
> > > the
> > > > rescale operation is triggered immediately or aligned with the
> > checkpoint
> > > > event (specified in FLIP-461).
> > > > So, I would say the behavior is expected.
> > > > Additionally, users can configure the rescaling behavior. For
> example,
> > > if a
> > > > user sets the lower bound parallelism to 2 and the upper bound to 3,
> > the
> > > > system will rescale after 10 seconds. Alternatively, if the user sets
> > the
> > > > same value for the lower and upper bounds, the rescale operation will
> > > wait
> > > > until all slots are available.
> > > >
> > > > Best Regrads,
> > > > Zdenek Tison
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Jul 11, 2024 at 2:38 PM Zdenek Tison <[email protected]>
> > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > Our team has been working on several improvements for
> > > AdaptiveScheduler,
> > > > > specifically focusing on aligning logic and timeouts in the
> > > > > WaitingForResources and Executing states. We believe these
> > enhancements
> > > > > will improve the adaptive scheduler's robustness and
> maintainability.
> > > > >
> > > > > For more detailed information, please refer to the FLIP document.
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1YeYSs64LqgUr3xyBTCjiRE-CT5VEyHjGjqxnxKPIQhM/edit?usp=sharing
> > > > >
> > > > > Thanks,
> > > > > Zdenek Tison
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-XXX: Aligning timeout logic in the AdaptiveScheduler's WaitingForResources and Executing states

Reply via email to