Re: About Native Deployment's Autoscaling implementation

David Morávek Mon, 23 May 2022 00:22:54 -0700

Hi Talat,

This is definitely an interesting and rather complex topic.


Few unstructured thoughts / notes / questions:

- The main struggle has always been that it's hard to come up with a
generic one-size-fits-it-all metrics for autoscaling.
  - Flink doesn't have knowledge of the external environment (eg. capacity
planning on the cluster, no notion of pre-emption), so it can not really
make a qualified decision in some cases.
  - ^ the above goes along the same reasoning as why we don't support
reactive mode with the session cluster (multi-job scheduling)
- The re-scaling decision logic most likely needs to be pluggable from the
above reasons
  - We're in general fairly concerned about running any user code in JM for
stability reasons.
  - The most flexible option would be allowing to set the desired
parallelism via rest api and leave the scaling decision to an external
process, which could be reused for both standalone and "active" deployment
modes (there is actually a prototype by Till, that allows this [1])

How do you intend to make an autoscaling decision? Also note that the
re-scaling is still a fairly expensive operation (especially with large
state), so you need to make sure autoscaler doesn't oscillate and doesn't
re-scale too often (this is also something that could vary from workload to
workload).

Note on the metrics question with an auto-scaler living in the JM:
- We shouldn't really collect the metrics into the JM, but instead JM can
pull then from TMs directly on-demand (basically the same thing and
external auto-scaler would do).

Looking forward to your thoughts

[1] https://github.com/tillrohrmann/flink/commits/autoscaling

Best,
D.

On Mon, May 23, 2022 at 8:32 AM Talat Uyarer <[email protected]>
wrote:

> Hi,
> I am working on auto scaling support for native deployments. Today Flink
> provides Reactive mode however it only runs on standalone deployments. We
> use Kubernetes native deployment. So I want to increase or decrease job
> resources for our streamin jobs. Recent Flip-138 and Flip-160 are very
> useful to achieve this goal. I started reading code of Flink JobManager,
> AdaptiveScheduler and DeclarativeSlotPool etc.
>
> My assumption is Required Resources will be calculated on AdaptiveScheduler
> whenever the scheduler receives a heartbeat from a task manager by calling
> public void updateAccumulators(AccumulatorSnapshot accumulatorSnapshot)
> method.
>
> I checked TaskExecutorToJobManagerHeartbeatPayload class however I only see
> *accumulatorReport* and *executionDeploymentReport* . Do you have any
> suggestions to collect metrics from TaskManagers ? Should I add metrics on
> TaskExecutorToJobManagerHeartbeatPayload ?
>
> I am open to another suggestion for this. Whenever I finalize my
> investigation. I will create a FLIP for more detailed implementation.
>
> Thanks for your help in advance.
> Talat
>

Re: About Native Deployment's Autoscaling implementation

Reply via email to