mxm commented on PR #799:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/799#issuecomment-2012067487

   > > Autoscaling wouldn't have a chance to realize its SLOs.
   > 
   > You are right. Autoscaler supports scaling parallelism and memory for now. 
As I understand, the downtime cannot be guaranteed even if users only use 
scaling parallelism. For example, flink jobs don't use the Adaptive Scheduler 
and the input rate is always changed, then flink jobs will be scaled frequently.
   
   I agree that there are edge cases where the autoscaler cannot fulfill its 
service objectives. However, that doesn't mean we need to give up on them 
entirely. With restarts due to autotuning at any point in time, the autoscaling 
algorithm is inherently broken because downtime is never factored into the 
autoscaling decision. 
   
   You mentioned the adaptive scheduler. Frankly, the use of the adaptive 
scheduler with autoscaling isn't fully developed. I would discourage users from 
using it with autoscaling at its current state.
   
   > Fortunately, scaling parallelism consider the restart time than scaling 
memory, and then increase some parallelisms.
   
   +1
   
   > 
   > > For this feature to be mergable, it will either have to be disabled by 
default (opt-in via config)
   > 
   > IIUC, `job.autoscaler.memory.tuning.enabled` is disabled by default. It 
means the memory tuning is turned off by default even if this PR is merged, 
right?
   
   Autoscaling is also disabled by default. I think we want to make sure 
autoscaling and autotuning work together collaboratively.
   
   > 
   > > or be integrated with autoscaling, i.e. figure out a way to balance 
tuning / autoscaling decisions and feed back tuning decisions to the 
autoscaling algorithm to scale up whenever we redeploy for memory changes to 
avoid falling behind and preventing autoscaling to scale up after downtime due 
to memory reconfigurations.
   > 
   > The restartTime has been considered during `computeScalingSummary`, but we 
may ignore it due to the new parallelism is `WithinUtilizationTarget`. Do you 
mean we force adjust the parallelism to the new parallelism when scaling memory 
happens even if the new parallelism `WithinUtilizationTarget`?
   
   True, the rescale time has been considered for the downscale / upscale 
processing capacity, but the current processing capacity doesn't factor in 
downtime. Unplanned restarts would reduce the processing capacity. If we know 
we are going to restart, the autoscaling algorithm should factor this in, e.g. 
by reducing the calculated processing capacity accordingly.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to