[jira] [Commented] (FLINK-30680) Consider using the autoscaler to detect slow taskmanagers

Matt Wang (Jira) Wed, 23 Aug 2023 20:04:05 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-30680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758320#comment-17758320
 ]


Matt Wang commented on FLINK-30680:
-----------------------------------

[~gyfora] [~mxm] Hi, can I take this Jira? I am interested in this. We have a 
lot of internal Flink jobs. Now the *Slow-TaskManager Dection mechanism* 
function has launched tens of thousands of jobs internally, and the effective 
processing can reach about 1,000 times a day which can reduce the lag of job. 
We want to Contribute this ferature back to the community and we'll come up 
with a design document to discuss with you

> Consider using the autoscaler to detect slow taskmanagers
> ---------------------------------------------------------
>
>                 Key: FLINK-30680
>                 URL: https://issues.apache.org/jira/browse/FLINK-30680
>             Project: Flink
>          Issue Type: New Feature
>          Components: Autoscaler, Kubernetes Operator
>            Reporter: Gyula Fora
>            Priority: Major
>
> We could leverage logic in the autoscaler to detect slow taskmanagers by 
> comparing the per-record processing times between them.
> If we notice that all subtasks on a single TM are considerably slower than 
> the rest (at similar input rates) we should try simply restarting the job 
> instead of scaling it up.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-30680) Consider using the autoscaler to detect slow taskmanagers

Reply via email to