[
https://issues.apache.org/jira/browse/FLINK-30680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758320#comment-17758320
]
Matt Wang commented on FLINK-30680:
-----------------------------------
[~gyfora] [~mxm] Hi, can I take this Jira? I am interested in this. We have a
lot of internal Flink jobs. Now the *Slow-TaskManager Dection mechanism*
function has launched tens of thousands of jobs internally, and the effective
processing can reach about 1,000 times a day which can reduce the lag of job.
We want to Contribute this ferature back to the community and we'll come up
with a design document to discuss with you
> Consider using the autoscaler to detect slow taskmanagers
> ---------------------------------------------------------
>
> Key: FLINK-30680
> URL: https://issues.apache.org/jira/browse/FLINK-30680
> Project: Flink
> Issue Type: New Feature
> Components: Autoscaler, Kubernetes Operator
> Reporter: Gyula Fora
> Priority: Major
>
> We could leverage logic in the autoscaler to detect slow taskmanagers by
> comparing the per-record processing times between them.
> If we notice that all subtasks on a single TM are considerably slower than
> the rest (at similar input rates) we should try simply restarting the job
> instead of scaling it up.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)