[jira] [Commented] (FLINK-23403) Decrease default values for heartbeat timeout and interval

Till Rohrmann (Jira) Fri, 16 Jul 2021 02:53:04 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-23403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17381938#comment-17381938
 ]


Till Rohrmann commented on FLINK-23403:
---------------------------------------

Thanks a lot for this input [~fly_in_gis]. I actually intend to start a public 
discussion about the default values to gather more feedback because 
practitioners' experience is super important for finding good default values.

Are you adjusting the timeout and interval setting for your deployment or do 
you find that 50s/10s works will in your setup?

Which Java version and garbage collector are you using when you experience 
fullGCs that take longer than 10s?

If your network is under high load, do you also experience that data 
connections between {{TaskExecutors}} get separated and, therefore, experience 
task restarts? Or is it simply that messages will get delivered very slowly. 
How does Flink behave in this situation given that the default 
{{akka.ask.timeout}} is set to {{10 s}}. I would assume that all kinds of RPCs 
should fail and that this causes a job restart.

> Decrease default values for heartbeat timeout and interval
> ----------------------------------------------------------
>
>                 Key: FLINK-23403
>                 URL: https://issues.apache.org/jira/browse/FLINK-23403
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Configuration, Runtime / Coordination
>    Affects Versions: 1.14.0
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.14.0
>
>
> In order to speed up failure detection I suggest to decrease the default 
> values for the heartbeat timeout and interval from 50s/10s to 15s/3s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-23403) Decrease default values for heartbeat timeout and interval

Reply via email to