Yang Wang created FLINK-17127:
---------------------------------

             Summary: Make pod creating retry interval configurable
                 Key: FLINK-17127
                 URL: https://issues.apache.org/jira/browse/FLINK-17127
             Project: Flink
          Issue Type: New Feature
          Components: Deployment / Kubernetes
            Reporter: Yang Wang


Follow the discussion in this PR[1].

In the current implementation, the {{POD_CREATION_RETRY_INTERVAL}} is set to 
fixed value with "3s", which means when creating a taskmanager pod failed, we 
will schedule a delay retry in 3s. It could work for most cases. However, we 
still have a risk that too many retried of different Flink clusters will flood 
to Kubernetes api server. So we need to add an initial and max setting for 
retry interval, similar to 
{{NETWORK_REQUEST_BACKOFF_INITIAL/NETWORK_REQUEST_BACKOFF_MAX}}.

 

 

[1]. https://github.com/apache/flink/pull/11427#discussion_r406318451



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to