[ https://issues.apache.org/jira/browse/FLINK-31589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
liuzhuo updated FLINK-31589: ---------------------------- Description: In the production environment, we found that if a cluster-level failure occurs, the jobs of a cluster will failover at the same time, which puts a lot of pressure on the k8s cluster. Through the monitoring of k8s, we found that the method of list pod caused relatively high pressure. By querying the relevant information of fabric8, we found an optimization scheme for the list pod, that is, adding the parameter of {code:java} ResourceVersion=0{code} in the list pod can greatly reduce the pressure on the k8s cluster. After adding this configuration, even if a large number of jobs failover, it does not cause too much pressure on the cluster and has achieved certain results [link|https://github.com/fabric8io/kubernetes-client/issues/4670] was: In the production environment, we found that if a cluster-level failure occurs, the jobs of a cluster will failover at the same time, which puts a lot of pressure on the k8s cluster. Through the monitoring of k8s, we found that the method of list pod caused relatively high pressure. By querying the relevant information of fabric8, we found an optimization scheme for the list pod, that is, adding the parameter of {code:java} ResourceVersion=0{code} in the list pod can greatly reduce the pressure on the k8s cluster [link|https://github.com/fabric8io/kubernetes-client/issues/4670] > Reduce the pressure of the list pod method on the k8s cluster > ------------------------------------------------------------- > > Key: FLINK-31589 > URL: https://issues.apache.org/jira/browse/FLINK-31589 > Project: Flink > Issue Type: Improvement > Components: Deployment / Kubernetes > Reporter: liuzhuo > Priority: Minor > > In the production environment, we found that if a cluster-level failure > occurs, the jobs of a cluster will failover at the same time, which puts a > lot of pressure on the k8s cluster. > Through the monitoring of k8s, we found that the method of list pod caused > relatively high pressure. > By querying the relevant information of fabric8, we found an optimization > scheme for the list pod, that is, adding the parameter of > {code:java} > ResourceVersion=0{code} > in the list pod can greatly reduce the pressure on the k8s cluster. > After adding this configuration, even if a large number of jobs failover, it > does not cause too much pressure on the cluster and has achieved certain > results > [link|https://github.com/fabric8io/kubernetes-client/issues/4670] -- This message was sent by Atlassian Jira (v8.20.10#820010)