[jira] [Created] (FLINK-24819) Higher cpu load after using SharedIndexInformer replaced naked Kubernetes watch

Yang Wang (Jira) Mon, 08 Nov 2021 01:45:08 -0800

Yang Wang created FLINK-24819:
---------------------------------

             Summary: Higher cpu load after using SharedIndexInformer replaced 
naked Kubernetes watch
                 Key: FLINK-24819
                 URL: https://issues.apache.org/jira/browse/FLINK-24819
             Project: Flink
          Issue Type: Improvement
          Components: Deployment / Kubernetes
    Affects Versions: 1.14.0
            Reporter: Yang Wang



In FLINK-22054, Flink has used a shared informer for ConfigMap to replace the 
naked K8s watch. After then, each Flink JVM process(JM/TM) only needs one 
connection to APIServer for ConfigMap watching. It aims to reduce the network 
pressure on K8s APIServer.

 

However, in our recent tests, we found that the CPU and memory cost of 
APIServer have been doubled while running same Flink workloads. After digging 
more details in the K8s, I think the root cause might be that ETCD does not 
have indexes for labels. It means APIServer need to pull all the events from 
ETCD for each watch and then filter with specified labels(e.g. 
app=xxx,type=flink-native-kubernetes,configmap-type=high-availability) 
internally. Before FLINK-22054, we started a dedicated connection for each 
ConfigMap watching. And it seems that APIServer only need to pull the events 
for the specified ConfigMap name.

 

Watch URL example(Before):

[https://kubernetes.default:6443/api/v1/namespaces/vvp-workload/configmaps?metadata.name=job-009d4f51-ca02-4793-a49b-a3344538719b-resourcemanager-leader&watch=true|https://kubernetes.default:6443/api/v1/namespaces/vvp-workload/configmaps?labelSelector=app%3Dk8s-ha-app-1-1636077491-23461%2Ctype%3Dflink-native-kubernetes%2Cconfigmap-type%3Dhigh-availability&resourceVersion=1153687321&watch=true]

 

Watch URL example(After):

[https://kubernetes.default:6443/api/v1/namespaces/vvp-workload/configmaps?labelSelector=app%3Dk8s-ha-app-1-1636077491-23461%2Ctype%3Dflink-native-kubernetes%2Cconfigmap-type%3Dhigh-availability&resourceVersion=1153687321&watch=true]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-24819) Higher cpu load after using SharedIndexInformer replaced naked Kubernetes watch

Reply via email to