mbalassi commented on code in PR #558: URL: https://github.com/apache/flink-kubernetes-operator/pull/558#discussion_r1154290969
########## flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/service/AbstractFlinkService.java: ########## @@ -627,14 +637,42 @@ public Map<String, String> getClusterInfo(Configuration conf) throws Exception { .toSeconds(), TimeUnit.SECONDS); - runtimeVersion.put( + clusterInfo.put( DashboardConfiguration.FIELD_NAME_FLINK_VERSION, dashboardConfiguration.getFlinkVersion()); - runtimeVersion.put( + clusterInfo.put( DashboardConfiguration.FIELD_NAME_FLINK_REVISION, dashboardConfiguration.getFlinkRevision()); } - return runtimeVersion; + + // JobManager resource usage can be deduced from the CR + var jmParameters = + new KubernetesJobManagerParameters( + conf, new KubernetesClusterClientFactory().getClusterSpecification(conf)); + var jmTotalCpu = + jmParameters.getJobManagerCPU() + * jmParameters.getJobManagerCPULimitFactor() + * jmParameters.getReplicas(); + var jmTotalMemory = + Math.round( + jmParameters.getJobManagerMemoryMB() + * Math.pow(1024, 2) + * jmParameters.getJobManagerMemoryLimitFactor() + * jmParameters.getReplicas()); + + // TaskManager resource usage is best gathered from the REST API to get current replicas Review Comment: There is a limit factor for TaskManager cores that Flink allows to be configured on top of the resources defined on the Kubernestes level, similarly to have I calculated the JobManager resources. I setup an example to validate your suggestion where I have one JM and TM each, with 0.5 cpus configured in the resources field each. The cpu limit factors are 1.0. We end up with 1.5 cpus (0.5 for the JM accurately reported and 1.0 for the TM). ``` jobManager: replicas: 1 resource: cpu: 0.5 memory: 2048m serviceAccount: flink taskManager: resource: cpu: 0.5 memory: 2048m status: clusterInfo: flink-revision: DeadD0d0 @ 1970-01-01T01:00:00+01:00 flink-version: 1.16.1 tm-cpu-limit-factor: "1.0" jm-cpu-limit-factor: "1.0" total-cpu: "1.5" total-memory: "4294967296" jobManagerDeploymentStatus: READY ``` It is a bit of a tough problem, because the Flink UI also shows 1 core for the TM (using the same value that we get from the REST API). <img width="1403" alt="Screenshot 2023-03-31 at 12 08 26" src="https://user-images.githubusercontent.com/5990983/229091963-f5e9a985-2ebe-4518-9623-6a4d4da9ad3c.png"> So ultimately we have to decide whether to stick with Flink or with Kubernetes, I am leaning towards the latter (with calculating in the limit factor, but avoiding the rounding). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org