Hi all,
I opened FLINK-39404 JIRA bug which is addressed by the #27899 PR and I
want to open a discussion thread around this [1][2].
This fixes incorrect CPU core reporting in containerized environments. When
a container has a fractional CPU limit (e.g. 0.5 on Kubernetes), the REST
API currently reports "cpuCores": 1 because
HardwareDescription.numberOfCPUCores is int.
The fix detects the actual container CPU limit from cgroup files and
changes cpuCores from *int* to double, so the REST API would return "cpuCores":
0.5.
The problem is that it is a REST API breaking change. Any custom tooling or
monitoring that deserializes hardware.cpuCores as an integer will break
when the value is fractional.
If this is too disruptive, an alternative would be to keep the existing
cpuCores as int (backward compatible) and add a new cpuCoresFractional
field as double:
{
"hardware": {
"cpuCores": 1,
"cpuCoresFractional": 0.5
}
}
The Web UI would then use the new field. The old field could be deprecated
over time.
Feedback and suggestions are very welcome. Please let me know what is the
best way to proceed here!
Best regards,
Dennis
[1] https://issues.apache.org/jira/browse/FLINK-39404
[2] https://github.com/apache/flink/pull/27899