Yangze Guo created FLINK-22505: ---------------------------------- Summary: Limit the precision of Resource Key: FLINK-22505 URL: https://issues.apache.org/jira/browse/FLINK-22505 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.13.0 Reporter: Yangze Guo
In our internal deployment, we found that a high precision {{CPUResource}} may cause the required resource never to be fulfilled. Think about the following scenario: - The {{SlotManager}} receives a slot request with 1.000000000000001 CPU and decides to allocate a pending task manager with that resource spec. - The resource manager starts a task manager and sets the CPU by dynamic config. In this step, we cast the {{CPUResource}} to a double value, where the precision loss happens. The task manager will finally register with 1.0 CPU and thus can not deduct any pending task manager or fulfill the slot request. To solve that issue, we proposed to limit the precision of Resource to a safe value, e.g. 8, to prevent the precision loss when cast to double. - For {{CPUResource}}, the supported scale for the CPU is 3 in k8s while in Yarn, the CPU should be an integer. - For {{ExternalResource}}, the value will always be treated as an integer. -- This message was sent by Atlassian Jira (v8.3.4#803005)