Xintong Song created FLINK-19324:
------------------------------------
Summary: Map requested/allocated containers with priority on YARN
Key: FLINK-19324
URL: https://issues.apache.org/jira/browse/FLINK-19324
Project: Flink
Issue Type: Bug
Components: Deployment / YARN
Reporter: Xintong Song
In the design doc of FLINK-14106, there was a
[discussion|https://docs.google.com/document/d/1f8imSus3QwKEUPAldzR8CSMjZ-2a9O17-rn4oeKGtqw/edit?disco=AAAAGPX_tmg]
on how we map allocated containers with the requested ones on YARN. We
rejected the design option that uses container priorities for mapping
containers of different resources, because we do not want to priorities
different container requests (which is the original purpose for this field). As
a result, we have to interpret how the requested container request would be
normalized by Yarn, and map the allocated/requested containers accordingly,
which is complicated and fragile. See also FLINK-19151.
Recently in our POC for fine grained resource management, we surprisingly
discovered that Yarn actually doesn't work with container requests same
priority and different resources. I do not find this described as an official
protocol in any Yarn's documents. The issue has been raised in early Yarn
versions (YARN-314) and has not been fixed util Hadoop 2.9 when
{{allocationRequestId}} is introduced. In Hadoop 2.8, Yarn scheduler is still
internally using priority as the key of a container request (see
[AppSchedulingInfo#updateResourceRequests
|https://github.com/apache/hadoop/blob/eb818cdc64336ade273a960ba3b9b5a5d0c4d4ec/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java#L341]),
thus requests same priority and different resources would overwrite each other.
The new discovery suggests that, if we want to support containers with
different resources on Hadoop 2.8 and earlier versions, we have to give them
different priorities anyway. Thus, I would suggest to get rid of the container
normalization simulation and go back to the previously rejected priority based
design option.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)