Xintong Song created FLINK-17390:
------------------------------------
Summary: Container resource cannot be mapped on Hadoop 2.10+
Key: FLINK-17390
URL: https://issues.apache.org/jira/browse/FLINK-17390
Project: Flink
Issue Type: Bug
Components: Deployment / YARN
Affects Versions: 1.11.0
Reporter: Xintong Song
Fix For: 1.11.0
In FLINK-16438, we introduced {{WorkerSpecContainerResourceAdapter}} for
mapping Yarn container {{Resource}} with Flink {{WorkerResourceSpec}}. Inside
this class, we use {{Resource}} for hash map keys and set elements, assuming
that {{Resource}} instances that describes the same set of resources have the
same hash code.
This assumption is not always true. {{Resource}} is an abstract class and may
have different implementations. In Hadoop 2.10+, {{LightWeightResource}}, a new
implementation of {{Resource}}, is introduced for {{Resource}} generated by
{{Resource.newInstance}} on the AM side, which overrides the {{hashCode}}
method. That means, a {{Resource}} generated on AM may have a different hash
code compared to an equal {{Resource}} returned from Yarn.
To solve this problem, we may introduce an {{InternalResource}} as an inner
class of {{WorkerSpecContainerResourceAdapter}}, with {{hashCode}} method
depends only on the fields needed by Flink (ATM memroy and vcores).
{{WorkerSpecContainerResourceAdapter}} should only use {{InternalResource}} for
internal state management, and do conversions for {{Resource}} passed into and
returned from it.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)