Yangze Guo created FLINK-20863:
----------------------------------
Summary: Exclude network memory from ResourceProfile
Key: FLINK-20863
URL: https://issues.apache.org/jira/browse/FLINK-20863
Project: Flink
Issue Type: Task
Reporter: Yangze Guo
Fix For: 1.13.0
Network memory is included in the current ResourceProfile implementation,
expecting the fine-grained resource management to not deploy too many tasks
onto a TM that require more network memory than the TM contains.
However, how much network memory each task needs highly depends on the shuffle
service implementation, and may vary when switching to another shuffle service.
Therefore, neither user nor the Flink runtime can easily specify network memory
requirements for a task/slot at the moment.
The concrete solution for network memory controlling is beyond the scope of
this FLIP. However, we are aware of a few potential directions for solving this
problem.
- Make shuffle services adaptively control the amount of memory assigned to
each task/slot, with respect to the given memory pool size. In this way, there
should be no need to rely on fine-grained resource management to control the
network memory consumption.
- Make shuffle services expose interfaces for calculating network memory
requirements for given SSGs. In this way, the Flink runtime can specify the
calculated network memory requirements for slots, without having to understand
the internal details of different shuffle service implementations.
As for now, we propose to exclude network memory from ResourceProfile for the
moment, to unblock the fine-grained resource management feature from the
network memory controlling issue. If needed, it can be added back in future, as
long as there’s a good way to specify the requirement.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)