Yunfeng Zhou created FLINK-38291:
------------------------------------
Summary: Reduce thread lock overhead for Flink UI REST handlers
Key: FLINK-38291
URL: https://issues.apache.org/jira/browse/FLINK-38291
Project: Flink
Issue Type: Improvement
Components: Runtime / REST
Affects Versions: 2.1
Reporter: Yunfeng Zhou
In some of the Flink jobs in our company, we found that if the job has a
sophisticated logic and the parallelism (number of subtasks) is about 512 or
1024, it may took more than one minute for the Flink UI to display the DAG of
the job.
Debugging into the corresponding REST handlers, we found that the latency is
caused by repeated visits to synchronized methods like MetricStore#
getSubtaskMetricStore. When invoking such methods, the thread might need to
wait for other synchronized methods to release the lock before it can enter the
method, and such overhead accumulates when the invocation is repeated.
Thus we propose to reduce the number of visits to these synchronized methods to
reduce the latency for DAG displaying.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)