Hi All,
Recently we have been experimenting using Flinkās history server as a
centralized debugging service for completed streaming jobs.
Specifically, we dynamically generate links to access log files on the YARN
host; in the meantime, we use the Flink history server to show job graphs,
exceptions and other info of the completed jobs[2].
This causes some pain for our users, namely: It is inconvenient to go to
YARN host to access logs; then go to Flink history server for the other
information.
Thus we would like to propose an improvement to the currently Flink history
server:
-
To support dynamic links to residual log files from the host machine
within the retention period [3];
-
To support dynamic links to aggregated log files provided by the
cluster, if supported: such as Hadoop HistoryServer[1], or Kubernetes
cluster level logging[4]?
-
Similar integration with Hadoop HistoryServer was already proposed
before[5] with slightly different approach.
Any feedback and suggestions are highly appreciated!
--
Rong
[1]
https://hadoop.apache.org/docs/r2.9.2/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/HistoryServerRest.html
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.9/monitoring/historyserver.html
[3]
https://hadoop.apache.org/docs/r2.9.2/hadoop-yarn/hadoop-yarn-common/yarn-default.xml#yarn.nodemanager.log.retain-seconds
[4]
https://kubernetes.io/docs/concepts/cluster-administration/logging/#cluster-level-logging-architectures
[5] https://issues.apache.org/jira/browse/FLINK-14317