Hi All,

Recently we have been experimenting using Flinkā€™s history server as a
centralized debugging service for completed streaming jobs.

Specifically, we dynamically generate links to access log files on the YARN
host; in the meantime, we use the Flink history server to show job graphs,
exceptions and other info of the completed jobs[2].

This causes some pain for our users, namely: It is inconvenient to go to
YARN host to access logs; then go to Flink history server for the other
information.

Thus we would like to propose an improvement to the currently Flink history
server:

   -

   To support dynamic links to residual log files from the host machine
   within the retention period [3];
   -

   To support dynamic links to aggregated log files provided by the
   cluster, if supported: such as Hadoop HistoryServer[1], or Kubernetes
   cluster level logging[4]?
   -

      Similar integration with Hadoop HistoryServer was already proposed
      before[5] with slightly different approach.


Any feedback and suggestions are highly appreciated!

--

Rong

[1]
https://hadoop.apache.org/docs/r2.9.2/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/HistoryServerRest.html

[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.9/monitoring/historyserver.html

[3]
https://hadoop.apache.org/docs/r2.9.2/hadoop-yarn/hadoop-yarn-common/yarn-default.xml#yarn.nodemanager.log.retain-seconds

[4]
https://kubernetes.io/docs/concepts/cluster-administration/logging/#cluster-level-logging-architectures
[5] https://issues.apache.org/jira/browse/FLINK-14317

Reply via email to