Dear devs, Currently, for log output, Flink does not explicitly distinguish between framework logs and user logs. In Task Manager, logs from the framework are intermixed with the user's business logs. In some deployment models, such as Standalone or YARN session, there are different task instances of different jobs deployed in the same Task Manager. It makes the log event flow more confusing unless the users explicitly use tags to distinguish them and it makes locating problems more difficult and inefficient. For YARN job cluster deployment model, this problem will not be very serious, but we still need to artificially distinguish between the framework and the business log. Overall, we found that Flink's existing log model has the following problems:
- Framework log and business log are mixed in the same log file. There is no way to make a clear distinction, which is not conducive to problem location and analysis; - Not conducive to the independent collection of business logs; Therefore, we propose a mechanism to separate the framework and business log. It can split existing log files for Task Manager. Currently, it is associated with two JIRA issue: - FLINK-11202[1]: Split log file per job - FLINK-11782[2]: Enhance TaskManager log visualization by listing all log files for Flink web UI We have implemented and validated it in standalone and Flink on YARN (job cluster) mode. sketch 1: [image: flink-web-ui-taskmanager-log-files.png] sketch 2: [image: flink-web-ui-taskmanager-log-files-2.png] Design documentation : https://docs.google.com/document/d/1TTYAtFoTWaGCveKDZH394FYdRyNyQFnVoW5AYFvnr5I/edit?usp=sharing Best, Vino [1]: https://issues.apache.org/jira/browse/FLINK-11202 [2]: https://issues.apache.org/jira/browse/FLINK-11782