Hi all, Yun Tang and I are opening this thread to discuss our proposal to integrate async-profiler's capabilities for profiling taskmananger (e.g., generating flame graphs) in the Flink Web [1].
Currently, Flink provides ThreadDump and Operator-Level Flame Graphs by sampling task threads. The results generated in such way missing the relevant stack of java threads and system calls. The async-profiler[2] is a low-overhead sampling profiler for Java, but the steps to use it in the production environment are cumbersome and suffer from permissions and security risks. Therefore, we propose adding rest APIs to provide the capability to invoke async-profiler on multiple platforms through JNI, which can be easily operated on Web UI. This enhancement will improve the efficiency and experience of Flink users in identifying performance bottlenecks. Please refer to the FLIP document for more details about the proposed design and implementation. We welcome any feedback and opinions on this proposal. [1] FLIP-375: Built-in cross-platform powerful java profiler on taskmanagers - Apache Flink - Apache Software Foundation <https://cwiki.apache.org/confluence/display/FLINK/FLIP-375%3A+Built-in+cross-platform+powerful+java+profiler+on+taskmanagers> [2] GitHub - async-profiler/async-profiler: Sampling CPU and HEAP profiler for Java featuring AsyncGetCallTrace + perf_events <https://github.com/async-profiler/async-profiler> Best regards, Yun Tang and Yu Chen