Thanks Yun and Yu for driving this proposal! It's very useful for troubleshooting why the CPU usage is high. +1
Best, Rui On Mon, Oct 9, 2023 at 7:21 PM Zhanghao Chen <zhanghao.c...@outlook.com> wrote: > Hi Yun and Yu, > > Thanks for driving this. This would definitely help users identify > performance bottlenecks, especially for the cases where the bottleneck lies > in the system stack (e.g. GC), and big +1 for the downloadable flamegraph > to ease sharing. I'm wondering if we could add this for the job manager as > well. In the OLAP scenario and sometimes in the streaming scenario (when > there're some heavy operations during execution plan generation or in > operator coordinators), the JM can have bottleneck as well. > > Best, > Zhanghao Chen > ________________________________ > From: Yu Chen <yuchen.e...@gmail.com> > Sent: Monday, October 9, 2023 17:24 > To: dev@flink.apache.org <dev@flink.apache.org> > Subject: [DISCUSS] FLIP-375: Built-in cross-platform powerful java > profiler on taskmanagers > > Hi all, > > Yun Tang and I are opening this thread to discuss our proposal to integrate > async-profiler's capabilities for profiling taskmananger (e.g., generating > flame graphs) in the Flink Web [1]. > > > Currently, Flink provides ThreadDump and Operator-Level Flame Graphs by > sampling task threads. The results generated in such way missing the > relevant stack of java threads and system calls. The async-profiler[2] is a > low-overhead sampling profiler for Java, but the steps to use it in the > production environment are cumbersome and suffer from permissions and > security risks. > > Therefore, we propose adding rest APIs to provide the capability to invoke > async-profiler on multiple platforms through JNI, which can be easily > operated on Web UI. This enhancement will improve the efficiency and > experience of Flink users in identifying performance bottlenecks. > > > > Please refer to the FLIP document for more details about the proposed > design > and implementation. We welcome any feedback and opinions on this proposal. > > > > [1] FLIP-375: Built-in cross-platform powerful java profiler on > taskmanagers - Apache Flink - Apache Software Foundation > < > https://cwiki.apache.org/confluence/display/FLINK/FLIP-375%3A+Built-in+cross-platform+powerful+java+profiler+on+taskmanagers > > > > [2] GitHub - async-profiler/async-profiler: Sampling CPU and HEAP profiler > for Java featuring AsyncGetCallTrace + perf_events > <https://github.com/async-profiler/async-profiler> > > > > Best regards, > > Yun Tang and Yu Chen >