Hi all, I would like to open a discussion proposing the ability to enable flamegraphs at runtime and make their configuration i.e number of samples, delay between samples, and stack depth *dynamically adjustable via the Web UI*, without requiring any job or cluster restarts.
As of now, enabling flamegraphs requires setting *rest.flamegraph.enabled=true* and restarting the Job. This is not ideal for debugging live issues, especially in production environments. I discussed this idea offline with Roman Khachatryan (author of FLIP-530 <https://cwiki.apache.org/confluence/display/FLINK/FLIP-530%3A+Dynamic+job+configuration>), Rui Fan, and Arvid Heise. While Rui noted that this could potentially align with FLIP-530’s direction, Roman confirmed that it’s better handled as a separate effort, since FLIP-530 <https://cwiki.apache.org/confluence/display/FLINK/FLIP-530%3A+Dynamic+job+configuration> is scoped to job-level config, whereas this proposal addresses cluster-level observability via RestOptions. For Design Details, Please refer: Dynamic Flamegraph via UI <https://docs.google.com/document/d/1A9fLFgXMGxQQn6X8WCv7mLL21AnLqrDFvLSHnUg8rLA/edit?tab=t.0#heading=h.s351fc464ma6> I’ve attached a short demo to help visualize the proposed feature and gather feedback. Demo <https://drive.google.com/file/d/1iik6aOc2uc9sFlHFlT8YDX5TKFdoD15u/view?usp=sharing> Looking forward to your thoughts. Regards, Poorvank Bhatia
