Hi all,

I would like to open a discussion proposing the ability to enable
flamegraphs at runtime and make their configuration i.e number of samples,
delay between samples, and stack depth *dynamically adjustable via the Web
UI*, without requiring any job or cluster restarts.

As of now, enabling flamegraphs requires setting
*rest.flamegraph.enabled=true* and restarting the Job. This is not ideal
for debugging live issues, especially in production environments.

I discussed this idea offline with Roman Khachatryan (author of FLIP-530
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-530%3A+Dynamic+job+configuration>),
Rui Fan, and Arvid Heise. While Rui noted that this could potentially align
with FLIP-530’s direction, Roman confirmed that it’s better handled as a
separate effort, since FLIP-530
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-530%3A+Dynamic+job+configuration>
is scoped to job-level config, whereas this proposal addresses
cluster-level observability via RestOptions.

For Design Details, Please refer: Dynamic Flamegraph via UI
<https://docs.google.com/document/d/1A9fLFgXMGxQQn6X8WCv7mLL21AnLqrDFvLSHnUg8rLA/edit?tab=t.0#heading=h.s351fc464ma6>

I’ve attached a short demo to help visualize the proposed feature and
gather feedback. Demo
<https://drive.google.com/file/d/1iik6aOc2uc9sFlHFlT8YDX5TKFdoD15u/view?usp=sharing>

Looking forward to your thoughts.

Regards,

Poorvank Bhatia

Reply via email to