Hi David,

thanks for starting this discussion. I like the idea of improving insights
into Flink's execution and I believe that a flame graph could be helpful.

I quickly glanced over your changes and I think they go in a good
direction. One idea could be to share the `StackTraceSample` produced by
the `StackTraceSampleCoordinator` between the different
`StackTraceOperatorTracker` so that we don't send multiple requests for the
same operators. That way we would decrease a bit the RPC load.

Apart from that, I think the next steps would be to find a committer who
could shepherd this effort and help you with merging it.

Cheers,
Till

On Wed, Jul 31, 2019 at 7:05 PM David Morávek <d...@apache.org> wrote:

> Hello,
>
> While looking into Flink internals, I've noticed that there is already a
> mechanism for stack-trace sampling of a particular job vertex.
>
> I think it may be really useful to allow user to easily render a cpu
> flamegraph <http://www.brendangregg.com/flamegraphs.html> in a new UI for
> a
> selected vertex (new tab next to back pressure) of a running job. Back
> pressure tab already provides a good idea of which vertex causes trouble,
> but it's hard to say what's actually going on.
>
> I've tried to implement a basic REST endpoint
> <
> https://github.com/dmvk/flink/commit/716231822d2fe99004895cdd0a365560479445b9
> >,
> that prepares data for the flame graph rendering and it seems to be
> providing good insight.
>
> It should be straightforward to render data from the endpoint in new UI
> using existing <https://github.com/spiermar/d3-flame-graph> javascript
> libraries.
>
> WDYT? Is this worth pushing forward?
>
> D.
>

Reply via email to