Hi Till, thanks for the feedback! These endpoints are only called when the
vertex is selected in the UI, so there should be any heavy RPC load. For
back-pressure, we only sample top 3 calls of the stack (depth = 3). For the
flame-graph, we want to sample the whole stack trace and we need different
sampling rate (longer period, more samples). Those are the main reasons to
split these in two "trackers", but I may be missing something.

I've prepared a little demo, so others can have a better idea of what I
have in mind.

https://youtu.be/GUNDehj9z9o

Please note that this is a proof of concept and I'm not frontend person, so
it may look little clumsy :)

D.

On Thu, Aug 1, 2019 at 11:40 AM Till Rohrmann <trohrm...@apache.org> wrote:

> Hi David,
>
> thanks for starting this discussion. I like the idea of improving insights
> into Flink's execution and I believe that a flame graph could be helpful.
>
> I quickly glanced over your changes and I think they go in a good
> direction. One idea could be to share the `StackTraceSample` produced by
> the `StackTraceSampleCoordinator` between the different
> `StackTraceOperatorTracker` so that we don't send multiple requests for the
> same operators. That way we would decrease a bit the RPC load.
>
> Apart from that, I think the next steps would be to find a committer who
> could shepherd this effort and help you with merging it.
>
> Cheers,
> Till
>
> On Wed, Jul 31, 2019 at 7:05 PM David Morávek <d...@apache.org> wrote:
>
> > Hello,
> >
> > While looking into Flink internals, I've noticed that there is already a
> > mechanism for stack-trace sampling of a particular job vertex.
> >
> > I think it may be really useful to allow user to easily render a cpu
> > flamegraph <http://www.brendangregg.com/flamegraphs.html> in a new UI
> for
> > a
> > selected vertex (new tab next to back pressure) of a running job. Back
> > pressure tab already provides a good idea of which vertex causes trouble,
> > but it's hard to say what's actually going on.
> >
> > I've tried to implement a basic REST endpoint
> > <
> >
> https://github.com/dmvk/flink/commit/716231822d2fe99004895cdd0a365560479445b9
> > >,
> > that prepares data for the flame graph rendering and it seems to be
> > providing good insight.
> >
> > It should be straightforward to render data from the endpoint in new UI
> > using existing <https://github.com/spiermar/d3-flame-graph> javascript
> > libraries.
> >
> > WDYT? Is this worth pushing forward?
> >
> > D.
> >
>

Reply via email to