First of all, thanks for your proposal. Thread monitoring is super
important for application performance.
So basically, I agree with this proposal.

But for tech details, I think we need more discussion in the following ways
1. Do you want to add thread status to the trace? If so, why don't consider
this as a UI level join? Because we could know thread id in the trace when
we create a span, right? Then we have all the thread dump(if), we could ask
UI to query specific thread context based on timestamp and thread
number(s).
2. For thread dump, I don't know whether you do the performance evaluation
for this OP. From my experiences, `get all need thread monitor segment
every 100 milliseconds` is a very high cost in your application and agent.
So, you may need to be careful about doing this.
3. Endpoint related thread dump with some sampling mechanisms makes more
sense to me. And this should be activated by UI. We should only provide a
conditional thread dump sampling mechanism, such as `first 5 traces of this
endpoint in the next 5 mins`.

Jian Tan
I think DaoCloud also has customized this feature in your internal
SkyWalking. Could you share what you do?

Sheng Wu 吴晟
Twitter, wusheng1108


741550557 <[email protected]> 于2019年12月8日周日 上午12:14写道:

> Hello everyone,
>
>
> I would like to share a new feature with skywalking, called “thread
> monitor”.
>
>
> Background
> When our company used skywalking to APM earlier, we found that many traces
> did not have enough span to fill up, doubting whether there were some
> third-party frameworks that we didn't enhance or programmers API usage
> errors such as java CountDown number is 3 but there are only 2 countdowns.
> So we decide to write a new feature to monitor executing trace thread
> stack, then we can get more information on the trace, quick known what’s
> happening on that trace.
>
>
>
>
> Structure
> Agent(thread monitor) — gRPC protocol — OAP Server(Storage) —
> Skywalking-Rocketbot-UI
>
>
>
>
> More detail
> OAP Server:
> 1. Storage witch traces need to monitor(i suggest storage on the endpoint,
> add new boolean field named needThreadMonitor)
> 2. Provide GraphQL API to change endpoint monitor status.
> 3. Monitor Trace parse, storage thread stack if the segment has any thread
> info.
>
>
> Skywalking-Rocketbot-UI:
> 1. Add a new switch button on the dashboard, It can read or modify
> endpoint status.
> 2. It will show every thread stack on click trace detail.
>
>
> Agent:
> 1. setup two new BootService:
> 1) find any need thread monitor endpoint in current service, start on a
> new schedule take and works on each minute.
> 2) start new schedule task to get all need thread monitor segment each 100
> milliseconds, and put a new thread dump task to a global thread pool(fixed,
> count number default 3).
> 2. check endpoint need thread monitor on create entry/local
> span(TracingConext#createEntry/LocalSpan). If need, It will be marked and
> put into thread monitor map.
> 3. when TraceingContext finishes, It will get thread has monitored, and
> send all thread stack to server.
>
>
> Finally, I don’t know it is a good idea to get more information on trace?
> If you have any good ideas or suggestions on this, please let me know.
>
>
> Mrpro

Reply via email to