Hello everyone,

I would like to share a new feature with skywalking, called “thread monitor”.


Background
When our company used skywalking to APM earlier, we found that many traces did 
not have enough span to fill up, doubting whether there were some third-party 
frameworks that we didn't enhance or programmers API usage errors such as java 
CountDown number is 3 but there are only 2 countdowns.
So we decide to write a new feature to monitor executing trace thread stack, 
then we can get more information on the trace, quick known what’s happening on 
that trace.




Structure
Agent(thread monitor) — gRPC protocol — OAP Server(Storage) — 
Skywalking-Rocketbot-UI




More detail
OAP Server:
1. Storage witch traces need to monitor(i suggest storage on the endpoint, add 
new boolean field named needThreadMonitor)
2. Provide GraphQL API to change endpoint monitor status.
3. Monitor Trace parse, storage thread stack if the segment has any thread info.


Skywalking-Rocketbot-UI:
1. Add a new switch button on the dashboard, It can read or modify endpoint 
status.
2. It will show every thread stack on click trace detail.


Agent:
1. setup two new BootService:
1) find any need thread monitor endpoint in current service, start on a new 
schedule take and works on each minute.
2) start new schedule task to get all need thread monitor segment each 100 
milliseconds, and put a new thread dump task to a global thread pool(fixed, 
count number default 3).
2. check endpoint need thread monitor on create entry/local 
span(TracingConext#createEntry/LocalSpan). If need, It will be marked and put 
into thread monitor map.
3. when TraceingContext finishes, It will get thread has monitored, and send 
all thread stack to server.


Finally, I don’t know it is a good idea to get more information on trace? If 
you have any good ideas or suggestions on this, please let me know.


Mrpro

Reply via email to