Hi,
Wu Sheng, I am a developer of Tongcheng. I find some performance issue
when I use Skywalking in production environment.
My Skywalking version is 6.0.0-GA. I find three issues:
1.Skywalking UI will not obtain trace data after the OAP server runs for
a period of time and ES does't have data also. But reboot
the OAP server, SW UI can obatin the data. I dump the thread when the OAP
server does't work. After I analysis the dump file,
I find the GRPC's thread that put the trace data into buffer is in sleeping
state and the persistence worker which consume the buffer also blocked。
So I think the buffer has problem, because the buffer producer and the buffer
consumer is in two threads, when the producer save data
into buffer the consumer does't know the buffer has data, which is the
visibility issue in multithread。So I change the buffer array into
ArrayBlockingQueue like this:
After this change everything runs well, besides the performance improved
obviously.
2.The GRPC server is using the JDK CachedThreadPool,the thread number of
which increase with requests,so when plenty of requests is coming and OAP
server
is not enough or ES is not enough,the GRPC thead number can reach to a big
number,then the OAP server crashed。So I change the code like this:
3.We use consul as register center,the consul health check with GRPC way
is not stable,so I change the way to tcp:
Above is the issue I encounter in production environment。
BRs,
Wu Yantao(吴延涛)