Hi All,
See the graphs in attachments.
For the first issue, I want to have a supplement. The root cause is the
visibility issue in multithread programming.
According to JMM(Java Memory Model), when one thread modify the buffer, the
other thread can't obtain the modified
result immediately without any synchronized measure. So I change the buffer
array to the JDK ArrayBlockingQueue,
use blocking queue to guarantee the visibility between the different threads.
In the code I use none blocking method,if you want to use blocking way,you
should change the offer method to put method.
At 2019-08-06 20:39:55, "Sheng Wu" <[email protected]> wrote:
>Hi
>
>Thanks for the feedback. We can't see the graphs you posted, could you send
>them in attachments?
>
>Sheng Wu 吴晟
>
>Apache SkyWalking, Apache ShardingSphere(Incubating), Zipkin
>Twitter, wusheng1108
>
>
>吴延涛 <[email protected]> 于2019年8月6日周二 下午8:31写道:
>
>> Hi,
>> Wu Sheng, I am a developer of Tongcheng. I find some performance
>> issue when I use Skywalking in production environment.
>> My Skywalking version is 6.0.0-GA. I find three issues:
>> 1.Skywalking UI will not obtain trace data after the OAP server runs
>> for a period of time and ES does't have data also. But reboot
>> the OAP server, SW UI can obatin the data. I dump the thread when the OAP
>> server does't work. After I analysis the dump file,
>> I find the GRPC's thread that put the trace data into buffer is in
>> sleeping state and the persistence worker which consume the buffer also
>> blocked。
>> So I think the buffer has problem, because the buffer producer and the
>> buffer consumer is in two threads, when the producer save data
>> into buffer the consumer does't know the buffer has data, which is the
>> visibility issue in multithread。So I change the buffer array into
>> ArrayBlockingQueue like this:
>> After this change everything runs well, besides the performance
>> improved obviously.
>> 2.The GRPC server is using the JDK CachedThreadPool,the thread
>> number of which increase with requests,so when plenty of requests is coming
>> and OAP server
>> is not enough or ES is not enough,the GRPC thead number can reach to a big
>> number,then the OAP server crashed。So I change the code like this:
>> 3.We use consul as register center,the consul health check with GRPC
>> way is not stable,so I change the way to tcp:
>>
>> Above is the issue I encounter in production environment。
>>
>>
>> BRs,
>> Wu Yantao(吴延涛)
>>
>>
>>
>>
>>
>>