Inline
Zhang, James <[email protected]> 于2020年1月16日周四 上午10:34写道: > Dear Skywalking Dev team, > > I had deployed Skywaking Java agent & UI/OAP/ES service into backend > microservices K8S cluster. During our JMeter performance testing we found > many *org.apache.skywalking.apm.dependencies.io.grpc.StatusRuntimeException: > DEADLINE_EXCEEDED* logs both in agent side and OAP server side. > > Agent side: > > ERROR 2020-01-14 03:50:52:070 > SkywalkingAgent-5-ServiceAndEndpointRegisterClient-0 > ServiceAndEndpointRegisterClient : ServiceAndEndpointRegisterClient execute > fail. > > org.apache.skywalking.apm.dependencies.io.grpc.StatusRuntimeException: > DEADLINE_EXCEEDED > > at > org.apache.skywalking.apm.dependencies.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:222) > > ERROR 2020-01-14 03:46:22:069 SkywalkingAgent-4-JVMService-consume-0 > JVMService : send JVM metrics to Collector fail. > > org.apache.skywalking.apm.dependencies.io.grpc.StatusRuntimeException: > DEADLINE_EXCEEDED > > at > org.apache.skywalking.apm.dependencies.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:222) > > > > OAP server side: > > 2020-01-14 03:53:18,935 - > org.apache.skywalking.oap.server.core.remote.client.GRPCRemoteClient > -147226067 [grpc-default-executor-863] ERROR [] - DEADLINE_EXCEEDED: > deadline exceeded after 19999979082ns > > io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after > 19999979082ns > > at io.grpc.Status.asRuntimeException(Status.java:526) > ~[grpc-core-1.15.1.jar:1.15.1] > > > > and the respective Instance Throughput curve don > > none-flat(with Exception log) curve vs. flat curve(no Exception log) > > VS. > > > > I checked the *TraceSegmentServiceClient* and related source code and > found that this Exception from agent side is an Error consume behavior, but > the error data is not counted into abandoned data size account. > > > > *I’m wondering that when this gRPC exception occurs, whether the trace > data sent to OAP server is lost or not?* > Most likely, lost. > *In case that the trace data is lost, why the lost data is not counted > into the abandoned data static? And the metric calculation during the trace > data lost time range is distorted due to incomplete trace data collection?* > Because by using gRPC streaming, we don't know how many segments lost. > > > *Is there any configuration needed from agent or/and oap server side to > resolve this gPRC exception issue to avoid trace data lost?* > I think, you should increase the backend resource or resolve the network unstable issue. > > > *P.S.* > > I also met the “*trace segment has been abandoned, cause by buffer is > full*” issue before due to the default 5*300 buffer is not enough. In > this case trace data is lost at agent side directly before sending to OAP > collector. > 5 * 3000 should be enough for most users unless your system is very high load or network is unstable like I said above. When you said 10 * 3000 is better, I am guessing your network or network performance is not stable, so you need more buffers at the agent side holding the data. > However after I increased the agent side trace data buffer to 10*3000, > this abandoned issue never occurred again. > > http-nio-0.0.0.0-9090-exec-23 TraceSegmentServiceClient : One trace > segment has been abandoned, cause by buffer is full. > > > > Thanks & Best Regards > > > > Xiaochao Zhang(James) > > DI SW CAS MP EMK DO-CHN > > No.7, Xixin Avenue, Chengdu High-Tech Zone > > Chengdu, China 611731 > > Email: [email protected] > > >
