Re: A question about trace and jvm metrics lost

Sheng Wu Mon, 13 Sep 2021 05:49:20 -0700

OK, just a reminder, if you are maintaining your fork, then many
things can't be explained from our sides.


Sheng Wu 吴晟
Twitter, wusheng1108

dafang <[email protected]> 于2021年9月13日周一 下午7:28写道：
>
> yes,our version is 8.1.0,and we have made some customized development based 
> on version 8.1.0,upgrade version may cost more.However, we will investigate 
> the new version as soon as possible and strive to use it as soon as possible
>
> Thank you very much!
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> 在 2021-09-13 19:23:31，"Sheng Wu" <[email protected]> 写道：
> >I think you are using old release, after 8.7.0, many things are changed to
> >improve performance.
> >Many less resources are required.
> >
> >dafang <[email protected]>于2021年9月13日 周一下午7:18写道：
> >
> >> OK.I think I have found the reason.Now share to you.
> >> I have found that if I set es-bulk size equals 5,then the es "request too
> >> large" error will never apper.But at the same time,the grpc server will
> >> happen some error,such as "cancelled before receiving half close", and it
> >> makes sw-agent can't send data(trace or jvm) to server anymore.This seems
> >> to require a balance between grpc receive speed and ES write speed to find
> >> a balance poin
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> 在 2021-09-13 17:45:40，"Sheng Wu" <[email protected]> 写道：
> >> >Unknown means unknown I am afraid.
> >> >I can't explain it. Firewall, proxy, security policy, etc. could you
> >> >any of them or others.
> >> >
> >> >Sheng Wu 吴晟
> >> >Twitter, wusheng1108
> >> >
> >> >dafang <[email protected]> 于2021年9月13日周一 下午4:37写道：
> >> >>
> >> >> Hello god wu.Through my check, I have found that there some error info
> >> in my skywalking-agent logs,such as "Send UpstreamSegment to collector fail
> >> with a grpc internal exception.
> >> org.apache.skywalking.apm.dependencies.io.grpc.StatusRuntimeException:
> >> UNAVAILABLE: Network closed for unknown reason"
> >> >> How to explain it?
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> At 2021-09-13 15:05:24, "Sheng Wu" <[email protected]> wrote:
> >> >> >(1) All data in that bulk(ElasticSearch concept, read their doc) will
> >> >> >be lost, yes.
> >> >> >(2) This only means your agent gets disconnected from Server
> >> >> >unexpectedly. For a reason about why, it wouldn't tell.
> >> >> >
> >> >> >About what you described in Chinese, first of all, it is better to
> >> >> >keep Chinese and English consistent, don't put more information on one
> >> >> >side, it is confusing.
> >> >> >Why the agent will be disconnected forever, it can't be told from what
> >> >> >you have provided.
> >> >> >Auto reconnecting is working normally AFAIK.
> >> >> >
> >> >> >Sheng Wu 吴晟
> >> >> >Twitter, wusheng1108
> >> >> >
> >> >> >dafang <[email protected]> 于2021年9月13日周一 下午2:58写道：
> >> >> >>
> >> >> >> And now.  I have two questions
> >> >> >> 1.if this error exist,will all trace and jvm metric be lost?
> >> >> >> 2.if there some msg in server logs just
> >> like:"org.apache.skywalking.oap.server.receiver.trace.provider.handler.v8.grpc.TraceSegmentReportServiceHandler
> >> - 86 [grpcServerPool-1-thread-7] ERROR [] - CANCELLED: cancelled before
> >> receiving half close
> >> >> >> io.grpc.StatusRuntimeException: CANCELLED: cancelled before
> >> receiving half close"
> >> >> >> will this make trace or jvm metrics be lost?
> >> >> >>
> >> >> >>
> >> >> >>
> >> 中文解释一下：我现在线上100多台机器，就会经常出现某些实例机器是好的，但是就会经常出现机器trace指标或者jvm指标丢失后就完全不会再出现，除非重启服务，我上面列举的这两个情况会导致我预见的这种情况么？
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> 在 2021-09-13 14:50:14，"Sheng Wu" <[email protected]> 写道：
> >> >> >> >That error does matter. HTTP too large will make ElasticSearch
> >> reject.
> >> >> >> >your bulk insert, which causes data loss.
> >> >> >> >
> >> >> >> >Sheng Wu 吴晟
> >> >> >> >Twitter, wusheng1108
> >> >> >> >
> >> >> >> >dafang <[email protected]> 于2021年9月13日周一 下午2:23写道：
> >> >> >> >>
> >> >> >> >> Hi skywalking dev team:
> >> >> >> >> In our prod env,I had found that the trace and jvm metrics lost
> >> after some service start . And agent logs show no error info.Only server
> >> log show: "Es 413 request too large".Will this problem cause complete data
> >> loss?
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> 我用中文再形容一下：
> >> >> >> >>
> >> 最近发现我们线上服务集群原本有15台机器，但是接入skywalking之后，有一部分(大概5-6台)，过了一段时间之后，trace指标或者jvm指标或者两者同时
> >> 会消失，但是此时该服务是可以继续提供服务的，只是监控数据没有了。经过排查
> >> 发现agent-log中没有任何错误信息，仅在服务端的日志中找到一些"413 request too large"的es报错，我想咨询一下
> >> ，这个问题会导致trace或者jvm指标入库失败之后，再也不会采集存储了么？
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> wait for your help
> >> >> >> >> yours
> >> >> >> >> 大方
> >> >> >> >> 2021.09.13
> >>
> >--
> >Sheng Wu 吴晟
> >
> >Apache SkyWalking
> >Apache Incubator
> >Apache ShardingSphere, ECharts, DolphinScheduler podlings
> >Zipkin
> >Twitter, wusheng1108

Re: A question about trace and jvm metrics lost

Reply via email to