On 19/02/2021 11:41, Paolo Filippelli wrote:

just asked the same question on IRC but i don't know which is the best place to get support, so I'll ask also here :)

BTW, this is the IRC link: https://matrix.to/#/!HaYTjhTxVqshXFkNfu:matrix.org/$16137341243277ijEwp:matrix.org?via=matrix.org

*The Question*

 I'm seeing a behaviour that I'd very much like to understand, maybe you can help me...we've got a K8s cluster where Prometheus operator is installed (v0.35.1). Prometheus version is v2.11.0

Istio has also been installed in the cluster with the default "PERMISSIVE" mode, as to say that every envoy sidecar accepts plain http traffic. Everything is deployed in default namespace, and everypod BUT prometheus/alertmanager/grafana is managed by Istio (i.e. the monitoring stack is out of the mesh)

Prometheus can successfully scrape all its targets (defined via ServiceMonitors), every target but 3/4 that it fails to scrape.

For example, from the logs of Prometheus i can see:

level=debug ts=2021-02-19T11:15:55.595Z caller=scrape.go:927 component="scrape manager" scrape_pool=default/divolte/0 target= msg="Scrape failed" err="server returned HTTP status 503 Service Unavailable"

But if i log into the Prometheus pod i can successully reach the pod that it's failing to scrape

/prometheus $ wget -SqO /dev/null
  HTTP/1.1 200 OK
  date: Fri, 19 Feb 2021 11:27:57 GMT
  content-type: text/plain; version=0.0.4; charset=utf-8
  content-length: 75758
  x-envoy-upstream-service-time: 57
  server: istio-envoy
  connection: close
  x-envoy-decorator-operation: divolte-srv.default.svc.cluster.local:7070/*

That error message doesn't indicate that there are any problems with getting to the server. It is saying that the server responded with a 503 error code.

Are certain targets consistently failing or do they sometimes work and only sometimes fail?

Are there any access or error logs from the Envoy sidecar or target pod that might shed some light on where that error is coming from?

Stuart Clark

