Re: [3.x]: openshift router and its own metrics

Daniel Comnea Thu, 15 Aug 2019 09:26:06 -0700

Hi Clayton,

Certainly some of the metrics should be preserved across reloads, e.g.
metrics like *haproxy_server_http_responses_total *should be preserved
across reload (though to an extent, Prometheus can handle resets correctly
with its native support).


However, the metric
*haproxy_server_http_average_response_latency_milliseconds* appears also to
be accumulating when we wouldn't expect it to. (According the the haproxy
stats, I think that's a rolling average over the last 1024 calls -- so it
goes up and down, or should.)

Thoughts?


Cheers,
Dani


On Thu, Aug 15, 2019 at 3:59 PM Clayton Coleman <ccole...@redhat.com> wrote:

> Metrics memory use in the router should be proportional to number of
> services, endpoints, and routes.  I doubt it's leaking there and if it were
> it'd be really slow since we don't restart the router monitor process
> ever.  Stats should definitely be preserved across reloads, but will not be
> preserved across the pod being restarted.
>
> On Thu, Aug 15, 2019 at 10:30 AM Dan Mace <dm...@redhat.com> wrote:
>
>>
>>
>> On Thu, Aug 15, 2019 at 10:03 AM Daniel Comnea <comnea.d...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Would appreciate if anyone can please confirm that my understanding is
>>> correct w.r.t the way the router haproxy image [1] is built.
>>> Am i right to assume that the image [1] is is built as it's seen without
>>> any other layer being added to include [2] ?
>>> Also am i right to say the haproxy metrics [2] is part of the origin
>>> package ?
>>>
>>>
>>> A bit of background/ context:
>>>
>>> a while back on OKD 3.7 we had to swap the openshift 3.7.2 router image
>>> with 3.10 because we were seeing some problems with the reload and so we
>>> wanted to take the benefit of the native haproxy 1.8 reload feature to stop
>>> affecting the traffic.
>>>
>>> While everything was nice and working okay we've noticed recently that
>>> the haproxy stats do slowly increase and we do wonder if this is an
>>> accumulation or not cause (maybe?) by the reloads. Now i'm aware of a
>>> change made [3] however i suspect that is not part of the 3.10 image hence
>>> my question to double check if my understanding is wrong or not.
>>>
>>>
>>> Cheers,
>>> Dani
>>>
>>> [1]
>>> https://github.com/openshift/origin/tree/release-3.10/images/router/haproxy
>>> [2]
>>> https://github.com/openshift/origin/tree/release-3.10/pkg/router/metrics
>>> [3]
>>> https://github.com/openshift/origin/commit/8f0119bdd9c3b679cdfdf2962143435a95e08eae#diff-58216897083787e1c87c90955aabceff
>>> _______________________________________________
>>> dev mailing list
>>> dev@lists.openshift.redhat.com
>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
>>>
>>
>> I think Clayton (copied) has the history here, but the nature of the
>> metrics commit you referenced is that many of the exposed metrics points
>> are counters which were being reset across reloads. The patch was (I think)
>> to enable counter metrics to correctly aaccumulate across reloads.
>>
>> As to how the image itself is built, the pkg directly is part of the
>> router controller code included with the image. Not sure if that answers
>> your question.
>>
>> --
>>
>> Dan Mace
>>
>> Principal Software Engineer, OpenShift
>>
>> Red Hat
>>
>> dm...@redhat.com
>>
>>
>>

_______________________________________________
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev

Re: [3.x]: openshift router and its own metrics

Reply via email to