Re: [prometheus-users] Removing namespace component in metric's keys

Manjula Amunugama Mon, 01 Mar 2021 02:47:13 -0800

Hi Clark,

There will be no duplicated values as you  mentioned the way we are going 
to do.
If you read through carefully, at last we will have metrics like the 
following
Call flow is like this
incoming request ==> Service 1 (endpoint 1)
Service 1 ==> Service 2 (endpoint 1)
Service 1 ==> Service 2 (endpoint 2)
Service 1 ==> Service 3 (endpoint 1)


For the incoming request to Service 1, Service 1 in-turn generates multiple 
outbound calls as mentioned above.

In Service 1
------------------
inboundcall_latency_microseconds_sum{system="booking_engine" 
,app="driver_eta",endpoint="endpoint1"}"
inboundcall_latency_microseconds_count{system="booking_engine" 
,app="driver_eta",endpoint="endpoint1"}"

outboundcall_latency_microseconds_sum{system="booking_engine" 
,app="driver_eta",outbound_service="Service2",endpoint="endpoint1"}"
outboundcall_latency_microseconds_count{system="booking_engine" 
,app="driver_eta",outbound_service="Service2",endpoint="endpoint1"}"

outboundcall_latency_microseconds_sum{system="booking_engine" 
,app="driver_eta",outbound_service="Service2",endpoint="endpoint2"}"
outboundcall_latency_microseconds_count{system="booking_engine" 
,app="driver_eta",outbound_service="Service2",endpoint="endpoint2"}"

outboundcall_latency_microseconds_sum{system="booking_engine" 
,app="driver_eta",outbound_service="Service3",endpoint="endpoint1"}"
outboundcall_latency_microseconds_count{system="booking_engine" 
,app="driver_eta",outbound_service="Service3",endpoint="endpoint1"}"

In Service 1 we expose the above sums and counts by service and endpoint

So there will be no duplicated values.

The issue here is as per my knowledge, in one series there will be lots of 
data related to multiple services.

And the plus point is we can generate Grafana, widgets (Meters/ Graphs etc) 
automatically as we know the keys, and also we can automate Alerts too.

Your feedback is very important.

Best Regards,
Manjula 

On Monday, March 1, 2021 at 3:28:22 PM UTC+5:30 Stuart Clark wrote:

> On 01/03/2021 06:44, Manjula Amunugama wrote:
> > Hi all,
> >
> > In our environment for monitoring about 200 micro-services, we use 
> > Prometheus & Grafana.
> >
> > In one application to another, developers used different different 
> > strings as the namespace component.
> > i.e.  we have used Prometheus keys like 
> > 
> "booking_engine_driver_eta_location_service_outboundcall_latency_microseconds_count"
>  
>
> > to count the latency from "BookingEngine.Driver-ETA" to 
> > "Location-Service"
> > In this "Booking Engine" is the "Service Group" and "Driver-ETA" is 
> > the service and "Location-Service" is the outbound service
> >
> > In monitoring its a must to monitor "Inbound Request Rates by 
> > Endpoint", "Inbound Request Error Rates by Endpoint", "Processing 
> > Latency by Endpoint", "Outbound Request Rates by Endpoint", "Outbound 
> > Request Rates by Endpoint", "Outbound Request Error Rates by Endpoint" 
> > for API based requests.
> >
> > We can monitor all the services with about 3 dashboards "Inbound 
> > Service Monitor Rates", "Outbound Service Monitor Rates", "Processing 
> > Latencies" we know the Prometheus keys used.
> > So we wanted to standardize the Prometheus Keys as the following
> > - We use namespace to define the "Development Team"
> > - Application Name will be a label in the key - i.e. label will be "app"
> > - Endpoint also will be a label in the key
> > - Error will be a label in the key
> >
> > So the previous key with labels will be changed to 
> > 
> "outboundcall_latency_microseconds_count{app="booking_engine_driver_eta_location_service"}"
> >
> > Doing this we can automate most of the things related Dashboarding and 
> > Alerting.
> >
> > By doing this about 200 time series-es will be grouped into about 4 
> > groups and hence 200 time series into 4 time series.
> >
> > Doing so, will there be a big hit for Prometheus performance?
> >
> A time series is different to a metric.
>
> A metric has a name and an optional selection of labels.
>
> A time series is one specific metric & label combination.
>
> So, for example, a metric could be called "requests_count", but two time 
> series could be "requests_count{response_code='200'}" or 
> "requests_count{system='frontend',authenticated='false'}".
>
> As a result, in terms of the number of time series there is no 
> difference between 100 metrics with no labels and a single metric with a 
> label with 100 values.
>
> How the difference affects performance will depend on how things are 
> being used. There is likely to be little difference in performance 
> during scraping, but query usage could make a bigger difference. A 
> metric with labels is expected to be aggregatable, so it would make 
> sense to arrange the data in that way if that would be true. If you were 
> to sum together all the different label combinations of a particular 
> metrics would the result make sense? An example, a metrics which counts 
> requests and has labels for error code would still make sense if you 
> summed everything together (rather than requests per code you would have 
> total number of requests).
>
> Would it make sense in your case to use labels within a single metric? 
> If the different systems are completely unrelated that might not be the 
> case - a sum wouldn't mean anything and an average would be equally 
> useless as the different systems do a totally different selection of 
> work. However if you are looking at latencies end-to-end across multiple 
> systems in a flow, or have multiple instances of a system, then it does 
> sound like the use of labels would make more sense - sum would give you 
> the overall end-to-end latency or you could produce averages for a 
> particular system across instances.
>
> -- 
> Stuart Clark
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/cdb612ad-f2f6-4cac-8908-d41c1e1a4bd2n%40googlegroups.com.

Re: [prometheus-users] Removing namespace component in metric's keys

Reply via email to