Re: [prometheus-users] Is it okay to post a Prometheus user survey here?

Julius Volz Fri, 22 May 2020 06:47:47 -0700

Hi Tom,

Just posted the final survey here:


https://groups.google.com/forum/#!topic/prometheus-users/XU7tbVn23co
https://groups.google.com/forum/#!topic/prometheus-developers/ToCQNP2mODQ

Let's see what results look like, hope it's helpful although not all
questions made it this time :)

Regards,
Julius

On Fri, May 22, 2020 at 10:49 AM Julius Volz <julius.v...@gmail.com> wrote:

> Yeah, I think as interesting as this could be, the survey is growing quite
> large already, and this would be one of the more complicated questions in
> terms of explaining it clearly enough and then getting users to compile the
> results. So I'm tending towards leaving it out this time around.
>
> But from experience you can safely assume that most large Prometheus
> deployments have a few metric names that are huge in their number of series
> (like a couple of 10k), and that would blow up any graph or other UI
> display without aggregation / filtering.
>
> On Wed, May 20, 2020 at 7:00 PM Tom Lee <t...@newrelic.com> wrote:
>
>> Yeah, agree. I really like the "largest N metric names" idea. I think
>> both total series and "top N metrics" are interesting for different
>> reasons, but also agree getting "real" numbers is a challenge whatever we
>> decide to do here. :)
>>
>> On Wed, May 20, 2020 at 6:38 AM Julius Volz <julius.v...@gmail.com>
>> wrote:
>>
>>> On Sun, May 17, 2020 at 7:57 PM Tom Lee <t...@newrelic.com> wrote:
>>>
>>>> Yes, I'm interested in what Tom's intent is behind the question. From a
>>>>> Prometheus perspective, the total time-series load is most important. But
>>>>> it might be different for his use case.
>>>>>
>>>>
>>>> Ah yep, really great question. I'm going to absolutely butcher the
>>>> terminology here, but the idea is we're sort of trying to differentiate
>>>> between "number of unique metric names" and "label/dimensional cardinality
>>>> within those metrics". The reason for us differentiating is something of an
>>>> implementation detail with respect to our own systems, but I think it also
>>>> applies somewhat to Prometheus and/or Grafana too: when you run a
>>>> non-aggregating query for a metric *x*, you might expect to see one
>>>> timeseries charted -- or you might see hundreds or even thousands. In our
>>>> own test setup we have JMX metrics for 15 Kafka servers reporting in.
>>>> Executing a "query" like *kafka_cluster_Partition_Value *(a metric
>>>> reported by the JMX exporter on behalf of Kafka) yields something like
>>>> 20,000-30,000 distinct timeseries charted by Prometheus. It spends a
>>>> surprising amount of time to execute that simple little query as a result.
>>>> This sort of cardinality "explosion" has big implications for system
>>>> architecture and scalability in our own systems, too.
>>>>
>>>
>>> Sorry for the delay! Yeah, makes sense, metric names that have many
>>> series can be problematic in UIs when doing queries without filters or
>>> aggregations. On the other hand, we know that having at least *some* of
>>> those is very common (almost every user has a couple huge ones), so we
>>> probably don't need a survey to tell us that :) More importantly maybe, to
>>> see how many metrics are too "overloaded", just having the total number
>>> metric names vs. the total number of series doesn't answer the question
>>> fully: you don't know whether the series are evenly split up across your
>>> metric names, or whether they're all clustered in a few names. It's also a
>>> bit challenging to get users to compile a list of distinct metric names
>>> across Prometheus servers, without some command-line foo or similar. We
>>> could ask something along the lines of "How many series do your largest N
>>> metric names contain?", and then give them a query like 'topk(3, count
>>> by(__name__) ({__name__!=""}))' to determine that per server. It would
>>> still require some manual work to combine results between servers though,
>>> hmmm...
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CA%2BT6Yox2Ojuc4vQHybYHcgsGC%3DLX-b8-sptjTWiA%2BnX5SckyWQ%40mail.gmail.com.

Re: [prometheus-users] Is it okay to post a Prometheus user survey here?

Reply via email to