Re: [prometheus-users] Use remote-write instead of federation

2022-07-20 Thread tejaswini vadlamudi
@Stuart: I agree with most of the ideas you say :-) I see remote-write as 
the most appropriate metrics forwarding for my deployment use case.
Using federation is not good in terms of interface 
standardization, HA of monitoring stack, and feature support. 
For the above case, I have functions and a dedicated set of 
engineers who own such workload to query individual instances, and the 
global instance is used as centralized monitoring.
I was looking at this 
closed bug, raised on 
Prometheus in the 2019 Summer. To my understanding, there are performance 
issues with remote-write but most of them are resolved and the community 
sees remote-write to perform better when compared to the federation. Am I 
thinking correctly? 
Could you clarify the performance comparison between 
remote-write and federation?

/Teja

On Tuesday, July 19, 2022 at 5:02:11 PM UTC+2 Stuart Clark wrote:

> On 19/07/2022 13:24, tejaswini vadlamudi wrote:
> > @Ben: Makes a point, but getting Thanos or Cortex into the picture 
> > could be a way forward after some time. For now, do you think it is 
> > good enough to use remote-write instead of federation?  From a 
> > performance and resource consumption POV, do you see remote-write as 
> > the way-forward?
> >
> With remote write you could use agent mode, so you don't have to have 
> local storage other than for the destination instance.
>
> However again it depends what you are trying to achieve and why you have 
> suggested having four instances. Are you wanting to query all four 
> instances or only the "global" one? Are you wanting to copy all data to 
> the "global" instance or only some metrics? Every data point, or only at 
> a lower frequency?
>
> If you are intending to copy all data (both metrics & data points) that 
> leans towards remote write as federation works differently. But in that 
> case there doesn't seem to be any advantage in having the extra three 
> instances at all (unless you are intending on doing local querying, 
> alerting or recording rules) - so I'd just have a single instance that 
> scrapes all namespaces.
>
> Alternatively if you are needing to have separate instances with local 
> storage/querying then I'd probably not look to copy all the data to the 
> "global" instance (which just doubles storage and memory usage) and 
> either use remote write for a much smaller subset of metrics, federation 
> with a slower scrape rate/reduced set of metrics, or as Ben suggested 
> something like Thanos (other options exist as well) to do away with the 
> fourth instance entirely and distribute the queries to the individual 
> instances instead.
>
> Maybe if you could explain a bit about what the design is hoping to 
> achieve it would help us advise better?
>
> -- 
> Stuart Clark
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/e58b5287-609d-48ab-8f20-4a6744d29bedn%40googlegroups.com.


Re: [prometheus-users] Prom QL

2022-07-20 Thread Brian Candler
No idea.  You haven't said what dashboard software you're using, nor what 
queries you're using to build that dashboard.

> I want to add one filter so that I can be able to know which servers are 
least used or most used.

Not sure what you mean by a "filter" in this context.  A PromQL query using 
min() or max() will work over all the values which are present in the 
instant vector, which as I said before, is of variable size.  It doesn't 
have to have a fixed number of inputs.

e.g. given this data

[
node_blah{instance="foo"} 123
node_blah{instance="bar"} 456
node_blah{instance="baz"} 789
]

then

min(node_blah) => 123

On Wednesday, 20 July 2022 at 10:28:58 UTC+1 chembakay...@gmail.com wrote:

> Thanks, Clark and Brian for your reply.
>
> I am using two data sources in my case. i.e Prometheus and Postgres.
>
> In my dashboard, there is a table that contains both Prometheus and 
> Postgres data. In this table, there is a column name %cpu used which will 
> be obtained from Prometheus.
>
> As Brain said, if the server goes down, we will not get the node level 
> metrics and for that particular server, we will have Postgres data but, not 
> Prometheus as the server was down.
>
> for example, my dashboard table is as follows:
>  
> IPCPU   %cpu   memory   memory_used   column1
> column2column3   
> 1.1.1.1 4  0.4%  40gb 60%  
>   a b  c
> 1.1.1.2 8  10%80gb30%  
>   d e  f
> 1.1.1.3
>  h  i   j
>
>
> the third server goes down, so we are not able to see the CPU and memory 
> values, my question was I want to add one filter so that I can be able to 
> know which servers are least used or most used.
>
> The CPU used for the third server will be no data as that server was down. 
> can we do any comparison for these servers(servers who went down) so that I 
> can filter these servers whose value will be null/no_data.
>
> Thanks & regards,
> Bharath Kumar.
>
> On Wednesday, 20 July 2022 at 14:22:17 UTC+5:30 Brian Candler wrote:
>
>> And just to clarify slightly, there aren't really "null values" in 
>> prometheus. A query like "node_blah" returns a *vector* of results, that 
>> is, a variable number of values. e.g.
>>
>> [
>> node_blah{instance="foo"} 123
>> node_blah{instance="bar"} 456
>> node_blah{instance="baz"} 789
>> ]
>>
>> If node "baz" goes down, then a query at a later point in time may return
>>
>> [
>> node_blah{instance="foo"} 124
>> node_blah{instance="bar"} 457
>> ]
>>
>> If you want to test for this specific condition, i.e. there is no 
>> "node_blah" metric present for a specific instance "baz", then you can form 
>> a rather awkward join query using absent() in conjunction with the "up" 
>> metric as Stuart described.
>>
>> But usually, you just want to query the "up" metric itself.
>>
>> On Wednesday, 20 July 2022 at 09:38:58 UTC+1 Stuart Clark wrote:
>>
>>> On 20/07/2022 08:49, BHARATH KUMAR wrote:
>>>
>>> Hello all, 
>>>
>>> I installed node exporters on many servers (around 300). Few of the 
>>> servers are unreachable. So because of that, we are unable to get the CPU, 
>>> and memory values of those servers.
>>>
>>> Now I want to add a filter in the Grafana dashboard to check the least 
>>> CPU used, most CPU used servers. But due to unreachability, we are not 
>>> getting values for a few servers.
>>>
>>> My question is 
>>> "*how to compare the output of the Prometheus query is NULL"*
>>>
>>> Generally, I am comparing the output of the prom query like 
>>> I) if the CPU usage is less than 10% then I am comparing like 
>>> query >=0<=10%
>>> ii) if the CPU usage is greater than 10% and less than 30% then I am 
>>> comparing like
>>> query >10<=30
>>> *similarly how to check the null values using the Prometheus query.*
>>>
>>> For servers which can't be scraped there will be no metrics, so any 
>>> queries won't have any data to query.
>>>
>>> However Prometheus itself creates certain metrics for all scrape 
>>> targets, including one called "up" which is either 0 or 1 - where 0 means 
>>> the scrape failed. You can therefore create dashboards and alerts that list 
>>> the servers which aren't accessible (up == 0).
>>>
>>> -- 
>>> Stuart Clark
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/2c15e3e8-b878-45a4-9386-ef2f0c0b1ccen%40googlegroups.com.


Re: [prometheus-users] Prom QL

2022-07-20 Thread BHARATH KUMAR
Thanks, Clark and Brian for your reply.

I am using two data sources in my case. i.e Prometheus and Postgres.

In my dashboard, there is a table that contains both Prometheus and 
Postgres data. In this table, there is a column name %cpu used which will 
be obtained from Prometheus.

As Brain said, if the server goes down, we will not get the node level 
metrics and for that particular server, we will have Postgres data but, not 
Prometheus as the server was down.

for example, my dashboard table is as follows:
 
IPCPU   %cpu   memory   memory_used   column1
column2column3   
1.1.1.1 4  0.4%  40gb 60%  
  a b  c
1.1.1.2 8  10%80gb30%  
  d e  f
1.1.1.3
 h  i   j


the third server goes down, so we are not able to see the CPU and memory 
values, my question was I want to add one filter so that I can be able to 
know which servers are least used or most used.

The CPU used for the third server will be no data as that server was down. 
can we do any comparison for these servers(servers who went down) so that I 
can filter these servers whose value will be null/no_data.

Thanks & regards,
Bharath Kumar.

On Wednesday, 20 July 2022 at 14:22:17 UTC+5:30 Brian Candler wrote:

> And just to clarify slightly, there aren't really "null values" in 
> prometheus. A query like "node_blah" returns a *vector* of results, that 
> is, a variable number of values. e.g.
>
> [
> node_blah{instance="foo"} 123
> node_blah{instance="bar"} 456
> node_blah{instance="baz"} 789
> ]
>
> If node "baz" goes down, then a query at a later point in time may return
>
> [
> node_blah{instance="foo"} 124
> node_blah{instance="bar"} 457
> ]
>
> If you want to test for this specific condition, i.e. there is no 
> "node_blah" metric present for a specific instance "baz", then you can form 
> a rather awkward join query using absent() in conjunction with the "up" 
> metric as Stuart described.
>
> But usually, you just want to query the "up" metric itself.
>
> On Wednesday, 20 July 2022 at 09:38:58 UTC+1 Stuart Clark wrote:
>
>> On 20/07/2022 08:49, BHARATH KUMAR wrote:
>>
>> Hello all, 
>>
>> I installed node exporters on many servers (around 300). Few of the 
>> servers are unreachable. So because of that, we are unable to get the CPU, 
>> and memory values of those servers.
>>
>> Now I want to add a filter in the Grafana dashboard to check the least 
>> CPU used, most CPU used servers. But due to unreachability, we are not 
>> getting values for a few servers.
>>
>> My question is 
>> "*how to compare the output of the Prometheus query is NULL"*
>>
>> Generally, I am comparing the output of the prom query like 
>> I) if the CPU usage is less than 10% then I am comparing like 
>> query >=0<=10%
>> ii) if the CPU usage is greater than 10% and less than 30% then I am 
>> comparing like
>> query >10<=30
>> *similarly how to check the null values using the Prometheus query.*
>>
>> For servers which can't be scraped there will be no metrics, so any 
>> queries won't have any data to query.
>>
>> However Prometheus itself creates certain metrics for all scrape targets, 
>> including one called "up" which is either 0 or 1 - where 0 means the scrape 
>> failed. You can therefore create dashboards and alerts that list the 
>> servers which aren't accessible (up == 0).
>>
>> -- 
>> Stuart Clark
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/acd5139b-258d-48ab-a767-990132c052e7n%40googlegroups.com.


Re: [prometheus-users] Prom QL

2022-07-20 Thread Brian Candler
And just to clarify slightly, there aren't really "null values" in 
prometheus. A query like "node_blah" returns a *vector* of results, that 
is, a variable number of values. e.g.

[
node_blah{instance="foo"} 123
node_blah{instance="bar"} 456
node_blah{instance="baz"} 789
]

If node "baz" goes down, then a query at a later point in time may return

[
node_blah{instance="foo"} 124
node_blah{instance="bar"} 457
]

If you want to test for this specific condition, i.e. there is no 
"node_blah" metric present for a specific instance "baz", then you can form 
a rather awkward join query using absent() in conjunction with the "up" 
metric as Stuart described.

But usually, you just want to query the "up" metric itself.

On Wednesday, 20 July 2022 at 09:38:58 UTC+1 Stuart Clark wrote:

> On 20/07/2022 08:49, BHARATH KUMAR wrote:
>
> Hello all, 
>
> I installed node exporters on many servers (around 300). Few of the 
> servers are unreachable. So because of that, we are unable to get the CPU, 
> and memory values of those servers.
>
> Now I want to add a filter in the Grafana dashboard to check the least CPU 
> used, most CPU used servers. But due to unreachability, we are not getting 
> values for a few servers.
>
> My question is 
> "*how to compare the output of the Prometheus query is NULL"*
>
> Generally, I am comparing the output of the prom query like 
> I) if the CPU usage is less than 10% then I am comparing like 
> query >=0<=10%
> ii) if the CPU usage is greater than 10% and less than 30% then I am 
> comparing like
> query >10<=30
> *similarly how to check the null values using the Prometheus query.*
>
> For servers which can't be scraped there will be no metrics, so any 
> queries won't have any data to query.
>
> However Prometheus itself creates certain metrics for all scrape targets, 
> including one called "up" which is either 0 or 1 - where 0 means the scrape 
> failed. You can therefore create dashboards and alerts that list the 
> servers which aren't accessible (up == 0).
>
> -- 
> Stuart Clark
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/645e6766-818b-474d-a5c2-ad6b61b2789an%40googlegroups.com.


Re: [prometheus-users] Prom QL

2022-07-20 Thread Stuart Clark

On 20/07/2022 08:49, BHARATH KUMAR wrote:

Hello all,

I installed node exporters on many servers (around 300). Few of the 
servers are unreachable. So because of that, we are unable to get the 
CPU, and memory values of those servers.


Now I want to add a filter in the Grafana dashboard to check the least 
CPU used, most CPU used servers. But due to unreachability, we are not 
getting values for a few servers.


My question is
"*how to compare the output of the Prometheus query is NULL"*

Generally, I am comparing the output of the prom query like
I) if the CPU usage is less than 10% then I am comparing like
query >=0<=10%
ii) if the CPU usage is greater than 10% and less than 30% then I am 
comparing like

query >10<=30
*similarly how to check the null values using the Prometheus query.*

For servers which can't be scraped there will be no metrics, so any 
queries won't have any data to query.


However Prometheus itself creates certain metrics for all scrape 
targets, including one called "up" which is either 0 or 1 - where 0 
means the scrape failed. You can therefore create dashboards and alerts 
that list the servers which aren't accessible (up == 0).


--
Stuart Clark

--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/53716e29-ceb2-7304-b003-17eba5262684%40Jahingo.com.


[prometheus-users] Prom QL

2022-07-20 Thread BHARATH KUMAR
Hello all,

I installed node exporters on many servers (around 300). Few of the servers 
are unreachable. So because of that, we are unable to get the CPU, and 
memory values of those servers.

Now I want to add a filter in the Grafana dashboard to check the least CPU 
used, most CPU used servers. But due to unreachability, we are not getting 
values for a few servers.

My question is 
"*how to compare the output of the Prometheus query is NULL"*

Generally, I am comparing the output of the prom query like 
I) if the CPU usage is less than 10% then I am comparing like 
query >=0<=10%
ii) if the CPU usage is greater than 10% and less than 30% then I am 
comparing like
query >10<=30
*similarly how to check the null values using the Prometheus query.*

Thanks & regards,
Bharath Kumar.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/4b5684c0-45a6-4470-a239-ddfea322e6ebn%40googlegroups.com.