[prometheus-users] JMX Exporter on Kafka not working on compute instance

2020-03-25 Thread Azher Khan
Hi Team,

I am trying to setup JMX exporter for Kafka running on a compute Instance 
(Virtual Machine). 

As suggested, I downloaded the JMX Exporter jar and the Kafka yaml from the 
following location.

wget 
https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.12.0/jmx_prometheus_javaagent-0.12.0.jar
wget 
https://raw.githubusercontent.com/prometheus/jmx_exporter/master/example_configs/kafka-0-8-2.yml


I set the environment variables in the "/etc/systemd/system/kafka.service" 
file as below:

Environment="KAFKA_OPTS=$KAFKA_OPTS 
-javaagent:/home/kafka_user/jmx_kafka_exporter/jmx_prometheus_javaagent-0.12.0.jar=7070:/home/kafka_user/jmx_kafka_exporter/kafka-0-8-2.yml"

After performing a reload and restart; Kafka fails to start.

Would highly appreciate any suggestions to run JMX exporter for Kafka 
running on a compute instance.

Thank you in Advance,

sudo systemctl daemon-reload
sudo systemctl restart kafka

sudo systemctl status kafka
● kafka.service - Kafka Daemon
   Loaded: loaded (/etc/systemd/system/kafka.service; enabled; vendor 
preset: disabled)
   Active: failed (Result: start-limit) since Thu 2020-03-26 06:05:52 UTC; 
935ms ago
  Process: 9842 ExecStart=/opt/kafka/bin/kafka-server-start.sh 
/opt/kafka/config/server.properties (code=exited, status=1/FAILURE)
 Main PID: 9842 (code=exited, status=1/FAILURE)

Mar 26 06:05:51 kafka1.com systemd[1]: Unit kafka.service entered failed 
state.
Mar 26 06:05:51 kafka1.com systemd[1]: kafka.service failed.
Mar 26 06:05:52 kafka1.com systemd[1]: kafka.service holdoff time over, 
scheduling restart.
Mar 26 06:05:52 kafka1.com systemd[1]: Stopped Kafka Daemon.
Mar 26 06:05:52 kafka1.com systemd[1]: start request repeated too quickly 
for kafka.service
Mar 26 06:05:52 kafka1.com systemd[1]: Failed to start Kafka Daemon.
Mar 26 06:05:52 kafka1.com systemd[1]: Unit kafka.service entered failed 
state.
Mar 26 06:05:52 kafka1.com systemd[1]: kafka.service failed.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/c0f1024c-2941-4591-b3b6-827ca06e0e28%40googlegroups.com.


Re: [prometheus-users] Get https://ip/metrics: x509: cannot validate certificate for ip because it doesn't contain any IP SANs

2020-03-25 Thread Jack Chew
Thank Cameron Kerr,

I follow 
https://groups.google.com/d/msg/prometheus-users/7SzbGIWpiD4/kwVEG8blBAAJ 
have been solve the issue.

在 2020年3月25日星期三 UTC+8上午9:42:06,Cameron Kerr写道:
>
> From the error message, it would appear that you are communicating via the 
> IP and not the DNS name. You should communicate using the DNS name. If you 
> really want to communicate by IP (why? if DNS stability is a concern, use 
> /etc/hosts or similar), then you would need to have an IP type of entry in 
> the name (probably in addition to the DNS name).
>
> Having IPs in the certificate is not recommended (even deprecated, I 
> think) in CA certificates, and I wouldn't trust browsers to honour them. 
> Cf: https://www.geocerts.com/support/ip-address-in-ssl-certificate, which 
> discusses some of the pitfalls, although you may well decide that is not 
> valid for your deployment.
>
> This is like creating a self-signed certificate with a Subject Alternate 
> Name (aka, a SAN cert). This will allow you to put other names / aliases 
> into the certificate.
>
> However, the best thing would be to communicate using the hostname; or 
> turn of validation if you are comfortable with that, and can be bothered 
> supporting that (in case other things want to communicate with Prometheus, 
> such as Grafana or any ad-hoc reporting)
>
> When creating a self-signed certificate, you can include a 
> Subject-Alternate-Name (SAN). It appears to be more of requirement these 
> days according to the CA Browser forum, or so I'm led to believe by the 
> people who provide us with certificates.
>
> Here's some bash commands you can use (from my own notes)
>
> Tested for RHEL5, RHEL6, and RHEL7 (creating a self-signed certificate 
> with a SAN)
>
> First copy and edit the BASE, CN and SANs, and paste those into a 
> terminal, then paste the command.
>
> BASE=test
> CN="/CN=test.example.com"
> SANs="DNS:test.example.com,IP:192.168.12.23"
>
> openssl req -x509 -nodes -newkey rsa:2048 -days 3650 -sha256 \
>   -keyout /etc/pki/tls/private/$BASE-selfsigned.key \
>   -out/etc/pki/tls/certs/$BASE-selfsigned.cert \
>   -reqexts SAN -extensions SAN \
>   -subj "$CN" \
>   -config <(
> cat /etc/pki/tls/openssl.cnf
> printf "[SAN]\nsubjectAltName=$SANs"
>
> I hope you find that useful.
>
> Cheers,
> Cameron
>
> On Thursday, 19 March 2020 03:45:41 UTC+13, Jakub Jakubik wrote:
>>
>> do you have the target configured with the ip address or the domain? is 
>> the domain in the cert? with curl do you use the ip or hostname?
>>
>> On Wed, Mar 18, 2020 at 12:35 PM Jack Chew  wrote:
>>
>>> Hi team,
>>>
>>>
>>> I config prometheus configere file TLS path will arise  Get 
>>> https://ip:9100/metrics: x509: cannot validate certificate for ip 
>>> because it doesn't contain any IP SANs, But i try use curl is work. 
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Prometheus Users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to promethe...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/prometheus-users/577749e3-a177-46d2-b05f-a2c8b3697dbc%40googlegroups.com
>>>  
>>> 
>>> .
>>>
>>
>>
>> -- 
>> Kuba Jakubik
>>
>> SRE Tech Lead
>>
>> Netguru - Building software for world changers
>> jakub@netguru.com
>> netguru.com
>> [image: facebook]  [image: twitter] 
>>  [image: linkedin] 
>>  
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/d667df07-f3b4-45fd-b7ea-a48a45b9f696%40googlegroups.com.


Re: [prometheus-users] AlertManager firing duplicate alerts

2020-03-25 Thread Christian Hoffmann
Hi,

you seem to be using external_labels without alert_relabel_configs to
drop this label from your alerts again. Therefore, your alerts will have
different labels and will not be de-duplicated.

See this blog post:
https://www.robustperception.io/high-availability-prometheus-alerting-and-notification

It has an example for the dc label (where you would need replica).

Kind regards,
Christian

On 3/25/20 5:17 AM, sunil sagar wrote:
> Hi , 
> 
> I have Prometheus environment in HA mode . And AlertManager is also in
> HA mode . 
> I am receiving duplicate alerts . 
> When I start both the prometheus , because of global label with
> different replica name , getting duplicate alert . Please advise. 
> 
> Prometheus Config:
> 
> Prometheus Node1:
> global:
>    external_labels:
>       replica: 1
> 
> alerting:
>   alertmanagers:
>      -static_configs:
>          - targets: 
>               - alertmanager1:9093
>               - alertmanager2:9093
> 
> --
> Prometheus Node2:
> global:
>    external_labels:
>       replica: 1
> 
> alerting:
>   alertmanagers:
>      -static_configs:
>          - targets: 
>               - alertmanager1:9093
>               - alertmanager2:9093
> -
> Sample AlertManager rule:
> expr: max(up == 0 ) by (host)
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to prometheus-users+unsubscr...@googlegroups.com
> .
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/c1b788da-59c0-439c-8fc2-d1ded5f0bf46%40googlegroups.com
> .

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/d27b20d4-9732-298f-85e1-d94a02a66db7%40hoffmann-christian.info.


[prometheus-users] k8s service discovery is failing

2020-03-25 Thread Steve


Hi

I have been struggling with a RBAC issue and I cannot figure it out.

Help please!


I have node exporter running in my cluster.

As you know, it is a deamonSet and there is a node_exporter pod running on 
each node.

I also have a Prometheus server also running in the same namespace as the 
node_exporter deamonSet i.e. the default namespace.


The scrape job for node _exporter is using a SD configuration for pods as 
follows:

- job_name: prometheus_node_exporter

  honor_timestamps: true

  scrape_interval: 15s

  scrape_timeout: 10s

  metrics_path: /metrics

  scheme: http

  kubernetes_sd_configs:

  - role: pod

...



If I setup my Prometheus Server to use a cluster role, the node_exporter 
targets are properly discovered. So far so good!


Now if I try to reduce the Prometheus Server to use a role instead, then it 
does not work.


As far as I know if the role includes listing any pods within the same 
namespace of the Prometheus Server service account, then the API server 
should grant access.

However, this is not the case. This is the log message I get from 
Prometheus Server:

level=error ts=2020-03-25T13:57:53.652Z caller=klog.go:94 
component=k8s_client_runtime func=ErrorDepth 
msg="/app/discovery/kubernetes/kubernetes.go:385: Failed to list *v1.Pod: 
pods is forbidden: User \"system:serviceaccount:default:prometheus-server\" 
cannot list resource \"pods\" in API group \"\" at the cluster scope"


Below is role I used for the Prometheus Server service account:

apiVersion: rbac.authorization.k8s.io/v1

kind: Role

metadata:

  creationTimestamp: "2020-03-25T13:40:13Z"

  labels:

app: prometheus

component: server

heritage: Helm

release: my-server

  name: prometheus-server

  namespace: default

  resourceVersion: "1943"

  selfLink: 
/apis/rbac.authorization.k8s.io/v1/namespaces/default/roles/prometheus-server

  uid: 28d3c869-894d-4797-9146-6137f60c7232

rules:

- apiGroups:

  - ""

  resources:

  - pods

  - configmaps

  verbs:

  - get

  - list

  - watch

 

 

Below is the role binding I used for Prometheus Server service account:

 

apiVersion: rbac.authorization.k8s.io/v1

kind: RoleBinding

metadata:

  creationTimestamp: "2020-03-25T13:40:13Z"

  labels:

app: prometheus

chart: prometheus-10.5.1-steve-server-12

component: server

heritage: Helm

release: my-server

  name: prometheus-server

  namespace: default

  resourceVersion: "1946"

  selfLink: 
/apis/rbac.authorization.k8s.io/v1/namespaces/default/rolebindings/prometheus-server

  uid: d581c497-52d6-4080-8ade-e33008c019fd

roleRef:

  apiGroup: rbac.authorization.k8s.io

  kind: Role

  name: prometheus-server

subjects:

- kind: ServiceAccount

  name: prometheus-server

  namespace: default

 

 

Thank you!

 

Regards

Steve B

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/2b7bd5cf-4fb8-4b5c-991a-f755aaf86106%40googlegroups.com.


Re: [prometheus-users] AlertManager firing duplicate alerts

2020-03-25 Thread sunil sagar
Hi All , 

Corrected on error :
Node 2 is marked as replica2 
> 
> Prometheus Config:
> 
> Prometheus Node1:
> global:
>external_labels:
>   replica: 1
> 
> alerting:
>   alertmanagers:
>  -static_configs:
>  - targets: 
>   - alertmanager1:9093
>   - alertmanager2:9093
> 
> --
> Prometheus Node2:
> global:
>external_labels:
>   replica: 2
> 
> alerting:
>   alertmanagers:
>  -static_configs:
>  - targets: 
>   - alertmanager1:9093
>   - alertmanager2:9093
> -



Thanks

> On 25 Mar 2020, at 12:18 PM, sunil sagar  wrote:
> 
> 
> Hi , 
> 
> I have Prometheus environment in HA mode . And AlertManager is also in HA 
> mode . 
> I am receiving duplicate alerts . 
> When I start both the prometheus , because of global label with different 
> replica name , getting duplicate alert . Please advise. 
> 
> Prometheus Config:
> 
> Prometheus Node1:
> global:
>external_labels:
>   replica: 1
> 
> alerting:
>   alertmanagers:
>  -static_configs:
>  - targets: 
>   - alertmanager1:9093
>   - alertmanager2:9093
> 
> --
> Prometheus Node2:
> global:
>external_labels:
>   replica: 1
> 
> alerting:
>   alertmanagers:
>  -static_configs:
>  - targets: 
>   - alertmanager1:9093
>   - alertmanager2:9093
> -
> Sample AlertManager rule:
> expr: max(up == 0 ) by (host)
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to prometheus-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/prometheus-users/c1b788da-59c0-439c-8fc2-d1ded5f0bf46%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/512BCF6B-BDA3-436A-BD9C-D2C0F3522F92%40gmail.com.


Re: [prometheus-users] Re: Monitoring Network from a distance ?

2020-03-25 Thread Ilhem Hamdi
Thanks a lot for your responses .
>From  the initial testes Black box exporter works just fine for me . thanks
for the suggestion.

Yes, I do have access to the backends servers

Le mer. 25 mars 2020 à 03:08, Cameron Kerr  a
écrit :

> If you have a VIP, then chances are you have multiple servers behind it
> and have some type of health-check URI/mechanism in place (or using a
> default one, such as whether a TCP connection or ICMP ping succeeds).
>
> I would suggest that you use the blackbox_exporter, but you use it to
> monitor:
>
> * each backends healthcheck (directly, in the same manner that your
> load-balancer would) --- this will help you know when your application is
> ABOUT to fail (eg. a rolling upgrade is causing things to fall out of the
> load-balancer).
> * each backends application statistics (this could be a combination of
> things like node_exporter, plus perhaps things like node_exporter's file
> metrics, or jmx_exporter, or 
> * the application as experienced at the front-door(s). You might also
> imagine having multiple targets for this if you have multiple points of
> presence for your application (eg. CDNs) or just want to test from
> different places.
> * test SSL certificates too (I believe you can do this using
> blackbox_exporter, but I haven't looked myself). If you terminate SSL in
> multiple places (eg. your load-balancer deployment is layer-4 and not
> layer-7) then monitor all of these places.
> * if testing per-server application readiness/health which is designed to
> bust through caching layers and exercise the various backends such as
> database and storage, then you might want to change the scrape interval for
> that... or use something like node_exporter with the file metrics so you
> can have some other testing/readiness engine running tests and then
> reporting the results quickly.
>
> Note that load-balancer configuration may influence error behaviour. For
> example, if the VIP goes down, does ARP respond? If it doesn't, then that
> may cause probes to take much longer before timing out, so be mindful of
> timeout settings for blackbox_exporter and tune if required.
>
> You didn't say whether you had access to the various backends, so I've
> made some assumptions here, as if you're like where I work, you have access
> to the servers, but perhaps not access to the load-balancer.
>
> There's lots to think about with this topic; just take what's useful.
>
> Cheers,
> Cameron
>
> On Thursday, 19 March 2020 22:54:00 UTC+13, Ilhem Hamdi wrote:
>>
>> Hello,
>>
>> I would like to Monitor VIP on F5 by :
>> - Monitoring  the ssl certfications
>> - Monitoring the status of VIP ( Up /down)
>>
>> The probleme is I don't have access on F5 , all I have is the URLS  to
>> check the connectivity of the VIP or I use curl command  to check the
>> expiration of certifcations  I need to Monitor these VIP via Prometheus ans
>> send alerts to clients .  I konw that you can configure exporter to do the
>> job for you but  is it possible if you don't have acess to network
>> equipeemnt  ? any recommendation  ?
>>
>> Thanks
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to prometheus-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/08371e91-ec74-4c0d-bc66-3a3a2c4dd024%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAHuiLcfBx9xGudq1kr9u38mBVNnEEKPCtktzAX5qDPoWbjf-Hw%40mail.gmail.com.


[prometheus-users] Re: Prometheus Performance issues

2020-03-25 Thread Brian Candler
If you run a PromQL query, it will automatically access data from wherever 
it's needed - in RAM, in stored chunks, or both, depending on what time 
range the query covers.

However if you query data outside the retention period then obviously it 
may not be there at all - depending on exactly when the old chunks are 
dropped.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/bdfd5184-b90f-47c1-bf98-1b0627c54588%40googlegroups.com.


[prometheus-users] Re: Unknown index type IpAddress

2020-03-25 Thread Brian Candler
How did you create the snmp.yml - using generator, or by hand?

In the sample configs I see "type: InetAddress" and "type: 
InetAddressIPv4", but not "type: IpAddress"

Notice the following in generator/README.md 

:

 type: DisplayString # Override the metric type, possible types are:
 #   gauge:   An integer with type gauge.
 #   counter: An integer with type counter.
 #   OctetString: A bit string, rendered as 
0xff34.
 #   DateAndTime: An RFC 2579 DateAndTime byte 
sequence. If the device has no time zone data, UTC is used.
 #   DisplayString: An ASCII or UTF-8 string.
 #   PhysAddress48: A 48 bit MAC address, 
rendered as 00:01:02:03:04:ff.
 #   Float: A 32 bit floating-point value with 
type gauge.
 #   Double: A 64 bit floating-point value with 
type gauge.
 #   InetAddressIPv4: An IPv4 address, rendered 
as 1.2.3.4.
 #   InetAddressIPv6: An IPv6 address, rendered 
as 0102:0304:0506:0708:090A:0B0C:0D0E:0F10.
 #   InetAddress: An InetAddress per RFC 4001. 
Must be preceded by an InetAddressType.
 #   InetAddressMissingSize: An InetAddress 
that violates section 4.1 of RFC 4001 by
 #   not having the size in the index. Must 
be preceded by an InetAddressType.
 #   EnumAsInfo: An enum for which a single 
timeseries is created. Good for constant values.
 #   EnumAsStateSet: An enum with a time series 
per state. Good for variable low-cardinality enums.
 #   Bits: An RFC 2578 BITS construct, which 
produces a StateSet with a time series per bit.

I don't see "IpAddress" in that list.  Where did it come from?

On Wednesday, 25 March 2020 09:39:11 UTC, Chris McKean wrote:
>
> Hi All,
>   I'm trying to get the snmp exporter working so I can get BGP peer 
> state metrics off our routers.  It works locally on my machine but when I 
> run the same config on the server I need this to run on it doesn't work.
>

Define "works locally" versus "doesn't work" - exactly what do you see on 
the local machine, as compared to what you see when deployed on the server?

I would suggest at very least you need to test both machines with curl, e.g.

curl 'localhost:9117/snmp?module=bgp4&target=x.x.x.x'

Run this both on the local machine and on the server, and use the same 
target in both cases.  What's the difference in behaviour?

Are you using the same version of snmp_exporter on both the local machine 
and the server?  That's the only thing I can think of which would have any 
effect.

However, I'm not yet convinced of your claim that the snmp.yml you've shown 
actually works on the local machine.


 

>   So there's obviously some config I'm missing on the server.  I don't 
> think, it's the MIB file that's missing.  If I run snmptranslate and 
> specify the MIB is brings back a numerical value as expected
>
>
The running snmp_exporter doesn't need MIBs, nor does it need any part of 
net-snmp to be present.  It's only the generator which needs this.  
snmp_exporter just reads the YAML file, nothing else.


When I try and query the 'bgp4' module this is the output I get.  
>
>
> *level=info ts=2020-03-24T10:53:35.919Z caller=main.go:149 msg="Starting 
> snmp_exporter" version="(version=0.17.0, branch=HEAD, 
> revision=f0ad4551a5c2023e383bc8ddef47dc760b83)"*
> *level=info ts=2020-03-24T10:53:35.919Z caller=main.go:150 
> build_context="(go=go1.13.8, user=root@cb51f17d52f8, 
> date=20200217-09:26:25)"*
> *level=info ts=2020-03-24T10:53:35.932Z caller=main.go:243 msg="Listening 
> on address" address=:9117*
> *panic: Unknown index type IpAddress*
>
>
Does the panic occur as soon as snmp_exporter starts up?  Or only when you 
send the query, e.g. with curl as above?

 

> *goroutine 29 [running]:*
> *main.indexOidsAsString(0xc00032e1d0, 0x4, 0x6, 0xc0001da840, 0x9, 0x0, 
> 0x415800, 0x7fb91dc8fa00, 0x300, 0x7fb91de76fff, ...)*
> * /app/collector.go:665 +0x1870*
> *main.indexesToLabels(0xc00032e1d0, 0x4, 0x6, 0xc00021b280, 0xc00027da80, 
> 0x8001c00027d928)*
> * /app/collector.go:676 +0x169*
> *main.pduToSamples(0xc00032e1d0, 0x4, 0x6, 0xc00027dab0, 0xc00021b280, 
> 0xc00027da80, 0xabee20, 0xc000252360, 0x0, 0x0, ...)*
> * /app/collector.go:329 +0x77*
> *main.collector.Collect(0xaca540, 0xc98dc0, 0xcca651, 0xb, 
> 0xc000124d20, 0xabee20, 0xc000252360, 0xc95020)*
> * /app/collector.go:247 +0x934*
> *github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1() 
> *
> * 
> /app

[prometheus-users] Re: Prometheus Performance issues

2020-03-25 Thread adi garg
Thanks, Brian for the suggestion. I have one more doubt, assume our 
retention period is 2hrs and we fire a query to get data for 4hrs, so it's 
gonna fire two queries, one for the data in RAM and other for the external 
storage that we are using?
 

On Wednesday, March 25, 2020 at 1:21:35 PM UTC+5:30, Brian Candler wrote:
>
> You are getting 10,000 metrics from each node and there are 50 nodes, 
> that's 500,000 timeseries.
>
> Your prometheus server will need significantly more than 500MB of RAM to 
> handle that.  See:
>
> https://www.robustperception.io/how-much-ram-does-prometheus-2-x-need-for-cardinality-and-ingestion
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/b99bcb2e-2a95-431b-a070-09f67e622926%40googlegroups.com.


[prometheus-users] Re: Unknown index type IpAddress

2020-03-25 Thread Chris McKean
Sorry, this is for the snmp_exporter here

"https://github.com/prometheus/snmp_exporter";

On Wednesday, 25 March 2020 09:39:11 UTC, Chris McKean wrote:
>
> Hi All,
>   I'm trying to get the snmp exporter working so I can get BGP peer 
> state metrics off our routers.  It works locally on my machine but when I 
> run the same config on the server I need this to run on it doesn't work.  
> So there's obviously some config I'm missing on the server.  I don't think, 
> it's the MIB file that's missing.  If I run snmptranslate and specify the 
> MIB is brings back a numerical value as expected
>
> *snmptranslate -m BGP4-MIB -IR -On bgpPeerState *
>
> *.1.3.6.1.2.1.15.3.1.2*
>
> When I try and query the 'bgp4' module this is the output I get.  
>
>
> *level=info ts=2020-03-24T10:53:35.919Z caller=main.go:149 msg="Starting 
> snmp_exporter" version="(version=0.17.0, branch=HEAD, 
> revision=f0ad4551a5c2023e383bc8ddef47dc760b83)"*
> *level=info ts=2020-03-24T10:53:35.919Z caller=main.go:150 
> build_context="(go=go1.13.8, user=root@cb51f17d52f8, 
> date=20200217-09:26:25)"*
> *level=info ts=2020-03-24T10:53:35.932Z caller=main.go:243 msg="Listening 
> on address" address=:9117*
> *panic: Unknown index type IpAddress*
>
> *goroutine 29 [running]:*
> *main.indexOidsAsString(0xc00032e1d0, 0x4, 0x6, 0xc0001da840, 0x9, 0x0, 
> 0x415800, 0x7fb91dc8fa00, 0x300, 0x7fb91de76fff, ...)*
> * /app/collector.go:665 +0x1870*
> *main.indexesToLabels(0xc00032e1d0, 0x4, 0x6, 0xc00021b280, 0xc00027da80, 
> 0x8001c00027d928)*
> * /app/collector.go:676 +0x169*
> *main.pduToSamples(0xc00032e1d0, 0x4, 0x6, 0xc00027dab0, 0xc00021b280, 
> 0xc00027da80, 0xabee20, 0xc000252360, 0x0, 0x0, ...)*
> * /app/collector.go:329 +0x77*
> *main.collector.Collect(0xaca540, 0xc98dc0, 0xcca651, 0xb, 
> 0xc000124d20, 0xabee20, 0xc000252360, 0xc95020)*
> * /app/collector.go:247 +0x934*
> *github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1() 
> *
> * 
> /app/vendor/github.com/prometheus/client_golang/prometheus/registry.go:443 
>  
> +0x19d*
> *created by 
> github.com/prometheus/client_golang/prometheus.(*Registry).Gather 
> *
> * 
> /app/vendor/github.com/prometheus/client_golang/prometheus/registry.go:454 
>  
> +0x57d*
>
>
> Has anyone any idead on this.  The server is running Centos 7.5 with 
> netsnmp installed.
>
> Here is the section of the bgp4 module on in the snmp.yml file
>
>
> bgp4:
> auth: 
> community: "x"
> version: 2
> walk:
> - 1.3.6.1.2.1.15.2
> - 1.3.6.1.2.1.15.3
> metrics:
> - name: bgpLocalAs
> oid: 1.3.6.1.2.1.15.2
> - name: bgpPeerState
> oid: 1.3.6.1.2.1.15.3.1.2
> indexes:
> - labelname: bgpPeerIdentifier
> type: IpAddress
> lookups:
> - labels: [bgpPeerIdentifier]
> labelname: bgpPeerRemoteAddr
> oid: 1.3.6.1.2.1.15.3.1.7
> - name: bgpPeerAdminStatus
> oid: 1.3.6.1.2.1.15.3.1.3
> indexes:
> - labelname: bgpPeerIdentifier
> type: IpAddress
> lookups:
> - labels: [bgpPeerIdentifier]
> labelname: bgpPeerRemoteAddr
> oid: 1.3.6.1.2.1.15.3.1.7
> - name: bgpPeerNegotiatedVersion
> oid: 1.3.6.1.2.1.15.3.1.4
> indexes:
> - labelname: bgpPeerIdentifier
> type: IpAddress
> lookups:
> - labels: [bgpPeerIdentifier]
> labelname: bgpPeerRemoteAddr
> oid: 1.3.6.1.2.1.15.3.1.7
> - name: bgpPeerLocalPort
> oid: 1.3.6.1.2.1.15.3.1.6
> indexes:
> - labelname: bgpPeerIdentifier
> type: IpAddress
> lookups:
> - labels: [bgpPeerIdentifier]
> labelname: bgpPeerRemoteAddr
> oid: 1.3.6.1.2.1.15.3.1.7
> - name: bgpPeerRemotePort
> oid: 1.3.6.1.2.1.15.3.1.8
> indexes:
> - labelname: bgpPeerIdentifier
> type: IpAddress
> lookups:
> - labels: [bgpPeerIdentifier]
> labelname: bgpPeerRemoteAddr
> oid: 1.3.6.1.2.1.15.3.1.7
> - name: bgpPeerRemoteAs
> oid: 1.3.6.1.2.1.15.3.1.9
> indexes:
> - labelname: bgpPeerIdentifier
> type: IpAddress
> lookups:
> - labels: [bgpPeerIdentifier]
> labelname: bgpPeerRemoteAddr
> oid: 1.3.6.1.2.1.15.3.1.7
> - name: bgpPeerInUpdates
> oid: 1.3.6.1.2.1.15.3.1.10
> indexes:
> - labelname: bgpPeerIdentifier
> type: IpAddress
> lookups:
> - labels: [bgpPeerIdentifier]
> labelname: bgpPeerRemoteAddr
> oid: 1.3.6.1.2.1.15.3.1.7
> - name: bgpPeerOutUpdates
> oid: 1.3.6.1.2.1.15.3.1.11
> indexes:
> - labelname: bgpPeerIdentifier
> type: IpAddress
> lookups:
> - labels: [bgpPeerIdentifier]
> labelname: bgpPeerRemoteAddr
> oid: 1.3.6.1.2.1.15.3.1.7
> - name: bgpPeerInTotalMessages
> oid: 1.3.6.1.2.1.15.3.1.12
> indexes:
> - labelname: bgpPeerIdentifier
> type: IpAddress
> lookups:
> - labels: [bgpPeerIdentifier]
> labelname: bgpPeerRemoteAddr
> oid: 1.3.6.1.2.1.15.3.1.7
> - name: bgpPeerOutTotalMessages
> oid: 1.3.6.1.2.1.15.3.1.13
> indexes:
> - labelname: bgpPeerIdentifier
> type: IpAddress
> looku

[prometheus-users] Unknown index type IpAddress

2020-03-25 Thread Chris McKean
Hi All,
  I'm trying to get the snmp exporter working so I can get BGP peer 
state metrics off our routers.  It works locally on my machine but when I 
run the same config on the server I need this to run on it doesn't work.  
So there's obviously some config I'm missing on the server.  I don't think, 
it's the MIB file that's missing.  If I run snmptranslate and specify the 
MIB is brings back a numerical value as expected

*snmptranslate -m BGP4-MIB -IR -On bgpPeerState *

*.1.3.6.1.2.1.15.3.1.2*

When I try and query the 'bgp4' module this is the output I get.  


*level=info ts=2020-03-24T10:53:35.919Z caller=main.go:149 msg="Starting 
snmp_exporter" version="(version=0.17.0, branch=HEAD, 
revision=f0ad4551a5c2023e383bc8ddef47dc760b83)"*
*level=info ts=2020-03-24T10:53:35.919Z caller=main.go:150 
build_context="(go=go1.13.8, user=root@cb51f17d52f8, 
date=20200217-09:26:25)"*
*level=info ts=2020-03-24T10:53:35.932Z caller=main.go:243 msg="Listening 
on address" address=:9117*
*panic: Unknown index type IpAddress*

*goroutine 29 [running]:*
*main.indexOidsAsString(0xc00032e1d0, 0x4, 0x6, 0xc0001da840, 0x9, 0x0, 
0x415800, 0x7fb91dc8fa00, 0x300, 0x7fb91de76fff, ...)*
* /app/collector.go:665 +0x1870*
*main.indexesToLabels(0xc00032e1d0, 0x4, 0x6, 0xc00021b280, 0xc00027da80, 
0x8001c00027d928)*
* /app/collector.go:676 +0x169*
*main.pduToSamples(0xc00032e1d0, 0x4, 0x6, 0xc00027dab0, 0xc00021b280, 
0xc00027da80, 0xabee20, 0xc000252360, 0x0, 0x0, ...)*
* /app/collector.go:329 +0x77*
*main.collector.Collect(0xaca540, 0xc98dc0, 0xcca651, 0xb, 
0xc000124d20, 0xabee20, 0xc000252360, 0xc95020)*
* /app/collector.go:247 +0x934*
*github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1()*
* 
/app/vendor/github.com/prometheus/client_golang/prometheus/registry.go:443 
+0x19d*
*created by 
github.com/prometheus/client_golang/prometheus.(*Registry).Gather*
* 
/app/vendor/github.com/prometheus/client_golang/prometheus/registry.go:454 
+0x57d*


Has anyone any idead on this.  The server is running Centos 7.5 with 
netsnmp installed.

Here is the section of the bgp4 module on in the snmp.yml file


bgp4:
auth: 
community: "x"
version: 2
walk:
- 1.3.6.1.2.1.15.2
- 1.3.6.1.2.1.15.3
metrics:
- name: bgpLocalAs
oid: 1.3.6.1.2.1.15.2
- name: bgpPeerState
oid: 1.3.6.1.2.1.15.3.1.2
indexes:
- labelname: bgpPeerIdentifier
type: IpAddress
lookups:
- labels: [bgpPeerIdentifier]
labelname: bgpPeerRemoteAddr
oid: 1.3.6.1.2.1.15.3.1.7
- name: bgpPeerAdminStatus
oid: 1.3.6.1.2.1.15.3.1.3
indexes:
- labelname: bgpPeerIdentifier
type: IpAddress
lookups:
- labels: [bgpPeerIdentifier]
labelname: bgpPeerRemoteAddr
oid: 1.3.6.1.2.1.15.3.1.7
- name: bgpPeerNegotiatedVersion
oid: 1.3.6.1.2.1.15.3.1.4
indexes:
- labelname: bgpPeerIdentifier
type: IpAddress
lookups:
- labels: [bgpPeerIdentifier]
labelname: bgpPeerRemoteAddr
oid: 1.3.6.1.2.1.15.3.1.7
- name: bgpPeerLocalPort
oid: 1.3.6.1.2.1.15.3.1.6
indexes:
- labelname: bgpPeerIdentifier
type: IpAddress
lookups:
- labels: [bgpPeerIdentifier]
labelname: bgpPeerRemoteAddr
oid: 1.3.6.1.2.1.15.3.1.7
- name: bgpPeerRemotePort
oid: 1.3.6.1.2.1.15.3.1.8
indexes:
- labelname: bgpPeerIdentifier
type: IpAddress
lookups:
- labels: [bgpPeerIdentifier]
labelname: bgpPeerRemoteAddr
oid: 1.3.6.1.2.1.15.3.1.7
- name: bgpPeerRemoteAs
oid: 1.3.6.1.2.1.15.3.1.9
indexes:
- labelname: bgpPeerIdentifier
type: IpAddress
lookups:
- labels: [bgpPeerIdentifier]
labelname: bgpPeerRemoteAddr
oid: 1.3.6.1.2.1.15.3.1.7
- name: bgpPeerInUpdates
oid: 1.3.6.1.2.1.15.3.1.10
indexes:
- labelname: bgpPeerIdentifier
type: IpAddress
lookups:
- labels: [bgpPeerIdentifier]
labelname: bgpPeerRemoteAddr
oid: 1.3.6.1.2.1.15.3.1.7
- name: bgpPeerOutUpdates
oid: 1.3.6.1.2.1.15.3.1.11
indexes:
- labelname: bgpPeerIdentifier
type: IpAddress
lookups:
- labels: [bgpPeerIdentifier]
labelname: bgpPeerRemoteAddr
oid: 1.3.6.1.2.1.15.3.1.7
- name: bgpPeerInTotalMessages
oid: 1.3.6.1.2.1.15.3.1.12
indexes:
- labelname: bgpPeerIdentifier
type: IpAddress
lookups:
- labels: [bgpPeerIdentifier]
labelname: bgpPeerRemoteAddr
oid: 1.3.6.1.2.1.15.3.1.7
- name: bgpPeerOutTotalMessages
oid: 1.3.6.1.2.1.15.3.1.13
indexes:
- labelname: bgpPeerIdentifier
type: IpAddress
lookups:
- labels: [bgpPeerIdentifier]
labelname: bgpPeerRemoteAddr
oid: 1.3.6.1.2.1.15.3.1.7
- name: bgpPeerFsmEstablishedTransitions
oid: 1.3.6.1.2.1.15.3.1.15
indexes:
- labelname: bgpPeerIdentifier
type: IpAddress
lookups:
- labels: [bgpPeerIdentifier]
labelname: bgpPeerRemoteAddr
oid: 1.3.6.1.2.1.15.3.1.7
- name: bgpPeerFsmEstablishedTime
oid: 1.3.6.1.2.1.15.3.1.16
indexes:
- labelname: bgpPeerIdentifier
type: IpAddress
lookups:
- labels: [bgpPeerIdentifier]
labelname: bgpPeerRemoteAddr
oid: 1.3.6.1.2.1.15.3.1.7
- name: bgpPeerConnectRetryInterval
oid: 1.3.6.1.2.1.15.3.1.17
indexes:
- labelname: bgpPeerIdentifier
type: IpAddress
lookups:
- labels: [bgpPeerIdentifier]
labelname: bgpPeerRemoteAddr
oid: 1.3.6.1.2.1.15.3.1.7
- name: bgpPeerHoldTime
oid: 1.

[prometheus-users] Re: Setup telegraf + Prometheus + Grafana on CentOS Linux release 7.7.1908 (Core).

2020-03-25 Thread Daniel Horecki
Telegraf has a plugin to act as a prometheus client. Then it can pull data 
from it:

https://github.com/influxdata/telegraf/tree/master/plugins/outputs/prometheus_client

On Tuesday, 24 March 2020 19:58:03 UTC, Brian Candler wrote:
>
> On Tuesday, 24 March 2020 18:41:48 UTC, Kaushal Shriyan wrote:
>>
>> Any plugin or module to allow telegraf agent to push metrics to the 
>> Prometheus time-series database?
>>
>>
> Sorry if I wasn't clear before, but I can't think of a clearer way to say 
> it: *you cannot push data to Prometheus*.
>
> Prometheus is a pull-only system.  Prometheus connects to data sources to 
> ingest data.
>
> https://prometheus.io/blog/2016/07/23/pull-does-not-scale-or-does-it/
>
> Regards,
>
> Brian.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/8be11569-111b-4a78-8ea8-5ef2367a5b5b%40googlegroups.com.


[prometheus-users] Re: Data retention policy

2020-03-25 Thread REMI DRUILHE
Anyone to help me on this subject?

Thanks.

Le mardi 24 mars 2020 13:57:05 UTC+1, REMI DRUILHE a écrit :
>
> Hello,
>
> I am using a Docker version of Prometheus (the latest one) in which I am 
> setting the value for the retention time (option 
> *storage.tsdb.retention.time*) at launch. I trying first with small 
> values like *1m* and *60s* (for 1 minute) or *5m* (for 5 minutes) but I 
> was not able to see any deletion of the data after this period. Thus, I 
> tested again this morning with an higher value: *2h* (for 2 hours). I 
> have seen this information here 
> 
>  
> (in the retention section) that 2h is the minimum value for Prometheus, not 
> sure if it is true or not. But even with this value, the data stored 4 
> hours ago can still be retrieved using the HTTP API. Here is my query: *curl 
> 'http://172.18.0.3:9090/api/v1/query_range?query=go_memstats_alloc_bytes&start=2020-03-24T00:01:00.000Z&end=2020-03-24T17:00:00.000Z&step=15s
>  
> '*
> .
>
> On the Prometheus GUI, the flag for this option is correctly setup.
>
> Here is the Docker compose file that is used to launch Prometheus:
>
>   prometheus:
> command: '--config.file=/etc/prometheus.yml 
> --storage.tsdb.retention.time=2h'
> container_name: remi_prometheus
> depends_on:
> - cadvisor
> expose:
> - '9090'
> image: prom/prometheus:latest
> labels:
>   project.run.user: remi
> networks:
>   project-bridge:
> aliases:
> - prometheus
> ports:
> - published: 9090
>   target: 9090
> volumes:
> - remi-prometheus:/prometheus:rw
> - /home/remi/Workspace/project/runtime/configuration/prometheus.yml:
> /etc/prometheus.yml:rw
>
> Note that I am using cAdvisor to populate Prometheus in which I get some 
> Golang metrics.
>
> Thus here are my questions:
>
>- What am I doing wrong?
>- Was is the minimum value for storage.tsdb.retention.time?
>- Is there another option that overwrite the value set in 
>storage.tsdb.retention.time that I am not aware of?
>- Is there a way to test that the option is working if the minimal 
>value is 1 day? I guess it would be to change the system date, but not 
> sure 
>about it.
>
>
> Thanks for the help,
>
> Best regards and good luck if you are under confinement because of this 
> damn virus :)
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/71ecd8b0-ff55-4b99-babd-4566d6b32356%40googlegroups.com.


[prometheus-users] Re: Writing an alert rule to find interfaces with traffic above weekly 95% percentile

2020-03-25 Thread Brian Candler
On Wednesday, 25 March 2020 00:15:53 UTC, Cameron Kerr wrote:
>
> What should the query be to give me a single value for each series 
> {vpn,ifName} that would give the 95th percentile based on the past N days?
>
>
Something like 

:

quantile_over_time(0.95, ...)

where ... is a range vector, so you'll want to put a subquery 
 in 
there, e.g. (expr)[7d:5m] if you're thinking about the 95th percentile 
based on 5-minute samples, which is typically what people want.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/8c5690ba-e5d3-4fa9-b780-e08144c43cf2%40googlegroups.com.


[prometheus-users] Re: Getting all the checks for a server in the same dashboard.

2020-03-25 Thread Brian Candler
Label all the metrics collected from a particular host with the same 
instance label - e.g. "foo.example.com" not "foo.example.com:9100".  See
https://www.robustperception.io/controlling-the-instance-label

Then you can easily configure your dashboard to show all graphs for a 
particular value of the instance label.

As for "I don't want to get empty panels": this will be a feature of 
whatever dashboard software you are using.  I suggest you ask on the 
mailing list of whatever software that is. Prometheus itself doesn't have 
dashboards.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/f79fed22-cab4-4f18-ab42-4b84302a29b0%40googlegroups.com.


[prometheus-users] Re: Prometheus Performance issues

2020-03-25 Thread Brian Candler
You are getting 10,000 metrics from each node and there are 50 nodes, 
that's 500,000 timeseries.

Your prometheus server will need significantly more than 500MB of RAM to 
handle that.  See:
https://www.robustperception.io/how-much-ram-does-prometheus-2-x-need-for-cardinality-and-ingestion

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/111932ef-f883-4d7d-b360-8657d16c5f5c%40googlegroups.com.


Re: [prometheus-users] Re: Prometheus Performance issues

2020-03-25 Thread Ben Kochie
Your Prometheus version is almost 2 years old. You need to upgrade to
2.17.0.

On Wed, Mar 25, 2020 at 8:08 AM adi garg  wrote:

> Please help?? Do I need to specify any other detail?
>
> On Wednesday, March 25, 2020 at 4:12:28 AM UTC+5:30, adi garg wrote:
>>
>> Hello experts, I was stress-testing Prometheus on the AWS cluster.
>> Prometheus version = 2.3
>> Number of worker nodes in a cluster = 50
>> scrape_interval = 15s
>> I am getting around 10,000 metrics from a node and I am running
>> Prometheus in a docker container with memory specified to 500MB.
>> My avg metric size comes out to be around 2.3kb, which according to me is
>> unexpectedly higher. Moreover, the Docker container keeps getting burst out
>> every (10-15)s and then restarting again.
>> Can somebody tells me what could be the reason for this?
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to prometheus-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/6073cdfc-5622-4a4f-a27f-7e9b0fe06339%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CABbyFmoc15QgkAmf9RkWLXimbdmhPhh9NJNU6PTC7591SCFYaQ%40mail.gmail.com.


[prometheus-users] Re: Prometheus Performance issues

2020-03-25 Thread adi garg
Please help?? Do I need to specify any other detail?

On Wednesday, March 25, 2020 at 4:12:28 AM UTC+5:30, adi garg wrote:
>
> Hello experts, I was stress-testing Prometheus on the AWS cluster. 
> Prometheus version = 2.3
> Number of worker nodes in a cluster = 50 
> scrape_interval = 15s
> I am getting around 10,000 metrics from a node and I am running Prometheus 
> in a docker container with memory specified to 500MB.
> My avg metric size comes out to be around 2.3kb, which according to me is 
> unexpectedly higher. Moreover, the Docker container keeps getting burst out 
> every (10-15)s and then restarting again.
> Can somebody tells me what could be the reason for this?
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/6073cdfc-5622-4a4f-a27f-7e9b0fe06339%40googlegroups.com.