[prometheus-users] Re: Prometheus alert tracking

kedar sirshikar Tue, 02 Jun 2020 13:41:55 -0700

Please refer below details captured from prometheus container related to 
OS/Platform.


root@prometheus-hi-res-s101:/# /prometheus/prometheus --version

*prometheus, version 2.3.1* (branch: HEAD, revision: 
188ca45bd85ce843071e768d855722a9d9dabe03)

  build user:       root@82ef94f1b8f7

  build date:       20180619-15:56:22

  go version:       go1.10.3

root@prometheus-hi-res-s101:/#

root@prometheus-hi-res-s101:/# cat /etc/os-release 

NAME="*Ubuntu*"

VERSION="*16.04.2 LTS (Xenial Xerus)*"

ID=ubuntu

ID_LIKE=debian

PRETTY_NAME="Ubuntu 16.04.2 LTS"

VERSION_ID="16.04"

HOME_URL="http://www.ubuntu.com/";

SUPPORT_URL="http://help.ubuntu.com/";

BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/";

VERSION_CODENAME=xenial

UBUNTU_CODENAME=xenial

root@prometheus-hi-res-s101:/#


We have integrated tailf-confd (https://www.tail-f.com/confd-basic/) for 
CLI to configure alert rules and monitor alert status.

As mentioned below, alert is seen to be resolved in 5 seconds on few 
occurrences (alert mentioned in last email was resolved in 15 seconds)

NAME           EVENT HOST         STATUS MESSAGE                            
                          CREATE TIME            RESOLVE TIME            
UPDATE TIME  

PROCESS_STATE       haproxy-common-s109      resolved    haproxy-common 
instance 109 of module haproxy-common is moved from Aborted state !  
*2020-05-24T07:53:54*.044+00:00  *2020-05-24T07:53:59*.057+00:00  
2020-05-24T08:08:59.066+00:00  
PROCESS_STATE       binding-s122                       resolved    binding 
instance 122 of module binding is moved from Aborted state !                
                  *2020-06-01T23:45:43*.997+00:00  
*2020-06-01T23:45:48*.881+00:00  
2020-06-02T00:00:48.849+00:00 

Alert which gets resolved after 15 seconds can be justified as we have got 
supporting evidence from grafana but proof, for alerts which got resolved 
in 5 seconds, are absent in logs and grafana.
Not sure, if there is something to do with duration for which alert remains 
active.

I am parallely continuing investigations within our product's approach to 
deal with alerts. In case if you get any hint from above details please let 
me know.
Thank you.


On Monday, June 1, 2020 at 11:55:10 PM UTC-7, Brian Candler wrote:
>
> On Tuesday, 2 June 2020 00:16:17 UTC+1, kedar sirshikar wrote:
>>
>> Alert configuration is as below:
>>
>> admin@orchestrator[nd2bwa6drm01v]# show running-config alert rule 
>> PROCESS_STATE 
>> alert rule PROCESS_STATE
>>  expression         "docker_service_up==1 or docker_service_up==3"
>>  event-host-label   container_name
>>  message            "{{ $labels.service_name }} instance {{ 
>> $labels.module_instance }} of module {{ $labels.module }} is in Aborted 
>> state !"
>>  snmp-facility      application
>>  snmp-severity      critical
>>  snmp-clear-message "{{ $labels.service_name }} instance {{ 
>> $labels.module_instance }} of module {{ $labels.module }} is moved from 
>> Aborted state !"
>> !
>>
>>
> Could you explain what software and platform/OS you are running?
>
> This "show running-config" command doesn't look like any flavour of 
> prometheus I'm familiar with.  Is this some version of prometheus embedded 
> in another system?  If so, do you have any way to determine what the 
> underlying version of prometheus is?
>
> Also, regular prometheus doesn't generate events directly.  It generates 
> HTTP calls to alertmanager, which processes those events.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/b472f1b3-13ec-4532-baf6-61e5ff563157%40googlegroups.com.

[prometheus-users] Re: Prometheus alert tracking

Reply via email to