Re: [openstack-dev] [vitrage] Vitrage alarm processing behavior

Paul Vaduva Wed, 21 Feb 2018 05:24:29 -0800

Hi Ifat,

Sorry for the late reply.
To answer your questions
I started as an example from the doctor datasource (or a porting of it for the 
1.3.0 version of vitrage) but will call it something different so no need to 
worry about conflicting with present doctor datasource.
I added polling alarms to it but I have a more particular use case:
* I get compute host down alarm on event
* I can't get host up event or it's an intricate sollution to implement


I tried to see if I can make the following scenario work:
Let's call Scenario I
* Get a compute host down event (Raisng an alarm)
* Periodically poll for the status of the compute in method "def 
_get_alarms(self):" of the Driver object
Both type of Interactions seem to work (polling and event based).
However now comes the tricky part. I would need for the alarms (with status up 
/ compute host up) returned by method "def _get_alarms(self):" of this Driver 
object to cancel/clear the compute host down alarms raised by event. This 
unfortunatelly does not happen.

Oddely enough there is a mimic of this scenario that works but is not robust 
enough for out needs.
Let's call Scenario II:
* Gettting an event with compute host down(when one of our compute actually 
goes down)
* Polling alarm (also compute host down) is raised and somehow overwrites the 
event based one (I can see the updated time).
* After a while the actual compute reboots and polling for the alarms returns 
an alarm with status up that in this case clears the previous (I assume polling 
type now) alarm.

Now I can't understand why this second scenario works and the first one does 
not.
It seems as the same alarm type (compute host down with status down) obtained 
by polling can overwrite an identical type and status alarm raised by event, 
but An alarm with an updated status (i. e. up) got by polling mode cannot 
overwrite / clear and alarm with status down got by an event.
I am wondering if there is a reason of this behavior and if there is a way to 
modify it or is it a bug.

For the event's generation I use modified version of zabbix_vitrage.py script 
that publishes to rabbitmq
vitrage_notifications.info queue. I have attached this python script.
The code is still experimental But I wanted to know if it's logically posible 
to create The scenario we need, Scenario I.

Best Regards
Paul

From: Afek, Ifat (Nokia - IL/Kfar Sava) [mailto:ifat.a...@nokia.com]
Sent: Wednesday, February 7, 2018 7:16 PM
To: OpenStack Development Mailing List (not for usage questions) 
<openstack-dev@lists.openstack.org>
Cc: Ciprian Barbu <ciprian.ba...@enea.com>
Subject: Re: [openstack-dev] [vitrage] Vitrage alarm processing behavior

Hi Paul,

I’m glad that my fix helped.

Regarding the Doctor datasource: the purpose of this datasource was to be used 
by the Doctor test scripts. Do you intend to modify it, or to create a new 
similar datasource that also supports polling? Modifying the existing 
datasource could be problematic, since we need to make sure the existing 
functionality and tests stay the same.

In general, most of our datasources support both polling and notifications. A 
simple example is the Cinder datasource [1]. For example of an alarm 
datasource, you can look at Zabbix datasource [2]. You can also go over the 
documentation of how to add a new datasource [3].

As for your question, it is the responsibility of the datasource to clear the 
alarms that it created. For the Doctor datasource, you can send an event with 
“status”:”up” in the details and the datasource will clear the alarm.

[1] 
https://github.com/openstack/vitrage/tree/master/vitrage/datasources/cinder/volume<https://url10.mailanyone.net/v1/?m=1ejTL3-0003ZV-4n&i=57e1b682&c=Pe0SmnJrux3qg2aeVKwciP-we0PY0bk3JoTO_20fQHQ70cIoAgpMPXrk8JuN_BWqpqnpygQerGyzW2Snm5KfUQ7Y-INhOKG5eybo-thEBodvAhGSFpyXWQxPXS0Auc9aF0vGy2Ea4hrWfL6eeD0bQycBJN8lTLZnuIQx59ZeULyqstlxVBL34dcnQOFQf-5nS76n_X9owe_iNZrV57fmTrGKDogeMocpOJwlz9vnzzCDaL7RjjqCRLcbAxwkyRas3lujR6oZKt9NK1NBb-hb3uc721qSI6SR8SVN6zZGjQE>
[2] 
https://github.com/openstack/vitrage/tree/master/vitrage/datasources/zabbix<https://url10.mailanyone.net/v1/?m=1ejTL3-0003ZV-4n&i=57e1b682&c=uGgIuECLH17WmCqispfyornk-y9i4E2eyyvxC5fH2sepif7vNt0e_Op9ifHIcOuZLWy4fzJMsbItzfWpk5qNeYW2O3iEr5sPuXnguxKSRm6yrD12oGtjjJibDR7oVJnkQSNtu5caCM1BoguJiXBL7WisodfHGVdbYJDe2W2m11dc3ZmARXYI1FlmVWOPQiAGlzNtUgcQ_wpYwHtTJJaur8wiS415nr2oRHwU4C9hawW9HWktVVEH877WI_P1xf3VI1PjGVf75imEW-bHo3lAtCIAv4hWKcrxtHdL48oP7kQ>
[3] 
https://docs.openstack.org/vitrage/latest/contributor/add-new-datasource.html<https://url10.mailanyone.net/v1/?m=1ejTL3-0003ZV-4n&i=57e1b682&c=A08vm8gwOUlRCFuV_ZDNRKrFdo7lGQmqtrZE-ZXEB6yLzcanUHFW1Aue5PnhXvrALgd0apyK5SAU9-PPc5Pqi5uod_I2JAHONug3ILQ9e3RvoKWyoYcuehJzRa3bqH3g_r5GQnKIRRNnYccSg6T4wkA-Wl6PHZ7KXq7cYp9zY7Fhz2jCK_zTUNBGJvLR2W_bqwPdTe2iyetPXPa0N_JrF38KrkUOVppDYgfi4_onM9N6QUUEECArxlYPl-T3xDM5cMSrTf9iE38OJrg_nKG8Fkwr7rAV5L8tAEZ5vGMDQxc>


Best Regards,
Ifat.


From: Paul Vaduva <paul.vad...@enea.com<mailto:paul.vad...@enea.com>>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" 
<openstack-dev@lists.openstack.org<mailto:openstack-dev@lists.openstack.org>>
Date: Wednesday, 7 February 2018 at 15:50
To: "OpenStack Development Mailing List (not for usage questions)" 
<openstack-dev@lists.openstack.org<mailto:openstack-dev@lists.openstack.org>>
Cc: Ciprian Barbu <ciprian.ba...@enea.com<mailto:ciprian.ba...@enea.com>>
Subject: Re: [openstack-dev] [vitrage] Vitrage alarm processing behavior

Hi Ifat,

Yes I’ve checked the 1.3.1 refers to a deb package (python-vitrage) version 
built by us, so the git tag used to build that deb is 1.3.0.
But I also backported doctor datasource from vitreage git master branch.

I also noticed that when I configure snapshots_interval=10 I also get this 
exception in
/var/log/vitrage/graph.log around the time the alarms disapear.
https://hastebin.com/ukisajojef.sql<https://url10.mailanyone.net/v1/?m=1ejTL3-0003ZV-4n&i=57e1b682&c=dIFoa_mHWzOpmJ9KV346afu6D9E3lEuyvUD6vwgvXW-hvbG45rR_s7mUjXnZgBFfnmwyP_2yo8TbtBKzX2-NatWbW9ZEbu-UWM9KzGIZ_t9Gd3XlOHgTkzVFIp7EKiMUPgii_AeCSLmrEla5h92sjdmi1Ki6H8V3qOQJ962FXtp5IUPKhIMtDvv8gJSMUeHWOXbhuK21K9PfeHmcf-1-Zpy7sWFV2FP9qVAn5jO9Wm0>

I've cherry picked your before mentioned change and the alarm that came from 
event is now persistent and the exception is gone.
So it was a bug.
I understand that for doctor datasources I need to have events for raising the 
alarm and also for clearing it is that correct?


Best Regards,
Paul

From: Afek, Ifat (Nokia - IL/Kfar Sava) [mailto:ifat.a...@nokia.com]
Sent: Wednesday, February 7, 2018 1:24 PM
To: OpenStack Development Mailing List (not for usage questions) 
<openstack-dev@lists.openstack.org<mailto:openstack-dev@lists.openstack.org>>
Subject: Re: [openstack-dev] [vitrage] Vitrage alarm processing behavior

Hi Paul,

It sounds like a bug. Alarms created by a datasource are not supposed to be 
deleted later on. It might be a bug that was fixed in Queens [1].

I’m not sure which Vitrage version you are actually using. I failed to find a 
vitrage version 1.3.1. Could it be that you are referring to a version of 
python-vitrageclient or vitrage-dashboard?

In any case, if you are using an older version, I suggest that you try to use 
the fix that I mentioned [1] and see if it helps.


[1] 
https://review.openstack.org/#/c/524228<https://url10.mailanyone.net/v1/?m=1ejNt4-0001fR-4I&i=57e1b682&c=LqJB68i5VuuaUnZ6iOIMHVhcsHMatfhcTwtLpAT-Rn5UZ3qnX4tq4XOTjYR1XqQIDRQGrqGMwZI31cnT-bEHTFX95wRD-iENXse8JBDHIyv8iJUD7RiwDp74HqNHBFZ-BybLQgQ6-sVcf62n2ogMk31b-Sp0xUJZXxH_0q2Iu-4Hodt4gxhKuFMTT2breh42c7OT5kdHzPJThKClzSEBQ2NWkNTCy112gxlapjMCVxSNQ9nsLg4f0XyJaAVUnAHO>


Best Regards,
Ifat.


From: Paul Vaduva <paul.vad...@enea.com<mailto:paul.vad...@enea.com>>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" 
<openstack-dev@lists.openstack.org<mailto:openstack-dev@lists.openstack.org>>
Date: Wednesday, 7 February 2018 at 11:58
To: 
"openstack-dev@lists.openstack.org<mailto:openstack-dev@lists.openstack.org>" 
<openstack-dev@lists.openstack.org<mailto:openstack-dev@lists.openstack.org>>
Subject: [openstack-dev] [vitrage] Vitrage alarm processing behavior

Hi Vitrage developers,

I have a question about vitrage innerworkings, I ported doctor datasource from 
master branch to an earlier version of vitrage (1.3.1).
I noticed some behavior I am wondering if it's ok or it is bug of some sort.
Here it is:
1. I am sending some event for rasing an alarm to doctor datasource of vitrage.
2. I am receiving the event hence the alarm is displayed on vitrage dashboard 
attached to the affected resource (as expected)
3. If I have configured snapshot_interval=10 in /etc/vitrage/vitrage.conf The 
alarm disapears after a while
fragment from /etc/vitrage/vitrage.conf
***************
[datasources]
types = 
nova.host,nova.instance,nova.zone,cinder.volume,neutron.network,neutron.port,doctor
snapshots_interval=10
***************
On the other hand if I comment it out the alarm persists
**************
[datasources]
types = 
nova.host,nova.instance,nova.zone,cinder.volume,neutron.network,neutron.port,doctor
#snapshots_interval=10
**************

I am interested if this behavior is correct or is this a bug.
My intention is to create some sort of hybrid datasource starting from the 
doctor one, that receives events for raising alarms like compute.host.down
but uses polling to clear them.

Best Regards,
Paul Vaduva

doctor_vitrage.py
Description: doctor_vitrage.py

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [vitrage] Vitrage alarm processing behavior

Reply via email to