Re: [openstack-dev] [Vitrage] About alarms reported by datasource and the alarms generated by vitrage evaluator

Afek, Ifat (Nokia - IL) Wed, 11 Jan 2017 00:07:46 -0800

You are right. But as I see it, the case of Vitrage suspect vs. the real Nagios 
alarm is just one example of the more general case of two monitors reporting 
the same alarm.
Don’t you think so?

From: Yujun Zhang <[email protected]>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" 
<[email protected]>
Date: Wednesday, 11 January 2017 at 09:46
To: "OpenStack Development Mailing List (not for usage questions)" 
<[email protected]>, "[email protected]" <[email protected]>
Cc: "[email protected]" <[email protected]>, "[email protected]" 
<[email protected]>, "[email protected]" <[email protected]>, 
"[email protected]" <[email protected]>, "[email protected]" 
<[email protected]>
Subject: Re: [openstack-dev] [Vitrage] About alarms reported by datasource and 
the alarms generated by vitrage evaluator

Hi, Ifat

If I understand it correctly, your concerns are mainly on same alarm from 
different monitor, but not "suspect" status as discussed in another thread.

On Tue, Jan 10, 2017 at 10:21 PM Afek, Ifat (Nokia - IL) 
<[email protected]<mailto:[email protected]>> wrote:
Hi Yinliyin,

At first I thought that changing the deduced to be a property on the alarm 
might help in solving your use case. But now I think most of the problems will 
remain the same:

·  It won’t solve the general problem of two different monitors that raise the 
same alarm
·  It won’t solve possible conflicts of timestamp and severity between 
different monitors
·  It will make the decision of when to delete the alarm more complex (delete 
it when the deduced alarm is deleted? When Nagios alarm is deleted? both? And 
how to change the timestamp and severity in these cases?)

So I don’t think that making this change is beneficial.
What do you think?

Best Regards,
Ifat.

From: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Monday, 9 January 2017 at 05:29
To: "Afek, Ifat (Nokia - IL)" <[email protected]<mailto:[email protected]>>
Cc: 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [openstack-dev] [Vitrage] About alarms reported by datasource and 
the alarms generated by vitrage evaluator

Hi Ifat,

         I think there is a situation that all the alarms are reported by the 
monitored system. We use vitrage to:

            1.  Found the relationships of the alarms, and find the root cause.

            2.  Deduce the alarm before it really occured. This comprise two 
aspects:

                 1) A cause B:  When A occured,  we deduce that B would occur

                 2) B is caused by A:  When B occured, we deduce that A must 
occured

            In "2",   we do expect vitrage to raise the alarm before the alarm 
is reported because the alarm would be lost or be delayed for some reason.  So 
we would write "raise alarm" actions in the scenarios of the template.  I think 
that the alarm is reported or is deduced should be a state property of the 
alarm. The vertex reported and the vertex deduced of the same alarm should be 
merged to one vertex.

     Best Regards,

     Yinliyin.

原始邮件
发件人： ＜[email protected]<mailto:[email protected]>＞;
收件人： 
＜[email protected]<mailto:[email protected]>＞;
抄送人：韩静00006838;王维雅00042110;章宇军10200531;贾培源10101785;龚亚辉6092001895<tel:(609)%20200-1895>;
日 期 ：2017年01月07日 02:18
主 题 ：Re: [openstack-dev] [Vitrage] About alarms reported by datasource and the 
alarms generated by vitrage evaluator

Hi YinLiYin,

This is an interesting question. Let me divide my answer to two parts.

First, the case that you described with Nagios and Vitrage. This problem 
depends on the specific Nagios tests that you configure in your system, as well 
as on the Vitrage templates that  you use. For example, you can use 
Nagios/Zabbix to monitor the physical layer, and Vitrage to raise deduced 
alarms on the virtual and application layers. This way you will never have 
duplicated alarms. If you want to use Nagios to monitor the other layers  as 
well, you can simply modify Vitrage templates so they don’t raise the deduced 
alarms that Nagios may generate, and use the templates to show RCA between 
different Nagios alarms.

Now let’s talk about the more general case. Vitrage can receive alarms from 
different monitors, including Nagios, Zabbix, collectd and Aodh. If you are 
using more than one monitor, it is  possible that the same alarm (maybe with a 
different name) will be raised twice. We need to create a mechanism to identify 
such cases and create a single alarm with the properties of both monitors. This 
has not been designed in details yet, so if you have  any suggestion we will be 
happy to hear them.

Best Regards,
Ifat.

From: "[email protected]<mailto:[email protected]>" 
＜[email protected]<mailto:[email protected]>＞
Reply-To: "OpenStack Development Mailing List (not for usage questions)" 
＜[email protected]<mailto:[email protected]>＞
Date: Friday, 6 January 2017 at 03:27
To: 
"[email protected]<mailto:[email protected]>" 
＜[email protected]<mailto:[email protected]>＞
Cc: "[email protected]<mailto:[email protected]>" 
＜[email protected]<mailto:[email protected]>＞, 
"[email protected]<mailto:[email protected]>" 
＜[email protected]<mailto:[email protected]>＞, 
"[email protected]<mailto:[email protected]>" 
＜[email protected]<mailto:[email protected]>＞, 
"[email protected]<mailto:[email protected]>" 
＜[email protected]<mailto:[email protected]>＞, 
"[email protected]<mailto:[email protected]>" 
＜[email protected]<mailto:[email protected]>＞
Subject: [openstack-dev] [Vitrage] About alarms reported by datasource and the 
alarms generated by vitrage evaluator

Hi all,

   Vitrage generate alarms acording to the templates. All the alarms raised by 
vitrage has the type "vitrage". Suppose Nagios has an alarm A. Alarm A is 
raised by vitrage evaluator according to the action part of a scenario, type  
of alarm A is "vitrage". If Nagios reported alarm A latter, a new alarm A with 
type "Nagios" would be generator in the entity graph.     There would be two 
vertices for the same alarm in the graph. And we have to define two alarm 
entities, two relationships,  two scenarios in the template file to make the 
alarm propagation procedure work.

   It is inconvenient to describe fault model of system with lot of alarms. How 
to solve this problem?

殷力殷 YinLiYin

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Vitrage] About alarms reported by datasource and the alarms generated by vitrage evaluator

Reply via email to