Re: [openstack-dev] [Vitrage] About alarms reported by datasource and the alarms generated by vitrage evaluator

2017-01-07 Thread Afek, Ifat (Nokia - IL)
Hi Yujun,

Thanks for the explanation, but I still don’t fully understand.

Let me start with the current state:
1.   introduce a flexible `metadata` dict in to ALARM entity
[Ifat] Already exists. An alarm is represented as a vertex in the entity graph, 
with a dictionary of properties.
2.   Allow generating update event[1] on metadata change
3.   Allow using ALARM metadata in scenario condition
[Ifat] Already exists. You can define properties in the ‘entities’ section in 
Vitrage templates
4.   Allow setting ALARM metadata in scenario action

If I understand correctly, you are suggesting that one scenario will add 
metadata to an existing alarm, which will trigger an event, and as a result 
another scenario might be executed?
Can you describe a use case where this behavior will help calculating the root 
cause?

Thanks,
Ifat.


From: Yujun Zhang 
Reply-To: "OpenStack Development Mailing List (not for usage questions)" 

Date: Saturday, 7 January 2017 at 09:27
To: "OpenStack Development Mailing List (not for usage questions)" 

Cc: "han.jin...@zte.com.cn" , "wang.we...@zte.com.cn" 
, "gong.yah...@zte.com.cn" , 
"jia.peiy...@zte.com.cn" , "zhang.yuj...@zte.com.cn" 

Subject: Re: [openstack-dev] [Vitrage] About alarms reported by datasource and 
the alarms generated by vitrage evaluator

The two questions raised by YinLiYin is actually one, i.e. how to enrich the 
alarm properties that can be used as an condition in root cause deducing.

Both 'suspect' or 'datasource' are additional information that may be referred 
as a condition in general fault model, a.k.a. scenario in vitrage.

It seems it could be done by

  1.  introduce a flexible `metadata` dict in to ALARM entity
2.  Allow generating update event[1] on metadata change
3.  Allow using ALARM metadata in scenario condition
4.  Allow setting ALARM metadata in scenario action
This will leave the flexibility to continuous development by defining a complex 
scenario template and keep the vitrage evaluator simple and generic.

My two cents.

[1]: 
http://docs.openstack.org/developer/vitrage/scenario-evaluator.html#concepts-and-guidelines

On Sat, Jan 7, 2017 at 2:23 AM Afek, Ifat (Nokia - IL) 
mailto:ifat.a...@nokia.com>> wrote:
Hi YinLiYin,

This is an interesting question. Let me divide my answer to two parts.

First, the case that you described with Nagios and Vitrage. This problem 
depends on the specific Nagios tests that you configure in your system, as well 
as on the Vitrage templates that you use. For example, you can use 
Nagios/Zabbix to monitor the physical layer, and Vitrage to raise deduced 
alarms on the virtual and application layers. This way you will never have 
duplicated alarms. If you want to use Nagios to monitor the other layers as 
well, you can simply modify Vitrage templates so they don’t raise the deduced 
alarms that Nagios may generate, and use the templates to show RCA between 
different Nagios alarms.

Now let’s talk about the more general case. Vitrage can receive alarms from 
different monitors, including Nagios, Zabbix, collectd and Aodh. If you are 
using more than one monitor, it is possible that the same alarm (maybe with a 
different name) will be raised twice. We need to create a mechanism to identify 
such cases and create a single alarm with the properties of both monitors. This 
has not been designed in details yet, so if you have any suggestion we will be 
happy to hear them.

Best Regards,
Ifat.


From: "yinli...@zte.com.cn" 
mailto:yinli...@zte.com.cn>>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" 
mailto:openstack-dev@lists.openstack.org>>
Date: Friday, 6 January 2017 at 03:27
To: 
"openstack-dev@lists.openstack.org" 
mailto:openstack-dev@lists.openstack.org>>
Cc: "gong.yah...@zte.com.cn" 
mailto:gong.yah...@zte.com.cn>>, 
"han.jin...@zte.com.cn" 
mailto:han.jin...@zte.com.cn>>, 
"wang.we...@zte.com.cn" 
mailto:wang.we...@zte.com.cn>>, 
"jia.peiy...@zte.com.cn" 
mailto:jia.peiy...@zte.com.cn>>, 
"zhang.yuj...@zte.com.cn" 
mailto:zhang.yuj...@zte.com.cn>>
Subject: [openstack-dev] [Vitrage] About alarms reported by datasource and the 
alarms generated by vitrage evaluator


Hi all,

   Vitrage generate alarms acording to the templates. All the alarms raised by 
vitrage has the type "vitrage". Suppose Nagios has an alarm A. Alarm A is 
raised by vitrage evaluator according to the action part of a scenario, type of 
alarm A is "vitrage". If Nagios reported alarm A latter, a new alarm A with 
type "Nagios" would be generator in the entity graph. There would be two 
vertices for the same alarm in the graph. And we have to define two alarm 
entities, two relationships, two scenarios in the template file to make the 
alarm propagation proce

Re: [openstack-dev] [ALU] Re: [ALU] Re: [ALU] Re: [ALU] [vitrage] how touseplaceholder vertex

2017-01-07 Thread Weyl, Alexey (Nokia - IL)
Hi Yujun,

A relationship is defined by the following trio: source id, target id and a 
label on the edge.
In the processor, I add only edges that doesn't exists.

Alexey

From: Yujun Zhang [mailto:zhangyujun+...@gmail.com] 
Sent: Thursday, January 05, 2017 4:32 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: [ALU] Re: [openstack-dev] [ALU] Re: [ALU] Re: [ALU] [vitrage] how 
touseplaceholder vertex

A follow up question on relationships.
On Thu, Jan 5, 2017 at 9:59 PM Weyl, Alexey (Nokia - IL) 
 wrote:
Hi Yujun,

Lets see.

1. There is no need for the transformer to handle this duplication. What will 
happen at the moment is that we will receive twice every neighbor, and it is 
fine by us, because it is a quite small datasource, and 99.999% of the time it 
won't be changed.

It's fine for neighbor because vertex can be identified by id and there won't 
be duplication. 

But what about relationship, how do we model redundant links between two 
entities? There seems to be no id for relationships.

2. It should be 2 events. We want to make it as simple as possible, and in the 
same time as flexible as possible. So you should create 2 events and each one 
will have the neighbor connection.

Hope it answers everything.

BR,
Alexey



From: Yujun Zhang [mailto:zhangyujun+...@gmail.com]
Sent: Thursday, January 05, 2017 2:32 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: [ALU] Re: [openstack-dev] [ALU] Re: [ALU] [vitrage] how to 
useplaceholder vertex

Alexey,

I have to dig this old thread to clarify some issues I met during static 
datasource implementation. Hope that you can still recall the context :-)

I'll try to simplify this question with an example. The following configuration 
are snippet from static datasource

1. suppose we have three switches linked in a ring. What would be the expected 
entity events emit by the driver?

In my proposed driver, there will be three entities. And each relationship will 
appear both in source entity and target entity, e.g. s1->s2 will be included in 
both s1 and s2. Should the transformer handle this duplication or the graph 
utils will?
entities:
  - config_id: s1
    type: switch
    name: switch-1
    id: 12345
    state: available
  - config_id: s2
    type: switch
    name: switch-2
    id: 23456
    state: available
  - config_id: s3
    type: switch
    name: switch-3
    id: 34567
    state: available
relationships:
  - source: s1
    target: s2
    relation_type: linked
  - source: s2
    target: s3
    relation_type: linked
  - source: s3
    target: s1
    relation_type: linked
2. suppose we created a link between switch and nova.host. What will be the 
expected entity events? Should it be one entity event of s1 with h1 embedded as 
neighbor? Or two entity events, s1 and h1?
entities:
  - config_id: s1
    type: switch
    name: switch-1
    id: 12345
    state: available
  - config_id: h1
    type: nova.host
    id: 1
relationships:
  - source: s1
    target: h1
    relation_type: attached

On Wed, Dec 14, 2016 at 11:54 PM Weyl, Alexey (Nokia - IL) 
 wrote:
1. That is correct.

2. That is not quite correct.
In static we only define the main properties of each entity, meaning type, id, 
category and thus it is ok that for each main entity we will create its 
neighbors and connect between them. There is no need for any distinguish due to 
that.


From: Yujun Zhang [mailto:zhangyujun+...@gmail.com]
Sent: Wednesday, December 14, 2016 5:00 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: [ALU] Re: [openstack-dev] [ALU] [vitrage] how to use placeholder vertex

Hi, Alexey,

Thanks for the detail example. It explains the existing mechanism of vertex 
creation well.

So it looks like each resource type will have a primary datasource, e.g. 
nova.host for nova.host, nova.intance for nova.instance, that holds full 
details. Is that correct?

Not sure that you remember the long discussion in static driver review[1] or 
not. At last, we agreed on a unified entity definition for both `nova.host` and 
`switch`, no extra key to indicate it is "external" (should create a 
placeholder).

If I understand it correctly, no placeholder will be created in this case. 
Because we can not distinguish them from the static configuration. And the 
properties of `nova.host` resource shall to be merged from `static` and 
nova.host` datasources. Is that so?

[1]: https://review.openstack.org/#/c/405354/  

On Wed, Dec 14, 2016 at 5:40 PM Weyl, Alexey (Nokia - IL) 
 wrote:
Hi Yujun,
 
This is a good question, and let me explain for you how it works.
Lets say we are supposed to get 2 entities from nova, nova.host called host1 
and nova.instance called vm1 and vm1 is supposed to be connected to host1.
The nova.host driver and nova.instance driver are working simultaneously and 
thus we don’t know the order in which those events will arrive.
We have 2 use cases:
1.   Host1 event arrives before vm1.
In this

Re: [openstack-dev] [nova] Feature patches that need final +2

2017-01-07 Thread Matt Riedemann

On 1/7/2017 12:30 PM, Matt Riedemann wrote:

I've gone through the feature tracking etherpad again [1] and here are
some patches that just need a final +2 to either close out the blueprint
or get it close to being closed:

1. https://blueprints.launchpad.net/nova/+spec/add-os-xenapi-library

Starts here: https://review.openstack.org/#/c/406059/

2. https://blueprints.launchpad.net/nova/+spec/flavor-notifications

Starts here: https://review.openstack.org/#/c/415776/

3. https://blueprints.launchpad.net/nova/+spec/hyper-v-vnuma-enable

Single patch: https://review.openstack.org/#/c/282407/

4. https://blueprints.launchpad.net/nova/+spec/hyper-v-ovs-vif

Single patch: https://review.openstack.org/#/c/140045/

--

It would be nice to get those reviewed and merged before the feature
review sprint on Wednesday 1/11 so we can focus on other things that
haven't had as much review time yet.

[1] https://etherpad.openstack.org/p/nova-ocata-feature-freeze



Oops, forgot to tag this with nova.

--

Thanks,

Matt Riedemann


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Performance] PTG?

2017-01-07 Thread Joe Talerico
Hey Andrey - Is there a shared etherpad for the Rally/Performance days?

Thanks,
Joe

On Wed, Jan 4, 2017 at 11:01 AM, Andrey Kurilin  wrote:
> Hi, Joe!
>
> It is not a mistake. After a talk with Dina B., we decided to extend Rally
> session for the wider
> audience and I requested "Rally & Performance team" session.
>
> On Wed, Jan 4, 2017 at 5:29 PM, Joe Talerico  wrote:
>>
>> When I signed up to attend the PTG, Performance was not listed as a
>> option, however on the website it clearly shows Performance is
>> Monday-Tuesday.
>>
>> Is this just a mistake in the event website?
>>
>> Joe
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
>
> --
> Best regards,
> Andrey Kurilin.
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] Feature patches that need final +2

2017-01-07 Thread Matt Riedemann
I've gone through the feature tracking etherpad again [1] and here are 
some patches that just need a final +2 to either close out the blueprint 
or get it close to being closed:


1. https://blueprints.launchpad.net/nova/+spec/add-os-xenapi-library

Starts here: https://review.openstack.org/#/c/406059/

2. https://blueprints.launchpad.net/nova/+spec/flavor-notifications

Starts here: https://review.openstack.org/#/c/415776/

3. https://blueprints.launchpad.net/nova/+spec/hyper-v-vnuma-enable

Single patch: https://review.openstack.org/#/c/282407/

4. https://blueprints.launchpad.net/nova/+spec/hyper-v-ovs-vif

Single patch: https://review.openstack.org/#/c/140045/

--

It would be nice to get those reviewed and merged before the feature 
review sprint on Wednesday 1/11 so we can focus on other things that 
haven't had as much review time yet.


[1] https://etherpad.openstack.org/p/nova-ocata-feature-freeze

--

Thanks,

Matt Riedemann


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Automatically disabling compute service on RBD EMFILE failures

2017-01-07 Thread Matt Riedemann
A few weeks ago someone in the operators channel was talking about 
issues with ceph-backed nova-compute and OSErrors for too many open 
files causing issues.


We have a bug reported that's very similar sounding:

https://bugs.launchpad.net/nova/+bug/1651526

During the periodic update_available_resource audit, the call to RBD to 
get disk usage fails with the EMFILE OSError. Since this is in a 
periodic it doesn't cause any direct operations to fail, but it will 
cause issues with scheduling as that host is really down, however, 
nothing sets the service to down (disabled).


I had proposed a solution in the bug report that we could automatically 
disable the service for that host when this happens, and then 
automatically enable the service again if/when the next periodic task 
run is successful. Disabling the service would take that host out of 
contention for scheduling and may also trigger an alarm for the operator 
to investigate the failure (although if there are EMFILE errors from the 
ceph cluster I'm guessing alarms should already be going off).


Anyway, I wanted to see how hacky of an idea this is. We already 
automatically enable/disable the service from the libvirt driver when 
the connection to libvirt itself drops via an event callback. This would 
be similar albeit less sophisticated as it's not using an event 
listening mechanism, we'd have to maintain some local state in memory to 
know if we need to enable/disable the service again. And it seems pretty 
hacky/one-offish to handle this just for the RBD failure, but maybe we 
just generically handle it for any EMFILE error when collecting disk 
usage in the resource audit?


--

Thanks,

Matt Riedemann


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [release] subscribe to the OpenStack release calendar

2017-01-07 Thread ChangBo Guo
Awesome!

2017-01-07 4:19 GMT+08:00 Doug Hellmann :

>
> > On Jan 6, 2017, at 1:14 PM, Julien Danjou  wrote:
> >
> > On Fri, Jan 06 2017, Doug Hellmann wrote:
> >
> > Hi Doug,
> >
> >> The link for the Ocata schedule is
> >> https://releases.openstack.org/ocata/schedule.ics
> >>
> >> We will have a similar Pike calendar available as soon as the
> >> schedule is finalized.
> >
> > Thank you, this is great. One question: could it be possible to have
> > only one ICS for all releases? Maybe having one per release plus a
> > "all.ics"?
> >
> > I'm lazy I don't want to track and add each calendar every 6 months. :-)
> >
> > --
> > Julien Danjou
> > ;; Free Software hacker
> > ;; https://julien.danjou.info
>
> See https://review.openstack.org/417495
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>


-- 
ChangBo Guo(gcb)
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat][tripleo] Heat memory usage in the TripleO gate during Ocata

2017-01-07 Thread Emilien Macchi
On Fri, Jan 6, 2017 at 5:41 PM, Zane Bitter  wrote:
> On 06/01/17 16:58, Emilien Macchi wrote:
>>
>> On Fri, Jan 6, 2017 at 4:35 PM, Thomas Herve  wrote:
>>>
>>> On Fri, Jan 6, 2017 at 6:12 PM, Zane Bitter  wrote:

 It's worth reiterating that TripleO still disables convergence in the
 undercloud, so these are all tests of the legacy code path. It would be
 great if we could set up a non-voting job on t-h-t with convergence
 enabled
 and start tracking memory use over time there too. As a first step,
 maybe we
 could at least add an experimental job on Heat to give us a baseline?
>>>
>>>
>>> +1. We haven't made any huge changes into that direction, but having
>>> some info would be great.
>>
>>
>> +1 too. I volunteer to do it.
>>
>> Quick question: to enable it, is it just a matter of setting
>> convergence_engine to true in heat.conf (on the undercloud)?
>
>
> Yep! Actually, it's even simpler than that: now that true is the default
> (Newton onwards), it's just a matter of _not_ setting it to false :)

done: https://review.openstack.org/#/q/topic:tripleo/heat/convergence

> - ZB
>
>> If not, what else if needed?
>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev