[ovirt-users] New Feature: engine NIC health check

2014-10-08 Thread Martin Mucha
Hi,

here's link for new feature, related to monitoring engine's NIC, trying to 
detect failure on engine itself and it that case block fencing.
http://www.ovirt.org/Features/engine_NIC_health_check

thanks for every input, namely for one addressing some of opened issues.

M.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] New Feature: engine NIC health check

2014-10-10 Thread Yair Zaslavsky


- Original Message -
> From: "Martin Mucha" 
> To: engine-de...@ovirt.org, users@ovirt.org
> Sent: Wednesday, October 8, 2014 2:33:06 PM
> Subject: [ovirt-users] New Feature: engine NIC health check
> 
> Hi,
> 
> here's link for new feature, related to monitoring engine's NIC, trying to
> detect failure on engine itself and it that case block fencing.
> http://www.ovirt.org/Features/engine_NIC_health_check
> 
> thanks for every input, namely for one addressing some of opened issues.
> 
> M.

I was curious  on how you perform the health check, so I read the feature page 
- good to learn more Java :)
Regarding open issues -
a. Yes, IMHO the scanning interval should be configured via engine-config - do 
you see a reason why not to do that? Maybe we should set a minimal interval 
value and enforSce it?
b. Same for the "no faiures since.." interval
c. I dont like the name of the table you're suggesting. Please consider an 
alternative. Also you may want to consider having a view that returns you the 
"static infomration" of the nic + the "stats" part (dynamic part? maybe just 
nic_state ? ) Why would u like to purge old data and not just hold a record per 
nic and update per each interval? in this case, no purging is required.
Maybe for DWH you will want some info on the history of the status of the 
nics... but I'm not sure if this is relevant for now.
d. If you go with my view suggestion, you  might consider displaying the 
"state" at REST-API

Yair

> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
> 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] New Feature: engine NIC health check

2014-10-13 Thread Martin Mucha
ad a)
ab b) good, I will make it configurable via engine-config.

ad c) ok, we probably should think about what should be in that table. I've 
proposed, that only failures should be recorded. Since there's currently no use 
for other values/NIC states, and NICs will be probably functional most of the 
time. This will minimize the need to write anything to db, and therefore name 
"EngineNicFailures". I also wanted to store information in format: "This NIC, 
was not up during check at '.MM.dd HH:mm:ss '". Which gives more 
information than what you're talking about. But there's no need for this data, 
so yes currently it'd be possible to store NIC status along with last failure 
timestamp. So the table could look like:

CREATE TABLE nics_health ( id UUID primary key, name VARCHAR(255), is_healthy 
BOOL, last_failure TIMESTAMP);

this way, we have to update record during each health check, while in 
previously proposed [by me] approach only when there's failure, and 'purge' of 
obsolete data can occur only when there's fence request and/or on some time 
period. What you're proposing provides less information (which is fine, since 
we don't need them), it's simpler, but it generates some unnecessary load for 
db. It's not big deal, but it's not optimal/necessary, since state, when nic is 
not up is not probable(if I understand it correctly).

Sorry, have no idea what DWH stands for.

ad d) I have no problem extending this feature with providing data over rest, 
but others should probably agree with that. I don't think any client(external 
system) need this information, and I also don't think engine nic health is any 
clients business in first place.

——
I will update feature page asap.
M.


- Original Message -
From: "Yair Zaslavsky" 
To: "Martin Mucha" 
Cc: engine-de...@ovirt.org, users@ovirt.org
Sent: Friday, October 10, 2014 3:14:37 PM
Subject: Re: [ovirt-users] New Feature: engine NIC health check



- Original Message -
> From: "Martin Mucha" 
> To: engine-de...@ovirt.org, users@ovirt.org
> Sent: Wednesday, October 8, 2014 2:33:06 PM
> Subject: [ovirt-users] New Feature: engine NIC health check
> 
> Hi,
> 
> here's link for new feature, related to monitoring engine's NIC, trying to
> detect failure on engine itself and it that case block fencing.
> http://www.ovirt.org/Features/engine_NIC_health_check
> 
> thanks for every input, namely for one addressing some of opened issues.
> 
> M.

I was curious  on how you perform the health check, so I read the feature page 
- good to learn more Java :)
Regarding open issues -
a. Yes, IMHO the scanning interval should be configured via engine-config - do 
you see a reason why not to do that? Maybe we should set a minimal interval 
value and enforSce it?
b. Same for the "no faiures since.." interval
c. I dont like the name of the table you're suggesting. Please consider an 
alternative. Also you may want to consider having a view that returns you the 
"static infomration" of the nic + the "stats" part (dynamic part? maybe just 
nic_state ? ) Why would u like to purge old data and not just hold a record per 
nic and update per each interval? in this case, no purging is required.
Maybe for DWH you will want some info on the history of the status of the 
nics... but I'm not sure if this is relevant for now.
d. If you go with my view suggestion, you  might consider displaying the 
"state" at REST-API

Yair

> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] New Feature: engine NIC health check

2014-10-13 Thread Oved Ourfali
Some comments:
1. IMO all timeouts should be configurable.
2. The effect it has on fencing will be part of the Cluster's fencing policy, 
so a new configuration item should be added there as well (can be scoped as 
part of a different feature, but in that case you don't need to use the 
information at all as part of the scope of your work, just make the information 
available).
3. I understand that the scope is EngineNics on a system level rather than 
configure them on a DC/Cluster scope. I guess there are use-cases in which you 
use different "legs" for different DCs, as some might be local, some remote, 
and etc... So consider doing the configuration in a DC level.
4. You wrote "engine-setup" instead of "engine-config" a few times throughout 
the wiki page.

Thanks,
Oved

- Original Message -
> From: "Martin Mucha" 
> To: engine-de...@ovirt.org, users@ovirt.org
> Sent: Wednesday, October 8, 2014 2:33:06 PM
> Subject: [ovirt-users] New Feature: engine NIC health check
> 
> Hi,
> 
> here's link for new feature, related to monitoring engine's NIC, trying to
> detect failure on engine itself and it that case block fencing.
> http://www.ovirt.org/Features/engine_NIC_health_check
> 
> thanks for every input, namely for one addressing some of opened issues.
> 
> M.
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
> 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users