CS Manager down: all hypervisors reboot

2015-08-19 Thread Frank Louwers
Hi all,

We had an interesting outage this morning. We took the Cloudstack Manager node 
down for hardware upgrades and kernel updates, and it seems all “non-dedicated” 
hosts rebooted.

We run KVM on CS 4.4.latest.

Is this “normal behaviour”, why does it do that, and how do I disable that?

The Manager is also a primary storage provider (NFS export), but all VMs use 
local storage (except 1).


Regards,

Frank

RE: CS Manager down: all hypervisors reboot

2015-08-19 Thread Somesh Naidu
Management server down would not result in host being rebooted. Primary storage 
down will.

As you mentioned, you have hosted your primary storage (NFS) on the management 
server node. So yes, taking it down will cause all host connected to it to 
reboot. It doesn’t matter how many VMs use that particular storage.

I am sure if there is a better way of doing this but you could modify 
kvmheartbeat.sh to disable reboot on loosing primary storage connection.

Regards,
Somesh

-Original Message-
From: Frank Louwers [mailto:fr...@openminds.be] 
Sent: Wednesday, August 19, 2015 12:19 PM
To: users@cloudstack.apache.org
Subject: CS Manager down: all hypervisors reboot

Hi all,

We had an interesting outage this morning. We took the Cloudstack Manager node 
down for hardware upgrades and kernel updates, and it seems all “non-dedicated” 
hosts rebooted.

We run KVM on CS 4.4.latest.

Is this “normal behaviour”, why does it do that, and how do I disable that?

The Manager is also a primary storage provider (NFS export), but all VMs use 
local storage (except 1).


Regards,

Frank


RE: CS Manager down: all hypervisors reboot

2015-08-19 Thread Frank Louwers
So migrating my primary storage to iSCSI would (as a side effect) disable the 
fencing/rebooting? 


On 19 Aug 2015 at 21:46:35, Somesh Naidu (somesh.na...@citrix.com) wrote:

 how would this work if primary storage were eg iSCSI? 
I believe we perform the heartbeat check and host fencing for NFS storage only. 

 Is there no way to disable that, except for modifying kvmheartbeat.sh? 
AFAIK, there isn't. 

RE: CS Manager down: all hypervisors reboot

2015-08-19 Thread Frank Louwers
On 19 Aug 2015 at 20:44:47, Somesh Naidu (somesh.na...@citrix.com) wrote:
Management server down would not result in host being rebooted. Primary storage 
down will. 

As you mentioned, you have hosted your primary storage (NFS) on the management 
server node. So yes, taking it down will cause all host connected to it to 
reboot. It doesn’t matter how many VMs use that particular storage. 

I am sure if there is a better way of doing this but you could modify 
kvmheartbeat.sh to disable reboot on loosing primary storage connection. 
HI Somesh,

Thanks for the explanation!

Am I right that (after reading your mail and 
http://mail-archives.apache.org/mod_mbox/cloudstack-dev/201508.mbox/%3ccalfpzo5cotx0qz+d_oxezjgytau+fa+mzxg_yqeuzswi_9g...@mail.gmail.com%3e
 by Marcus) that this will happen even if no VMs on the host set to “HA”?

What would then be the procedure to perform maintenance on the (first?) primary 
NFS storage server and how would this work if primary storage were eg iSCSI?

That sill would not explain why the “dedicated” hosts didn’t reboot, but I 
assume I should take a look at kvmheartbeat.sh then. Is there no way to disable 
that, except for modifying kvmheartbeat.sh?



Regards,



Frank



Regards, 
Somesh 

-Original Message- 
From: Frank Louwers [mailto:fr...@openminds.be] 
Sent: Wednesday, August 19, 2015 12:19 PM 
To: users@cloudstack.apache.org 
Subject: CS Manager down: all hypervisors reboot 

Hi all, 

We had an interesting outage this morning. We took the Cloudstack Manager node 
down for hardware upgrades and kernel updates, and it seems all “non-dedicated” 
hosts rebooted. 

We run KVM on CS 4.4.latest. 

Is this “normal behaviour”, why does it do that, and how do I disable that? 

The Manager is also a primary storage provider (NFS export), but all VMs use 
local storage (except 1). 


Regards, 

Frank 


RE: CS Manager down: all hypervisors reboot

2015-08-19 Thread Somesh Naidu
 So migrating my primary storage to iSCSI would (as a side effect) disable the 
 fencing/rebooting?

I can't be 100% sure about that but that's what I got from running through the 
code. I don't have a setup ATM where I could test this to confirm.

Regards,
Somesh


-Original Message-
From: Frank Louwers [mailto:fr...@openminds.be] 
Sent: Wednesday, August 19, 2015 3:55 PM
To: users@cloudstack.apache.org
Subject: RE: CS Manager down: all hypervisors reboot

So migrating my primary storage to iSCSI would (as a side effect) disable the 
fencing/rebooting? 


On 19 Aug 2015 at 21:46:35, Somesh Naidu (somesh.na...@citrix.com) wrote:

 how would this work if primary storage were eg iSCSI? 
I believe we perform the heartbeat check and host fencing for NFS storage only. 

 Is there no way to disable that, except for modifying kvmheartbeat.sh? 
AFAIK, there isn't. 


Re: CS Manager down: all hypervisors reboot

2015-08-19 Thread Prashant s
yes !!
i have seen this before, as Somesh pointed out ..,the easy fix is to
comment out the last like of the heartbeat.sh script.

cloudstack heart beat mechanism  are designed to reboot hypervisors  if the
storage repositories are not available for 120 seconds.
This is the cloudstack design feature - /opt/xensource/bin/xenheartbeat.sh

CHECK OUT The last line of the script … /opt/xensource/bin/xenheartbeat.sh

# for nfs
dirs=$(cat $file | grep sr-mount)
for dir in $dirs
do
mp=`mount | grep $dir`
if [ -n $mp ]; then
hb=$dir/hb-$1
date +%s | dd of=$hb count=100 bs=1 2/dev/null
if [ $? -ne 0 ]; then
/usr/bin/logger -t heartbeat Potential problem with $hb: not reachable
since $(($(date +%s) - $lastdate)) seconds
else
lastdate=$(date +%s)
fi
else
/usr/bin/logger -t heartbeat Potential problem with heartbeat, mount not
found for $dir
lastdate=$(date +%s)
sed -i /${dir##/*/}/d $file
fi
done
done

/usr/bin/logger -t heartbeat Problem with $hb: not reachable for $(($(date
+%s) - $lastdate)) seconds, rebooting system!
reboot -f


i commented out the reboot -f , to avoid reboots ..


thanks
p


On Wed, Aug 19, 2015 at 3:58 PM, Somesh Naidu somesh.na...@citrix.com
wrote:

  So migrating my primary storage to iSCSI would (as a side effect)
 disable the fencing/rebooting?

 I can't be 100% sure about that but that's what I got from running through
 the code. I don't have a setup ATM where I could test this to confirm.

 Regards,
 Somesh


 -Original Message-
 From: Frank Louwers [mailto:fr...@openminds.be]
 Sent: Wednesday, August 19, 2015 3:55 PM
 To: users@cloudstack.apache.org
 Subject: RE: CS Manager down: all hypervisors reboot

 So migrating my primary storage to iSCSI would (as a side effect) disable
 the fencing/rebooting?


 On 19 Aug 2015 at 21:46:35, Somesh Naidu (somesh.na...@citrix.com) wrote:

  how would this work if primary storage were eg iSCSI?
 I believe we perform the heartbeat check and host fencing for NFS storage
 only.

  Is there no way to disable that, except for modifying kvmheartbeat.sh?
 AFAIK, there isn't.