CS Manager down: all hypervisors reboot
Hi all, We had an interesting outage this morning. We took the Cloudstack Manager node down for hardware upgrades and kernel updates, and it seems all “non-dedicated” hosts rebooted. We run KVM on CS 4.4.latest. Is this “normal behaviour”, why does it do that, and how do I disable that? The Manager is also a primary storage provider (NFS export), but all VMs use local storage (except 1). Regards, Frank
RE: CS Manager down: all hypervisors reboot
Management server down would not result in host being rebooted. Primary storage down will. As you mentioned, you have hosted your primary storage (NFS) on the management server node. So yes, taking it down will cause all host connected to it to reboot. It doesn’t matter how many VMs use that particular storage. I am sure if there is a better way of doing this but you could modify kvmheartbeat.sh to disable reboot on loosing primary storage connection. Regards, Somesh -Original Message- From: Frank Louwers [mailto:fr...@openminds.be] Sent: Wednesday, August 19, 2015 12:19 PM To: users@cloudstack.apache.org Subject: CS Manager down: all hypervisors reboot Hi all, We had an interesting outage this morning. We took the Cloudstack Manager node down for hardware upgrades and kernel updates, and it seems all “non-dedicated” hosts rebooted. We run KVM on CS 4.4.latest. Is this “normal behaviour”, why does it do that, and how do I disable that? The Manager is also a primary storage provider (NFS export), but all VMs use local storage (except 1). Regards, Frank
RE: CS Manager down: all hypervisors reboot
So migrating my primary storage to iSCSI would (as a side effect) disable the fencing/rebooting? On 19 Aug 2015 at 21:46:35, Somesh Naidu (somesh.na...@citrix.com) wrote: how would this work if primary storage were eg iSCSI? I believe we perform the heartbeat check and host fencing for NFS storage only. Is there no way to disable that, except for modifying kvmheartbeat.sh? AFAIK, there isn't.
RE: CS Manager down: all hypervisors reboot
On 19 Aug 2015 at 20:44:47, Somesh Naidu (somesh.na...@citrix.com) wrote: Management server down would not result in host being rebooted. Primary storage down will. As you mentioned, you have hosted your primary storage (NFS) on the management server node. So yes, taking it down will cause all host connected to it to reboot. It doesn’t matter how many VMs use that particular storage. I am sure if there is a better way of doing this but you could modify kvmheartbeat.sh to disable reboot on loosing primary storage connection. HI Somesh, Thanks for the explanation! Am I right that (after reading your mail and http://mail-archives.apache.org/mod_mbox/cloudstack-dev/201508.mbox/%3ccalfpzo5cotx0qz+d_oxezjgytau+fa+mzxg_yqeuzswi_9g...@mail.gmail.com%3e by Marcus) that this will happen even if no VMs on the host set to “HA”? What would then be the procedure to perform maintenance on the (first?) primary NFS storage server and how would this work if primary storage were eg iSCSI? That sill would not explain why the “dedicated” hosts didn’t reboot, but I assume I should take a look at kvmheartbeat.sh then. Is there no way to disable that, except for modifying kvmheartbeat.sh? Regards, Frank Regards, Somesh -Original Message- From: Frank Louwers [mailto:fr...@openminds.be] Sent: Wednesday, August 19, 2015 12:19 PM To: users@cloudstack.apache.org Subject: CS Manager down: all hypervisors reboot Hi all, We had an interesting outage this morning. We took the Cloudstack Manager node down for hardware upgrades and kernel updates, and it seems all “non-dedicated” hosts rebooted. We run KVM on CS 4.4.latest. Is this “normal behaviour”, why does it do that, and how do I disable that? The Manager is also a primary storage provider (NFS export), but all VMs use local storage (except 1). Regards, Frank
RE: CS Manager down: all hypervisors reboot
So migrating my primary storage to iSCSI would (as a side effect) disable the fencing/rebooting? I can't be 100% sure about that but that's what I got from running through the code. I don't have a setup ATM where I could test this to confirm. Regards, Somesh -Original Message- From: Frank Louwers [mailto:fr...@openminds.be] Sent: Wednesday, August 19, 2015 3:55 PM To: users@cloudstack.apache.org Subject: RE: CS Manager down: all hypervisors reboot So migrating my primary storage to iSCSI would (as a side effect) disable the fencing/rebooting? On 19 Aug 2015 at 21:46:35, Somesh Naidu (somesh.na...@citrix.com) wrote: how would this work if primary storage were eg iSCSI? I believe we perform the heartbeat check and host fencing for NFS storage only. Is there no way to disable that, except for modifying kvmheartbeat.sh? AFAIK, there isn't.
Re: CS Manager down: all hypervisors reboot
yes !! i have seen this before, as Somesh pointed out ..,the easy fix is to comment out the last like of the heartbeat.sh script. cloudstack heart beat mechanism are designed to reboot hypervisors if the storage repositories are not available for 120 seconds. This is the cloudstack design feature - /opt/xensource/bin/xenheartbeat.sh CHECK OUT The last line of the script … /opt/xensource/bin/xenheartbeat.sh # for nfs dirs=$(cat $file | grep sr-mount) for dir in $dirs do mp=`mount | grep $dir` if [ -n $mp ]; then hb=$dir/hb-$1 date +%s | dd of=$hb count=100 bs=1 2/dev/null if [ $? -ne 0 ]; then /usr/bin/logger -t heartbeat Potential problem with $hb: not reachable since $(($(date +%s) - $lastdate)) seconds else lastdate=$(date +%s) fi else /usr/bin/logger -t heartbeat Potential problem with heartbeat, mount not found for $dir lastdate=$(date +%s) sed -i /${dir##/*/}/d $file fi done done /usr/bin/logger -t heartbeat Problem with $hb: not reachable for $(($(date +%s) - $lastdate)) seconds, rebooting system! reboot -f i commented out the reboot -f , to avoid reboots .. thanks p On Wed, Aug 19, 2015 at 3:58 PM, Somesh Naidu somesh.na...@citrix.com wrote: So migrating my primary storage to iSCSI would (as a side effect) disable the fencing/rebooting? I can't be 100% sure about that but that's what I got from running through the code. I don't have a setup ATM where I could test this to confirm. Regards, Somesh -Original Message- From: Frank Louwers [mailto:fr...@openminds.be] Sent: Wednesday, August 19, 2015 3:55 PM To: users@cloudstack.apache.org Subject: RE: CS Manager down: all hypervisors reboot So migrating my primary storage to iSCSI would (as a side effect) disable the fencing/rebooting? On 19 Aug 2015 at 21:46:35, Somesh Naidu (somesh.na...@citrix.com) wrote: how would this work if primary storage were eg iSCSI? I believe we perform the heartbeat check and host fencing for NFS storage only. Is there no way to disable that, except for modifying kvmheartbeat.sh? AFAIK, there isn't.