[jira] [Commented] (CLOUDSTACK-5429) KVM - Primary store down/Network Failure - Hosts attempt to reboot becasue of primary store being down hangs.
[ https://issues.apache.org/jira/browse/CLOUDSTACK-5429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215462#comment-14215462 ] Marcus Sorensen commented on CLOUDSTACK-5429: - No, that does not work. VMs cannot be cleanly shut down (or even forced off) if their storage is hanging. The qemu processes will be in D state and unresponsive. Force reboot of host via IPMI or sysrq trigger, or something like that would be necessary, and the mgmt server would need to recognize that this has happened so the VMs can start elsewhere safely. KVM - Primary store down/Network Failure - Hosts attempt to reboot becasue of primary store being down hangs. - Key: CLOUDSTACK-5429 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-5429 Project: CloudStack Issue Type: Bug Security Level: Public(Anyone can view this level - this is the default.) Components: Management Server Affects Versions: 4.3.0 Environment: Build from 4.3 Reporter: Sangeetha Hariharan Assignee: edison su Priority: Critical Fix For: 4.4.0 Attachments: kvm-networkshutdown.png, kvmhostreboot.png, psdown.rar KVM - Primary store down - Hosts attempt to reboot becasue of primary store being down hangs. Set up: Advanced zone with KVM (RHEL 6.3) hosts. Steps to reproduce the problem: 1. Deploy few Vms in each of the hosts with 10 GB ROOT volume size , so we start with 10 Vms. 2. Create snaposhot for ROOT volumes. 3. When snapshot is still in progress , Make the primary storage unavailable for 10 mts. This results in the KVM hosts to reboot. But reboot of KVM host is not successful. It is stuck at trying to unmount nfs mount points. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CLOUDSTACK-5429) KVM - Primary store down/Network Failure - Hosts attempt to reboot becasue of primary store being down hangs.
[ https://issues.apache.org/jira/browse/CLOUDSTACK-5429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025813#comment-14025813 ] edison su commented on CLOUDSTACK-5429: --- It's the default behavior on RHEL: https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Virtualization_Administration_Guide/sect-gracefully-shut-down-guests-libvirt.html One can change the behavior to shutdown vm, instead of suspend vm, during host shutdown, by modify libvirt-guests, but I think it should be configured by admin, not in cloudstack. KVM - Primary store down/Network Failure - Hosts attempt to reboot becasue of primary store being down hangs. - Key: CLOUDSTACK-5429 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-5429 Project: CloudStack Issue Type: Bug Security Level: Public(Anyone can view this level - this is the default.) Components: Management Server Affects Versions: 4.3.0 Environment: Build from 4.3 Reporter: Sangeetha Hariharan Assignee: edison su Priority: Critical Fix For: 4.4.0 Attachments: kvm-networkshutdown.png, kvmhostreboot.png, psdown.rar KVM - Primary store down - Hosts attempt to reboot becasue of primary store being down hangs. Set up: Advanced zone with KVM (RHEL 6.3) hosts. Steps to reproduce the problem: 1. Deploy few Vms in each of the hosts with 10 GB ROOT volume size , so we start with 10 Vms. 2. Create snaposhot for ROOT volumes. 3. When snapshot is still in progress , Make the primary storage unavailable for 10 mts. This results in the KVM hosts to reboot. But reboot of KVM host is not successful. It is stuck at trying to unmount nfs mount points. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CLOUDSTACK-5429) KVM - Primary store down/Network Failure - Hosts attempt to reboot becasue of primary store being down hangs.
[ https://issues.apache.org/jira/browse/CLOUDSTACK-5429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919520#comment-13919520 ] Marcus Sorensen commented on CLOUDSTACK-5429: - You could change from a clean reboot to sysrq triggers, which would be more like ipmi/power fencing that would normally occur in a situation like this. It's really bad to have VM processes running like this if we try to start them elsewhere. Most distributions enable it by default. It would be nice if the agent could also somehow tell the mgmt server that it's relinquishing those vms prior to force-rebooting itself, since I know under normal circumstances the HA VMs won't run anywhere else until the agent on this host is reachable again, which could be a long time if there's actually a host-specific issue. http://fedoraproject.org/wiki/QA/Sysrq KVM - Primary store down/Network Failure - Hosts attempt to reboot becasue of primary store being down hangs. - Key: CLOUDSTACK-5429 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-5429 Project: CloudStack Issue Type: Bug Security Level: Public(Anyone can view this level - this is the default.) Components: Management Server Affects Versions: 4.3.0 Environment: Build from 4.3 Reporter: Sangeetha Hariharan Assignee: edison su Priority: Critical Fix For: 4.4.0 Attachments: kvm-networkshutdown.png, kvmhostreboot.png, psdown.rar KVM - Primary store down - Hosts attempt to reboot becasue of primary store being down hangs. Set up: Advanced zone with KVM (RHEL 6.3) hosts. Steps to reproduce the problem: 1. Deploy few Vms in each of the hosts with 10 GB ROOT volume size , so we start with 10 Vms. 2. Create snaposhot for ROOT volumes. 3. When snapshot is still in progress , Make the primary storage unavailable for 10 mts. This results in the KVM hosts to reboot. But reboot of KVM host is not successful. It is stuck at trying to unmount nfs mount points. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CLOUDSTACK-5429) KVM - Primary store down/Network Failure - Hosts attempt to reboot becasue of primary store being down hangs.
[ https://issues.apache.org/jira/browse/CLOUDSTACK-5429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917645#comment-13917645 ] Marcus Sorensen commented on CLOUDSTACK-5429: - Also if we have ipmi fencing that might work. It seems that capability has been discussed a bit in the past. KVM - Primary store down/Network Failure - Hosts attempt to reboot becasue of primary store being down hangs. - Key: CLOUDSTACK-5429 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-5429 Project: CloudStack Issue Type: Bug Security Level: Public(Anyone can view this level - this is the default.) Components: Management Server Affects Versions: 4.3.0 Environment: Build from 4.3 Reporter: Sangeetha Hariharan Assignee: edison su Priority: Critical Fix For: 4.4.0 Attachments: kvm-networkshutdown.png, kvmhostreboot.png, psdown.rar KVM - Primary store down - Hosts attempt to reboot becasue of primary store being down hangs. Set up: Advanced zone with KVM (RHEL 6.3) hosts. Steps to reproduce the problem: 1. Deploy few Vms in each of the hosts with 10 GB ROOT volume size , so we start with 10 Vms. 2. Create snaposhot for ROOT volumes. 3. When snapshot is still in progress , Make the primary storage unavailable for 10 mts. This results in the KVM hosts to reboot. But reboot of KVM host is not successful. It is stuck at trying to unmount nfs mount points. -- This message was sent by Atlassian JIRA (v6.2#6252)