rhtyd commented on issue #2890: KVMHAMonitor thread blocks indefinitely while 
NFS not available
URL: https://github.com/apache/cloudstack/issues/2890#issuecomment-434207830
 
 
   @csquire I could not reproduce the issue. I guess it depends on how you're 
simulating NFS/shared-storage failure and what your iptables rules are. In my 
case I simply shutdown the NFS server instead of iptables rules and observed 
following:
   
   ```
   .
   2018-10-30 08:04:04,842 WARN  [kvm.resource.KVMHAMonitor] (Thread-34:null) 
(logid:) Timed out: 
/usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh -i 
172.20.0.1 -p /export/testing/primary -m 
/mnt/05c30f6d-5725-3369-a8d6-b5fb37c7ba8f -h 172.20.1.10 .  Output is: 
   2018-10-30 08:04:04,843 WARN  [kvm.resource.KVMHAMonitor] (Thread-34:null) 
(logid:) write heartbeat failed: timeout, try: 5 of 5
   2018-10-30 08:04:04,843 DEBUG [kvm.resource.KVMHAMonitor] (Thread-34:null) 
(logid:) [ignored] interupted between heartbeat retries.
   2018-10-30 08:04:04,843 WARN  [kvm.resource.KVMHAMonitor] (Thread-34:null) 
(logid:) write heartbeat failed: timeout; stopping cloudstack-agent
   2018-10-30 08:04:04,843 DEBUG [kvm.resource.KVMHAMonitor] (Thread-34:null) 
(logid:) Executing: 
/usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh -i 
172.20.0.1 -p /export/testing/primary -m 
/mnt/05c30f6d-5725-3369-a8d6-b5fb37c7ba8f -c 
   2018-10-30 08:04:04,845 DEBUG [kvm.resource.KVMHAMonitor] (Thread-34:null) 
(logid:) Executing while with timeout : 60000
   2018-10-30 08:04:05,007 DEBUG [kvm.resource.LibvirtComputingResource] 
(UgentTask-5:null) (logid:) Executing: 
/usr/share/cloudstack-common/scripts/vm/network/security_group.py 
get_rule_logs_for_vms 
   2018-10-30 08:04:05,009 DEBUG [kvm.resource.LibvirtComputingResource] 
(UgentTask-5:null) (logid:) Executing while with timeout : 1800000
   2018-10-30 08:04:05,109 DEBUG [kvm.resource.LibvirtComputingResource] 
(UgentTask-5:null) (logid:) Execution is successful.
   2018-10-30 08:04:05,109 DEBUG [kvm.resource.LibvirtConnection] 
(UgentTask-5:null) (logid:) Looking for libvirtd connection at: qemu:///system
   2018-10-30 08:04:05,114 DEBUG [cloud.agent.Agent] (UgentTask-5:null) 
(logid:) Sending ping: Seq 1-157:  { Cmd , MgmtId: -1, via: 1, Ver: v1, Flags: 
11, 
[{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupStates":{},"_hostVmStateReport":{"r-5-VM":{"state":"PowerOn","host":"centos7-kvm1"},"s-1-VM":{"state":"PowerOn","host":"centos7-kvm1"},"v-2-VM":{"state":"PowerOn","host":"centos7-kvm1"},"i-2-4-VM":{"state":"PowerOn","host":"centos7-kvm1"}},"_gatewayAccessible":true,"_vnetAccessible":true,"hostType":"Routing","hostId":1,"wait":0}}]
 }
   2018-10-30 08:04:05,166 DEBUG [cloud.agent.Agent] (Agent-Handler-3:null) 
(logid:) Received response: Seq 1-157:  { Ans: , MgmtId: 2485476061376, via: 1, 
Ver: v1, Flags: 100010, 
[{"com.cloud.agent.api.PingAnswer":{"_command":{"hostType":"Routing","hostId":1,"wait":0},"result":true,"wait":0}}]
 }
   2018-10-30 08:04:06,762 DEBUG [cloud.agent.Agent] 
(agentRequest-Handler-5:null) (logid:0aa26d2d) Request:Seq 
1-3303108851699548248:  { Cmd , MgmtId: 2485476061376, via: 1, Ver: v1, Flags: 
100011, 
[{"com.cloud.agent.api.GetVncPortCommand":{"id":4,"name":"i-2-4-VM","wait":0}}] 
}
   2018-10-30 08:04:06,762 DEBUG [cloud.agent.Agent] 
(agentRequest-Handler-5:null) (logid:0aa26d2d) Processing command: 
com.cloud.agent.api.GetVncPortCommand
   2018-10-30 08:04:06,762 DEBUG [kvm.resource.LibvirtConnection] 
(agentRequest-Handler-5:null) (logid:0aa26d2d) Looking for libvirtd connection 
at: qemu:///system
   2018-10-30 08:04:06,765 DEBUG [cloud.agent.Agent] 
(agentRequest-Handler-5:null) (logid:0aa26d2d) Seq 1-3303108851699548248:  { 
Ans: , MgmtId: 2485476061376, via: 1, Ver: v1, Flags: 10, 
[{"com.cloud.agent.api.GetVncPortAnswer":{"address":"172.20.1.10","port":5903,"result":true,"wait":0}}]
 }
   2018-10-30 08:04:09,871 DEBUG [kvm.resource.KVMHAMonitor] (Thread-34:null) 
(logid:) Exit value is 143
   2018-10-30 08:04:09,872 DEBUG [kvm.resource.KVMHAMonitor] (Thread-34:null) 
(logid:) Redirecting to /bin/systemctl stop cloudstack-agent.service
   2018-10-30 08:04:09,873 INFO  [cloud.agent.Agent] (AgentShutdownThread:null) 
(logid:) Stopping the agent: Reason = sig.kill
   2018-10-30 08:04:09,874 DEBUG [cloud.agent.Agent] (AgentShutdownThread:null) 
(logid:) Sending shutdown to management server
   2018-10-30 08:04:09,876 DEBUG [utils.nio.NioConnection] 
(Agent-NioConnectionHandler-1:null) (logid:) Location 1: Socket 
Socket[addr=/172.20.0.1,port=8250,localport=43646] closed on read.  Probably -1 
returned: Connection closed with -1 on reading size.
   2018-10-30 08:04:09,876 DEBUG [utils.nio.NioConnection] 
(Agent-NioConnectionHandler-1:null) (logid:) Closing socket 
Socket[addr=/172.20.0.1,port=8250,localport=43646]
   2018-10-30 08:04:10,875 DEBUG [kvm.resource.LibvirtConnection] 
(AgentShutdownThread:null) (logid:) Looking for libvirtd connection at: 
qemu:///system
   ```
   
   I could see the `kvmheartbeat.sh` try 5 times every 60 seconds and 
eventually failing to shutdown the agent by this: `/bin/systemctl stop 
cloudstack-agent.service`. On the management server side, I saw the host state 
go from Up to Disconnected:
   
   ![screenshot from 2018-10-30 
13-38-20](https://user-images.githubusercontent.com/95203/47704265-223dd100-dc49-11e8-9f71-1aa36e66e47d.png)
   
   The only change in behaviour I see is that the KVM host is not rebooted but 
only the agent gets shutdown.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to