somejfn commented on issue #2890: KVMHAMonitor thread blocks indefinitely while 
NFS not available
URL: https://github.com/apache/cloudstack/issues/2890#issuecomment-432374446
 
 
   Confirmed we see similar behavior on 4.11.2rc3 and the agent went in Down 
state.   Agent logs:
   
   810986-e702-36ea-a87b-fd48064ecb12
   2018-10-23 13:14:40,391 INFO  [kvm.resource.LibvirtConnection] 
(agentRequest-Handler-4:null) (logid:f8cd7cf7) No existing libvirtd connection 
found. Opening a new one
   2018-10-23 13:14:40,392 WARN  [kvm.resource.LibvirtConnection] 
(agentRequest-Handler-4:null) (logid:f8cd7cf7) Can not find a connection for 
Instance i-4-24-VM. Assuming the default connection.
   2018-10-23 13:14:40,399 INFO  [kvm.storage.LibvirtStorageAdaptor] 
(agentRequest-Handler-4:null) (logid:f8cd7cf7) Trying to fetch storage pool 
4e49054a-463f-306f-9678-b0d9b02af9a1 from libvirt
   2018-10-23 13:14:51,496 INFO  [kvm.storage.LibvirtStorageAdaptor] 
(agentRequest-Handler-2:null) (logid:3a0df8e5) Trying to fetch storage pool 
0e233ec5-ea14-439e-bfde-a8c7566d254c from libvirt
   2018-10-23 13:14:51,498 INFO  [kvm.storage.LibvirtStorageAdaptor] 
(agentRequest-Handler-2:null) (logid:3a0df8e5) Asking libvirt to refresh 
storage pool 0e233ec5-ea14-439e-bfde-a8c7566d254c
   2018-10-23 13:15:25,027 INFO  [kvm.storage.LibvirtStorageAdaptor] 
(agentRequest-Handler-1:null) (logid:581a1d95) Trying to fetch storage pool 
0e233ec5-ea14-439e-bfde-a8c7566d254c from libvirt
   2018-10-23 13:15:25,029 INFO  [kvm.storage.LibvirtStorageAdaptor] 
(agentRequest-Handler-1:null) (logid:581a1d95) Asking libvirt to refresh 
storage pool 0e233ec5-ea14-439e-bfde-a8c7566d254c
   2018-10-23 13:15:25,590 INFO  [kvm.storage.LibvirtStorageAdaptor] 
(agentRequest-Handler-5:null) (logid:581a1d95) Trying to fetch storage pool 
3e810986-e702-36ea-a87b-fd48064ecb12 from libvirt
   2018-10-23 13:15:25,592 INFO  [kvm.storage.LibvirtStorageAdaptor] 
(agentRequest-Handler-5:null) (logid:581a1d95) Asking libvirt to refresh 
storage pool 3e810986-e702-36ea-a87b-fd48064ecb12
   
   2018-10-23 13:21:28,804 WARN  [kvm.resource.KVMHAChecker] (Script-3:null) 
(logid:) Interrupting script.
   2018-10-23 13:21:28,806 WARN  [kvm.resource.KVMHAChecker] 
(pool-15160-thread-1:null) (logid:c3d5dcaf) Timed out: 
/usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh -i 
10.73.96.232 -p /vol/t500_0_fls3_pool36_root -m 
/mnt/d05f1c9d-9454-3707-a6c4-781398af198d -h 10.73.96.212 -r -t 60 .  Output is:
   2018-10-23 13:21:32,826 WARN  [kvm.resource.KVMHAChecker] (Script-7:null) 
(logid:) Interrupting script.
   2018-10-23 13:21:32,827 WARN  [kvm.resource.KVMHAChecker] 
(pool-15161-thread-1:null) (logid:c3d5dcaf) Timed out: 
/usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh -i 
10.73.96.232 -p /vol/t500_0_fls3_pool36_root -m 
/mnt/d05f1c9d-9454-3707-a6c4-781398af198d -h 10.73.96.212 -r -t 60 .  Output is:
   2018-10-23 13:21:36,846 WARN  [kvm.resource.KVMHAChecker] (Script-4:null) 
(logid:) Interrupting script.
   2018-10-23 13:21:36,847 WARN  [kvm.resource.KVMHAChecker] 
(pool-15162-thread-1:null) (logid:4a3cb34f) Timed out: 
/usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh -i 
10.73.96.232 -p /vol/t500_0_fls3_pool36_root -m 
/mnt/d05f1c9d-9454-3707-a6c4-781398af198d -h 10.73.96.212 -r -t 60 .  Output is:
   2018-10-23 13:24:44,205 INFO  [cloud.agent.Agent] (Agent-Handler-1:null) 
(logid:5a5a7500) Lost connection to host: 10.73.96.19. Attempting reconnection 
while we still have 5 commands in progress.
   2018-10-23 13:24:44,206 INFO  [utils.nio.NioClient] (Agent-Handler-1:null) 
(logid:5a5a7500) NioClient connection closed
   2018-10-23 13:24:44,206 INFO  [cloud.agent.Agent] (Agent-Handler-1:null) 
(logid:5a5a7500) Reconnecting to host:10.73.96.19
   2018-10-23 13:24:44,207 INFO  [utils.nio.NioClient] (Agent-Handler-1:null) 
(logid:5a5a7500) Connecting to 10.73.96.19:8250
   2018-10-23 13:24:44,207 INFO  [utils.nio.Link] (Agent-Handler-1:null) 
(logid:5a5a7500) Conf file found: /etc/cloudstack/agent/agent.properties
   
   Note sometimes you will see the agent successfully go in Disconnect state 
but the host HA framework might still fire after the kvm.ha.degraded.max.period 
timer and that is not expected.   In any case we want to avoid massive KVM host 
resets via IPMI for storage related problems because this is more damaging than 
waiting to primary storage to come back. 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to