digitalgust commented on issue #3509: 4.12 add host fail ,using tls connection URL: https://github.com/apache/cloudstack/issues/3509#issuecomment-514946131 test 4.11.3.0, the problem same as 4.12 hypervisor add unreliable , i have 2 host for hypervisor in my test environment. add fisrst host , it always shown "alert“, then remove the first host, mgmt note success message, then add the second host, second show "alert" ,!!! the first host turn "UP", but the first host is success deleted a momont ago. or keep it for serial hours in "alert" state, the hosts convert "UP". when host is in "alert" state, it cant't connect to mgmt, but in the host can ping mgmt, and can mgmt port 8250 is normal, the same time , there is another host in "UP" state . i can abstractly confirm the host's network is connectable . Puzzling real . this is agent log ,the host in "alert" state ``` 2019-07-25 16:03:40,152 INFO [cloud.agent.Agent] (Agent Timer:null) (logid:) Reconnecting to host:192.168.100.1 2019-07-25 16:03:40,153 INFO [utils.nio.NioClient] (Agent Timer:null) (logid:) Connecting to 192.168.100.1:8250 2019-07-25 16:03:40,154 INFO [utils.nio.Link] (Agent Timer:null) (logid:) Conf file found: /etc/cloudstack/agent/agent.properties 2019-07-25 16:03:40,289 INFO [utils.nio.NioClient] (Agent Timer:null) (logid:) SSL: Handshake done 2019-07-25 16:03:40,290 INFO [utils.nio.NioClient] (Agent Timer:null) (logid:) Connected to 192.168.100.1:8250 2019-07-25 16:03:40,290 DEBUG [kvm.resource.LibvirtConnection] (Agent-Handler-1:null) (logid:) Looking for libvirtd connection at: qemu:///system 2019-07-25 16:03:40,296 DEBUG [kvm.resource.LibvirtCapXMLParser] (Agent-Handler-1:null) (logid:) Found /usr/libexec/qemu-kvm as a suiteable emulator 2019-07-25 16:03:40,296 DEBUG [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) (logid:) Executing: /bin/bash -c qemu-img --help|grep convert 2019-07-25 16:03:40,297 DEBUG [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) (logid:) Executing while with timeout : 1800000 2019-07-25 16:03:40,314 DEBUG [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) (logid:) Execution is successful. 2019-07-25 16:03:40,315 DEBUG [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) (logid:) convert [--object objectdef] [--image-opts] [-c] [-p] [-q] [-n] [-f fmt] [-t cache] [-T src_cache] [-O output_fmt] [-o options] [-s snapshot_id_or_name] [-l snapshot_param] [-S sparse_size] [-m num_coroutines] [-W] filename [filename2 [...]] output_filename options are: 'none', 'writeback' (default, except for convert), 'writethrough', 'directsync' and 'unsafe' (default for convert) Parameters to convert subcommand: '-m' specifies how many coroutines work in parallel during the convert 2019-07-25 16:03:40,315 DEBUG [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) (logid:) cpus=8, speed=4500, ram=7378771968, _dom0MinMem=1073741824, _dom0OvercommitMem=0, cpu sockets=1 2019-07-25 16:03:40,315 DEBUG [cloud.resource.ServerResourceBase] (Agent-Handler-1:null) (logid:) Parameters for private nic: 192.168.100.50 - 1c:69:7a:00:e2:76-255.255.255.0 2019-07-25 16:03:40,315 DEBUG [cloud.resource.ServerResourceBase] (Agent-Handler-1:null) (logid:) Parameters for storage nic: 192.168.100.50 - 1c:69:7a:00:e2:76-255.255.255.0 2019-07-25 16:03:40,315 DEBUG [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) (logid:) Executing: /usr/share/cloudstack-common/scripts/vm/hypervisor/versions.sh 2019-07-25 16:03:40,316 DEBUG [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) (logid:) Executing while with timeout : 1800000 2019-07-25 16:03:40,324 DEBUG [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) (logid:) Execution is successful. 2019-07-25 16:03:40,330 DEBUG [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) (logid:) Executing: sudo grep InitiatorName= /etc/iscsi/initiatorname.iscsi 2019-07-25 16:03:40,331 DEBUG [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) (logid:) Executing while with timeout : 3600000 2019-07-25 16:03:40,346 DEBUG [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) (logid:) Execution is successful. 2019-07-25 16:03:40,346 INFO [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Attempting to create storage pool 43a0fdb9-9faf-406b-ba0b-388aa1bbc7d4 (Filesystem) in libvirt 2019-07-25 16:03:40,346 DEBUG [kvm.resource.LibvirtConnection] (Agent-Handler-1:null) (logid:) Looking for libvirtd connection at: qemu:///system 2019-07-25 16:03:40,347 INFO [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Found existing defined storage pool 43a0fdb9-9faf-406b-ba0b-388aa1bbc7d4, using it. 2019-07-25 16:03:40,347 INFO [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Trying to fetch storage pool 43a0fdb9-9faf-406b-ba0b-388aa1bbc7d4 from libvirt 2019-07-25 16:03:40,347 DEBUG [kvm.resource.LibvirtConnection] (Agent-Handler-1:null) (logid:) Looking for libvirtd connection at: qemu:///system 2019-07-25 16:03:40,350 DEBUG [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Succesfully refreshed pool 43a0fdb9-9faf-406b-ba0b-388aa1bbc7d4 Capacity: 53660876800 Used: 1794367488 Available: 51866509312 2019-07-25 16:03:40,351 DEBUG [cloud.agent.Agent] (Agent-Handler-1:null) (logid:) Executing: hostname 2019-07-25 16:03:40,352 DEBUG [cloud.agent.Agent] (Agent-Handler-1:null) (logid:) Executing while with timeout : 500 2019-07-25 16:03:40,352 DEBUG [cloud.agent.Agent] (Agent-Handler-1:null) (logid:) Execution is successful. 2019-07-25 16:03:40,352 DEBUG [cloud.agent.Agent] (Agent-Handler-1:null) (logid:) Executing: hostname 2019-07-25 16:03:40,353 DEBUG [cloud.agent.Agent] (Agent-Handler-1:null) (logid:) Executing while with timeout : 500 2019-07-25 16:03:40,353 DEBUG [cloud.agent.Agent] (Agent-Handler-1:null) (logid:) Execution is successful. 2019-07-25 16:03:40,354 DEBUG [cloud.agent.Agent] (Agent-Handler-1:null) (logid:) Sending Startup: Seq 4-15: { Cmd , MgmtId: -1, via: 4, Ver: v1, Flags: 1, [{"com.cloud.agent.api.StartupRoutingCommand":{"cpuSockets":1,"cpus":8,"speed":4500,"memory":7378771968,"dom0MinMemory":1073741824,"poolSync":false,"supportsClonedVolumes":false,"caps":"hvm,snapshot","pool":"/root","hypervisorType":"KVM","hostDetails":{"Host.OS.Kernel.Version":"3.10.0-957.el7.x86_64","com.cloud.network.Networks.RouterPrivateIpStrategy":"HostLocal","Host.OS.Version":"7.6.1810","secured":"true","Host.OS":"CentOS"},"hostTags":[],"groupDetails":{},"type":"Routing","dataCenter":"1","pod":"1","cluster":"1","guid":"3960b86e-3c62-3d60-a52f-dc64a9e09619-LibvirtComputingResource","name":"host01","id":4,"version":"4.11.3.0","iqn":"iqn.1994-05.com.redhat:3ae550194c3","privateIpAddress":"192.168.100.50","privateMacAddress":"1c:69:7a:00:e2:76","privateNetmask":"255.255.255.0","storageIpAddress":"192.168.100.50","storageNetmask":"255.255.255.0","storageMacAddress":"1c:69:7a:00:e2:76","resourceName":"LibvirtComputingResource","gatewayIpAddress":"192.168.100.254","msHostList":"192.168.100.1","wait":0}},{"com.cloud.agent.api.StartupStorageCommand":{"totalSize":0,"poolInfo":{"uuid":"43a0fdb9-9faf-406b-ba0b-388aa1bbc7d4","host":"192.168.100.50","localPath":"/var/lib/libvirt/images","hostPath":"/var/lib/libvirt/images","poolType":"Filesystem","capacityBytes":53660876800,"availableBytes":51866509312},"resourceType":"STORAGE_POOL","hostDetails":{},"type":"Storage","dataCenter":"1","pod":"1","guid":"3960b86e-3c62-3d60-a52f-dc64a9e09619-LibvirtComputingResource","name":"host01","id":4,"version":"4.11.3.0","resourceName":"LibvirtComputingResource","msHostList":"192.168.100.1","wait":0}}] } 2019-07-25 16:03:40,354 DEBUG [cloud.agent.Agent] (Agent-Handler-1:null) (logid:) Startup task created ==============here can't connect to mgmt======= 2019-07-25 16:03:45,291 INFO [cloud.agent.Agent] (Agent Timer:null) (logid:) Connected to the host: 192.168.100.1 2019-07-25 16:09:40,354 INFO [cloud.agent.Agent] (Agent Timer:null) (logid:) The startup command is now cancelled ================ 2019-07-25 16:09:40,355 DEBUG [cloud.agent.Agent] (Agent Timer:null) (logid:) Clearing watch list: 0 2019-07-25 16:09:40,355 INFO [cloud.agent.Agent] (Agent Timer:null) (logid:) Lost connection to host: 192.168.100.1. Attempting reconnection while we still have 0 commands in progress. 2019-07-25 16:09:40,355 DEBUG [utils.nio.NioConnection] (Agent-NioConnectionHandler-1:null) (logid:) Closing socket Socket[addr=/192.168.100.1,port=8250,localport=54460] 2019-07-25 16:09:40,357 INFO [utils.nio.NioClient] (Agent Timer:null) (logid:) NioClient connection closed ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services