digitalgust commented on issue #3509: 4.12 add host fail ,using tls connection
URL: https://github.com/apache/cloudstack/issues/3509#issuecomment-514946131
 
 
   test 4.11.3.0,   the problem same as 4.12
   hypervisor add unreliable ,
   i have 2 host for hypervisor in my test environment.
   add fisrst host , it always shown "alert“, then remove the first host,  mgmt 
note success message,  then add the second host, second show "alert" ,!!! the 
first host turn "UP", but the first host is success deleted a momont ago.
   or keep it for serial hours in "alert" state,  the hosts convert "UP".
   when host is in "alert" state, it cant't connect to mgmt, but in the host 
can ping mgmt, and can mgmt 
    port 8250 is normal,  the same time , there is another host in "UP" state .
   i  can abstractly confirm the host's network is connectable . 
   Puzzling real .
   
   this is agent log  ,the host in "alert" state 
   ```
   2019-07-25 16:03:40,152 INFO  [cloud.agent.Agent] (Agent Timer:null) 
(logid:) Reconnecting to host:192.168.100.1
   2019-07-25 16:03:40,153 INFO  [utils.nio.NioClient] (Agent Timer:null) 
(logid:) Connecting to 192.168.100.1:8250
   2019-07-25 16:03:40,154 INFO  [utils.nio.Link] (Agent Timer:null) (logid:) 
Conf file found: /etc/cloudstack/agent/agent.properties
   2019-07-25 16:03:40,289 INFO  [utils.nio.NioClient] (Agent Timer:null) 
(logid:) SSL: Handshake done
   2019-07-25 16:03:40,290 INFO  [utils.nio.NioClient] (Agent Timer:null) 
(logid:) Connected to 192.168.100.1:8250
   2019-07-25 16:03:40,290 DEBUG [kvm.resource.LibvirtConnection] 
(Agent-Handler-1:null) (logid:) Looking for libvirtd connection at: 
qemu:///system
   2019-07-25 16:03:40,296 DEBUG [kvm.resource.LibvirtCapXMLParser] 
(Agent-Handler-1:null) (logid:) Found /usr/libexec/qemu-kvm as a suiteable 
emulator
   2019-07-25 16:03:40,296 DEBUG [kvm.resource.LibvirtComputingResource] 
(Agent-Handler-1:null) (logid:) Executing: /bin/bash -c qemu-img --help|grep 
convert 
   2019-07-25 16:03:40,297 DEBUG [kvm.resource.LibvirtComputingResource] 
(Agent-Handler-1:null) (logid:) Executing while with timeout : 1800000
   2019-07-25 16:03:40,314 DEBUG [kvm.resource.LibvirtComputingResource] 
(Agent-Handler-1:null) (logid:) Execution is successful.
   2019-07-25 16:03:40,315 DEBUG [kvm.resource.LibvirtComputingResource] 
(Agent-Handler-1:null) (logid:)   convert [--object objectdef] [--image-opts] 
[-c] [-p] [-q] [-n] [-f fmt] [-t cache] [-T src_cache] [-O output_fmt] [-o 
options] [-s snapshot_id_or_name] [-l snapshot_param] [-S sparse_size] [-m 
num_coroutines] [-W] filename [filename2 [...]] output_filename
       options are: 'none', 'writeback' (default, except for convert), 
'writethrough',
       'directsync' and 'unsafe' (default for convert)
   Parameters to convert subcommand:
     '-m' specifies how many coroutines work in parallel during the convert
   
   2019-07-25 16:03:40,315 DEBUG [kvm.resource.LibvirtComputingResource] 
(Agent-Handler-1:null) (logid:) cpus=8, speed=4500, ram=7378771968, 
_dom0MinMem=1073741824, _dom0OvercommitMem=0, cpu sockets=1
   2019-07-25 16:03:40,315 DEBUG [cloud.resource.ServerResourceBase] 
(Agent-Handler-1:null) (logid:) Parameters for private nic: 192.168.100.50 - 
1c:69:7a:00:e2:76-255.255.255.0
   2019-07-25 16:03:40,315 DEBUG [cloud.resource.ServerResourceBase] 
(Agent-Handler-1:null) (logid:) Parameters for storage nic: 192.168.100.50 - 
1c:69:7a:00:e2:76-255.255.255.0
   2019-07-25 16:03:40,315 DEBUG [kvm.resource.LibvirtComputingResource] 
(Agent-Handler-1:null) (logid:) Executing: 
/usr/share/cloudstack-common/scripts/vm/hypervisor/versions.sh 
   2019-07-25 16:03:40,316 DEBUG [kvm.resource.LibvirtComputingResource] 
(Agent-Handler-1:null) (logid:) Executing while with timeout : 1800000
   2019-07-25 16:03:40,324 DEBUG [kvm.resource.LibvirtComputingResource] 
(Agent-Handler-1:null) (logid:) Execution is successful.
   2019-07-25 16:03:40,330 DEBUG [kvm.resource.LibvirtComputingResource] 
(Agent-Handler-1:null) (logid:) Executing: sudo grep InitiatorName= 
/etc/iscsi/initiatorname.iscsi 
   2019-07-25 16:03:40,331 DEBUG [kvm.resource.LibvirtComputingResource] 
(Agent-Handler-1:null) (logid:) Executing while with timeout : 3600000
   2019-07-25 16:03:40,346 DEBUG [kvm.resource.LibvirtComputingResource] 
(Agent-Handler-1:null) (logid:) Execution is successful.
   2019-07-25 16:03:40,346 INFO  [kvm.storage.LibvirtStorageAdaptor] 
(Agent-Handler-1:null) (logid:) Attempting to create storage pool 
43a0fdb9-9faf-406b-ba0b-388aa1bbc7d4 (Filesystem) in libvirt
   2019-07-25 16:03:40,346 DEBUG [kvm.resource.LibvirtConnection] 
(Agent-Handler-1:null) (logid:) Looking for libvirtd connection at: 
qemu:///system
   2019-07-25 16:03:40,347 INFO  [kvm.storage.LibvirtStorageAdaptor] 
(Agent-Handler-1:null) (logid:) Found existing defined storage pool 
43a0fdb9-9faf-406b-ba0b-388aa1bbc7d4, using it.
   2019-07-25 16:03:40,347 INFO  [kvm.storage.LibvirtStorageAdaptor] 
(Agent-Handler-1:null) (logid:) Trying to fetch storage pool 
43a0fdb9-9faf-406b-ba0b-388aa1bbc7d4 from libvirt
   2019-07-25 16:03:40,347 DEBUG [kvm.resource.LibvirtConnection] 
(Agent-Handler-1:null) (logid:) Looking for libvirtd connection at: 
qemu:///system
   2019-07-25 16:03:40,350 DEBUG [kvm.storage.LibvirtStorageAdaptor] 
(Agent-Handler-1:null) (logid:) Succesfully refreshed pool 
43a0fdb9-9faf-406b-ba0b-388aa1bbc7d4 Capacity: 53660876800 Used: 1794367488 
Available: 51866509312
   2019-07-25 16:03:40,351 DEBUG [cloud.agent.Agent] (Agent-Handler-1:null) 
(logid:) Executing: hostname 
   2019-07-25 16:03:40,352 DEBUG [cloud.agent.Agent] (Agent-Handler-1:null) 
(logid:) Executing while with timeout : 500
   2019-07-25 16:03:40,352 DEBUG [cloud.agent.Agent] (Agent-Handler-1:null) 
(logid:) Execution is successful.
   2019-07-25 16:03:40,352 DEBUG [cloud.agent.Agent] (Agent-Handler-1:null) 
(logid:) Executing: hostname 
   2019-07-25 16:03:40,353 DEBUG [cloud.agent.Agent] (Agent-Handler-1:null) 
(logid:) Executing while with timeout : 500
   2019-07-25 16:03:40,353 DEBUG [cloud.agent.Agent] (Agent-Handler-1:null) 
(logid:) Execution is successful.
   2019-07-25 16:03:40,354 DEBUG [cloud.agent.Agent] (Agent-Handler-1:null) 
(logid:) Sending Startup: Seq 4-15:  { Cmd , MgmtId: -1, via: 4, Ver: v1, 
Flags: 1, 
[{"com.cloud.agent.api.StartupRoutingCommand":{"cpuSockets":1,"cpus":8,"speed":4500,"memory":7378771968,"dom0MinMemory":1073741824,"poolSync":false,"supportsClonedVolumes":false,"caps":"hvm,snapshot","pool":"/root","hypervisorType":"KVM","hostDetails":{"Host.OS.Kernel.Version":"3.10.0-957.el7.x86_64","com.cloud.network.Networks.RouterPrivateIpStrategy":"HostLocal","Host.OS.Version":"7.6.1810","secured":"true","Host.OS":"CentOS"},"hostTags":[],"groupDetails":{},"type":"Routing","dataCenter":"1","pod":"1","cluster":"1","guid":"3960b86e-3c62-3d60-a52f-dc64a9e09619-LibvirtComputingResource","name":"host01","id":4,"version":"4.11.3.0","iqn":"iqn.1994-05.com.redhat:3ae550194c3","privateIpAddress":"192.168.100.50","privateMacAddress":"1c:69:7a:00:e2:76","privateNetmask":"255.255.255.0","storageIpAddress":"192.168.100.50","storageNetmask":"255.255.255.0","storageMacAddress":"1c:69:7a:00:e2:76","resourceName":"LibvirtComputingResource","gatewayIpAddress":"192.168.100.254","msHostList":"192.168.100.1","wait":0}},{"com.cloud.agent.api.StartupStorageCommand":{"totalSize":0,"poolInfo":{"uuid":"43a0fdb9-9faf-406b-ba0b-388aa1bbc7d4","host":"192.168.100.50","localPath":"/var/lib/libvirt/images","hostPath":"/var/lib/libvirt/images","poolType":"Filesystem","capacityBytes":53660876800,"availableBytes":51866509312},"resourceType":"STORAGE_POOL","hostDetails":{},"type":"Storage","dataCenter":"1","pod":"1","guid":"3960b86e-3c62-3d60-a52f-dc64a9e09619-LibvirtComputingResource","name":"host01","id":4,"version":"4.11.3.0","resourceName":"LibvirtComputingResource","msHostList":"192.168.100.1","wait":0}}]
 }
   2019-07-25 16:03:40,354 DEBUG [cloud.agent.Agent] (Agent-Handler-1:null) 
(logid:) Startup task created
   ==============here can't connect to mgmt=======
   2019-07-25 16:03:45,291 INFO  [cloud.agent.Agent] (Agent Timer:null) 
(logid:) Connected to the host: 192.168.100.1
   
   2019-07-25 16:09:40,354 INFO  [cloud.agent.Agent] (Agent Timer:null) 
(logid:) The startup command is now cancelled
   ================
   2019-07-25 16:09:40,355 DEBUG [cloud.agent.Agent] (Agent Timer:null) 
(logid:) Clearing watch list: 0
   2019-07-25 16:09:40,355 INFO  [cloud.agent.Agent] (Agent Timer:null) 
(logid:) Lost connection to host: 192.168.100.1. Attempting reconnection while 
we still have 0 commands in progress.
   2019-07-25 16:09:40,355 DEBUG [utils.nio.NioConnection] 
(Agent-NioConnectionHandler-1:null) (logid:) Closing socket 
Socket[addr=/192.168.100.1,port=8250,localport=54460]
   2019-07-25 16:09:40,357 INFO  [utils.nio.NioClient] (Agent Timer:null) 
(logid:) NioClient connection closed
   
   
   ```
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to