答复: Agent LB for CloudStack failed
In order to facilitate tracking, I created a new issue. https://github.com/apache/cloudstack/issues/3505 Welcome to update, thank you 发件人: li jerry 发送时间: Friday, July 19, 2019 8:25:27 PM 收件人: dev@cloudstack.apache.org ; us...@cloudstack.apache.org 主题: 答复: Agent LB for CloudStack failed Thank you, look forward to your reply. 发送自 Windows 10 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>应用 发件人: Nicolas Vazquez 发送时间: Friday, July 19, 2019 8:16:40 PM 收件人: us...@cloudstack.apache.org ; dev@cloudstack.apache.org 主题: Re: Agent LB for CloudStack failed Ok, I'll try replicating and get back to you. Regards, Nicolas Vazquez From: li jerry Sent: Thursday, July 18, 2019 4:41 AM To: us...@cloudstack.apache.org ; dev@cloudstack.apache.org Subject: 答复: Agent LB for CloudStack failed I added host.lb.check.interval = 0 to all agent.properties and restarted the cloudstack-agent The following is the connection status of the agent after reboot. mysql> select host.id ,host.name,host.mgmt_server_id,host.status,mshost.name from host,mshost where host.mgmt_server_id=mshost.msid; +++++--+ | id | name | mgmt_server_id | status | name | +++++--+ | 1 | test-ceph-node01.cs2cloud.internal | 2200502468634 | Up | acs-mn01 | | 3 | s-8-VM | 2200502468634 | Up | acs-mn01 | | 5 | test-ceph-node03.cs2cloud.internal | 2200502468634 | Up | acs-mn01 | | 2 | v-9-VM | 2199950196764 | Up | acs-mn02 | | 4 | test-ceph-node02.cs2cloud.internal | 2199950196764 | Up | acs-mn02 | | 6 | test-ceph-node04.cs2cloud.internal | 2199950196764 | Up | acs-mn02 | +++++--+ 6 rows in set (0.00 sec) 2019-07-18 15:10 Forced power off to close acs-mn02 wait After the 15th minute (2019-07-18 15:26:23), the agent found that the management node failed and began to switch. So, add host.lb.check.interval=0 to agent. properties doesn't solve the problem. Below is the log 2019-07-18 15:26:23,414 DEBUG [utils.nio.NioConnection] (Agent-NioConnectionHandler-1:null) (logid:) Location 1: Socket Socket[addr=/172.17.1.142,port=8250,localport=33190] closed on read. Probably -1 returned: No route to host 2019-07-18 15:26:23,416 DEBUG [utils.nio.NioConnection] (Agent-NioConnectionHandler-1:null) (logid:) Closing socket Socket[addr=/172.17.1.142,port=8250,localport=33190] 2019-07-18 15:26:23,417 DEBUG [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Clearing watch list: 2 2019-07-18 15:26:23,417 INFO [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Lost connection to host: 172.17.1.142. Attempting reconnection while we still have 0 commands in progress. 2019-07-18 15:26:23,420 INFO [utils.nio.NioClient] (Agent-Handler-2:null) (logid:) NioClient connection closed 2019-07-18 15:26:23,420 INFO [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Reconnecting to host:172.17.1.142 2019-07-18 15:26:23,420 INFO [utils.nio.NioClient] (Agent-Handler-2:null) (logid:) Connecting to 172.17.1.142:8250 2019-07-18 15:26:26,427 ERROR [utils.nio.NioConnection] (Agent-Handler-2:null) (logid:) Unable to initialize the threads. java.net.NoRouteToHostException: No route to host at sun.nio.ch.Net.connect0(Native Method) at sun.nio.ch.Net.connect(Net.java:454) at sun.nio.ch.Net.connect(Net.java:446) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) at com.cloud.utils.nio.NioClient.init(NioClient.java:56) at com.cloud.utils.nio.NioConnection.start(NioConnection.java:95) at com.cloud.agent.Agent.reconnect(Agent.java:517) at com.cloud.agent.Agent$ServerHandler.doTask(Agent.java:1091) at com.cloud.utils.nio.Task.call(Task.java:83) at com.cloud.utils.nio.Task.call(Task.java:29) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-07-18 15:26:26,432 INFO [utils.exception.CSExceptionErrorCode] (Agent-Handler-2:null) (logid:) Could not find exception: com.cloud.utils.exception.NioConnectionException in error code list for exceptions 2019-07-18 15:26:26,432 WARN [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) NIO Connection Exception com.cloud.utils.exception.NioConnectionException: No route to host 2019-07-18 15:26:26,432 INFO [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Attempted to connect to the server, but rec
答复: Agent LB for CloudStack failed
Thank you, look forward to your reply. 发送自 Windows 10 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>应用 发件人: Nicolas Vazquez 发送时间: Friday, July 19, 2019 8:16:40 PM 收件人: us...@cloudstack.apache.org ; dev@cloudstack.apache.org 主题: Re: Agent LB for CloudStack failed Ok, I'll try replicating and get back to you. Regards, Nicolas Vazquez From: li jerry Sent: Thursday, July 18, 2019 4:41 AM To: us...@cloudstack.apache.org ; dev@cloudstack.apache.org Subject: 答复: Agent LB for CloudStack failed I added host.lb.check.interval = 0 to all agent.properties and restarted the cloudstack-agent The following is the connection status of the agent after reboot. mysql> select host.id ,host.name,host.mgmt_server_id,host.status,mshost.name from host,mshost where host.mgmt_server_id=mshost.msid; +++++--+ | id | name | mgmt_server_id | status | name | +++++--+ | 1 | test-ceph-node01.cs2cloud.internal | 2200502468634 | Up | acs-mn01 | | 3 | s-8-VM | 2200502468634 | Up | acs-mn01 | | 5 | test-ceph-node03.cs2cloud.internal | 2200502468634 | Up | acs-mn01 | | 2 | v-9-VM | 2199950196764 | Up | acs-mn02 | | 4 | test-ceph-node02.cs2cloud.internal | 2199950196764 | Up | acs-mn02 | | 6 | test-ceph-node04.cs2cloud.internal | 2199950196764 | Up | acs-mn02 | +++++--+ 6 rows in set (0.00 sec) 2019-07-18 15:10 Forced power off to close acs-mn02 wait After the 15th minute (2019-07-18 15:26:23), the agent found that the management node failed and began to switch. So, add host.lb.check.interval=0 to agent. properties doesn't solve the problem. Below is the log 2019-07-18 15:26:23,414 DEBUG [utils.nio.NioConnection] (Agent-NioConnectionHandler-1:null) (logid:) Location 1: Socket Socket[addr=/172.17.1.142,port=8250,localport=33190] closed on read. Probably -1 returned: No route to host 2019-07-18 15:26:23,416 DEBUG [utils.nio.NioConnection] (Agent-NioConnectionHandler-1:null) (logid:) Closing socket Socket[addr=/172.17.1.142,port=8250,localport=33190] 2019-07-18 15:26:23,417 DEBUG [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Clearing watch list: 2 2019-07-18 15:26:23,417 INFO [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Lost connection to host: 172.17.1.142. Attempting reconnection while we still have 0 commands in progress. 2019-07-18 15:26:23,420 INFO [utils.nio.NioClient] (Agent-Handler-2:null) (logid:) NioClient connection closed 2019-07-18 15:26:23,420 INFO [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Reconnecting to host:172.17.1.142 2019-07-18 15:26:23,420 INFO [utils.nio.NioClient] (Agent-Handler-2:null) (logid:) Connecting to 172.17.1.142:8250 2019-07-18 15:26:26,427 ERROR [utils.nio.NioConnection] (Agent-Handler-2:null) (logid:) Unable to initialize the threads. java.net.NoRouteToHostException: No route to host at sun.nio.ch.Net.connect0(Native Method) at sun.nio.ch.Net.connect(Net.java:454) at sun.nio.ch.Net.connect(Net.java:446) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) at com.cloud.utils.nio.NioClient.init(NioClient.java:56) at com.cloud.utils.nio.NioConnection.start(NioConnection.java:95) at com.cloud.agent.Agent.reconnect(Agent.java:517) at com.cloud.agent.Agent$ServerHandler.doTask(Agent.java:1091) at com.cloud.utils.nio.Task.call(Task.java:83) at com.cloud.utils.nio.Task.call(Task.java:29) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-07-18 15:26:26,432 INFO [utils.exception.CSExceptionErrorCode] (Agent-Handler-2:null) (logid:) Could not find exception: com.cloud.utils.exception.NioConnectionException in error code list for exceptions 2019-07-18 15:26:26,432 WARN [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) NIO Connection Exception com.cloud.utils.exception.NioConnectionException: No route to host 2019-07-18 15:26:26,432 INFO [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Attempted to connect to the server, but received an unexpected exception, trying again... 2019-07-18 15:26:26,432 INFO [utils.nio.NioClient] (Agent-Handler-2:null) (logid:) NioClient connection closed 2019-07-18 15:26:31,433 INFO [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Reconnecting to host:172.17.1.141 2019-07-18 15:26:31,434 INFO [utils.nio.NioClient] (Agent-Hand
Re: Agent LB for CloudStack failed
Ok, I'll try replicating and get back to you. Regards, Nicolas Vazquez From: li jerry Sent: Thursday, July 18, 2019 4:41 AM To: us...@cloudstack.apache.org ; dev@cloudstack.apache.org Subject: 答复: Agent LB for CloudStack failed I added host.lb.check.interval = 0 to all agent.properties and restarted the cloudstack-agent The following is the connection status of the agent after reboot. mysql> select host.id ,host.name,host.mgmt_server_id,host.status,mshost.name from host,mshost where host.mgmt_server_id=mshost.msid; +++++--+ | id | name | mgmt_server_id | status | name | +++++--+ | 1 | test-ceph-node01.cs2cloud.internal | 2200502468634 | Up | acs-mn01 | | 3 | s-8-VM | 2200502468634 | Up | acs-mn01 | | 5 | test-ceph-node03.cs2cloud.internal | 2200502468634 | Up | acs-mn01 | | 2 | v-9-VM | 2199950196764 | Up | acs-mn02 | | 4 | test-ceph-node02.cs2cloud.internal | 2199950196764 | Up | acs-mn02 | | 6 | test-ceph-node04.cs2cloud.internal | 2199950196764 | Up | acs-mn02 | +++++--+ 6 rows in set (0.00 sec) 2019-07-18 15:10 Forced power off to close acs-mn02 wait After the 15th minute (2019-07-18 15:26:23), the agent found that the management node failed and began to switch. So, add host.lb.check.interval=0 to agent. properties doesn't solve the problem. Below is the log 2019-07-18 15:26:23,414 DEBUG [utils.nio.NioConnection] (Agent-NioConnectionHandler-1:null) (logid:) Location 1: Socket Socket[addr=/172.17.1.142,port=8250,localport=33190] closed on read. Probably -1 returned: No route to host 2019-07-18 15:26:23,416 DEBUG [utils.nio.NioConnection] (Agent-NioConnectionHandler-1:null) (logid:) Closing socket Socket[addr=/172.17.1.142,port=8250,localport=33190] 2019-07-18 15:26:23,417 DEBUG [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Clearing watch list: 2 2019-07-18 15:26:23,417 INFO [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Lost connection to host: 172.17.1.142. Attempting reconnection while we still have 0 commands in progress. 2019-07-18 15:26:23,420 INFO [utils.nio.NioClient] (Agent-Handler-2:null) (logid:) NioClient connection closed 2019-07-18 15:26:23,420 INFO [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Reconnecting to host:172.17.1.142 2019-07-18 15:26:23,420 INFO [utils.nio.NioClient] (Agent-Handler-2:null) (logid:) Connecting to 172.17.1.142:8250 2019-07-18 15:26:26,427 ERROR [utils.nio.NioConnection] (Agent-Handler-2:null) (logid:) Unable to initialize the threads. java.net.NoRouteToHostException: No route to host at sun.nio.ch.Net.connect0(Native Method) at sun.nio.ch.Net.connect(Net.java:454) at sun.nio.ch.Net.connect(Net.java:446) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) at com.cloud.utils.nio.NioClient.init(NioClient.java:56) at com.cloud.utils.nio.NioConnection.start(NioConnection.java:95) at com.cloud.agent.Agent.reconnect(Agent.java:517) at com.cloud.agent.Agent$ServerHandler.doTask(Agent.java:1091) at com.cloud.utils.nio.Task.call(Task.java:83) at com.cloud.utils.nio.Task.call(Task.java:29) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-07-18 15:26:26,432 INFO [utils.exception.CSExceptionErrorCode] (Agent-Handler-2:null) (logid:) Could not find exception: com.cloud.utils.exception.NioConnectionException in error code list for exceptions 2019-07-18 15:26:26,432 WARN [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) NIO Connection Exception com.cloud.utils.exception.NioConnectionException: No route to host 2019-07-18 15:26:26,432 INFO [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Attempted to connect to the server, but received an unexpected exception, trying again... 2019-07-18 15:26:26,432 INFO [utils.nio.NioClient] (Agent-Handler-2:null) (logid:) NioClient connection closed 2019-07-18 15:26:31,433 INFO [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Reconnecting to host:172.17.1.141 2019-07-18 15:26:31,434 INFO [utils.nio.NioClient] (Agent-Handler-2:null) (logid:) Connecting to 172.17.1.141:8250 2019-07-18 15:26:31,435 INFO [utils.nio.Link] (Agent-Handler-2:null) (logid:) Conf file found: /etc/cloudstack/agent/agent.properties 2019-07-18 15:26:31,545 INFO [utils.nio.NioClient] (Agent-Handler-2:null) (logid:) SSL: Handshake done 2019-07-18 15:2
答复: Agent LB for CloudStack failed
Vazquez<mailto:nicolas.vazq...@shapeblue.com> 发送时间: 2019年7月18日 12:48 收件人: dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org>; us...@cloudstack.apache.org<mailto:us...@cloudstack.apache.org> 主题: Re: Agent LB for CloudStack failed Thanks, I suspect the culprit is the background task trying to reconnect to the preferred host (which runs every 60 seconds). I would suggest disabling the background task by setting the interval to 0. As you do not want to change your 'host' global configuration to propagate a new list to the agents, you should do it this way: - Add this line to agent.properties: host.lb.check.interval=0 - Restart the agent Please let me know if this fixes your issue. Regards, Nicolas Vazquez From: li jerry Sent: Thursday, July 18, 2019 12:00 AM To: dev@cloudstack.apache.org ; us...@cloudstack.apache.org Subject: 答复: Agent LB for CloudStack failed Hi Nicolas test-ceph-node01 [root@test-ceph-node01 ~]# cat /etc/cloudstack/agent/agent.properties #Storage #Wed Jul 17 10:39:18 CST 2019 workers=5 guest.network.device=br0 private.network.device=br0 port=8250 resource=com.cloud.hypervisor.kvm.resource.LibvirtComputingResource pod=1 zone=1 hypervisor.type=kvm guid=88ca642a-e319-3369-b2c9-39c2b2bddc7c public.network.device=br0 cluster=1 local.storage.uuid=ec28176f-a3db-4383-90c8-6dcdbc45c3e0 keystore.passphrase=O8VdcZqBwWMMxwk2 domr.scripts.dir=scripts/network/domr/kvm LibvirtComputingResource.id=1 host=172.17.1.141,172.17.1.142@roundrobin this is test-ceph-node02 [root@test-ceph-node02 ~]# cat /etc/cloudstack/agent/agent.properties #Storage #Wed Jul 17 10:58:23 CST 2019 guest.network.device=br0 workers=5 private.network.device=br0 port=8250 resource=com.cloud.hypervisor.kvm.resource.LibvirtComputingResource pod=1 zone=1 guid=649cbe62-dcac-36ae-a62c-699f0e0b8af1 hypervisor.type=kvm cluster=1 public.network.device=br0 local.storage.uuid=2fc2f796-0614-40cf-bfdf-37a9429520fb domr.scripts.dir=scripts/network/domr/kvm keystore.passphrase=vB48rgCk58vNJC6N host=172.17.1.142,172.17.1.141@roundrobin LibvirtComputingResource.id=4 test-ceph-node03 [root@test-ceph-node03 ~]# cat /etc/cloudstack/agent/agent.properties #Storage #Wed Jul 17 10:39:18 CST 2019 guest.network.device=br0 workers=5 private.network.device=br0 port=8250 resource=com.cloud.hypervisor.kvm.resource.LibvirtComputingResource pod=1 zone=1 hypervisor.type=kvm guid=4d3742c4-8678-3f21-a841-c1ffa32d0a8d public.network.device=br0 cluster=1 local.storage.uuid=31ee15cf-b3b2-4387-b081-7c47971b9e68 keystore.passphrase=ACgs24DnBgYkORvh domr.scripts.dir=scripts/network/domr/kvm LibvirtComputingResource.id=5 host=172.17.1.141,172.17.1.142@roundrobin test-ceph-node04 [root@test-ceph-node04 ~]# cat /etc/cloudstack/agent/agent.properties #Storage #Wed Jul 17 10:58:22 CST 2019 guest.network.device=br0 workers=5 private.network.device=br0 port=8250 resource=com.cloud.hypervisor.kvm.resource.LibvirtComputingResource pod=1 zone=1 hypervisor.type=kvm guid=bfd4b7ba-fd5f-365d-b4d8-a6e8e7c78c0c public.network.device=br0 cluster=1 local.storage.uuid=2d5004ff-37b1-4f66-bff0-e71ac211f1da keystore.passphrase=r3D4upcAOdWbwE9p domr.scripts.dir=scripts/network/domr/kvm LibvirtComputingResource.id=6 host=172.17.1.142,172.17.1.141@roundrobin 发件人: Nicolas Vazquez<mailto:nicolas.vazq...@shapeblue.com> 发送时间: 2019年7月18日 10:56 收件人: us...@cloudstack.apache.org<mailto:us...@cloudstack.apache.org>; dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org> 主题: Re: Agent LB for CloudStack failed Hi Jerry, I'll request some additional information. Can you provide me with the value stored on agent.properties for 'host' property on each KVM host? I suspect that the global setting has not been propagated to the agents, as it is trying to reconnect instead of connecting to the next management server once it is down. Regards, Nicolas Vazquez From: li jerry Sent: Monday, July 15, 2019 10:20 PM To: us...@cloudstack.apache.org ; dev@cloudstack.apache.org Subject: Agent LB for CloudStack failed Hello everyone My kvm Agent LB on 4.11.2/4.11.3 failed. When the preferred managment node is forced to power off, the agent will not immediately connect to the second management node.After 15 minutes, the agent issues a "No route to host" error and connects to the second management node. management node: acs-mn01,172.17.1.141 acs-mn02,172.17.1.142 mysql db node: acs-db01 kvmm agent node: test-ceph-node01 test-ceph-node02 test-ceph-node03 test-ceph-node04 global seting host=172.17.1.142,172.17.1.141 indirect.agent.lb.algorithm=roundrobin indirect.agent.lb.check.interval=60 Partial agnet logs: 2019-07-15 23:22:39,340 DEBUG [cloud.agent.Agent] (UgentTask-5:null) (logid:) Sending ping: Seq 1-19: { Cmd , MgmtId: -1, via: 1, Ver : v1, Flags: 11, [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupStates"
Re: Agent LB for CloudStack failed
Thanks, I suspect the culprit is the background task trying to reconnect to the preferred host (which runs every 60 seconds). I would suggest disabling the background task by setting the interval to 0. As you do not want to change your 'host' global configuration to propagate a new list to the agents, you should do it this way: - Add this line to agent.properties: host.lb.check.interval=0 - Restart the agent Please let me know if this fixes your issue. Regards, Nicolas Vazquez From: li jerry Sent: Thursday, July 18, 2019 12:00 AM To: dev@cloudstack.apache.org ; us...@cloudstack.apache.org Subject: 答复: Agent LB for CloudStack failed Hi Nicolas test-ceph-node01 [root@test-ceph-node01 ~]# cat /etc/cloudstack/agent/agent.properties #Storage #Wed Jul 17 10:39:18 CST 2019 workers=5 guest.network.device=br0 private.network.device=br0 port=8250 resource=com.cloud.hypervisor.kvm.resource.LibvirtComputingResource pod=1 zone=1 hypervisor.type=kvm guid=88ca642a-e319-3369-b2c9-39c2b2bddc7c public.network.device=br0 cluster=1 local.storage.uuid=ec28176f-a3db-4383-90c8-6dcdbc45c3e0 keystore.passphrase=O8VdcZqBwWMMxwk2 domr.scripts.dir=scripts/network/domr/kvm LibvirtComputingResource.id=1 host=172.17.1.141,172.17.1.142@roundrobin this is test-ceph-node02 [root@test-ceph-node02 ~]# cat /etc/cloudstack/agent/agent.properties #Storage #Wed Jul 17 10:58:23 CST 2019 guest.network.device=br0 workers=5 private.network.device=br0 port=8250 resource=com.cloud.hypervisor.kvm.resource.LibvirtComputingResource pod=1 zone=1 guid=649cbe62-dcac-36ae-a62c-699f0e0b8af1 hypervisor.type=kvm cluster=1 public.network.device=br0 local.storage.uuid=2fc2f796-0614-40cf-bfdf-37a9429520fb domr.scripts.dir=scripts/network/domr/kvm keystore.passphrase=vB48rgCk58vNJC6N host=172.17.1.142,172.17.1.141@roundrobin LibvirtComputingResource.id=4 test-ceph-node03 [root@test-ceph-node03 ~]# cat /etc/cloudstack/agent/agent.properties #Storage #Wed Jul 17 10:39:18 CST 2019 guest.network.device=br0 workers=5 private.network.device=br0 port=8250 resource=com.cloud.hypervisor.kvm.resource.LibvirtComputingResource pod=1 zone=1 hypervisor.type=kvm guid=4d3742c4-8678-3f21-a841-c1ffa32d0a8d public.network.device=br0 cluster=1 local.storage.uuid=31ee15cf-b3b2-4387-b081-7c47971b9e68 keystore.passphrase=ACgs24DnBgYkORvh domr.scripts.dir=scripts/network/domr/kvm LibvirtComputingResource.id=5 host=172.17.1.141,172.17.1.142@roundrobin test-ceph-node04 [root@test-ceph-node04 ~]# cat /etc/cloudstack/agent/agent.properties #Storage #Wed Jul 17 10:58:22 CST 2019 guest.network.device=br0 workers=5 private.network.device=br0 port=8250 resource=com.cloud.hypervisor.kvm.resource.LibvirtComputingResource pod=1 zone=1 hypervisor.type=kvm guid=bfd4b7ba-fd5f-365d-b4d8-a6e8e7c78c0c public.network.device=br0 cluster=1 local.storage.uuid=2d5004ff-37b1-4f66-bff0-e71ac211f1da keystore.passphrase=r3D4upcAOdWbwE9p domr.scripts.dir=scripts/network/domr/kvm LibvirtComputingResource.id=6 host=172.17.1.142,172.17.1.141@roundrobin 发件人: Nicolas Vazquez<mailto:nicolas.vazq...@shapeblue.com> 发送时间: 2019年7月18日 10:56 收件人: us...@cloudstack.apache.org<mailto:us...@cloudstack.apache.org>; dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org> 主题: Re: Agent LB for CloudStack failed Hi Jerry, I'll request some additional information. Can you provide me with the value stored on agent.properties for 'host' property on each KVM host? I suspect that the global setting has not been propagated to the agents, as it is trying to reconnect instead of connecting to the next management server once it is down. Regards, Nicolas Vazquez From: li jerry Sent: Monday, July 15, 2019 10:20 PM To: us...@cloudstack.apache.org ; dev@cloudstack.apache.org Subject: Agent LB for CloudStack failed Hello everyone My kvm Agent LB on 4.11.2/4.11.3 failed. When the preferred managment node is forced to power off, the agent will not immediately connect to the second management node.After 15 minutes, the agent issues a "No route to host" error and connects to the second management node. management node: acs-mn01,172.17.1.141 acs-mn02,172.17.1.142 mysql db node: acs-db01 kvmm agent node: test-ceph-node01 test-ceph-node02 test-ceph-node03 test-ceph-node04 global seting host=172.17.1.142,172.17.1.141 indirect.agent.lb.algorithm=roundrobin indirect.agent.lb.check.interval=60 Partial agnet logs: 2019-07-15 23:22:39,340 DEBUG [cloud.agent.Agent] (UgentTask-5:null) (logid:) Sending ping: Seq 1-19: { Cmd , MgmtId: -1, via: 1, Ver : v1, Flags: 11, [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupStates":{},"_hostVmStateReport":{},"_gatewayAccessible":true,"_vnetAccessible":true,"hostType ":"Routing","hostId":1,"wait":0}}] } 2019-07-15 23:23:09,960 DEBUG [utils.nio.N
答复: Agent LB for CloudStack failed
Hi Nicolas test-ceph-node01 [root@test-ceph-node01 ~]# cat /etc/cloudstack/agent/agent.properties #Storage #Wed Jul 17 10:39:18 CST 2019 workers=5 guest.network.device=br0 private.network.device=br0 port=8250 resource=com.cloud.hypervisor.kvm.resource.LibvirtComputingResource pod=1 zone=1 hypervisor.type=kvm guid=88ca642a-e319-3369-b2c9-39c2b2bddc7c public.network.device=br0 cluster=1 local.storage.uuid=ec28176f-a3db-4383-90c8-6dcdbc45c3e0 keystore.passphrase=O8VdcZqBwWMMxwk2 domr.scripts.dir=scripts/network/domr/kvm LibvirtComputingResource.id=1 host=172.17.1.141,172.17.1.142@roundrobin this is test-ceph-node02 [root@test-ceph-node02 ~]# cat /etc/cloudstack/agent/agent.properties #Storage #Wed Jul 17 10:58:23 CST 2019 guest.network.device=br0 workers=5 private.network.device=br0 port=8250 resource=com.cloud.hypervisor.kvm.resource.LibvirtComputingResource pod=1 zone=1 guid=649cbe62-dcac-36ae-a62c-699f0e0b8af1 hypervisor.type=kvm cluster=1 public.network.device=br0 local.storage.uuid=2fc2f796-0614-40cf-bfdf-37a9429520fb domr.scripts.dir=scripts/network/domr/kvm keystore.passphrase=vB48rgCk58vNJC6N host=172.17.1.142,172.17.1.141@roundrobin LibvirtComputingResource.id=4 test-ceph-node03 [root@test-ceph-node03 ~]# cat /etc/cloudstack/agent/agent.properties #Storage #Wed Jul 17 10:39:18 CST 2019 guest.network.device=br0 workers=5 private.network.device=br0 port=8250 resource=com.cloud.hypervisor.kvm.resource.LibvirtComputingResource pod=1 zone=1 hypervisor.type=kvm guid=4d3742c4-8678-3f21-a841-c1ffa32d0a8d public.network.device=br0 cluster=1 local.storage.uuid=31ee15cf-b3b2-4387-b081-7c47971b9e68 keystore.passphrase=ACgs24DnBgYkORvh domr.scripts.dir=scripts/network/domr/kvm LibvirtComputingResource.id=5 host=172.17.1.141,172.17.1.142@roundrobin test-ceph-node04 [root@test-ceph-node04 ~]# cat /etc/cloudstack/agent/agent.properties #Storage #Wed Jul 17 10:58:22 CST 2019 guest.network.device=br0 workers=5 private.network.device=br0 port=8250 resource=com.cloud.hypervisor.kvm.resource.LibvirtComputingResource pod=1 zone=1 hypervisor.type=kvm guid=bfd4b7ba-fd5f-365d-b4d8-a6e8e7c78c0c public.network.device=br0 cluster=1 local.storage.uuid=2d5004ff-37b1-4f66-bff0-e71ac211f1da keystore.passphrase=r3D4upcAOdWbwE9p domr.scripts.dir=scripts/network/domr/kvm LibvirtComputingResource.id=6 host=172.17.1.142,172.17.1.141@roundrobin 发件人: Nicolas Vazquez<mailto:nicolas.vazq...@shapeblue.com> 发送时间: 2019年7月18日 10:56 收件人: us...@cloudstack.apache.org<mailto:us...@cloudstack.apache.org>; dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org> 主题: Re: Agent LB for CloudStack failed Hi Jerry, I'll request some additional information. Can you provide me with the value stored on agent.properties for 'host' property on each KVM host? I suspect that the global setting has not been propagated to the agents, as it is trying to reconnect instead of connecting to the next management server once it is down. Regards, Nicolas Vazquez From: li jerry Sent: Monday, July 15, 2019 10:20 PM To: us...@cloudstack.apache.org ; dev@cloudstack.apache.org Subject: Agent LB for CloudStack failed Hello everyone My kvm Agent LB on 4.11.2/4.11.3 failed. When the preferred managment node is forced to power off, the agent will not immediately connect to the second management node.After 15 minutes, the agent issues a "No route to host" error and connects to the second management node. management node: acs-mn01,172.17.1.141 acs-mn02,172.17.1.142 mysql db node: acs-db01 kvmm agent node: test-ceph-node01 test-ceph-node02 test-ceph-node03 test-ceph-node04 global seting host=172.17.1.142,172.17.1.141 indirect.agent.lb.algorithm=roundrobin indirect.agent.lb.check.interval=60 Partial agnet logs: 2019-07-15 23:22:39,340 DEBUG [cloud.agent.Agent] (UgentTask-5:null) (logid:) Sending ping: Seq 1-19: { Cmd , MgmtId: -1, via: 1, Ver : v1, Flags: 11, [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupStates":{},"_hostVmStateReport":{},"_gatewayAccessible":true,"_vnetAccessible":true,"hostType ":"Routing","hostId":1,"wait":0}}] } 2019-07-15 23:23:09,960 DEBUG [utils.nio.NioConnection] (Agent-NioConnectionHandler-1:null) (logid:) Location 1: Socket Socket[addr=/172.17.1.142,port=8250,localport= 34854] closed on read. Probably -1 returned: No route to host 2019-07-15 23:23:09,960 DEBUG [utils.nio.NioConnection] (Agent-NioConnectionHandler-1:null) (logid:) Closing socket Socket[addr=/172.17.1.142,port=8250,localport=34854] 2019-07-15 23:23:09,961 DEBUG [cloud.agent.Agent] (Agent-Handler-4:null) (logid:a4e4de49) Clearing watch list: 2 2019-07-15 23:23:09,962 INFO [cloud.agent.Agent] (Agent-Handler-4:null) (logid:a4e4de49) Lost connection to host: 172.17.1.142. Attempting reconnection while we still have 0 commands in Progress. 2019-07-15 23:2
Re: Agent LB for CloudStack failed
Hi Jerry, I'll request some additional information. Can you provide me with the value stored on agent.properties for 'host' property on each KVM host? I suspect that the global setting has not been propagated to the agents, as it is trying to reconnect instead of connecting to the next management server once it is down. Regards, Nicolas Vazquez From: li jerry Sent: Monday, July 15, 2019 10:20 PM To: us...@cloudstack.apache.org ; dev@cloudstack.apache.org Subject: Agent LB for CloudStack failed Hello everyone My kvm Agent LB on 4.11.2/4.11.3 failed. When the preferred managment node is forced to power off, the agent will not immediately connect to the second management node.After 15 minutes, the agent issues a "No route to host" error and connects to the second management node. management node: acs-mn01,172.17.1.141 acs-mn02,172.17.1.142 mysql db node: acs-db01 kvmm agent node: test-ceph-node01 test-ceph-node02 test-ceph-node03 test-ceph-node04 global seting host=172.17.1.142,172.17.1.141 indirect.agent.lb.algorithm=roundrobin indirect.agent.lb.check.interval=60 Partial agnet logs: 2019-07-15 23:22:39,340 DEBUG [cloud.agent.Agent] (UgentTask-5:null) (logid:) Sending ping: Seq 1-19: { Cmd , MgmtId: -1, via: 1, Ver : v1, Flags: 11, [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupStates":{},"_hostVmStateReport":{},"_gatewayAccessible":true,"_vnetAccessible":true,"hostType ":"Routing","hostId":1,"wait":0}}] } 2019-07-15 23:23:09,960 DEBUG [utils.nio.NioConnection] (Agent-NioConnectionHandler-1:null) (logid:) Location 1: Socket Socket[addr=/172.17.1.142,port=8250,localport= 34854] closed on read. Probably -1 returned: No route to host 2019-07-15 23:23:09,960 DEBUG [utils.nio.NioConnection] (Agent-NioConnectionHandler-1:null) (logid:) Closing socket Socket[addr=/172.17.1.142,port=8250,localport=34854] 2019-07-15 23:23:09,961 DEBUG [cloud.agent.Agent] (Agent-Handler-4:null) (logid:a4e4de49) Clearing watch list: 2 2019-07-15 23:23:09,962 INFO [cloud.agent.Agent] (Agent-Handler-4:null) (logid:a4e4de49) Lost connection to host: 172.17.1.142. Attempting reconnection while we still have 0 commands in Progress. 2019-07-15 23:23:09,963 INFO [utils.nio.NioClient] (Agent-Handler-4:null) (logid:a4e4de49) NioClient connection closed 2019-07-15 23:23:09,964 INFO [cloud.agent.Agent] (Agent-Handler-4:null) (logid:a4e4de49) Reconnecting to host:172.17.1.142 2019-07-15 23:23:09,964 INFO [utils.nio.NioClient] (Agent-Handler-4:null) (logid:a4e4de49) Connecting to 172.17.1.142:8250 2019-07-15 23:23:12,972 ERROR [utils.nio.NioConnection] (Agent-Handler-4:null) (logid:a4e4de49) Unable to initialize the threads. java.net.NoRouteToHostException: No route to host At sun.nio.ch.Net.connect0(Native Method) At sun.nio.ch.Net.connect(Net.java:454) At sun.nio.ch.Net.connect(Net.java:446) At sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) At com.cloud.utils.nio.NioClient.init(NioClient.java:56) At com.cloud.utils.nio.NioConnection.start(NioConnection.java:95) At com.cloud.agent.Agent.reconnect(Agent.java:517) At com.cloud.agent.Agent$ServerHandler.doTask(Agent.java:1091) At com.clo nicolas.vazq...@shapeblue.com www.shapeblue.com Amadeus House, Floral Street, London WC2E 9DPUK @shapeblue
Re: Agent LB for CloudStack failed
Nicolas, can you have a look and comment? Thanks. Regards, Rohit Yadav From: li jerry Sent: Tuesday, July 16, 2019 6:50:40 AM To: us...@cloudstack.apache.org ; dev@cloudstack.apache.org Subject: Agent LB for CloudStack failed Hello everyone My kvm Agent LB on 4.11.2/4.11.3 failed. When the preferred managment node is forced to power off, the agent will not immediately connect to the second management node.After 15 minutes, the agent issues a "No route to host" error and connects to the second management node. management node: acs-mn01,172.17.1.141 acs-mn02,172.17.1.142 mysql db node: acs-db01 kvmm agent node: test-ceph-node01 test-ceph-node02 test-ceph-node03 test-ceph-node04 global seting host=172.17.1.142,172.17.1.141 indirect.agent.lb.algorithm=roundrobin indirect.agent.lb.check.interval=60 Partial agnet logs: 2019-07-15 23:22:39,340 DEBUG [cloud.agent.Agent] (UgentTask-5:null) (logid:) Sending ping: Seq 1-19: { Cmd , MgmtId: -1, via: 1, Ver : v1, Flags: 11, [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupStates":{},"_hostVmStateReport":{},"_gatewayAccessible":true,"_vnetAccessible":true,"hostType ":"Routing","hostId":1,"wait":0}}] } 2019-07-15 23:23:09,960 DEBUG [utils.nio.NioConnection] (Agent-NioConnectionHandler-1:null) (logid:) Location 1: Socket Socket[addr=/172.17.1.142,port=8250,localport= 34854] closed on read. Probably -1 returned: No route to host 2019-07-15 23:23:09,960 DEBUG [utils.nio.NioConnection] (Agent-NioConnectionHandler-1:null) (logid:) Closing socket Socket[addr=/172.17.1.142,port=8250,localport=34854] 2019-07-15 23:23:09,961 DEBUG [cloud.agent.Agent] (Agent-Handler-4:null) (logid:a4e4de49) Clearing watch list: 2 2019-07-15 23:23:09,962 INFO [cloud.agent.Agent] (Agent-Handler-4:null) (logid:a4e4de49) Lost connection to host: 172.17.1.142. Attempting reconnection while we still have 0 commands in Progress. 2019-07-15 23:23:09,963 INFO [utils.nio.NioClient] (Agent-Handler-4:null) (logid:a4e4de49) NioClient connection closed 2019-07-15 23:23:09,964 INFO [cloud.agent.Agent] (Agent-Handler-4:null) (logid:a4e4de49) Reconnecting to host:172.17.1.142 2019-07-15 23:23:09,964 INFO [utils.nio.NioClient] (Agent-Handler-4:null) (logid:a4e4de49) Connecting to 172.17.1.142:8250 2019-07-15 23:23:12,972 ERROR [utils.nio.NioConnection] (Agent-Handler-4:null) (logid:a4e4de49) Unable to initialize the threads. java.net.NoRouteToHostException: No route to host At sun.nio.ch.Net.connect0(Native Method) At sun.nio.ch.Net.connect(Net.java:454) At sun.nio.ch.Net.connect(Net.java:446) At sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) At com.cloud.utils.nio.NioClient.init(NioClient.java:56) At com.cloud.utils.nio.NioConnection.start(NioConnection.java:95) At com.cloud.agent.Agent.reconnect(Agent.java:517) At com.cloud.agent.Agent$ServerHandler.doTask(Agent.java:1091) At com.clo rohit.ya...@shapeblue.com www.shapeblue.com Amadeus House, Floral Street, London WC2E 9DPUK @shapeblue
Agent LB for CloudStack failed
Hello everyone My kvm Agent LB on 4.11.2/4.11.3 failed. When the preferred managment node is forced to power off, the agent will not immediately connect to the second management node.After 15 minutes, the agent issues a "No route to host" error and connects to the second management node. management node: acs-mn01,172.17.1.141 acs-mn02,172.17.1.142 mysql db node: acs-db01 kvmm agent node: test-ceph-node01 test-ceph-node02 test-ceph-node03 test-ceph-node04 global seting host=172.17.1.142,172.17.1.141 indirect.agent.lb.algorithm=roundrobin indirect.agent.lb.check.interval=60 Partial agnet logs: 2019-07-15 23:22:39,340 DEBUG [cloud.agent.Agent] (UgentTask-5:null) (logid:) Sending ping: Seq 1-19: { Cmd , MgmtId: -1, via: 1, Ver : v1, Flags: 11, [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupStates":{},"_hostVmStateReport":{},"_gatewayAccessible":true,"_vnetAccessible":true,"hostType ":"Routing","hostId":1,"wait":0}}] } 2019-07-15 23:23:09,960 DEBUG [utils.nio.NioConnection] (Agent-NioConnectionHandler-1:null) (logid:) Location 1: Socket Socket[addr=/172.17.1.142,port=8250,localport= 34854] closed on read. Probably -1 returned: No route to host 2019-07-15 23:23:09,960 DEBUG [utils.nio.NioConnection] (Agent-NioConnectionHandler-1:null) (logid:) Closing socket Socket[addr=/172.17.1.142,port=8250,localport=34854] 2019-07-15 23:23:09,961 DEBUG [cloud.agent.Agent] (Agent-Handler-4:null) (logid:a4e4de49) Clearing watch list: 2 2019-07-15 23:23:09,962 INFO [cloud.agent.Agent] (Agent-Handler-4:null) (logid:a4e4de49) Lost connection to host: 172.17.1.142. Attempting reconnection while we still have 0 commands in Progress. 2019-07-15 23:23:09,963 INFO [utils.nio.NioClient] (Agent-Handler-4:null) (logid:a4e4de49) NioClient connection closed 2019-07-15 23:23:09,964 INFO [cloud.agent.Agent] (Agent-Handler-4:null) (logid:a4e4de49) Reconnecting to host:172.17.1.142 2019-07-15 23:23:09,964 INFO [utils.nio.NioClient] (Agent-Handler-4:null) (logid:a4e4de49) Connecting to 172.17.1.142:8250 2019-07-15 23:23:12,972 ERROR [utils.nio.NioConnection] (Agent-Handler-4:null) (logid:a4e4de49) Unable to initialize the threads. java.net.NoRouteToHostException: No route to host At sun.nio.ch.Net.connect0(Native Method) At sun.nio.ch.Net.connect(Net.java:454) At sun.nio.ch.Net.connect(Net.java:446) At sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) At com.cloud.utils.nio.NioClient.init(NioClient.java:56) At com.cloud.utils.nio.NioConnection.start(NioConnection.java:95) At com.cloud.agent.Agent.reconnect(Agent.java:517) At com.cloud.agent.Agent$ServerHandler.doTask(Agent.java:1091) At com.clo