[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhinandan Prateek updated CLOUDSTACK-4371:
-------------------------------------------

    Assignee: Koushik Das

> [Performance Testing] Basic zone with 20K Hosts, management server restart 
> leaves the hosts in disconnected state for very long time
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-4371
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-4371
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>          Components: Management Server
>    Affects Versions: 4.2.0
>         Environment: Basic zone, with over 20K simulator hosts
>            Reporter: Sowmya Krishnan
>            Assignee: Koushik Das
>              Labels: performance
>             Fix For: 4.3.0
>
>         Attachments: agenttaskpool_334.log, ms1_restartfail.log.gz, 
> ms2_restartfail.log.gz, ms3_restartfail.log.gz
>
>
> Basic zone performance test bed:
> 20K simulator hosts,
> 3 Management servers
> 1 host/cluster
> Local storage
> Java heap size: 12GB
> db.cloud.maxActive=2000
> direct.agent.load.size=1000
> agent.lb.enabled=true
> Deploy around 20K simulator hosts with 3 Management servers clustered
> (Not deployed any VMs yet)
> After all hosts are deployed, stop all 3 Management servers and then start 
> all 3 one after another
> Result
> =====
> Hosts don't get to connected state at all even after 10 minutes. While around 
> 2K of them go into alert state while rest are in disconnected state.
> mysql> select count(*), status, resource_state, type, mgmt_server_id from 
> host group by mgmt_server_id, status, type, resource_state;
> +----------+--------------+----------------+--------------------+----------------+
> | count(*) | status       | resource_state | type               | 
> mgmt_server_id |
> +----------+--------------+----------------+--------------------+----------------+
> |     1946 | Alert        | Enabled        | Routing            |           
> NULL |
> |    18054 | Disconnected | Enabled        | Routing            |           
> NULL |
> |        1 | Disconnected | Enabled        | SecondaryStorageVM |           
> NULL |
> +----------+--------------+----------------+--------------------+----------------+
> 3 rows in set (0.11 sec)
> MS Logs show lot of storage pool exceptions while hosts try to get connected:
> 2013-08-16 05:49:25,592 DEBUG [agent.transport.Request] 
> (AgentTaskPool-12:null) Seq 13-32440322: Sending  { Cmd , MgmtId: 
> 206915885094132, via: 13, Ver: v1, Flags: 100011, [{"com.cloud.agen
> t.api.CleanupNetworkRulesCmd":{"interval":2028,"wait":0}}] }
> 2013-08-16 05:49:25,592 DEBUG [agent.transport.Request] 
> (AgentTaskPool-12:null) Seq 13-32440322: Executing:  { Cmd , MgmtId: 
> 206915885094132, via: 13, Ver: v1, Flags: 100011, [{"com.cloud.a
> gent.api.CleanupNetworkRulesCmd":{"interval":2028,"wait":0}}] }
> 2013-08-16 05:49:25,592 DEBUG [xen.discoverer.XcpServerDiscoverer] 
> (AgentTaskPool-14:null) Not XenServer so moving on.
> 2013-08-16 05:49:25,592 DEBUG [agent.manager.AgentManagerImpl] 
> (AgentTaskPool-14:null) Sending Connect to listener: 
> DeploymentPlanningManagerImpl_EnhancerByCloudStack_76f3d8e4
> 2013-08-16 05:49:25,591 DEBUG [cloud.resource.AgentResourceBase] 
> (ClusteredAgentManager Timer:null) Deserializing simulated agent on reconnect
> 2013-08-16 05:49:25,594 INFO  [network.security.SecurityGroupListener] 
> (AgentTaskPool-12:null) Scheduled network rules cleanup, interval=2028
> 2013-08-16 05:49:25,594 INFO  [network.security.SecurityGroupListener] 
> (AgentTaskPool-12:null) Received a host startup notification
> 2013-08-16 05:49:25,595 DEBUG [agent.manager.AgentManagerImpl] 
> (AgentTaskPool-12:null) Sending Connect to listener: StoragePoolMonitor
> ...
> ...
> 2013-08-16 05:49:25,761 DEBUG [agent.manager.AgentManagerImpl] 
> (AgentTaskPool-12:null) Sending Connect to listener: 
> ClusteredVirtualMachineManagerImpl_EnhancerByCloudStack_b5459b7b
> 2013-08-16 05:49:25,764 DEBUG [cloud.vm.VirtualMachineManagerImpl] 
> (AgentTaskPool-12:null) Found 0 VMs for host 13
> 2013-08-16 05:49:25,765 DEBUG [agent.manager.AgentManagerImpl] 
> (AgentTaskPool-12:null) Sending Connect to listener: LocalStoragePoolListener
> 2013-08-16 05:49:25,768 DEBUG 
> [datastore.lifecycle.CloudStackPrimaryDataStoreLifeCycleImpl] 
> (AgentTaskPool-12:null) createPool Params @ scheme - Filesystem storageHost - 
> 172.1.3.131 hostPath - /mnt/2a2463b4-4fd2-4ac7-ad3f-040a3046e478 port - -1
> 2013-08-16 05:49:25,771 DEBUG 
> [datastore.lifecycle.CloudStackPrimaryDataStoreLifeCycleImpl] 
> (AgentTaskPool-12:null) Another active pool with the same uuid already exists
> 2013-08-16 05:49:25,772 WARN  [cloud.storage.StorageManagerImpl] 
> (AgentTaskPool-12:null) Unable to setup the local storage pool for 
> Host[-13-Routing]
> com.cloud.utils.exception.CloudRuntimeException: Another active pool with the 
> same uuid already exists
>         at 
> org.apache.cloudstack.storage.datastore.lifecycle.CloudStackPrimaryDataStoreLifeCycleImpl.initialize(CloudStackPrimaryDataStoreLifeCycleImpl.java:319)
>         at 
> com.cloud.storage.StorageManagerImpl.createLocalStorage(StorageManagerImpl.java:647)
>         at 
> com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125)
>         at 
> com.cloud.storage.LocalStoragePoolListener.processConnect(LocalStoragePoolListener.java:86)
>         at 
> com.cloud.agent.manager.AgentManagerImpl.notifyMonitorsOfConnection(AgentManagerImpl.java:587)
>         at 
> com.cloud.agent.manager.AgentManagerImpl.handleDirectConnectAgent(AgentManagerImpl.java:1479)
>         at 
> com.cloud.resource.ResourceManagerImpl.createHostAndAgent(ResourceManagerImpl.java:1739)
>         at 
> com.cloud.resource.ResourceManagerImpl.createHostAndAgent(ResourceManagerImpl.java:1901)
>         at 
> com.cloud.agent.manager.AgentManagerImpl$SimulateStartTask.run(AgentManagerImpl.java:1130)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:679)
> 2013-08-16 05:49:25,773 INFO  [utils.exception.CSExceptionErrorCode] 
> (AgentTaskPool-12:null) Could not find exception: 
> com.cloud.exception.ConnectionException in error code list for exceptions
> 2013-08-16 05:49:25,773 WARN  [agent.manager.AgentManagerImpl] 
> (AgentTaskPool-12:null) Monitor LocalStoragePoolListener says there is an 
> error in the connect process for 13 due to Unable to setup the local storage 
> pool for Host[-13-Routing]
> 2013-08-16 05:49:25,773 INFO  [agent.manager.AgentManagerImpl] 
> (AgentTaskPool-12:null) Host 13 is disconnecting with event AgentDisconnected



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to