[ https://issues.apache.org/jira/browse/CLOUDSTACK-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Animesh Chaturvedi updated CLOUDSTACK-5688: ------------------------------------------- Assignee: edison su Edison can you review these KVM issues > NPE when the KVM host is rebooted on the upgraded environment > -------------------------------------------------------------- > > Key: CLOUDSTACK-5688 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-5688 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the > default.) > Components: KVM, Upgrade > Affects Versions: 4.3.0 > Environment: upgraded from 4.2.1 to 4.3 > Reporter: manasaveloori > Assignee: edison su > Priority: Critical > Fix For: 4.3.0 > > Attachments: agent.log, management-server.rar, mysqldump421.dmp, > mysqldump43.dmp > > > Steps: > 1. Have CS 4.2.1 with KVM host.---advanced zone. > 2. Upgrade the CS to 4.3. > 3. Reboot the KVM host(did not enable maintenance) > 4. The KVM host is connected and then disconnecting from the CS. > Observing the following NPE in CS logs: > 2013-12-31 18:03:38,127 DEBUG [c.c.r.ResourceManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Dispatching resource state event > CREATE_HOST_VO_FOR_CONNECTED to BaremetalDhcpManagerImpl > 2013-12-31 18:03:38,127 DEBUG [c.c.r.ResourceManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Dispatching resource state event > CREATE_HOST_VO_FOR_CONNECTED to NetworkUsageManagerImpl > 2013-12-31 18:03:38,127 DEBUG [c.c.r.ResourceManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Dispatching resource state event > CREATE_HOST_VO_FOR_CONNECTED to BaremetalPxeManagerImpl > 2013-12-31 18:03:38,128 DEBUG [c.c.r.ResourceManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Dispatching resource state event > CREATE_HOST_VO_FOR_CONNECTED to PaloAltoExternalFirewallElement > 2013-12-31 18:03:38,128 DEBUG [c.c.r.ResourceManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Dispatching resource state event > CREATE_HOST_VO_FOR_CONNECTED to KvmServerDiscoverer > 2013-12-31 18:03:38,172 DEBUG [c.c.r.ResourceState] > (AgentConnectTaskPool-194:ctx-00137ad5) Resource state update: [id = 1; name > = RHLE64; old state = Enabled; event = InternalCreated; new state = Enabled] > 2013-12-31 18:03:38,173 DEBUG [c.c.h.Status] > (AgentConnectTaskPool-194:ctx-00137ad5) Transition:[Resource state = Enabled, > Agent event = AgentConnected, Host id = 1, name = RHLE64] > 2013-12-31 18:03:38,185 DEBUG [c.c.h.Status] > (AgentConnectTaskPool-194:ctx-00137ad5) Agent status update: [id = 1; name = > RHLE64; old status = Alert; event = AgentConnected; new status = Connecting; > old update count = 390; new update count = 391] > 2013-12-31 18:03:38,185 DEBUG [c.c.a.m.ClusteredAgentManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) create ClusteredAgentAttache for 1 > 2013-12-31 18:03:38,188 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Sending Connect to listener: > XcpServerDiscoverer > 2013-12-31 18:03:38,189 DEBUG [c.c.h.x.d.XcpServerDiscoverer] > (AgentConnectTaskPool-194:ctx-00137ad5) Not XenServer so moving on. > 2013-12-31 18:03:38,189 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Sending Connect to listener: > HypervServerDiscoverer > 2013-12-31 18:03:38,189 DEBUG [c.c.h.h.d.HypervServerDiscoverer] > (AgentConnectTaskPool-194:ctx-00137ad5) Not Hyper-V hypervisor, so moving on. > 2013-12-31 18:03:38,189 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Sending Connect to listener: > StoragePoolMonitor > 2013-12-31 18:03:38,202 DEBUG [c.c.s.l.StoragePoolMonitor] > (AgentConnectTaskPool-194:ctx-00137ad5) Host 1 connected, sending down > storage pool information ... > 2013-12-31 18:03:38,205 DEBUG [c.c.s.StorageManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Adding pool null to host 1 > 2013-12-31 18:03:38,215 DEBUG [c.c.a.t.Request] > (AgentConnectTaskPool-194:ctx-00137ad5) Seq 1-920387585: Sending { Cmd , > MgmtId: 6758231703598, via: 1(RHLE64), Ver: v1, Flags: 100011, > [{"com.cloud.agent.api.ModifyStoragePoolCommand":{"add":true,"pool":{"id":1,"uuid":"efbe1d5a-12c4-3a30-bef6-a219dfe81249","host":"10.147.40.27","path":"/export/home/manasa/primaryKVM","port":2049,"type":"NetworkFilesystem"},"localPath":"/mnt//efbe1d5a-12c4-3a30-bef6-a219dfe81249","wait":0}}] > } > 2013-12-31 18:03:38,220 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentManager-Handler-6:null) Ping from 1 > 2013-12-31 18:03:38,228 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentManager-Handler-6:null) Not processing PingRoutingCommand for agent > id=0; can't find the host in the DB > 2013-12-31 18:03:38,233 DEBUG [c.c.a.t.Request] (AgentManager-Handler-4:null) > Seq 1-920387585: Processing: { Ans: , MgmtId: 6758231703598, via: 1, Ver: > v1, Flags: 10, > [{"com.cloud.agent.api.ModifyStoragePoolAnswer":{"poolInfo":{"host":"10.147.40.27","localPath":"/mnt//efbe1d5a-12c4-3a30-bef6-a219dfe81249","hostPath":"/export/home/manasa/primaryKVM","poolType":"NetworkFilesystem","capacityBytes":52844687360,"availableBytes":40225341440},"templateInfo":{},"result":true,"wait":0}}] > } > 2013-12-31 18:03:38,234 DEBUG [c.c.a.t.Request] > (AgentConnectTaskPool-194:ctx-00137ad5) Seq 1-920387585: Received: { Ans: , > MgmtId: 6758231703598, via: 1, Ver: v1, Flags: 10, { ModifyStoragePoolAnswer > } } > 2013-12-31 18:03:38,273 INFO [o.a.c.s.d.p.DefaultHostListener] > (AgentConnectTaskPool-194:ctx-00137ad5) Connection established between > org.apache.cloudstack.storage.datastore.PrimaryDataStoreImpl@22daf59c host + 1 > 2013-12-31 18:03:38,283 DEBUG [c.c.s.StorageManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Successfully set Capacity - > 105689374720 for capacity type - 3 , DataCenterId - 1, HostOrPoolId - 1, > PodId null > 2013-12-31 18:03:38,283 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Sending Connect to listener: > DeploymentPlanningManagerImpl > 2013-12-31 18:03:38,284 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Sending Connect to listener: > NetworkOrchestrator > 2013-12-31 18:03:38,286 DEBUG [o.a.c.e.o.NetworkOrchestrator] > (AgentConnectTaskPool-194:ctx-00137ad5) Host's hypervisorType is: KVM > 2013-12-31 18:03:38,293 DEBUG [o.a.c.e.o.NetworkOrchestrator] > (AgentConnectTaskPool-194:ctx-00137ad5) Sending CheckNetworkCommand to check > the Network is setup correctly on Agent > 2013-12-31 18:03:38,306 DEBUG [c.c.a.t.Request] > (AgentConnectTaskPool-194:ctx-00137ad5) Seq 1-920387586: Sending { Cmd , > MgmtId: 6758231703598, via: 1(RHLE64), Ver: v1, Flags: 100111, > [{"com.cloud.agent.api.CheckNetworkCommand":{"networkInfoList":[{"physicalNetworkId":200}],"wait":0}}] > } > 2013-12-31 18:03:38,360 DEBUG [c.c.a.t.Request] (AgentManager-Handler-9:null) > Seq 1-920387586: Processing: { Ans: , MgmtId: 6758231703598, via: 1, Ver: > v1, Flags: 110, > [{"com.cloud.agent.api.CheckNetworkAnswer":{"_reconnect":false,"result":true,"wait":0}}] > } > 2013-12-31 18:03:38,360 DEBUG [c.c.a.t.Request] > (AgentConnectTaskPool-194:ctx-00137ad5) Seq 1-920387586: Received: { Ans: , > MgmtId: 6758231703598, via: 1, Ver: v1, Flags: 110, { CheckNetworkAnswer } } > 2013-12-31 18:03:38,360 DEBUG [o.a.c.e.o.NetworkOrchestrator] > (AgentConnectTaskPool-194:ctx-00137ad5) Network setup is correct on Agent > 2013-12-31 18:03:38,361 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Sending Connect to listener: > VmwareManagerImpl > 2013-12-31 18:03:38,362 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Sending Connect to listener: > SecondaryStorageListener > 2013-12-31 18:03:38,362 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Sending Connect to listener: > ClusteredVirtualMachineManagerImpl > 2013-12-31 18:03:38,362 DEBUG [c.c.v.VirtualMachineManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Received startup command from > hypervisor host. host id: 1 > 2013-12-31 18:03:38,362 INFO [c.c.v.VirtualMachinePowerStateSyncImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Reset VM power state sync for host: 1 > 2013-12-31 18:03:38,361 DEBUG [c.c.a.m.AgentAttache] > (AgentManager-Handler-9:null) Seq 1-920387586: No more commands found > 2013-12-31 18:03:38,383 DEBUG [c.c.v.VirtualMachineManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Found 8 VMs for host 1 > 2013-12-31 18:03:38,385 DEBUG [c.c.v.VirtualMachineManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) VM s-1-VM: cs state = Running and > realState = Stopped > 2013-12-31 18:03:38,385 DEBUG [c.c.v.VirtualMachineManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) VM s-1-VM: cs state = Running and > realState = Stopped > 2013-12-31 18:03:38,386 DEBUG [c.c.h.HighAvailabilityManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) VM does not require investigation so > I'm marking it as Stopped: VM[SecondaryStorageVm|s-1-VM] > 2013-12-31 18:03:38,386 WARN [o.a.c.alerts] > (AgentConnectTaskPool-194:ctx-00137ad5) alertType:: 19 // dataCenterId:: 1 > // podId:: 1 // clusterId:: null // message:: VM (name: s-1-VM, id: 1) > stopped unexpectedly on host id:1, availability zone id:1, pod id:1 > 2013-12-31 18:03:38,396 DEBUG [c.c.h.HighAvailabilityManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) VM is not HA enabled so we're done. > 2013-12-31 18:03:38,396 ERROR [c.c.a.m.AgentManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Monitor > ClusteredVirtualMachineManagerImpl says there is an error in the connect > process for 1 due to null > java.lang.NullPointerException > at > com.cloud.vm.VirtualMachineManagerImpl.advanceStop(VirtualMachineManagerImpl.java:1264) > at > com.cloud.ha.HighAvailabilityManagerImpl.scheduleRestart(HighAvailabilityManagerImpl.java:346) > at > com.cloud.vm.VirtualMachineManagerImpl.compareState(VirtualMachineManagerImpl.java:2715) > at > com.cloud.vm.VirtualMachineManagerImpl.fullHostSync(VirtualMachineManagerImpl.java:2278) > at > com.cloud.vm.VirtualMachineManagerImpl.processConnect(VirtualMachineManagerImpl.java:2929) > at > com.cloud.agent.manager.AgentManagerImpl.notifyMonitorsOfConnection(AgentManagerImpl.java:495) > at > com.cloud.agent.manager.AgentManagerImpl.handleConnectedAgent(AgentManagerImpl.java:999) > at > com.cloud.agent.manager.AgentManagerImpl.access$000(AgentManagerImpl.java:117) > at > com.cloud.agent.manager.AgentManagerImpl$HandleAgentConnectTask.runInContext(AgentManagerImpl.java:1082) > at > org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49) > at > org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56) > at > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103) > at > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53) > at > org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) > 2013-12-31 18:03:38,398 INFO [c.c.a.m.AgentManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Host 1 is disconnecting with event > AgentDisconnected > 2013-12-31 18:03:38,401 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) The next status of agent 1is Alert, > current status is Connecting > 2013-12-31 18:03:38,402 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Deregistering link for 1 with state > Alert > 2013-12-31 18:03:38,402 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Remove Agent : 1 > 2013-12-31 18:03:38,402 DEBUG [c.c.a.m.ConnectedAgentAttache] > (AgentConnectTaskPool-194:ctx-00137ad5) Processing Disconnect. > 2013-12-31 18:03:38,403 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: > com.cloud.hypervisor.xen.discoverer.XcpServerDiscoverer > 2013-12-31 18:03:38,403 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: > com.cloud.hypervisor.hyperv.discoverer.HypervServerDiscoverer > 2013-12-31 18:03:38,403 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: > com.cloud.storage.listener.StoragePoolMonitor > 2013-12-31 18:03:38,403 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: > com.cloud.deploy.DeploymentPlanningManagerImpl > 2013-12-31 18:03:38,403 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: > org.apache.cloudstack.engine.orchestration.NetworkOrchestrator > 2013-12-31 18:03:38,403 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: > com.cloud.hypervisor.vmware.manager.VmwareManagerImpl > 2013-12-31 18:03:38,404 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: > com.cloud.storage.secondary.SecondaryStorageListener > 2013-12-31 18:03:38,404 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: > com.cloud.vm.ClusteredVirtualMachineManagerImpl > 2013-12-31 18:03:38,404 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: > com.cloud.network.security.SecurityGroupListener > 2013-12-31 18:03:38,404 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: > com.cloud.consoleproxy.ConsoleProxyListener > 2013-12-31 18:03:38,407 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: > com.cloud.network.NetworkUsageManagerImpl$DirectNetworkStatsListener > 2013-12-31 18:03:38,407 DEBUG [c.c.n.NetworkUsageManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Disconnected called on 1 with status > Alert > 2013-12-31 18:03:38,407 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: > com.cloud.storage.LocalStoragePoolListener > 2013-12-31 18:03:38,408 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentConnectTaskPool-194:ctx-00137ad5) Sending Disconnect to listener: > com.cloud.agent.manager.AgentManager > The KVM host will be in Alert state: > The log snippet from agent log: > 2013-12-31 13:06:17,978 INFO [cloud.agent.Agent] (Agent-Handler-1:null) > Reconnecting... > 2013-12-31 13:06:17,979 INFO [utils.nio.NioClient] (Agent-Selector:null) > Connecting to 10.147.38.164:8250 > 2013-12-31 13:06:18,070 INFO [utils.nio.NioClient] (Agent-Selector:null) > SSL: Handshake done > 2013-12-31 13:06:18,070 INFO [utils.nio.NioClient] (Agent-Selector:null) > Connected to 10.147.38.164:8250 > 2013-12-31 13:06:18,124 INFO [cloud.agent.Agent] (Agent-Handler-2:null) > Proccess agent startup answer, agent id = 0 > 2013-12-31 13:06:18,125 INFO [cloud.agent.Agent] (Agent-Handler-2:null) Set > agent id 0 > 2013-12-31 13:06:18,125 INFO [cloud.agent.Agent] (Agent-Handler-2:null) > Startup Response Received: agent id = 0 > 2013-12-31 13:06:23,070 INFO [cloud.agent.Agent] (Agent-Handler-1:null) > Connected to the server > 2013-12-31 13:06:23,378 INFO [cloud.agent.Agent] (Agent-Handler-1:null) Lost > connection to the server. Dealing with the remaining commands... > 2013-12-31 13:06:28,379 INFO [cloud.agent.Agent] (Agent-Handler-1:null) > Reconnecting... > 2013-12-31 13:06:28,380 INFO [utils.nio.NioClient] (Agent-Selector:null) > Connecting to 10.147.38.164:8250 > 2013-12-31 13:06:28,473 INFO [utils.nio.NioClient] (Agent-Selector:null) > SSL: Handshake done > 2013-12-31 13:06:28,473 INFO [utils.nio.NioClient] (Agent-Selector:null) > Connected to 10.147.38.164:8250 > 2013-12-31 13:06:28,528 INFO [cloud.agent.Agent] (Agent-Handler-2:null) > Proccess agent startup answer, agent id = 0 > 2013-12-31 13:06:28,528 INFO [cloud.agent.Agent] (Agent-Handler-2:null) Set > agent id 0 > 2013-12-31 13:06:28,529 INFO [cloud.agent.Agent] (Agent-Handler-2:null) > Startup Response Received: agent id = 0 > Attaching the MS and agent logs. > Dumps before and after upgrade. -- This message was sent by Atlassian JIRA (v6.1.5#6160)