[ https://issues.apache.org/jira/browse/CLOUDSTACK-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692981#comment-13692981 ]
Prasanna Santhanam commented on CLOUDSTACK-3137: ------------------------------------------------ The deployment uses Xenserver so there are no logs like KVM on the agent side > systemvms HA fails on latest master (4591f94) > --------------------------------------------- > > Key: CLOUDSTACK-3137 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-3137 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the > default.) > Reporter: Prasanna Santhanam > Priority: Blocker > Attachments: logs.tar.bz2 > > > SystemVMs fail to start during HA on latest master > (4591f94a0bb69875dca769c9738998d3b5b96106) > It looks like the HA of system VMs is failing. On our automated test > environment we reboot the management server before the systemVMs spin up to > ensure that the global settings are affected and the configuration for the > secstorage.internal.sites is rightly set in the secondary storage VM. The > management server upon reboot discovers that SSVM was put to perform HA in > the op_ha_work table and starts to re-spin the SSVM. At this time the CPVM is > also booting up. No further activity is observed after two HA attempts. > SystemVM start before Management server start: > Jun 24 22:30:49 cloudstack-centos63 local0: 2013-06-25 05:30:49,738 DEBUG > [cloud.vm.VirtualMachineManagerImpl] (secstorage-1:null) Allocating entries > for VM: VM[SecondaryStorageVm|s-1-VM] > Jun 24 22:30:49 cloudstack-centos63 local0: 2013-06-25 05:30:49,771 DEBUG > [cloud.vm.VirtualMachineManagerImpl] (secstorage-1:null) Allocating nics for > VM[SecondaryStorageVm|s-1-VM] > Jun 24 22:30:49 cloudstack-centos63 local0: 2013-06-25 05:30:49,774 DEBUG > [cloud.network.NetworkManagerImpl] (secstorage-1:null) Allocating nic for vm > VM[SecondaryStorageVm|s-1-VM] in network Ntwk[200|Public|1] with requested > profile NicProfile[0-0-null-null-null > Jun 24 22:30:49 cloudstack-centos63 local0: 2013-06-25 05:30:49,843 DEBUG > [cloud.network.NetworkManagerImpl] (secstorage-1:null) Allocating nic for vm > VM[SecondaryStorageVm|s-1-VM] in network Ntwk[202|Control|3] with requested > profile null > Jun 24 22:30:49 cloudstack-centos63 local0: 2013-06-25 05:30:49,870 DEBUG > [cloud.network.NetworkManagerImpl] (secstorage-1:null) Allocating nic for vm > VM[SecondaryStorageVm|s-1-VM] in network Ntwk[201|Management|2] with > requested profile null > Jun 24 22:30:49 cloudstack-centos63 local0: 2013-06-25 05:30:49,916 DEBUG > [cloud.network.NetworkManagerImpl] (secstorage-1:null) Allocating nic for vm > VM[SecondaryStorageVm|s-1-VM] in network Ntwk[203|Storage|4] with requested > profile null > ... > ... > Jun 24 22:30:52 cloudstack-centos63 local0: 2013-06-25 05:30:52,552 DEBUG > [agent.transport.Request] (consoleproxy-1:null) Seq 3-2067660811: Sending { > Cmd , MgmtId: 200973787296321, via: 3, Ver: v1, Flags: 100111, > [{"org.apache.cloudstack.storage.command.CopyCommand":{"srcTO":{"org.apache.cloudstack.storage.to.TemplateObjectTO":{"path":"template/tmpl/1/1/","origUrl":"http://nfs/templates/acton/acton-systemvm-02062012.vhd.bz2","uuid":"1","id":1,"format":"VHD","accountId":1,"checksum":"f613f38c96bf039f2e5cbf92fa8ad4f8","hvm":false,"displayText":"SystemVM > Template > (XenServer)","imageDataStore":{"com.cloud.agent.api.to.NfsTO":{"_url":"nfs://nfs.fmt.vmops.com:/expo... > Jun 24 22:31:01 cloudstack-centos63 local0: 2013-06-25 05:31:01,824 INFO > [context.support.XmlWebApplicationContext] (Thread-26:null) Closing Root > WebApplicationContext: startup date [Mon Jun 24 23:25:50 UTC 2013]; root of > context hierarchy > ^^^^^^^^^^ Management Server Restarts ^^^^^^^^^^^^^^ > After restart SSVM scheduled for HA > Jun 24 22:32:52 cloudstack-centos63 local0: 2013-06-25 05:32:52,394 INFO > [cloud.ha.HighAvailabilityManagerImpl] (Timer-1:null) Schedule vm for HA: > VM[SecondaryStorageVm|s-1-VM] > Jun 24 22:32:52 cloudstack-centos63 local0: 2013-06-25 05:32:52,455 INFO > [cloud.vm.VirtualMachineManagerImpl] (Timer-1:null) Handling unfinished work > item: ItWork[1805a89b-d03a-4f1f-b717-ac9f5da6edea-Starting-2-Prepare] > Jun 24 22:32:52 cloudstack-centos63 local0: 2013-06-25 05:32:52,504 INFO > [cloud.ha.HighAvailabilityManagerImpl] (Timer-1:null) Schedule vm for HA: > VM[ConsoleProxy|v-2-VM] > ... > Jun 24 22:32:52 cloudstack-centos63 local0: 2013-06-25 05:32:52,540 INFO > [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:null) Starting work > Jun 24 22:32:52 cloudstack-centos63 local0: 2013-06-25 05:32:52,547 INFO > [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-1:null) Starting work > ... > Jun 24 22:32:52 cloudstack-centos63 local0: 2013-06-25 05:32:52,815 INFO > [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-1) Processing > HAWork[1-HA-1-Starting-Investigating] > ... > Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,289 DEBUG > [cloud.ha.CheckOnAgentInvestigator] (HA-Worker-0:work-2) Unable to reach the > agent for VM[ConsoleProxy|v-2-VM]: Resource [Host:3] is unreachable: Host 3: > Host with specified id is not in the right state: Disconnected > Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,290 INFO > [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-2) > SimpleInvestigator found VM[ConsoleProxy|v-2-VM]to be alive? null > Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,293 DEBUG > [cloud.ha.CheckOnAgentInvestigator] (HA-Worker-2:work-1) Unable to reach the > agent for VM[SecondaryStorageVm|s-1-VM]: Resource [Host:2] is unreachable: > Host 2: Host with specified id is not in the right state: Disconnected > Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,298 INFO > [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-1) > SimpleInvestigator found VM[SecondaryStorageVm|s-1-VM]to be alive? null > Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,336 INFO > [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-2) > XenServerInvestigator found VM[ConsoleProxy|v-2-VM]to be alive? null > Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,338 DEBUG > [cloud.ha.UserVmDomRInvestigator] (HA-Worker-0:work-2) Not a User Vm, unable > to determine state of VM[ConsoleProxy|v-2-VM] returning null > Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,338 INFO > [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-2) null found > VM[ConsoleProxy|v-2-VM]to be alive? null > Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,338 DEBUG > [cloud.ha.ManagementIPSystemVMInvestigator] (HA-Worker-0:work-2) Testing if > VM[ConsoleProxy|v-2-VM] is alive > .... > Rescheduled HA > Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,623 INFO > [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-2) Rescheduling > HAWork[2-HA-2-Starting-Investigating] to try again at Tue Jun 25 05:43:07 UTC > 2013 > Jun 24 22:32:53 cloudstack-centos63 local0: 2013-06-25 05:32:53,626 INFO > [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-1) Rescheduling > HAWork[1-HA-1-Starting-Investigating] to try again at Tue Jun 25 05:43:07 UTC > 2013 > ... > Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,586 INFO > [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-3:work-1) Processing > HAWork[1-HA-1-Starting-Investigating] > Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,592 INFO > [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-3:work-1) HA on > VM[SecondaryStorageVm|s-1-VM] > Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,593 INFO > [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-3:work-1) VM > VM[SecondaryStorageVm|s-1-VM] has been changed. Current State = Starting > Previous State = Starting last updated = 6 previous updated = 2 > Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,593 INFO > [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-3:work-1) Completed > HAWork[1-HA-1-Starting-Investigating] > Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,611 INFO > [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-2) Processing > HAWork[2-HA-2-Starting-Investigating] > Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,618 INFO > [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-2) HA on > VM[ConsoleProxy|v-2-VM] > Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,618 INFO > [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-2) VM > VM[ConsoleProxy|v-2-VM] has been changed. Current State = Starting Previous > State = Starting last updated = 6 previous updated = 2 > Jun 24 22:43:11 cloudstack-centos63 local0: 2013-06-25 05:43:11,618 INFO > [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-2:work-2) Completed > HAWork[2-HA-2-Starting-Investigating] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira