[URGENT] cloudstack SSVM and router failed to start after power blackout

2014-06-19 Thread dimas yoga pratama
OK this is my problem, after blackout I can''t start virtual router, and
ssvm not detected in my cloudstack system. SSVM recreated itself but stuck
in starting state.

What should I do?Please help me..


Re: [URGENT] cloudstack SSVM and router failed to start after power blackout

2014-06-19 Thread Rohit Yadav
Hi,

SSVMs and VRs are stateless so if restarts are not working for you, you may
(force) stop and remove them. The CloudStack HA thread(s) would kickstart
new ones after a certain timeout, to speed this behaviour you may restart
CloudStack as well.

If your problem still persists after trying above you may try debugging the
issue:
https://cwiki.apache.org/confluence/display/CLOUDSTACK/SSVM,+templates,+Secondary+storage+troubleshooting

Regards.


On Thu, Jun 19, 2014 at 7:34 PM, dimas yoga pratama smid...@gmail.com
wrote:

 OK this is my problem, after blackout I can''t start virtual router, and
 ssvm not detected in my cloudstack system. SSVM recreated itself but stuck
 in starting state.

 What should I do?Please help me..



Re: [URGENT] cloudstack SSVM and router failed to start after power blackout

2014-06-19 Thread dimas yoga pratama
Hi, from the infrastructure tab I can detect the hosts, but  both of the
hosts show alert state., I already try to force reconnect but it fails.
What should I do? Now the CPVM fail to start too.


On Thu, Jun 19, 2014 at 9:17 PM, Rohit Yadav bhais...@apache.org wrote:

 Hi,

 SSVMs and VRs are stateless so if restarts are not working for you, you may
 (force) stop and remove them. The CloudStack HA thread(s) would kickstart
 new ones after a certain timeout, to speed this behaviour you may restart
 CloudStack as well.

 If your problem still persists after trying above you may try debugging the
 issue:

 https://cwiki.apache.org/confluence/display/CLOUDSTACK/SSVM,+templates,+Secondary+storage+troubleshooting

 Regards.


 On Thu, Jun 19, 2014 at 7:34 PM, dimas yoga pratama smid...@gmail.com
 wrote:

  OK this is my problem, after blackout I can''t start virtual router, and
  ssvm not detected in my cloudstack system. SSVM recreated itself but
 stuck
  in starting state.
 
  What should I do?Please help me..
 



Re: [URGENT] cloudstack SSVM and router failed to start after power blackout

2014-06-19 Thread Rohit Yadav
I'm not sure what could be the specific issue. You can tail the management
server logs to see what is failing. After you figure out the specific
issue, you may share it with us with your host os, CloudStack version
details and the connected host details.

Regards.


On Thu, Jun 19, 2014 at 8:01 PM, dimas yoga pratama smid...@gmail.com
wrote:

 Hi, from the infrastructure tab I can detect the hosts, but  both of the
 hosts show alert state., I already try to force reconnect but it fails.
 What should I do? Now the CPVM fail to start too.


 On Thu, Jun 19, 2014 at 9:17 PM, Rohit Yadav bhais...@apache.org wrote:

  Hi,
 
  SSVMs and VRs are stateless so if restarts are not working for you, you
 may
  (force) stop and remove them. The CloudStack HA thread(s) would kickstart
  new ones after a certain timeout, to speed this behaviour you may restart
  CloudStack as well.
 
  If your problem still persists after trying above you may try debugging
 the
  issue:
 
 
 https://cwiki.apache.org/confluence/display/CLOUDSTACK/SSVM,+templates,+Secondary+storage+troubleshooting
 
  Regards.
 
 
  On Thu, Jun 19, 2014 at 7:34 PM, dimas yoga pratama smid...@gmail.com
  wrote:
 
   OK this is my problem, after blackout I can''t start virtual router,
 and
   ssvm not detected in my cloudstack system. SSVM recreated itself but
  stuck
   in starting state.
  
   What should I do?Please help me..
  
 



Re: [URGENT] cloudstack SSVM and router failed to start after power blackout

2014-06-19 Thread dimas yoga pratama
management log :

2014-06-19 21:49:21,312 WARN  [c.c.u.n.Link] (AgentManager-Selector:null)
SSL: Fail to find the generated keystore. Loading fail-safe one to continue.

2014-06-19 21:49:11,585 DEBUG [c.c.v.VirtualMachineManagerImpl]
(AgentConnectTaskPool-344:ctx-07431045) Ignoring VM in starting mode:
r-71-VM
2014-06-19 21:49:11,585 DEBUG [c.c.h.HighAvailabilityManagerImpl]
(AgentConnectTaskPool-344:ctx-07431045) VM does not require investigation
so I'm marking it as Stopped: VM[DomainRouter|r-71-VM]
2014-06-19 21:49:11,585 WARN  [o.a.c.f.j.AsyncJobExecutionContext]
(AgentConnectTaskPool-344:ctx-07431045) Job is executed without a context,
setup psudo job for the executing thread
2014-06-19 21:49:11,651 DEBUG [c.c.v.VirtualMachineManagerImpl]
(AgentConnectTaskPool-344:ctx-07431045) Unable to transition the state but
we're moving on because it's forced stop
2014-06-19 21:49:11,651 DEBUG [c.c.v.VirtualMachineManagerImpl]
(AgentConnectTaskPool-344:ctx-07431045) Unable to cleanup VM:
VM[DomainRouter|r-71-VM] ,since outstanding work item is not found
2014-06-19 21:49:11,651 ERROR [c.c.a.m.AgentManagerImpl]
(AgentConnectTaskPool-344:ctx-07431045) Monitor
ClusteredVirtualMachineManagerImpl says there is an error in the connect
process for 2 due to Work item not found, We cannot stop
VM[DomainRouter|r-71-VM] when it is in state Starting
com.cloud.utils.exception.CloudRuntimeException: Work item not found, We
cannot stop VM[DomainRouter|r-71-VM] when it is in state Starting
at
com.cloud.vm.VirtualMachineManagerImpl.advanceStop(VirtualMachineManagerImpl.java:1415)
at
com.cloud.vm.VirtualMachineManagerImpl.orchestrateStop(VirtualMachineManagerImpl.java:1344)
at
com.cloud.vm.VirtualMachineManagerImpl.advanceStop(VirtualMachineManagerImpl.java:1312)
at
com.cloud.ha.HighAvailabilityManagerImpl.scheduleRestart(HighAvailabilityManagerImpl.java:346)
at
com.cloud.vm.VirtualMachineManagerImpl.compareState(VirtualMachineManagerImpl.java:2827)
at
com.cloud.vm.VirtualMachineManagerImpl.fullHostSync(VirtualMachineManagerImpl.java:2384)
at
com.cloud.vm.VirtualMachineManagerImpl.processConnect(VirtualMachineManagerImpl.java:3035)
at
com.cloud.agent.manager.AgentManagerImpl.notifyMonitorsOfConnection(AgentManagerImpl.java:495)
at
com.cloud.agent.manager.AgentManagerImpl.handleConnectedAgent(AgentManagerImpl.java:999)
at
com.cloud.agent.manager.AgentManagerImpl.access$000(AgentManagerImpl.java:117)
at
com.cloud.agent.manager.AgentManagerImpl$HandleAgentConnectTask.runInContext(AgentManagerImpl.java:1082)
at
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
at
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
2014-06-19 21:49:08,251 DEBUG [c.c.s.s.SecondaryStorageManagerImpl]
(secstorage-1:ctx-c407a559) Zone 1 is not ready to launch secondary storage
VM yet
2014-06-19 21:49:08,282 DEBUG [c.c.c.ConsoleProxyManagerImpl]
(consoleproxy-1:ctx-60f4b3c9) Zone 1 is not ready to launch console proxy
yet
2014-06-19 21:49:01,197 DEBUG [c.c.n.NetworkUsageManagerImpl]
(AgentConnectTaskPool-342:ctx-b66d3294) Disconnected called on 2 with
status Alert
2014-06-19 21:49:01,197 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentConnectTaskPool-342:ctx-b66d3294) Sending Disconnect to listener:
com.cloud.consoleproxy.ConsoleProxyListener
2014-06-19 21:49:01,198 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-342:ctx-b66d3294) Transition:[Resource state =
Enabled, Agent event = AgentDisconnected, Host id = 2, name =
host1.cloud.priv]
2014-06-19 21:49:01,259 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-342:ctx-b66d3294) Agent status update: [id = 2; name
= host1.cloud.priv; old status = Connecting; event = AgentDisconnected; new
status = Alert; old update count = 404; new update count = 405]
2014-06-19 21:49:01,259 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentConnectTaskPool-342:ctx-b66d3294) Notifying other nodes of to
disconnect
2014-06-19 21:49:01,260 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentConnectTaskPool-342:ctx-b66d3294) Failed to handle host connection:
com.cloud.utils.exception.CloudRuntimeException: Unable to connect 2
2014-06-19 21:49:01,261 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentConnectTaskPool-342:ctx-b66d3294) Can not send command
com.cloud.agent.api.ReadyCommand due to Host 2 is 

Re: [URGENT] cloudstack SSVM and router failed to start after power blackout

2014-06-19 Thread Rohit Yadav
Hi Dimas,

Looks like the VM is in starting state and CloudStack is unable to contact
the agent. Hope you've removed the VR from CloudStack using the UI. You can
try restarting the management server. The issue is of sync, where one party
(mgmt server) has different view of the world than the other (the
host/agent). In such cases, do not remove the host else when you re-add it,
it may destroy all the (user) VMs on it or simply fail.

If restarting won't fix the problem, in global settings reduce the expunge
timeout (that's when CloudStack marks a VM as removed, since you've just
destroyed it, it can take some time to get expunged) and try again.

As a final course of action I would stop the management server, then ssh to
the host and destroy SSVMs, using mysql client I would change db entries
for SSVM to removed/expunged (simply mark by updating row, do not remove
the row), start the mgmt server again and hope it would work this time.

Suggestions anyone in such a case?

Regards.


On Thu, Jun 19, 2014 at 8:29 PM, dimas yoga pratama smid...@gmail.com
wrote:

 management log :

 2014-06-19 21:49:21,312 WARN  [c.c.u.n.Link] (AgentManager-Selector:null)
 SSL: Fail to find the generated keystore. Loading fail-safe one to
 continue.

 2014-06-19 21:49:11,585 DEBUG [c.c.v.VirtualMachineManagerImpl]
 (AgentConnectTaskPool-344:ctx-07431045) Ignoring VM in starting mode:
 r-71-VM
 2014-06-19 21:49:11,585 DEBUG [c.c.h.HighAvailabilityManagerImpl]
 (AgentConnectTaskPool-344:ctx-07431045) VM does not require investigation
 so I'm marking it as Stopped: VM[DomainRouter|r-71-VM]
 2014-06-19 21:49:11,585 WARN  [o.a.c.f.j.AsyncJobExecutionContext]
 (AgentConnectTaskPool-344:ctx-07431045) Job is executed without a context,
 setup psudo job for the executing thread
 2014-06-19 21:49:11,651 DEBUG [c.c.v.VirtualMachineManagerImpl]
 (AgentConnectTaskPool-344:ctx-07431045) Unable to transition the state but
 we're moving on because it's forced stop
 2014-06-19 21:49:11,651 DEBUG [c.c.v.VirtualMachineManagerImpl]
 (AgentConnectTaskPool-344:ctx-07431045) Unable to cleanup VM:
 VM[DomainRouter|r-71-VM] ,since outstanding work item is not found
 2014-06-19 21:49:11,651 ERROR [c.c.a.m.AgentManagerImpl]
 (AgentConnectTaskPool-344:ctx-07431045) Monitor
 ClusteredVirtualMachineManagerImpl says there is an error in the connect
 process for 2 due to Work item not found, We cannot stop
 VM[DomainRouter|r-71-VM] when it is in state Starting
 com.cloud.utils.exception.CloudRuntimeException: Work item not found, We
 cannot stop VM[DomainRouter|r-71-VM] when it is in state Starting
 at

 com.cloud.vm.VirtualMachineManagerImpl.advanceStop(VirtualMachineManagerImpl.java:1415)
 at

 com.cloud.vm.VirtualMachineManagerImpl.orchestrateStop(VirtualMachineManagerImpl.java:1344)
 at

 com.cloud.vm.VirtualMachineManagerImpl.advanceStop(VirtualMachineManagerImpl.java:1312)
 at

 com.cloud.ha.HighAvailabilityManagerImpl.scheduleRestart(HighAvailabilityManagerImpl.java:346)
 at

 com.cloud.vm.VirtualMachineManagerImpl.compareState(VirtualMachineManagerImpl.java:2827)
 at

 com.cloud.vm.VirtualMachineManagerImpl.fullHostSync(VirtualMachineManagerImpl.java:2384)
 at

 com.cloud.vm.VirtualMachineManagerImpl.processConnect(VirtualMachineManagerImpl.java:3035)
 at

 com.cloud.agent.manager.AgentManagerImpl.notifyMonitorsOfConnection(AgentManagerImpl.java:495)
 at

 com.cloud.agent.manager.AgentManagerImpl.handleConnectedAgent(AgentManagerImpl.java:999)
 at

 com.cloud.agent.manager.AgentManagerImpl.access$000(AgentManagerImpl.java:117)
 at

 com.cloud.agent.manager.AgentManagerImpl$HandleAgentConnectTask.runInContext(AgentManagerImpl.java:1082)
 at

 org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
 at

 org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
 at

 org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
 at

 org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
 at

 org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
 at

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 2014-06-19 21:49:08,251 DEBUG [c.c.s.s.SecondaryStorageManagerImpl]
 (secstorage-1:ctx-c407a559) Zone 1 is not ready to launch secondary storage
 VM yet
 2014-06-19 21:49:08,282 DEBUG [c.c.c.ConsoleProxyManagerImpl]
 (consoleproxy-1:ctx-60f4b3c9) Zone 1 is not ready to launch console proxy
 yet
 2014-06-19 21:49:01,197 DEBUG [c.c.n.NetworkUsageManagerImpl]
 (AgentConnectTaskPool-342:ctx-b66d3294) Disconnected called 

Re: [URGENT] cloudstack SSVM and router failed to start after power blackout

2014-06-19 Thread dimas yoga pratama
Thanks Rohit, I will give feedback as soon as I tried it


On Thu, Jun 19, 2014 at 10:38 PM, Rohit Yadav bhais...@apache.org wrote:

 Hi Dimas,

 Looks like the VM is in starting state and CloudStack is unable to contact
 the agent. Hope you've removed the VR from CloudStack using the UI. You can
 try restarting the management server. The issue is of sync, where one party
 (mgmt server) has different view of the world than the other (the
 host/agent). In such cases, do not remove the host else when you re-add it,
 it may destroy all the (user) VMs on it or simply fail.

 If restarting won't fix the problem, in global settings reduce the expunge
 timeout (that's when CloudStack marks a VM as removed, since you've just
 destroyed it, it can take some time to get expunged) and try again.

 As a final course of action I would stop the management server, then ssh to
 the host and destroy SSVMs, using mysql client I would change db entries
 for SSVM to removed/expunged (simply mark by updating row, do not remove
 the row), start the mgmt server again and hope it would work this time.

 Suggestions anyone in such a case?

 Regards.


 On Thu, Jun 19, 2014 at 8:29 PM, dimas yoga pratama smid...@gmail.com
 wrote:

  management log :
 
  2014-06-19 21:49:21,312 WARN  [c.c.u.n.Link] (AgentManager-Selector:null)
  SSL: Fail to find the generated keystore. Loading fail-safe one to
  continue.
 
  2014-06-19 21:49:11,585 DEBUG [c.c.v.VirtualMachineManagerImpl]
  (AgentConnectTaskPool-344:ctx-07431045) Ignoring VM in starting mode:
  r-71-VM
  2014-06-19 21:49:11,585 DEBUG [c.c.h.HighAvailabilityManagerImpl]
  (AgentConnectTaskPool-344:ctx-07431045) VM does not require investigation
  so I'm marking it as Stopped: VM[DomainRouter|r-71-VM]
  2014-06-19 21:49:11,585 WARN  [o.a.c.f.j.AsyncJobExecutionContext]
  (AgentConnectTaskPool-344:ctx-07431045) Job is executed without a
 context,
  setup psudo job for the executing thread
  2014-06-19 21:49:11,651 DEBUG [c.c.v.VirtualMachineManagerImpl]
  (AgentConnectTaskPool-344:ctx-07431045) Unable to transition the state
 but
  we're moving on because it's forced stop
  2014-06-19 21:49:11,651 DEBUG [c.c.v.VirtualMachineManagerImpl]
  (AgentConnectTaskPool-344:ctx-07431045) Unable to cleanup VM:
  VM[DomainRouter|r-71-VM] ,since outstanding work item is not found
  2014-06-19 21:49:11,651 ERROR [c.c.a.m.AgentManagerImpl]
  (AgentConnectTaskPool-344:ctx-07431045) Monitor
  ClusteredVirtualMachineManagerImpl says there is an error in the connect
  process for 2 due to Work item not found, We cannot stop
  VM[DomainRouter|r-71-VM] when it is in state Starting
  com.cloud.utils.exception.CloudRuntimeException: Work item not found, We
  cannot stop VM[DomainRouter|r-71-VM] when it is in state Starting
  at
 
 
 com.cloud.vm.VirtualMachineManagerImpl.advanceStop(VirtualMachineManagerImpl.java:1415)
  at
 
 
 com.cloud.vm.VirtualMachineManagerImpl.orchestrateStop(VirtualMachineManagerImpl.java:1344)
  at
 
 
 com.cloud.vm.VirtualMachineManagerImpl.advanceStop(VirtualMachineManagerImpl.java:1312)
  at
 
 
 com.cloud.ha.HighAvailabilityManagerImpl.scheduleRestart(HighAvailabilityManagerImpl.java:346)
  at
 
 
 com.cloud.vm.VirtualMachineManagerImpl.compareState(VirtualMachineManagerImpl.java:2827)
  at
 
 
 com.cloud.vm.VirtualMachineManagerImpl.fullHostSync(VirtualMachineManagerImpl.java:2384)
  at
 
 
 com.cloud.vm.VirtualMachineManagerImpl.processConnect(VirtualMachineManagerImpl.java:3035)
  at
 
 
 com.cloud.agent.manager.AgentManagerImpl.notifyMonitorsOfConnection(AgentManagerImpl.java:495)
  at
 
 
 com.cloud.agent.manager.AgentManagerImpl.handleConnectedAgent(AgentManagerImpl.java:999)
  at
 
 
 com.cloud.agent.manager.AgentManagerImpl.access$000(AgentManagerImpl.java:117)
  at
 
 
 com.cloud.agent.manager.AgentManagerImpl$HandleAgentConnectTask.runInContext(AgentManagerImpl.java:1082)
  at
 
 
 org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
  at
 
 
 org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
  at
 
 
 org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
  at
 
 
 org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
  at
 
 
 org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
  at
 
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at
 
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:744)
  2014-06-19 21:49:08,251 DEBUG [c.c.s.s.SecondaryStorageManagerImpl]
  (secstorage-1:ctx-c407a559) Zone 1 is not ready to launch secondary
 storage
  VM yet