Hello People

I have a strange issue where mgt server times out to send a command to
secondary storage VM every hour and because of this UI won't be accessible
for a short duration of time. Sometimes I have to restart mgt server to get
it back to working state and sometimes I don't need to restart it. I also
see some exceptions while fetching the storage stats.


The log says secondary storage VM is lagging behind mgt server in ping and
it sends a disconnect message to other components. Can you let me know how
to troubleshoot this issue? I destroyed the secondary storage VM but the
issue still persists. I checked the date/time on the mgt server and SSVM
and they are same. This is happening for quite a few days now. Below are
the logs



2019-07-25 04:01:22,769 INFO  [c.c.a.m.AgentManagerImpl]
(AgentMonitor-1:ctx-c33dbe74) (logid:5442158c) Found the following agents
behind on ping: [183]
2019-07-25 04:01:22,775 WARN  [c.c.a.m.AgentManagerImpl]
(AgentMonitor-1:ctx-c33dbe74) (logid:5442158c) Disconnect agent for
CPVM/SSVM due to physical connection close. host: 183
2019-07-25 04:01:22,778 INFO  [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Host 183 is disconnecting
with event ShutdownRequested
2019-07-25 04:01:22,781 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) The next status of agent
183is Disconnected, current status is Up
2019-07-25 04:01:22,781 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Deregistering link for 183
with state Disconnected
2019-07-25 04:01:22,781 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Remove Agent : 183
2019-07-25 04:01:22,781 DEBUG [c.c.a.m.ConnectedAgentAttache]
(AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Processing Disconnect.
2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentAttache]
(AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Seq
183-7541559051008607242: Sending disconnect to class
com.cloud.agent.manager.SynchronousListener
2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
listener: com.cloud.hypervisor.xenserver.discoverer.XcpServerDiscoverer
2019-07-25 04:01:22,782 DEBUG [c.c.u.n.NioConnection]
(pool-2-thread-1:null) (logid:) Closing socket Socket[addr=/172.30.32.16
,port=38250,localport=8250]
2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentAttache]
(StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq
183-7541559051008607242: Waiting some more time because this is the current
command
2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
listener: com.cloud.hypervisor.hyperv.discoverer.HypervServerDiscoverer
2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentAttache]
(StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq
183-7541559051008607242: Waiting some more time because this is the current
command
2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
listener: com.cloud.deploy.DeploymentPlanningManagerImpl
2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
listener: com.cloud.network.security.SecurityGroupListener
2019-07-25 04:01:22,783 INFO  [c.c.u.e.CSExceptionErrorCode]
(StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Could not find exception:
com.cloud.exception.OperationTimedoutException in error code list for
exceptions
2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
listener: org.apache.cloudstack.engine.orchestration.NetworkOrchestrator
2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
listener: com.cloud.vm.ClusteredVirtualMachineManagerImpl
2019-07-25 04:01:22,783 WARN  [c.c.a.m.AgentAttache]
(StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq
183-7541559051008607242: Timed out on null
2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
listener: com.cloud.storage.listener.StoragePoolMonitor
2019-07-25 04:01:22,784 DEBUG [c.c.a.m.AgentAttache]
(StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq
183-7541559051008607242: Cancelling.
2019-07-25 04:01:22,784 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
listener: com.cloud.storage.secondary.SecondaryStorageListener
2019-07-25 04:01:22,784 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
listener: com.cloud.network.SshKeysDistriMonitor
2019-07-25 04:01:22,785 DEBUG [o.a.c.s.RemoteHostEndPoint]
(StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Failed to send command,
due to Agent:183, com.cloud.exception.OperationTimedoutException: Commands
7541559051008607242 to Host 183 timed out after 3600
2019-07-25 04:01:22,785 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
listener: com.cloud.network.router.VpcVirtualNetworkApplianceManagerImpl
2019-07-25 04:01:22,785 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
listener: com.cloud.storage.download.DownloadListener




2019-07-25 04:01:22,785 ERROR [c.c.s.StatsCollector]
(StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Error trying to retrieve
storage stats
com.cloud.utils.exception.CloudRuntimeException: Failed to send command,
due to Agent:183, com.cloud.exception.OperationTimedoutException: Commands
7541559051008607242 to Host 183 timed out after 3600
        at
org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(RemoteHostEndPoint.java:133)
        at
com.cloud.server.StatsCollector$StorageCollector.runInContext(StatsCollector.java:1139)
        at
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
        at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
        at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
        at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
        at
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
2019-07-25 04:01:22,786 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
listener: com.cloud.consoleproxy.ConsoleProxyListener
2019-07-25 04:01:22,789 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
listener: com.cloud.storage.LocalStoragePoolListener
2019-07-25 04:01:22,789 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
listener: com.cloud.storage.upload.UploadListener
2019-07-25 04:01:22,790 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
listener: com.cloud.capacity.StorageCapacityListener
2019-07-25 04:01:22,790 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
listener: com.cloud.capacity.ComputeCapacityListener
2019-07-25 04:01:22,790 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
listener: com.cloud.network.SshKeysDistriMonitor
2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
listener: com.cloud.network.router.VirtualNetworkApplianceManagerImpl
2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
listener:
com.cloud.network.NetworkUsageManagerImpl$DirectNetworkStatsListener
2019-07-25 04:01:22,791 DEBUG [c.c.n.NetworkUsageManagerImpl]
(AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Disconnected called on 183
with status Disconnected




2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
listener: com.cloud.agent.manager.AgentManagerImpl$BehindOnPingListener
2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
listener: com.cloud.agent.manager.AgentManagerImpl$SetHostParamsListener
2019-07-25 04:01:22,791 DEBUG [c.c.h.Status] (AgentTaskPool-1:ctx-66de2057)
(logid:841d2a63) Transition:[Resource state = Enabled, Agent event =
ShutdownRequested, Host id = 183, name = s-2775-VM]



-- 
Thanks and regards
Rakesh venkatesh

Reply via email to