Yes I was monitoring it continuously. Below are the steps which I was doing 
when issue happened


1. Ping from MGT server to ssvm
2. Ping from ssvm to secondary storage ip
3. Ping from ssvm to public IP like 8.8.8.8
4. Ping from MGT server to node in which ssvm was running


Out of all these, the ping drops were observed from MGT server to ssvm and mgt 
server to nodes. Basically all nodes lost connection. Then it recovered itself 
after 1 minute.


Sent from my iPhone

> On 25-Jul-2019, at 3:48 PM, Andrija Panic <andrija.pa...@gmail.com> wrote:
> 
> Can you observe the status of SSVM (is it UP/Connecting/Disconnected/Down)
> while you have issues?
> 
> I would advise checking your Secondary Storage itself - and also running
> the SSVM diagnose script  /usr/local/cloud/systemvm/ssvm-check.sh - observe
> if any errors with NFS or others.
> 
> Lastly - and don't laugh - check that you don't have issues with networking
> equipment (some of us had VEEEERY strange issues in connectivity some years
> ago with crappy QCT/Quanta Switches in MLAG setup)
> 
> Andrija
> 
>> On Thu, 25 Jul 2019 at 15:42, Rakesh v <www.rakeshv....@gmail.com> wrote:
>> 
>> Yes I have set the ip's of the three MGT servers in the "host" field
>> 
>> Sent from my iPhone
>> 
>>> On 25-Jul-2019, at 2:14 PM, Pierre-Luc Dion <pdion...@apache.org> wrote:
>>> 
>>> Do you have a load balancer in front of cloudstack? Did you set the
>> global
>>> settings "host" to the ip of the mgmt server?
>>> 
>>> 
>>> Le jeu. 25 juill. 2019 03 h 24, Rakesh Venkatesh <
>> www.rakeshv....@gmail.com>
>>> a écrit :
>>> 
>>>> Hello People
>>>> 
>>>> 
>>>> I have a strange issue where mgt server times out to send a command to
>>>> secondary storage VM every hour and because of this UI won't be
>> accessible
>>>> for a short duration of time. Sometimes I have to restart mgt server to
>> get
>>>> it back to working state and sometimes I don't need to restart it. I
>> also
>>>> see some exceptions while fetching the storage stats.
>>>> 
>>>> 
>>>> The log says secondary storage VM is lagging behind mgt server in ping
>> and
>>>> it sends a disconnect message to other components. Can you let me know
>> how
>>>> to troubleshoot this issue? I destroyed the secondary storage VM but the
>>>> issue still persists. I checked the date/time on the mgt server and SSVM
>>>> and they are same. This is happening for quite a few days now. Below are
>>>> the logs
>>>> 
>>>> 
>>>> 
>>>> 2019-07-25 04:01:22,769 INFO  [c.c.a.m.AgentManagerImpl]
>>>> (AgentMonitor-1:ctx-c33dbe74) (logid:5442158c) Found the following
>> agents
>>>> behind on ping: [183]
>>>> 2019-07-25 04:01:22,775 WARN  [c.c.a.m.AgentManagerImpl]
>>>> (AgentMonitor-1:ctx-c33dbe74) (logid:5442158c) Disconnect agent for
>>>> CPVM/SSVM due to physical connection close. host: 183
>>>> 2019-07-25 04:01:22,778 INFO  [c.c.a.m.AgentManagerImpl]
>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Host 183 is
>> disconnecting
>>>> with event ShutdownRequested
>>>> 2019-07-25 04:01:22,781 DEBUG [c.c.a.m.AgentManagerImpl]
>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) The next status of agent
>>>> 183is Disconnected, current status is Up
>>>> 2019-07-25 04:01:22,781 DEBUG [c.c.a.m.AgentManagerImpl]
>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Deregistering link for
>> 183
>>>> with state Disconnected
>>>> 2019-07-25 04:01:22,781 DEBUG [c.c.a.m.AgentManagerImpl]
>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Remove Agent : 183
>>>> 2019-07-25 04:01:22,781 DEBUG [c.c.a.m.ConnectedAgentAttache]
>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Processing Disconnect.
>>>> 2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentAttache]
>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Seq
>>>> 183-7541559051008607242: Sending disconnect to class
>>>> com.cloud.agent.manager.SynchronousListener
>>>> 2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentManagerImpl]
>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
>>>> listener: com.cloud.hypervisor.xenserver.discoverer.XcpServerDiscoverer
>>>> 2019-07-25 04:01:22,782 DEBUG [c.c.u.n.NioConnection]
>>>> (pool-2-thread-1:null) (logid:) Closing socket Socket[addr=/
>> 172.30.32.16
>>>> ,port=38250,localport=8250]
>>>> 2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentAttache]
>>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq
>>>> 183-7541559051008607242: Waiting some more time because this is the
>> current
>>>> command
>>>> 2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentManagerImpl]
>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
>>>> listener: com.cloud.hypervisor.hyperv.discoverer.HypervServerDiscoverer
>>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentAttache]
>>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq
>>>> 183-7541559051008607242: Waiting some more time because this is the
>> current
>>>> command
>>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl]
>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
>>>> listener: com.cloud.deploy.DeploymentPlanningManagerImpl
>>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl]
>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
>>>> listener: com.cloud.network.security.SecurityGroupListener
>>>> 2019-07-25 04:01:22,783 INFO  [c.c.u.e.CSExceptionErrorCode]
>>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Could not find
>> exception:
>>>> com.cloud.exception.OperationTimedoutException in error code list for
>>>> exceptions
>>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl]
>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
>>>> listener: org.apache.cloudstack.engine.orchestration.NetworkOrchestrator
>>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl]
>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
>>>> listener: com.cloud.vm.ClusteredVirtualMachineManagerImpl
>>>> 2019-07-25 04:01:22,783 WARN  [c.c.a.m.AgentAttache]
>>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq
>>>> 183-7541559051008607242: Timed out on null
>>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl]
>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
>>>> listener: com.cloud.storage.listener.StoragePoolMonitor
>>>> 2019-07-25 04:01:22,784 DEBUG [c.c.a.m.AgentAttache]
>>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq
>>>> 183-7541559051008607242: Cancelling.
>>>> 2019-07-25 04:01:22,784 DEBUG [c.c.a.m.AgentManagerImpl]
>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
>>>> listener: com.cloud.storage.secondary.SecondaryStorageListener
>>>> 2019-07-25 04:01:22,784 DEBUG [c.c.a.m.AgentManagerImpl]
>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
>>>> listener: com.cloud.network.SshKeysDistriMonitor
>>>> 2019-07-25 04:01:22,785 DEBUG [o.a.c.s.RemoteHostEndPoint]
>>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Failed to send command,
>>>> due to Agent:183, com.cloud.exception.OperationTimedoutException:
>> Commands
>>>> 7541559051008607242 to Host 183 timed out after 3600
>>>> 2019-07-25 04:01:22,785 DEBUG [c.c.a.m.AgentManagerImpl]
>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
>>>> listener: com.cloud.network.router.VpcVirtualNetworkApplianceManagerImpl
>>>> 2019-07-25 04:01:22,785 DEBUG [c.c.a.m.AgentManagerImpl]
>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
>>>> listener: com.cloud.storage.download.DownloadListener
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 2019-07-25 04:01:22,785 ERROR [c.c.s.StatsCollector]
>>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Error trying to
>> retrieve
>>>> storage stats
>>>> com.cloud.utils.exception.CloudRuntimeException: Failed to send command,
>>>> due to Agent:183, com.cloud.exception.OperationTimedoutException:
>> Commands
>>>> 7541559051008607242 to Host 183 timed out after 3600
>>>>       at
>>>> 
>>>> 
>> org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(RemoteHostEndPoint.java:133)
>>>>       at
>>>> 
>>>> 
>> com.cloud.server.StatsCollector$StorageCollector.runInContext(StatsCollector.java:1139)
>>>>       at
>>>> 
>>>> 
>> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
>>>>       at
>>>> 
>>>> 
>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>>>>       at
>>>> 
>>>> 
>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>>>>       at
>>>> 
>>>> 
>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>>>>       at
>>>> 
>>>> 
>> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
>>>>       at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>>       at
>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>>>>       at
>>>> 
>>>> 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>>>>       at
>>>> 
>>>> 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>>>>       at
>>>> 
>>>> 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>>       at
>>>> 
>>>> 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>>       at java.lang.Thread.run(Thread.java:748)
>>>> 2019-07-25 04:01:22,786 DEBUG [c.c.a.m.AgentManagerImpl]
>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
>>>> listener: com.cloud.consoleproxy.ConsoleProxyListener
>>>> 2019-07-25 04:01:22,789 DEBUG [c.c.a.m.AgentManagerImpl]
>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
>>>> listener: com.cloud.storage.LocalStoragePoolListener
>>>> 2019-07-25 04:01:22,789 DEBUG [c.c.a.m.AgentManagerImpl]
>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
>>>> listener: com.cloud.storage.upload.UploadListener
>>>> 2019-07-25 04:01:22,790 DEBUG [c.c.a.m.AgentManagerImpl]
>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
>>>> listener: com.cloud.capacity.StorageCapacityListener
>>>> 2019-07-25 04:01:22,790 DEBUG [c.c.a.m.AgentManagerImpl]
>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
>>>> listener: com.cloud.capacity.ComputeCapacityListener
>>>> 2019-07-25 04:01:22,790 DEBUG [c.c.a.m.AgentManagerImpl]
>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
>>>> listener: com.cloud.network.SshKeysDistriMonitor
>>>> 2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl]
>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
>>>> listener: com.cloud.network.router.VirtualNetworkApplianceManagerImpl
>>>> 2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl]
>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
>>>> listener:
>>>> com.cloud.network.NetworkUsageManagerImpl$DirectNetworkStatsListener
>>>> 2019-07-25 04:01:22,791 DEBUG [c.c.n.NetworkUsageManagerImpl]
>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Disconnected called on
>> 183
>>>> with status Disconnected
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl]
>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
>>>> listener: com.cloud.agent.manager.AgentManagerImpl$BehindOnPingListener
>>>> 2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl]
>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to
>>>> listener: com.cloud.agent.manager.AgentManagerImpl$SetHostParamsListener
>>>> 2019-07-25 04:01:22,791 DEBUG [c.c.h.Status]
>> (AgentTaskPool-1:ctx-66de2057)
>>>> (logid:841d2a63) Transition:[Resource state = Enabled, Agent event =
>>>> ShutdownRequested, Host id = 183, name = s-2775-VM]
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Thanks and regards
>>>> Rakesh venkatesh
>>>> 
>> 
> 
> 
> -- 
> 
> Andrija Panić

Reply via email to