True, but I thought since mgt sends disconnect to ssvm, I thought it will reset 
the interface on which is connecting

Sent from my iPhone

> On 25-Jul-2019, at 4:57 PM, Andrija Panic <[email protected]> wrote:
> 
> In your previous mail, I understond that you used the OS tool "ping" and
> NOT refering to internal ACS pings?
> "Out of all these, the ping drops were observed from MGT server to ssvm and
> mgt server to nodes. Basically all nodes lost connection. Then it recovered
> itself after 1 minute."
> 
> 
>> On Thu, 25 Jul 2019 at 16:55, Rakesh v <[email protected]> wrote:
>> 
>> The ping between mgt server and ssvm fails because mgt sends disconnect
>> message to all nodes. If you look at the logs I pasted in first email, the
>> mgt server thinks ssvm is lagging behind on ping and sends a disconnect
>> message without investigation for all nodes. Also it happens at the
>> beginning of every hour.
>> 
>> 
>> So I'm sure network is not the issue here.
>> 
>> Sent from my iPhone
>> 
>>> On 25-Jul-2019, at 4:46 PM, Andrija Panic <[email protected]>
>> wrote:
>>> 
>>> since basic network connectivity (ping failures) was down between mgmts
>> and
>>> nodes (and SSVM on it)  - I would point my finger to your networking
>>> equipment - i.e. I expect zero problems with ACS (since pings fail).
>>> 
>>> Let us know how it goes.
>>> 
>>> Andrija
>>> 
>>>> On Thu, 25 Jul 2019 at 16:04, Rakesh v <[email protected]>
>> wrote:
>>>> 
>>>> Yes I was monitoring it continuously. Below are the steps which I was
>>>> doing when issue happened
>>>> 
>>>> 
>>>> 1. Ping from MGT server to ssvm
>>>> 2. Ping from ssvm to secondary storage ip
>>>> 3. Ping from ssvm to public IP like 8.8.8.8
>>>> 4. Ping from MGT server to node in which ssvm was running
>>>> 
>>>> 
>>>> Out of all these, the ping drops were observed from MGT server to ssvm
>> and
>>>> mgt server to nodes. Basically all nodes lost connection. Then it
>> recovered
>>>> itself after 1 minute.
>>>> 
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>>> On 25-Jul-2019, at 3:48 PM, Andrija Panic <[email protected]>
>>>> wrote:
>>>>> 
>>>>> Can you observe the status of SSVM (is it
>>>> UP/Connecting/Disconnected/Down)
>>>>> while you have issues?
>>>>> 
>>>>> I would advise checking your Secondary Storage itself - and also
>> running
>>>>> the SSVM diagnose script  /usr/local/cloud/systemvm/ssvm-check.sh -
>>>> observe
>>>>> if any errors with NFS or others.
>>>>> 
>>>>> Lastly - and don't laugh - check that you don't have issues with
>>>> networking
>>>>> equipment (some of us had VEEEERY strange issues in connectivity some
>>>> years
>>>>> ago with crappy QCT/Quanta Switches in MLAG setup)
>>>>> 
>>>>> Andrija
>>>>> 
>>>>>> On Thu, 25 Jul 2019 at 15:42, Rakesh v <[email protected]>
>>>> wrote:
>>>>>> 
>>>>>> Yes I have set the ip's of the three MGT servers in the "host" field
>>>>>> 
>>>>>> Sent from my iPhone
>>>>>> 
>>>>>>> On 25-Jul-2019, at 2:14 PM, Pierre-Luc Dion <[email protected]>
>>>> wrote:
>>>>>>> 
>>>>>>> Do you have a load balancer in front of cloudstack? Did you set the
>>>>>> global
>>>>>>> settings "host" to the ip of the mgmt server?
>>>>>>> 
>>>>>>> 
>>>>>>> Le jeu. 25 juill. 2019 03 h 24, Rakesh Venkatesh <
>>>>>> [email protected]>
>>>>>>> a écrit :
>>>>>>> 
>>>>>>>> Hello People
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I have a strange issue where mgt server times out to send a command
>> to
>>>>>>>> secondary storage VM every hour and because of this UI won't be
>>>>>> accessible
>>>>>>>> for a short duration of time. Sometimes I have to restart mgt server
>>>> to
>>>>>> get
>>>>>>>> it back to working state and sometimes I don't need to restart it. I
>>>>>> also
>>>>>>>> see some exceptions while fetching the storage stats.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> The log says secondary storage VM is lagging behind mgt server in
>> ping
>>>>>> and
>>>>>>>> it sends a disconnect message to other components. Can you let me
>> know
>>>>>> how
>>>>>>>> to troubleshoot this issue? I destroyed the secondary storage VM but
>>>> the
>>>>>>>> issue still persists. I checked the date/time on the mgt server and
>>>> SSVM
>>>>>>>> and they are same. This is happening for quite a few days now. Below
>>>> are
>>>>>>>> the logs
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 2019-07-25 04:01:22,769 INFO  [c.c.a.m.AgentManagerImpl]
>>>>>>>> (AgentMonitor-1:ctx-c33dbe74) (logid:5442158c) Found the following
>>>>>> agents
>>>>>>>> behind on ping: [183]
>>>>>>>> 2019-07-25 04:01:22,775 WARN  [c.c.a.m.AgentManagerImpl]
>>>>>>>> (AgentMonitor-1:ctx-c33dbe74) (logid:5442158c) Disconnect agent for
>>>>>>>> CPVM/SSVM due to physical connection close. host: 183
>>>>>>>> 2019-07-25 04:01:22,778 INFO  [c.c.a.m.AgentManagerImpl]
>>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Host 183 is
>>>>>> disconnecting
>>>>>>>> with event ShutdownRequested
>>>>>>>> 2019-07-25 04:01:22,781 DEBUG [c.c.a.m.AgentManagerImpl]
>>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) The next status of
>>>> agent
>>>>>>>> 183is Disconnected, current status is Up
>>>>>>>> 2019-07-25 04:01:22,781 DEBUG [c.c.a.m.AgentManagerImpl]
>>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Deregistering link
>> for
>>>>>> 183
>>>>>>>> with state Disconnected
>>>>>>>> 2019-07-25 04:01:22,781 DEBUG [c.c.a.m.AgentManagerImpl]
>>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Remove Agent : 183
>>>>>>>> 2019-07-25 04:01:22,781 DEBUG [c.c.a.m.ConnectedAgentAttache]
>>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Processing
>> Disconnect.
>>>>>>>> 2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentAttache]
>>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Seq
>>>>>>>> 183-7541559051008607242: Sending disconnect to class
>>>>>>>> com.cloud.agent.manager.SynchronousListener
>>>>>>>> 2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentManagerImpl]
>>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
>> to
>>>>>>>> listener:
>>>> com.cloud.hypervisor.xenserver.discoverer.XcpServerDiscoverer
>>>>>>>> 2019-07-25 04:01:22,782 DEBUG [c.c.u.n.NioConnection]
>>>>>>>> (pool-2-thread-1:null) (logid:) Closing socket Socket[addr=/
>>>>>> 172.30.32.16
>>>>>>>> ,port=38250,localport=8250]
>>>>>>>> 2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentAttache]
>>>>>>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq
>>>>>>>> 183-7541559051008607242: Waiting some more time because this is the
>>>>>> current
>>>>>>>> command
>>>>>>>> 2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentManagerImpl]
>>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
>> to
>>>>>>>> listener:
>>>> com.cloud.hypervisor.hyperv.discoverer.HypervServerDiscoverer
>>>>>>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentAttache]
>>>>>>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq
>>>>>>>> 183-7541559051008607242: Waiting some more time because this is the
>>>>>> current
>>>>>>>> command
>>>>>>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl]
>>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
>> to
>>>>>>>> listener: com.cloud.deploy.DeploymentPlanningManagerImpl
>>>>>>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl]
>>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
>> to
>>>>>>>> listener: com.cloud.network.security.SecurityGroupListener
>>>>>>>> 2019-07-25 04:01:22,783 INFO  [c.c.u.e.CSExceptionErrorCode]
>>>>>>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Could not find
>>>>>> exception:
>>>>>>>> com.cloud.exception.OperationTimedoutException in error code list
>> for
>>>>>>>> exceptions
>>>>>>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl]
>>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
>> to
>>>>>>>> listener:
>>>> org.apache.cloudstack.engine.orchestration.NetworkOrchestrator
>>>>>>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl]
>>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
>> to
>>>>>>>> listener: com.cloud.vm.ClusteredVirtualMachineManagerImpl
>>>>>>>> 2019-07-25 04:01:22,783 WARN  [c.c.a.m.AgentAttache]
>>>>>>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq
>>>>>>>> 183-7541559051008607242: Timed out on null
>>>>>>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl]
>>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
>> to
>>>>>>>> listener: com.cloud.storage.listener.StoragePoolMonitor
>>>>>>>> 2019-07-25 04:01:22,784 DEBUG [c.c.a.m.AgentAttache]
>>>>>>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq
>>>>>>>> 183-7541559051008607242: Cancelling.
>>>>>>>> 2019-07-25 04:01:22,784 DEBUG [c.c.a.m.AgentManagerImpl]
>>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
>> to
>>>>>>>> listener: com.cloud.storage.secondary.SecondaryStorageListener
>>>>>>>> 2019-07-25 04:01:22,784 DEBUG [c.c.a.m.AgentManagerImpl]
>>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
>> to
>>>>>>>> listener: com.cloud.network.SshKeysDistriMonitor
>>>>>>>> 2019-07-25 04:01:22,785 DEBUG [o.a.c.s.RemoteHostEndPoint]
>>>>>>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Failed to send
>>>> command,
>>>>>>>> due to Agent:183, com.cloud.exception.OperationTimedoutException:
>>>>>> Commands
>>>>>>>> 7541559051008607242 to Host 183 timed out after 3600
>>>>>>>> 2019-07-25 04:01:22,785 DEBUG [c.c.a.m.AgentManagerImpl]
>>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
>> to
>>>>>>>> listener:
>>>> com.cloud.network.router.VpcVirtualNetworkApplianceManagerImpl
>>>>>>>> 2019-07-25 04:01:22,785 DEBUG [c.c.a.m.AgentManagerImpl]
>>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
>> to
>>>>>>>> listener: com.cloud.storage.download.DownloadListener
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 2019-07-25 04:01:22,785 ERROR [c.c.s.StatsCollector]
>>>>>>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Error trying to
>>>>>> retrieve
>>>>>>>> storage stats
>>>>>>>> com.cloud.utils.exception.CloudRuntimeException: Failed to send
>>>> command,
>>>>>>>> due to Agent:183, com.cloud.exception.OperationTimedoutException:
>>>>>> Commands
>>>>>>>> 7541559051008607242 to Host 183 timed out after 3600
>>>>>>>>     at
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(RemoteHostEndPoint.java:133)
>>>>>>>>     at
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> com.cloud.server.StatsCollector$StorageCollector.runInContext(StatsCollector.java:1139)
>>>>>>>>     at
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
>>>>>>>>     at
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>>>>>>>>     at
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>>>>>>>>     at
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>>>>>>>>     at
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
>>>>>>>>     at
>>>>>>>> 
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>>>>>>     at
>>>>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>>>>>>>>     at
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>>>>>>>>     at
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>>>>>>>>     at
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>>>>>>     at
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>>>>>>     at java.lang.Thread.run(Thread.java:748)
>>>>>>>> 2019-07-25 04:01:22,786 DEBUG [c.c.a.m.AgentManagerImpl]
>>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
>> to
>>>>>>>> listener: com.cloud.consoleproxy.ConsoleProxyListener
>>>>>>>> 2019-07-25 04:01:22,789 DEBUG [c.c.a.m.AgentManagerImpl]
>>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
>> to
>>>>>>>> listener: com.cloud.storage.LocalStoragePoolListener
>>>>>>>> 2019-07-25 04:01:22,789 DEBUG [c.c.a.m.AgentManagerImpl]
>>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
>> to
>>>>>>>> listener: com.cloud.storage.upload.UploadListener
>>>>>>>> 2019-07-25 04:01:22,790 DEBUG [c.c.a.m.AgentManagerImpl]
>>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
>> to
>>>>>>>> listener: com.cloud.capacity.StorageCapacityListener
>>>>>>>> 2019-07-25 04:01:22,790 DEBUG [c.c.a.m.AgentManagerImpl]
>>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
>> to
>>>>>>>> listener: com.cloud.capacity.ComputeCapacityListener
>>>>>>>> 2019-07-25 04:01:22,790 DEBUG [c.c.a.m.AgentManagerImpl]
>>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
>> to
>>>>>>>> listener: com.cloud.network.SshKeysDistriMonitor
>>>>>>>> 2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl]
>>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
>> to
>>>>>>>> listener:
>> com.cloud.network.router.VirtualNetworkApplianceManagerImpl
>>>>>>>> 2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl]
>>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
>> to
>>>>>>>> listener:
>>>>>>>> com.cloud.network.NetworkUsageManagerImpl$DirectNetworkStatsListener
>>>>>>>> 2019-07-25 04:01:22,791 DEBUG [c.c.n.NetworkUsageManagerImpl]
>>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Disconnected called
>> on
>>>>>> 183
>>>>>>>> with status Disconnected
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl]
>>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
>> to
>>>>>>>> listener:
>>>> com.cloud.agent.manager.AgentManagerImpl$BehindOnPingListener
>>>>>>>> 2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl]
>>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
>> to
>>>>>>>> listener:
>>>> com.cloud.agent.manager.AgentManagerImpl$SetHostParamsListener
>>>>>>>> 2019-07-25 04:01:22,791 DEBUG [c.c.h.Status]
>>>>>> (AgentTaskPool-1:ctx-66de2057)
>>>>>>>> (logid:841d2a63) Transition:[Resource state = Enabled, Agent event =
>>>>>>>> ShutdownRequested, Host id = 183, name = s-2775-VM]
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Thanks and regards
>>>>>>>> Rakesh venkatesh
>>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> 
>>>>> Andrija Panić
>>>> 
>>> 
>>> 
>>> --
>>> 
>>> Andrija Panić
>> 
> 
> 
> -- 
> 
> Andrija Panić

Reply via email to