In your previous mail, I understond that you used the OS tool "ping" and
NOT refering to internal ACS pings?
"Out of all these, the ping drops were observed from MGT server to ssvm and
mgt server to nodes. Basically all nodes lost connection. Then it recovered
itself after 1 minute."


On Thu, 25 Jul 2019 at 16:55, Rakesh v <www.rakeshv....@gmail.com> wrote:

> The ping between mgt server and ssvm fails because mgt sends disconnect
> message to all nodes. If you look at the logs I pasted in first email, the
> mgt server thinks ssvm is lagging behind on ping and sends a disconnect
> message without investigation for all nodes. Also it happens at the
> beginning of every hour.
>
>
> So I'm sure network is not the issue here.
>
> Sent from my iPhone
>
> > On 25-Jul-2019, at 4:46 PM, Andrija Panic <andrija.pa...@gmail.com>
> wrote:
> >
> > since basic network connectivity (ping failures) was down between mgmts
> and
> > nodes (and SSVM on it)  - I would point my finger to your networking
> > equipment - i.e. I expect zero problems with ACS (since pings fail).
> >
> > Let us know how it goes.
> >
> > Andrija
> >
> >> On Thu, 25 Jul 2019 at 16:04, Rakesh v <www.rakeshv....@gmail.com>
> wrote:
> >>
> >> Yes I was monitoring it continuously. Below are the steps which I was
> >> doing when issue happened
> >>
> >>
> >> 1. Ping from MGT server to ssvm
> >> 2. Ping from ssvm to secondary storage ip
> >> 3. Ping from ssvm to public IP like 8.8.8.8
> >> 4. Ping from MGT server to node in which ssvm was running
> >>
> >>
> >> Out of all these, the ping drops were observed from MGT server to ssvm
> and
> >> mgt server to nodes. Basically all nodes lost connection. Then it
> recovered
> >> itself after 1 minute.
> >>
> >>
> >> Sent from my iPhone
> >>
> >>> On 25-Jul-2019, at 3:48 PM, Andrija Panic <andrija.pa...@gmail.com>
> >> wrote:
> >>>
> >>> Can you observe the status of SSVM (is it
> >> UP/Connecting/Disconnected/Down)
> >>> while you have issues?
> >>>
> >>> I would advise checking your Secondary Storage itself - and also
> running
> >>> the SSVM diagnose script  /usr/local/cloud/systemvm/ssvm-check.sh -
> >> observe
> >>> if any errors with NFS or others.
> >>>
> >>> Lastly - and don't laugh - check that you don't have issues with
> >> networking
> >>> equipment (some of us had VEEEERY strange issues in connectivity some
> >> years
> >>> ago with crappy QCT/Quanta Switches in MLAG setup)
> >>>
> >>> Andrija
> >>>
> >>>> On Thu, 25 Jul 2019 at 15:42, Rakesh v <www.rakeshv....@gmail.com>
> >> wrote:
> >>>>
> >>>> Yes I have set the ip's of the three MGT servers in the "host" field
> >>>>
> >>>> Sent from my iPhone
> >>>>
> >>>>> On 25-Jul-2019, at 2:14 PM, Pierre-Luc Dion <pdion...@apache.org>
> >> wrote:
> >>>>>
> >>>>> Do you have a load balancer in front of cloudstack? Did you set the
> >>>> global
> >>>>> settings "host" to the ip of the mgmt server?
> >>>>>
> >>>>>
> >>>>> Le jeu. 25 juill. 2019 03 h 24, Rakesh Venkatesh <
> >>>> www.rakeshv....@gmail.com>
> >>>>> a écrit :
> >>>>>
> >>>>>> Hello People
> >>>>>>
> >>>>>>
> >>>>>> I have a strange issue where mgt server times out to send a command
> to
> >>>>>> secondary storage VM every hour and because of this UI won't be
> >>>> accessible
> >>>>>> for a short duration of time. Sometimes I have to restart mgt server
> >> to
> >>>> get
> >>>>>> it back to working state and sometimes I don't need to restart it. I
> >>>> also
> >>>>>> see some exceptions while fetching the storage stats.
> >>>>>>
> >>>>>>
> >>>>>> The log says secondary storage VM is lagging behind mgt server in
> ping
> >>>> and
> >>>>>> it sends a disconnect message to other components. Can you let me
> know
> >>>> how
> >>>>>> to troubleshoot this issue? I destroyed the secondary storage VM but
> >> the
> >>>>>> issue still persists. I checked the date/time on the mgt server and
> >> SSVM
> >>>>>> and they are same. This is happening for quite a few days now. Below
> >> are
> >>>>>> the logs
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> 2019-07-25 04:01:22,769 INFO  [c.c.a.m.AgentManagerImpl]
> >>>>>> (AgentMonitor-1:ctx-c33dbe74) (logid:5442158c) Found the following
> >>>> agents
> >>>>>> behind on ping: [183]
> >>>>>> 2019-07-25 04:01:22,775 WARN  [c.c.a.m.AgentManagerImpl]
> >>>>>> (AgentMonitor-1:ctx-c33dbe74) (logid:5442158c) Disconnect agent for
> >>>>>> CPVM/SSVM due to physical connection close. host: 183
> >>>>>> 2019-07-25 04:01:22,778 INFO  [c.c.a.m.AgentManagerImpl]
> >>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Host 183 is
> >>>> disconnecting
> >>>>>> with event ShutdownRequested
> >>>>>> 2019-07-25 04:01:22,781 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) The next status of
> >> agent
> >>>>>> 183is Disconnected, current status is Up
> >>>>>> 2019-07-25 04:01:22,781 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Deregistering link
> for
> >>>> 183
> >>>>>> with state Disconnected
> >>>>>> 2019-07-25 04:01:22,781 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Remove Agent : 183
> >>>>>> 2019-07-25 04:01:22,781 DEBUG [c.c.a.m.ConnectedAgentAttache]
> >>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Processing
> Disconnect.
> >>>>>> 2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentAttache]
> >>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Seq
> >>>>>> 183-7541559051008607242: Sending disconnect to class
> >>>>>> com.cloud.agent.manager.SynchronousListener
> >>>>>> 2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> to
> >>>>>> listener:
> >> com.cloud.hypervisor.xenserver.discoverer.XcpServerDiscoverer
> >>>>>> 2019-07-25 04:01:22,782 DEBUG [c.c.u.n.NioConnection]
> >>>>>> (pool-2-thread-1:null) (logid:) Closing socket Socket[addr=/
> >>>> 172.30.32.16
> >>>>>> ,port=38250,localport=8250]
> >>>>>> 2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentAttache]
> >>>>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq
> >>>>>> 183-7541559051008607242: Waiting some more time because this is the
> >>>> current
> >>>>>> command
> >>>>>> 2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> to
> >>>>>> listener:
> >> com.cloud.hypervisor.hyperv.discoverer.HypervServerDiscoverer
> >>>>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentAttache]
> >>>>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq
> >>>>>> 183-7541559051008607242: Waiting some more time because this is the
> >>>> current
> >>>>>> command
> >>>>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> to
> >>>>>> listener: com.cloud.deploy.DeploymentPlanningManagerImpl
> >>>>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> to
> >>>>>> listener: com.cloud.network.security.SecurityGroupListener
> >>>>>> 2019-07-25 04:01:22,783 INFO  [c.c.u.e.CSExceptionErrorCode]
> >>>>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Could not find
> >>>> exception:
> >>>>>> com.cloud.exception.OperationTimedoutException in error code list
> for
> >>>>>> exceptions
> >>>>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> to
> >>>>>> listener:
> >> org.apache.cloudstack.engine.orchestration.NetworkOrchestrator
> >>>>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> to
> >>>>>> listener: com.cloud.vm.ClusteredVirtualMachineManagerImpl
> >>>>>> 2019-07-25 04:01:22,783 WARN  [c.c.a.m.AgentAttache]
> >>>>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq
> >>>>>> 183-7541559051008607242: Timed out on null
> >>>>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> to
> >>>>>> listener: com.cloud.storage.listener.StoragePoolMonitor
> >>>>>> 2019-07-25 04:01:22,784 DEBUG [c.c.a.m.AgentAttache]
> >>>>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq
> >>>>>> 183-7541559051008607242: Cancelling.
> >>>>>> 2019-07-25 04:01:22,784 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> to
> >>>>>> listener: com.cloud.storage.secondary.SecondaryStorageListener
> >>>>>> 2019-07-25 04:01:22,784 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> to
> >>>>>> listener: com.cloud.network.SshKeysDistriMonitor
> >>>>>> 2019-07-25 04:01:22,785 DEBUG [o.a.c.s.RemoteHostEndPoint]
> >>>>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Failed to send
> >> command,
> >>>>>> due to Agent:183, com.cloud.exception.OperationTimedoutException:
> >>>> Commands
> >>>>>> 7541559051008607242 to Host 183 timed out after 3600
> >>>>>> 2019-07-25 04:01:22,785 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> to
> >>>>>> listener:
> >> com.cloud.network.router.VpcVirtualNetworkApplianceManagerImpl
> >>>>>> 2019-07-25 04:01:22,785 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> to
> >>>>>> listener: com.cloud.storage.download.DownloadListener
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> 2019-07-25 04:01:22,785 ERROR [c.c.s.StatsCollector]
> >>>>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Error trying to
> >>>> retrieve
> >>>>>> storage stats
> >>>>>> com.cloud.utils.exception.CloudRuntimeException: Failed to send
> >> command,
> >>>>>> due to Agent:183, com.cloud.exception.OperationTimedoutException:
> >>>> Commands
> >>>>>> 7541559051008607242 to Host 183 timed out after 3600
> >>>>>>      at
> >>>>>>
> >>>>>>
> >>>>
> >>
> org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(RemoteHostEndPoint.java:133)
> >>>>>>      at
> >>>>>>
> >>>>>>
> >>>>
> >>
> com.cloud.server.StatsCollector$StorageCollector.runInContext(StatsCollector.java:1139)
> >>>>>>      at
> >>>>>>
> >>>>>>
> >>>>
> >>
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
> >>>>>>      at
> >>>>>>
> >>>>>>
> >>>>
> >>
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
> >>>>>>      at
> >>>>>>
> >>>>>>
> >>>>
> >>
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
> >>>>>>      at
> >>>>>>
> >>>>>>
> >>>>
> >>
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
> >>>>>>      at
> >>>>>>
> >>>>>>
> >>>>
> >>
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
> >>>>>>      at
> >>>>>>
> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> >>>>>>      at
> >>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> >>>>>>      at
> >>>>>>
> >>>>>>
> >>>>
> >>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> >>>>>>      at
> >>>>>>
> >>>>>>
> >>>>
> >>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> >>>>>>      at
> >>>>>>
> >>>>>>
> >>>>
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> >>>>>>      at
> >>>>>>
> >>>>>>
> >>>>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> >>>>>>      at java.lang.Thread.run(Thread.java:748)
> >>>>>> 2019-07-25 04:01:22,786 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> to
> >>>>>> listener: com.cloud.consoleproxy.ConsoleProxyListener
> >>>>>> 2019-07-25 04:01:22,789 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> to
> >>>>>> listener: com.cloud.storage.LocalStoragePoolListener
> >>>>>> 2019-07-25 04:01:22,789 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> to
> >>>>>> listener: com.cloud.storage.upload.UploadListener
> >>>>>> 2019-07-25 04:01:22,790 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> to
> >>>>>> listener: com.cloud.capacity.StorageCapacityListener
> >>>>>> 2019-07-25 04:01:22,790 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> to
> >>>>>> listener: com.cloud.capacity.ComputeCapacityListener
> >>>>>> 2019-07-25 04:01:22,790 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> to
> >>>>>> listener: com.cloud.network.SshKeysDistriMonitor
> >>>>>> 2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> to
> >>>>>> listener:
> com.cloud.network.router.VirtualNetworkApplianceManagerImpl
> >>>>>> 2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> to
> >>>>>> listener:
> >>>>>> com.cloud.network.NetworkUsageManagerImpl$DirectNetworkStatsListener
> >>>>>> 2019-07-25 04:01:22,791 DEBUG [c.c.n.NetworkUsageManagerImpl]
> >>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Disconnected called
> on
> >>>> 183
> >>>>>> with status Disconnected
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> 2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> to
> >>>>>> listener:
> >> com.cloud.agent.manager.AgentManagerImpl$BehindOnPingListener
> >>>>>> 2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> to
> >>>>>> listener:
> >> com.cloud.agent.manager.AgentManagerImpl$SetHostParamsListener
> >>>>>> 2019-07-25 04:01:22,791 DEBUG [c.c.h.Status]
> >>>> (AgentTaskPool-1:ctx-66de2057)
> >>>>>> (logid:841d2a63) Transition:[Resource state = Enabled, Agent event =
> >>>>>> ShutdownRequested, Host id = 183, name = s-2775-VM]
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Thanks and regards
> >>>>>> Rakesh venkatesh
> >>>>>>
> >>>>
> >>>
> >>>
> >>> --
> >>>
> >>> Andrija Panić
> >>
> >
> >
> > --
> >
> > Andrija Panić
>


-- 

Andrija Panić

Reply via email to