Re: [ClusterLabs] Resource switchover taking more time upon shutting off one of the node in a 2 node cluster
On Sat, 2018-02-24 at 15:02 +0530, avinash sharma wrote: > Hi Ken, > > Thanks for the reply. > Here the resource in question is RoutingManager and floatingips which > has no dependency on stateful_consul resource so i think we can > ignore stateful_consul_promote failures. > RoutingManager (MS) and aaaip, nataccessgwip, accessip, > natcpcoregwip, cpcoreip from floatingips resource group, are the > resources for which switcover action by crmd got delayed. The delay happens around this: Feb 21 21:42:26 [24021] IVM-1 lrmd: warning: operation_finished: stateful_wildfly_promote_0:869 - timed out after 30ms So it's waiting on that, whether due to a constraint or for some other reason. For example, if the transition that was started in got aborted, the cluster has to wait for that result before starting a new transition. > Thanks, > Avinash Sharma > > On Fri, Feb 23, 2018 at 8:57 PM, Ken Gaillot> wrote: > > On Fri, 2018-02-23 at 16:15 +0530, avinash sharma wrote: > > > Subject: Switchover of resource(MS) 'RoutingManager' and resource > > > group 'floatingips', which have 'colocation' and 'after' > > constraints > > > on each other, are taking around 5 minutes to get promoted when > > node > > > running master instance goes down. > > > > > > > > When Pacemaker runs the resource agent, it will log any error > > messages > > that the agent prints. I didn't look at the entire log, but I > > suspect > > this is the cause, the promote action didn't succeed during that > > time: > > > > > Feb 21 21:37:40 [24021] IVM-1 lrmd: notice: > > > operation_finished: stateful_consul_promote_0:864:stderr [ > > > ssh_exchange_identification: Connection closed by remote host > > > ] > > > Feb 21 21:37:40 [24021] IVM-1 lrmd: notice: > > > operation_finished: stateful_consul_promote_0:864:stderr [ > > > rsync: connection unexpectedly closed (0 bytes received so far) > > > [sender] ] > > > Feb 21 21:37:40 [24021] IVM-1 lrmd: notice: > > > operation_finished: stateful_consul_promote_0:864:stderr [ > > > rsync error: unexplained error (code 255) at io.c(226) > > [sender=3.1.2] > > > ] -- Ken Gaillot ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Resource switchover taking more time upon shutting off one of the node in a 2 node cluster
Hi Ken, Thanks for the reply. Here the resource in question is RoutingManager and floatingips which has no dependency on stateful_consul resource so i think we can ignore stateful_consul_promote failures. RoutingManager (MS) and aaaip, nataccessgwip, accessip, natcpcoregwip, cpcoreip from floatingips resource group, are the resources for which switcover action by crmd got delayed. Thanks, Avinash Sharma On Fri, Feb 23, 2018 at 8:57 PM, Ken Gaillotwrote: > On Fri, 2018-02-23 at 16:15 +0530, avinash sharma wrote: > > Subject: Switchover of resource(MS) 'RoutingManager' and resource > > group 'floatingips', which have 'colocation' and 'after' constraints > > on each other, are taking around 5 minutes to get promoted when node > > running master instance goes down. > > > > When Pacemaker runs the resource agent, it will log any error messages > that the agent prints. I didn't look at the entire log, but I suspect > this is the cause, the promote action didn't succeed during that time: > > > Feb 21 21:37:40 [24021] IVM-1 lrmd: notice: > > operation_finished: stateful_consul_promote_0:864:stderr [ > > ssh_exchange_identification: Connection closed by remote host > > ] > > Feb 21 21:37:40 [24021] IVM-1 lrmd: notice: > > operation_finished: stateful_consul_promote_0:864:stderr [ > > rsync: connection unexpectedly closed (0 bytes received so far) > > [sender] ] > > Feb 21 21:37:40 [24021] IVM-1 lrmd: notice: > > operation_finished: stateful_consul_promote_0:864:stderr [ > > rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.2] > > ] > -- > Ken Gaillot > ___ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- Thanks, Avinash Sharma ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Resource switchover taking more time upon shutting off one of the node in a 2 node cluster
On Fri, 2018-02-23 at 16:15 +0530, avinash sharma wrote: > Subject: Switchover of resource(MS) 'RoutingManager' and resource > group 'floatingips', which have 'colocation' and 'after' constraints > on each other, are taking around 5 minutes to get promoted when node > running master instance goes down. When Pacemaker runs the resource agent, it will log any error messages that the agent prints. I didn't look at the entire log, but I suspect this is the cause, the promote action didn't succeed during that time: > Feb 21 21:37:40 [24021] IVM-1 lrmd: notice: > operation_finished: stateful_consul_promote_0:864:stderr [ > ssh_exchange_identification: Connection closed by remote host > ] > Feb 21 21:37:40 [24021] IVM-1 lrmd: notice: > operation_finished: stateful_consul_promote_0:864:stderr [ > rsync: connection unexpectedly closed (0 bytes received so far) > [sender] ] > Feb 21 21:37:40 [24021] IVM-1 lrmd: notice: > operation_finished: stateful_consul_promote_0:864:stderr [ > rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.2] > ] -- Ken Gaillot___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org