Re: [ClusterLabs] Resource switchover taking more time upon shutting off one of the node in a 2 node cluster

2018-03-26 Thread Ken Gaillot
On Sat, 2018-02-24 at 15:02 +0530, avinash sharma wrote:
> Hi Ken,
> 
> Thanks for the reply. 
> Here the resource in question is RoutingManager and floatingips which
> has no dependency on stateful_consul resource so i think we can
> ignore stateful_consul_promote failures.
> RoutingManager (MS) and aaaip, nataccessgwip, accessip,
> natcpcoregwip, cpcoreip from floatingips resource group, are the
> resources for which switcover action by crmd got delayed.

The delay happens around this:

Feb 21 21:42:26 [24021] IVM-1   lrmd:  warning: operation_finished:
stateful_wildfly_promote_0:869 - timed out after 30ms

So it's waiting on that, whether due to a constraint or for some other
reason. For example, if the transition that was started in got aborted,
the cluster has to wait for that result before starting a new
transition.

> Thanks,
> Avinash Sharma
> 
> On Fri, Feb 23, 2018 at 8:57 PM, Ken Gaillot 
> wrote:
> > On Fri, 2018-02-23 at 16:15 +0530, avinash sharma wrote:
> > > Subject: Switchover of resource(MS) 'RoutingManager' and resource
> > > group 'floatingips', which have 'colocation' and 'after'
> > constraints
> > > on each other, are taking around 5 minutes to get promoted when
> > node
> > > running master instance goes down.
> > 
> > 
> > 
> > When Pacemaker runs the resource agent, it will log any error
> > messages
> > that the agent prints. I didn't look at the entire log, but I
> > suspect
> > this is the cause, the promote action didn't succeed during that
> > time:
> > 
> > > Feb 21 21:37:40 [24021] IVM-1       lrmd:   notice:
> > > operation_finished:   stateful_consul_promote_0:864:stderr [
> > > ssh_exchange_identification: Connection closed by remote host
> > >  ]
> > > Feb 21 21:37:40 [24021] IVM-1       lrmd:   notice:
> > > operation_finished:   stateful_consul_promote_0:864:stderr [
> > > rsync: connection unexpectedly closed (0 bytes received so far)
> > > [sender] ]
> > > Feb 21 21:37:40 [24021] IVM-1       lrmd:   notice:
> > > operation_finished:   stateful_consul_promote_0:864:stderr [
> > > rsync error: unexplained error (code 255) at io.c(226)
> > [sender=3.1.2]
> > > ]
-- 
Ken Gaillot 
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Resource switchover taking more time upon shutting off one of the node in a 2 node cluster

2018-02-24 Thread avinash sharma
Hi Ken,

Thanks for the reply.
Here the resource in question is RoutingManager and floatingips which has
no dependency on stateful_consul resource so i think we can ignore
stateful_consul_promote
failures.
RoutingManager (MS) and aaaip, nataccessgwip, accessip, natcpcoregwip,
cpcoreip from floatingips resource group, are the resources for which
switcover action by crmd got delayed.

Thanks,
Avinash Sharma

On Fri, Feb 23, 2018 at 8:57 PM, Ken Gaillot  wrote:

> On Fri, 2018-02-23 at 16:15 +0530, avinash sharma wrote:
> > Subject: Switchover of resource(MS) 'RoutingManager' and resource
> > group 'floatingips', which have 'colocation' and 'after' constraints
> > on each other, are taking around 5 minutes to get promoted when node
> > running master instance goes down.
>
> 
>
> When Pacemaker runs the resource agent, it will log any error messages
> that the agent prints. I didn't look at the entire log, but I suspect
> this is the cause, the promote action didn't succeed during that time:
>
> > Feb 21 21:37:40 [24021] IVM-1   lrmd:   notice:
> > operation_finished:   stateful_consul_promote_0:864:stderr [
> > ssh_exchange_identification: Connection closed by remote host
> >  ]
> > Feb 21 21:37:40 [24021] IVM-1   lrmd:   notice:
> > operation_finished:   stateful_consul_promote_0:864:stderr [
> > rsync: connection unexpectedly closed (0 bytes received so far)
> > [sender] ]
> > Feb 21 21:37:40 [24021] IVM-1   lrmd:   notice:
> > operation_finished:   stateful_consul_promote_0:864:stderr [
> > rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.2]
> > ]
> --
> Ken Gaillot 
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
Thanks,
Avinash Sharma
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Resource switchover taking more time upon shutting off one of the node in a 2 node cluster

2018-02-23 Thread Ken Gaillot
On Fri, 2018-02-23 at 16:15 +0530, avinash sharma wrote:
> Subject: Switchover of resource(MS) 'RoutingManager' and resource
> group 'floatingips', which have 'colocation' and 'after' constraints
> on each other, are taking around 5 minutes to get promoted when node
> running master instance goes down.



When Pacemaker runs the resource agent, it will log any error messages
that the agent prints. I didn't look at the entire log, but I suspect
this is the cause, the promote action didn't succeed during that time:

> Feb 21 21:37:40 [24021] IVM-1       lrmd:   notice:
> operation_finished:   stateful_consul_promote_0:864:stderr [
> ssh_exchange_identification: Connection closed by remote host
>  ]
> Feb 21 21:37:40 [24021] IVM-1       lrmd:   notice:
> operation_finished:   stateful_consul_promote_0:864:stderr [
> rsync: connection unexpectedly closed (0 bytes received so far)
> [sender] ]
> Feb 21 21:37:40 [24021] IVM-1       lrmd:   notice:
> operation_finished:   stateful_consul_promote_0:864:stderr [
> rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.2]
> ] 
-- 
Ken Gaillot 
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org