Re: [ClusterLabs] Problem with stonith and starting services

2017-07-14 Thread Cesar Hernandez
> > > So if this is really the reason it would probably be worth > finding out what is really happening. > Thanks. Yes, I think this is really the reason. I fixed it one week ago and hasn't happened again ___ Users mailing list:

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-14 Thread Klaus Wenninger
On 07/12/2017 05:16 PM, Cesar Hernandez wrote: > >> El 6 jul 2017, a las 17:34, Ken Gaillot escribió: >> >> On 07/06/2017 10:27 AM, Cesar Hernandez wrote: It looks like a bug when the fenced node rejoins quickly enough that it is a member again before its fencing

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-12 Thread Cesar Hernandez
> El 6 jul 2017, a las 17:34, Ken Gaillot escribió: > > On 07/06/2017 10:27 AM, Cesar Hernandez wrote: >> >>> >>> It looks like a bug when the fenced node rejoins quickly enough that it >>> is a member again before its fencing confirmation has been sent. I know >>> there

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-07 Thread Cesar Hernandez
>>> >> >> Could it be caused if node 2 becomes rebooted and alive before the stonith >> script has finished? > > That *shouldn't* cause any problems, but I'm not sure what's happening > in this case. Maybe is the cause for it... My other servers installations had a slow stonith device and

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-06 Thread Klaus Wenninger
On 07/06/2017 04:48 PM, Ken Gaillot wrote: > On 07/06/2017 09:26 AM, Klaus Wenninger wrote: >> On 07/06/2017 04:20 PM, Cesar Hernandez wrote: If node2 is getting the notification of its own fencing, it wasn't successfully fenced. Successful fencing would render it incapacitated

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-06 Thread Ken Gaillot
On 07/06/2017 09:26 AM, Klaus Wenninger wrote: > On 07/06/2017 04:20 PM, Cesar Hernandez wrote: >>> If node2 is getting the notification of its own fencing, it wasn't >>> successfully fenced. Successful fencing would render it incapacitated >>> (powered down, or at least cut off from the network

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-06 Thread Klaus Wenninger
On 07/06/2017 04:20 PM, Cesar Hernandez wrote: >> If node2 is getting the notification of its own fencing, it wasn't >> successfully fenced. Successful fencing would render it incapacitated >> (powered down, or at least cut off from the network and any shared >> resources). > > Maybe I don't

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-06 Thread Cesar Hernandez
> > If node2 is getting the notification of its own fencing, it wasn't > successfully fenced. Successful fencing would render it incapacitated > (powered down, or at least cut off from the network and any shared > resources). Maybe I don't understand you, or maybe you don't understand me... ;)

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-06 Thread Ken Gaillot
On 07/06/2017 08:54 AM, Cesar Hernandez wrote: > >> >> So, the above log means that node1 decided that node2 needed to be >> fenced, requested fencing of node2, and received a successful result for >> the fencing, and yet node2 was not killed. >> >> Your fence agent should not return success

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-06 Thread Ken Gaillot
On 07/04/2017 08:28 AM, Cesar Hernandez wrote: > >> >> Agreed, I don't think it's multicast vs unicast. >> >> I can't see from this what's going wrong. Possibly node1 is trying to >> re-fence node2 when it comes back. Check that the fencing resources are >> configured correctly, and check whether

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-05 Thread Cesar Hernandez
> > >>> >>> But you definitely shouldn't have a fencing-agent that claims to have fenced >>> a node if it is not sure - rather the other way round if in doubt. >> >> > > True! Which is why I mentioned it to be dangerous. > But your fencing-agent is even more dangerous ;-) > > Well.. my

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-05 Thread Klaus Wenninger
On 07/05/2017 04:50 PM, Cesar Hernandez wrote: >> Not a good idea probably - and the reason for what you are experiencing ;-) >> If you have problems starting the nodes within a certain time-window >> disabling startup-fencing might be an option to consider although dangerous. >> But you

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-05 Thread Cesar Hernandez
> Not a good idea probably - and the reason for what you are experiencing ;-) > If you have problems starting the nodes within a certain time-window > disabling startup-fencing might be an option to consider although dangerous. > But you definitely shouldn't have a fencing-agent that claims to

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-05 Thread Klaus Wenninger
On 07/05/2017 04:22 PM, Cesar Hernandez wrote: > >> Are you logging which ones went OK and which failed. >> The script returns negatively if both go wrong? > The script always returns OK Not a good idea probably - and the reason for what you are experiencing ;-) If you have problems starting the

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-05 Thread Cesar Hernandez
> Are you logging which ones went OK and which failed. > The script returns negatively if both go wrong? The script always returns OK ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home:

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-05 Thread Klaus Wenninger
On 07/05/2017 08:50 AM, Cesar Hernandez wrote: >> Might be kind of a strange race as well ... but without knowing what the >> script actually does ... >> > The script first try to reboot the node using ssh, something like ssh $NODE > reboot -f, then runs a remote reboot using AWS api Are you

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-05 Thread Cesar Hernandez
> Might be kind of a strange race as well ... but without knowing what the > script actually does ... > The script first try to reboot the node using ssh, something like ssh $NODE reboot -f, then runs a remote reboot using AWS api Thanks ___

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-04 Thread Klaus Wenninger
On 07/04/2017 04:52 PM, Cesar Hernandez wrote: >> The first line is the consequence of the 2nd. >> And the 1st says that node2 just has seen some fencing-resource >> positively reporting to have fenced himself - which >> is why crmd is exiting in a way that it is not respawned >> by pacemakerd. >

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-04 Thread Cesar Hernandez
> The first line is the consequence of the 2nd. > And the 1st says that node2 just has seen some fencing-resource > positively reporting to have fenced himself - which > is why crmd is exiting in a way that it is not respawned > by pacemakerd. Thanks. But my script have a logfile, I've checked

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-04 Thread Klaus Wenninger
On 07/04/2017 03:28 PM, Cesar Hernandez wrote: >> Agreed, I don't think it's multicast vs unicast. >> >> I can't see from this what's going wrong. Possibly node1 is trying to >> re-fence node2 when it comes back. Check that the fencing resources are >> configured correctly, and check whether node1

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-04 Thread Cesar Hernandez
> > Agreed, I don't think it's multicast vs unicast. > > I can't see from this what's going wrong. Possibly node1 is trying to > re-fence node2 when it comes back. Check that the fencing resources are > configured correctly, and check whether node1 sees the first fencing > succeed. Thanks.

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-03 Thread Ken Gaillot
On 07/03/2017 02:34 AM, Cesar Hernandez wrote: > Hi > > I have installed a pacemaker cluster with two nodes. The same type of > installation has done before many times and the following error never > appeared before. The situation is the following: > > both nodes running cluster services >

[ClusterLabs] Problem with stonith and starting services

2017-07-03 Thread Cesar Hernandez
Hi I have installed a pacemaker cluster with two nodes. The same type of installation has done before many times and the following error never appeared before. The situation is the following: both nodes running cluster services stop pacemaker on node 1 stop pacemaker on node 2 start corosync