Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Valentin Vidic
On Mon, Mar 12, 2018 at 04:31:46PM +0100, Klaus Wenninger wrote: > Nope. Whenever the cluster is completely down... > Otherwise nodes would come up - if not seeing each other - > happily with both starting all services because they don't > know what already had been running on the other node. >

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Klaus Wenninger
On 03/12/2018 04:17 PM, Valentin Vidic wrote: > On Mon, Mar 12, 2018 at 01:58:21PM +0100, Klaus Wenninger wrote: >> But isn't dlm directly interfering with corosync so >> that it would get the quorum state from there? >> As you have 2-node set probably on a 2-node-cluster >> this would - after

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Valentin Vidic
On Mon, Mar 12, 2018 at 01:58:21PM +0100, Klaus Wenninger wrote: > But isn't dlm directly interfering with corosync so > that it would get the quorum state from there? > As you have 2-node set probably on a 2-node-cluster > this would - after both nodes down - wait for all > nodes up first. Isn't

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Klaus Wenninger
On 03/12/2018 01:44 PM, Muhammad Sharfuddin wrote: > Hi Klaus, > > primitive sbd-stonith stonith:external/sbd \ >     op monitor interval=3000 timeout=20 \ >     op start interval=0 timeout=240 \ >     op stop interval=0 timeout=100 \ >     params sbd_device="/dev/mapper/sbd" \ >   

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Muhammad Sharfuddin
Hi Klaus, primitive sbd-stonith stonith:external/sbd \     op monitor interval=3000 timeout=20 \     op start interval=0 timeout=240 \     op stop interval=0 timeout=100 \     params sbd_device="/dev/mapper/sbd" \     meta target-role=Started property cib-bootstrap-options:

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Klaus Wenninger
Hi Muhammad! Could you be a little bit more elaborate on your fencing-setup! I read about you using SBD but I don't see any sbd-fencing-resource. For the case you wanted to use watchdog-fencing with SBD this would require stonith-watchdog-timeout property to be set. But watchdog-fencing relies on

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Muhammad Sharfuddin
@Ulrich, issue I am facing is that when both nodes get crashed and then if I keep one node offline, the online node doesn't start the ocfs2 resources. -- Regards, Muhammad Sharfuddin On 3/12/2018 4:51 PM, Muhammad Sharfuddin wrote: Hello Gang, as informed, previously cluster was fixed to

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Muhammad Sharfuddin
Hello Gang, as informed, previously cluster was fixed to start the ocfs2 resources by a) crm resource start dlm b) mount/umount  the ocfs2 file system manually. (this step was the fix) and then starting the clone group(which include dlm, ocfs2 file systems) worked fine: c) crm resource

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Gang He
>>> > Hello Gang, > > to follow your instructions, I started the dlm resource via: > > crm resource start dlm > > then mount/unmount the ocfs2 file system manually..(which seems to be > the fix of the situation). > > Now resources are getting started properly on a single node.. I am

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Muhammad Sharfuddin
Hello Gang, to follow your instructions, I started the dlm resource via:     crm resource start dlm then mount/unmount the ocfs2 file system manually..(which seems to be the fix of the situation). Now resources are getting started properly on a single node.. I am happy as the issue is

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Gang He
Hello Muhammad, Usually, ocfs2 resource startup failure is caused by mount command timeout (or hanged). The sample debugging method is, remove ocfs2 resource from crm first, then mount this file system manually, see if the mount command will be timeout or hanged. If this command is hanged,

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-11 Thread Muhammad Sharfuddin
On 3/12/2018 7:32 AM, Gang He wrote: Hello Muhammad, I think this problem is not in ocfs2, the cause looks like the cluster quorum is missed. For two-node cluster (does not three-node cluster), if one node is offline, the quorum will be missed by default. So, you should configure two-node

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-11 Thread Gang He
Hello Muhammad, I think this problem is not in ocfs2, the cause looks like the cluster quorum is missed. For two-node cluster (does not three-node cluster), if one node is offline, the quorum will be missed by default. So, you should configure two-node related quorum setting according to the

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-09 Thread Muhammad Sharfuddin
On 3/10/2018 10:00 AM, Andrei Borzenkov wrote: 09.03.2018 19:55, Muhammad Sharfuddin пишет: Hi, This two node cluster starts resources when both nodes are online but does not start the ocfs2 resources when one node is offline. e.g if I gracefully stop the cluster resources then stop the

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-09 Thread Andrei Borzenkov
09.03.2018 19:55, Muhammad Sharfuddin пишет: > Hi, > > This two node cluster starts resources when both nodes are online but > does not start the ocfs2 resources > > when one node is offline. e.g if I gracefully stop the cluster resources > then stop the pacemaker service on > > either node,