[ClusterLabs] Antw: Re: single node fails to start the ocfs2 resource

2018-03-12 Thread Ulrich Windl
Hi! I didn't read the logs carefully, but I remember one pitfall (SLES 11): If I formatted the filesystem when the OCFS serveices were not running, I was unable to mount it; I had to reformat the filesystem when the OCFS services were running. Maybe that helps. Regards, Ulrich >>> "Gang He"

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Gang He
Hello Muhammad, Usually, ocfs2 resource startup failure is caused by mount command timeout (or hanged). The sample debugging method is, remove ocfs2 resource from crm first, then mount this file system manually, see if the mount command will be timeout or hanged. If this command is hanged,

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Gang He
>>> > Hello Gang, > > to follow your instructions, I started the dlm resource via: > > crm resource start dlm > > then mount/unmount the ocfs2 file system manually..(which seems to be > the fix of the situation). > > Now resources are getting started properly on a single node.. I am

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Muhammad Sharfuddin
Hello Gang, to follow your instructions, I started the dlm resource via:     crm resource start dlm then mount/unmount the ocfs2 file system manually..(which seems to be the fix of the situation). Now resources are getting started properly on a single node.. I am happy as the issue is

Re: [ClusterLabs] corosync 2.4 CPG config change callback

2018-03-12 Thread Thomas Lamprecht
Hi, On 3/9/18 5:26 PM, Jan Friesse wrote: > ... > >> TotemConfchgCallback: ringid (1.1436) >> active processors 3: 1 2 3 >> EXIT >> Finalize  result is 1 (should be 1) >> >> >> Hope I did both test right, but as it reproduces multiple times >> with testcpg, our cpg usage in our filesystem, this

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Klaus Wenninger
On 03/12/2018 01:44 PM, Muhammad Sharfuddin wrote: > Hi Klaus, > > primitive sbd-stonith stonith:external/sbd \ >     op monitor interval=3000 timeout=20 \ >     op start interval=0 timeout=240 \ >     op stop interval=0 timeout=100 \ >     params sbd_device="/dev/mapper/sbd" \ >   

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Klaus Wenninger
Hi Muhammad! Could you be a little bit more elaborate on your fencing-setup! I read about you using SBD but I don't see any sbd-fencing-resource. For the case you wanted to use watchdog-fencing with SBD this would require stonith-watchdog-timeout property to be set. But watchdog-fencing relies on

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Muhammad Sharfuddin
Hi Klaus, primitive sbd-stonith stonith:external/sbd \     op monitor interval=3000 timeout=20 \     op start interval=0 timeout=240 \     op stop interval=0 timeout=100 \     params sbd_device="/dev/mapper/sbd" \     meta target-role=Started property cib-bootstrap-options:

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Muhammad Sharfuddin
Hello Gang, as informed, previously cluster was fixed to start the ocfs2 resources by a) crm resource start dlm b) mount/umount  the ocfs2 file system manually. (this step was the fix) and then starting the clone group(which include dlm, ocfs2 file systems) worked fine: c) crm resource

[ClusterLabs] Resources stopped due to unmanage

2018-03-12 Thread Pavel Levshin
Hello. I've just expiriensed a fault in my pacemaker-based cluster. Seriously, I'm completely disoriented after this. Hopefully someone can give me a hint... Two-node cluster runs few VirtualDomains along with their common infrastructure (libvirtd, NFS and so on). It is Pacemaker 1.1.16

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Valentin Vidic
On Mon, Mar 12, 2018 at 04:31:46PM +0100, Klaus Wenninger wrote: > Nope. Whenever the cluster is completely down... > Otherwise nodes would come up - if not seeing each other - > happily with both starting all services because they don't > know what already had been running on the other node. >

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Klaus Wenninger
On 03/12/2018 04:17 PM, Valentin Vidic wrote: > On Mon, Mar 12, 2018 at 01:58:21PM +0100, Klaus Wenninger wrote: >> But isn't dlm directly interfering with corosync so >> that it would get the quorum state from there? >> As you have 2-node set probably on a 2-node-cluster >> this would - after

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Valentin Vidic
On Mon, Mar 12, 2018 at 01:58:21PM +0100, Klaus Wenninger wrote: > But isn't dlm directly interfering with corosync so > that it would get the quorum state from there? > As you have 2-node set probably on a 2-node-cluster > this would - after both nodes down - wait for all > nodes up first. Isn't

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Muhammad Sharfuddin
@Ulrich, issue I am facing is that when both nodes get crashed and then if I keep one node offline, the online node doesn't start the ocfs2 resources. -- Regards, Muhammad Sharfuddin On 3/12/2018 4:51 PM, Muhammad Sharfuddin wrote: Hello Gang, as informed, previously cluster was fixed to

Re: [ClusterLabs] Resources stopped due to unmanage

2018-03-12 Thread Ken Gaillot
On Mon, 2018-03-12 at 22:36 +0300, Pavel Levshin wrote: > Hello. > > > I've just expiriensed a fault in my pacemaker-based cluster. > Seriously,  > I'm completely disoriented after this. Hopefully someone can give me > a  > hint... > > > Two-node cluster runs few VirtualDomains along with

Re: [ClusterLabs] Antw: Re: [Cluster-devel] DLM connection channel switch take too long time (> 5mins)

2018-03-12 Thread Guoqing Jiang
On 03/08/2018 07:24 PM, Ulrich Windl wrote: Hi! What surprises me most is that a connect(...O_NONBLOCK) actually blocks: EINPROGRESS The socket is non-blocking and the connection cannot be com- pleted immediately. Maybe it is because that the socket is