Hello Muhammad, I think this problem is not in ocfs2, the cause looks like the cluster quorum is missed. For two-node cluster (does not three-node cluster), if one node is offline, the quorum will be missed by default. So, you should configure two-node related quorum setting according to the pacemaker manual. Then, DLM can work normal, and ocfs2 resource can start up.
Thanks Gang >>> > Hi, > > This two node cluster starts resources when both nodes are online but > does not start the ocfs2 resources > > when one node is offline. e.g if I gracefully stop the cluster resources > then stop the pacemaker service on > > either node, and try to start the ocfs2 resource on the online node, it > fails. > > logs: > > pipci001 pengine[17732]: notice: Start dlm:0#011(pipci001) > pengine[17732]: notice: Start p-fssapmnt:0#011(pipci001) > pengine[17732]: notice: Start p-fsusrsap:0#011(pipci001) > pipci001 pengine[17732]: notice: Calculated transition 2, saving > inputs in /var/lib/pacemaker/pengine/pe-input-339.bz2 > pipci001 crmd[17733]: notice: Processing graph 2 > (ref=pe_calc-dc-1520613202-31) derived from > /var/lib/pacemaker/pengine/pe-input-339.bz2 > crmd[17733]: notice: Initiating start operation dlm_start_0 locally on > pipci001 > lrmd[17730]: notice: executing - rsc:dlm action:start call_id:69 > dlm_controld[19019]: 4575 dlm_controld 4.0.7 started > lrmd[17730]: notice: finished - rsc:dlm action:start call_id:69 > pid:18999 exit-code:0 exec-time:1082ms queue-time:1ms > crmd[17733]: notice: Result of start operation for dlm on pipci001: 0 (ok) > crmd[17733]: notice: Initiating monitor operation dlm_monitor_60000 > locally on pipci001 > crmd[17733]: notice: Initiating start operation p-fssapmnt_start_0 > locally on pipci001 > lrmd[17730]: notice: executing - rsc:p-fssapmnt action:start call_id:71 > Filesystem(p-fssapmnt)[19052]: INFO: Running start for > /dev/mapper/sapmnt on /sapmnt > kernel: [ 4576.529938] dlm: Using TCP for communications > kernel: [ 4576.530233] dlm: BFA9FF042AA045F4822C2A6A06020EE9: joining > the lockspace group. > dlm_controld[19019]: 4629 fence work wait for quorum > dlm_controld[19019]: 4634 BFA9FF042AA045F4822C2A6A06020EE9 wait for quorum > lrmd[17730]: warning: p-fssapmnt_start_0 process (PID 19052) timed out > kernel: [ 4636.418223] dlm: BFA9FF042AA045F4822C2A6A06020EE9: group > event done -512 0 > kernel: [ 4636.418227] dlm: BFA9FF042AA045F4822C2A6A06020EE9: group join > failed -512 0 > lrmd[17730]: warning: p-fssapmnt_start_0:19052 - timed out after 60000ms > lrmd[17730]: notice: finished - rsc:p-fssapmnt action:start call_id:71 > pid:19052 exit-code:1 exec-time:60002ms queue-time:0ms > kernel: [ 4636.420628] ocfs2: Unmounting device (254,1) on (node 0) > crmd[17733]: error: Result of start operation for p-fssapmnt on > pipci001: Timed Out > crmd[17733]: warning: Action 11 (p-fssapmnt_start_0) on pipci001 failed > (target: 0 vs. rc: 1): Error > crmd[17733]: notice: Transition aborted by operation > p-fssapmnt_start_0 'modify' on pipci001: Event failed > crmd[17733]: warning: Action 11 (p-fssapmnt_start_0) on pipci001 failed > (target: 0 vs. rc: 1): Error > crmd[17733]: notice: Transition 2 (Complete=5, Pending=0, Fired=0, > Skipped=0, Incomplete=6, > Source=/var/lib/pacemaker/pengine/pe-input-339.bz2): Complete > pengine[17732]: notice: Watchdog will be used via SBD if fencing is > required > pengine[17732]: notice: On loss of CCM Quorum: Ignore > pengine[17732]: warning: Processing failed op start for p-fssapmnt:0 on > pipci001: unknown error (1) > pengine[17732]: warning: Processing failed op start for p-fssapmnt:0 on > pipci001: unknown error (1) > pengine[17732]: warning: Forcing base-clone away from pipci001 after > 1000000 failures (max=2) > pengine[17732]: warning: Forcing base-clone away from pipci001 after > 1000000 failures (max=2) > pengine[17732]: notice: Stop dlm:0#011(pipci001) > pengine[17732]: notice: Stop p-fssapmnt:0#011(pipci001) > pengine[17732]: notice: Calculated transition 3, saving inputs in > /var/lib/pacemaker/pengine/pe-input-340.bz2 > pengine[17732]: notice: Watchdog will be used via SBD if fencing is > required > pengine[17732]: notice: On loss of CCM Quorum: Ignore > pengine[17732]: warning: Processing failed op start for p-fssapmnt:0 on > pipci001: unknown error (1) > pengine[17732]: warning: Processing failed op start for p-fssapmnt:0 on > pipci001: unknown error (1) > pengine[17732]: warning: Forcing base-clone away from pipci001 after > 1000000 failures (max=2) > pipci001 pengine[17732]: warning: Forcing base-clone away from pipci001 > after 1000000 failures (max=2) > pengine[17732]: notice: Stop dlm:0#011(pipci001) > pengine[17732]: notice: Stop p-fssapmnt:0#011(pipci001) > pengine[17732]: notice: Calculated transition 4, saving inputs in > /var/lib/pacemaker/pengine/pe-input-341.bz2 > crmd[17733]: notice: Processing graph 4 (ref=pe_calc-dc-1520613263-36) > derived from /var/lib/pacemaker/pengine/pe-input-341.bz2 > crmd[17733]: notice: Initiating stop operation p-fssapmnt_stop_0 > locally on pipci001 > lrmd[17730]: notice: executing - rsc:p-fssapmnt action:stop call_id:72 > Filesystem(p-fssapmnt)[19189]: INFO: Running stop for /dev/mapper/sapmnt > on /sapmnt > pipci001 lrmd[17730]: notice: finished - rsc:p-fssapmnt action:stop > call_id:72 pid:19189 exit-code:0 exec-time:83ms queue-time:0ms > pipci001 crmd[17733]: notice: Result of stop operation for p-fssapmnt > on pipci001: 0 (ok) > crmd[17733]: notice: Initiating stop operation dlm_stop_0 locally on > pipci001 > pipci001 lrmd[17730]: notice: executing - rsc:dlm action:stop call_id:74 > pipci001 dlm_controld[19019]: 4636 shutdown ignored, active lockspaces > > > resource configuration: > > primitive p-fssapmnt Filesystem \ > params device="/dev/mapper/sapmnt" directory="/sapmnt" > fstype=ocfs2 \ > op monitor interval=20 timeout=40 \ > op start timeout=60 interval=0 \ > op stop timeout=60 interval=0 > primitive dlm ocf:pacemaker:controld \ > op monitor interval=60 timeout=60 \ > op start interval=0 timeout=90 \ > op stop interval=0 timeout=100 > clone base-clone base-group \ > meta interleave=true target-role=Started > > cluster properties: > property cib-bootstrap-options: \ > have-watchdog=true \ > stonith-enabled=true \ > stonith-timeout=80 \ > startup-fencing=true \ > > > Software versions: > > kernel version: 4.4.114-94.11-default > pacemaker-1.1.16-4.8.x86_64 > corosync-2.3.6-9.5.1.x86_64 > ocfs2-kmp-default-4.4.114-94.11.3.x86_64 > ocfs2-tools-1.8.5-1.35.x86_64 > dlm-kmp-default-4.4.114-94.11.3.x86_64 > libdlm3-4.0.7-1.28.x86_64 > libdlm-4.0.7-1.28.x86_64 > > > -- > Regards, > Muhammad Sharfuddin > > > --- > This email has been checked for viruses by Avast antivirus software. > https://www.avast.com/antivirus > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org