Hi,

1 - if I put a node(node2) offline; ocfs2 resources keep running on online node(node1)

2 - while node2 was offline, via cluster I stop/start the ocfs2 resource group successfully so many times in a row.

3 - while node2 was offline; I restart the pacemaker service on the node1 and then tries to start the ocfs2 resource group, dlm started but ocfs2 file system resource does not start.

Nutshell:

a - both nodes must be online to start the ocfs2 resource.

b - if one crashes or offline(gracefully) ocfs2 resource keeps running on the other/surviving node.

c - while one node was offline, we can stop/start the ocfs2 resource group on the surviving node but if we stops the pacemaker service, then ocfs2 file system resource does not start with the following info in the logs:

lrmd[4317]:   notice: executing - rsc:p-fssapmnt action:start call_id:53
Filesystem(p-fssapmnt)[5139]: INFO: Running start for /dev/mapper/sapmnt on /sapmnt
kernel: [  706.162676] dlm: Using TCP for communications
kernel: [  706.162916] dlm: BFA9FF042AA045F4822C2A6A06020EE9: joining the lockspace group...
dlm_controld[5105]: 759 fence work wait for quorum
dlm_controld[5105]: 764 BFA9FF042AA045F4822C2A6A06020EE9 wait for quorum
lrmd[4317]:  warning: p-fssapmnt_start_0 process (PID 5139) timed out
lrmd[4317]:  warning: p-fssapmnt_start_0:5139 - timed out after 60000ms
lrmd[4317]:   notice: finished - rsc:p-fssapmnt action:start call_id:53 pid:5139 exit-code:1 exec-time:60002ms queue-time:0ms kernel: [  766.056514] dlm: BFA9FF042AA045F4822C2A6A06020EE9: group event done -512 0 kernel: [  766.056528] dlm: BFA9FF042AA045F4822C2A6A06020EE9: group join failed -512 0 crmd[4320]:   notice: Result of stop operation for p-fssapmnt on pipci001: 0 (ok) crmd[4320]:   notice: Initiating stop operation dlm_stop_0 locally on pipci001
lrmd[4317]:   notice: executing - rsc:dlm action:stop call_id:56
dlm_controld[5105]: 766 shutdown ignored, active lockspaces
lrmd[4317]:  warning: dlm_stop_0 process (PID 5326) timed out
lrmd[4317]:  warning: dlm_stop_0:5326 - timed out after 100000ms
lrmd[4317]:   notice: finished - rsc:dlm action:stop call_id:56 pid:5326 exit-code:1 exec-time:100003ms queue-time:0ms crmd[4320]:    error: Result of stop operation for dlm on pipci001: Timed Out crmd[4320]:  warning: Action 15 (dlm_stop_0) on pipci001 failed (target: 0 vs. rc: 1): Error crmd[4320]:   notice: Transition aborted by operation dlm_stop_0 'modify' on pipci001: Event failed crmd[4320]:  warning: Action 15 (dlm_stop_0) on pipci001 failed (target: 0 vs. rc: 1): Error pengine[4319]:   notice: Watchdog will be used via SBD if fencing is required
pengine[4319]:   notice: On loss of CCM Quorum: Ignore
pengine[4319]:  warning: Processing failed op stop for dlm:0 on pipci001: unknown error (1) pengine[4319]:  warning: Processing failed op stop for dlm:0 on pipci001: unknown error (1) pengine[4319]:  warning: Cluster node pipci001 will be fenced: dlm:0 failed there pengine[4319]:  warning: Processing failed op start for p-fssapmnt:0 on pipci001: unknown error (1) pengine[4319]:   notice: Stop of failed resource dlm:0 is implicit after pipci001 is fenced
pengine[4319]:   notice:  * Fence pipci001
pengine[4319]:   notice: Stop    sbd-stonith#011(pipci001)
pengine[4319]:   notice: Stop    dlm:0#011(pipci001)
crmd[4320]:   notice: Requesting fencing (reboot) of node pipci001
stonith-ng[4316]:   notice: Client crmd.4320.4c2f757b wants to fence (reboot) 'pipci001' with device '(any)'
stonith-ng[4316]:   notice: Requesting peer fencing (reboot) of pipci001
stonith-ng[4316]:   notice: sbd-stonith can fence (reboot) pipci001: dynamic-list


--
Regards,
Muhammad Sharfuddin | +923332144823 | nds.com.pk

On 3/13/2018 1:04 PM, Ulrich Windl wrote:
Hi!

I'd recommend this:
Cleanly boot your nodes, avoiding any manual operation with cluster resources. 
Keep the logs.
Then start your tests, keeping the logs for each.
Try to fix issues by reading the logs and adjusting the cluster configuration, 
and not by starting commands that the cluster should start.

We had an 2-node OCFS2 cluster running for quite some time with SLES11, but now the 
cluster is three nodes. To me the output of "crm_mon -1Arfj" combined with 
having set record-pending=true was very valuable finding problems.

Regards,
Ulrich


Muhammad Sharfuddin <m.sharfud...@nds.com.pk> schrieb am 13.03.2018 um 08:43 in
Nachricht <7b773ae9-4209-d246-b5c0-2c8b67e62...@nds.com.pk>:
Dear Klaus,

If I understand you properly then, its a fencing issue, and whatever I
am facing is "natural" or "by-design" in a two node cluster where quorum
is incomplete.

I am quite convinced that you have pointed out right because, when I
start the dlm resource via cluster and then tries to start the ocfs2
file system manually from command line, mount command remains hanged and
following events are reported in the logs:

      kernel: [62622.864828] ocfs2: Registered cluster interface user
      kernel: [62622.884427] dlm: Using TCP for communications
      kernel: [62622.884750] dlm: BFA9FF042AA045F4822C2A6A06020EE9:
joining the lockspace group...
      dlm_controld[17655]: 62627 fence work wait for quorum
      dlm_controld[17655]: 62680 BFA9FF042AA045F4822C2A6A06020EE9 wait
for quorum

and then following messages keep reported every 5-10 minutes, till I
kill the mount.ocfs2 process:

      dlm_controld[17655]: 62627 fence work wait for quorum
      dlm_controld[17655]: 62680 BFA9FF042AA045F4822C2A6A06020EE9 wait
for quorum

I am also very much confused, because yesterday I did the same and was
able to mount the ocfs2 file system manually from command line(at least
once), and then unmount the file system manually stop the dlm resource
from cluster and then complete ocfs2 resource stack(dlm, file systems)
start/stop successfully via cluster even when only machine was online.

In a two-node cluster, which have ocfs2 resources, we can't run the
ocfs2 resources when quorum is incomplete(one node is offline) ?

--
Regards,
Muhammad Sharfuddin

On 3/12/2018 5:58 PM, Klaus Wenninger wrote:
On 03/12/2018 01:44 PM, Muhammad Sharfuddin wrote:
Hi Klaus,

primitive sbd-stonith stonith:external/sbd \
          op monitor interval=3000 timeout=20 \
          op start interval=0 timeout=240 \
          op stop interval=0 timeout=100 \
          params sbd_device="/dev/mapper/sbd" \
          meta target-role=Started
Makes more sense now.
Using pcmk_delay_max would probably be useful here
to prevent a fence-race.
That stonith-resource was not in your resource-list below ...

property cib-bootstrap-options: \
          have-watchdog=true \
          stonith-enabled=true \
          no-quorum-policy=ignore \
          stonith-timeout=90 \
          startup-fencing=true
You've set no-quorum-policy=ignore for pacemaker.
Whether this is a good idea or not in your setup is
written on another page.
But isn't dlm directly interfering with corosync so
that it would get the quorum state from there?
As you have 2-node set probably on a 2-node-cluster
this would - after both nodes down - wait for all
nodes up first.

Regards,
Klaus

# ps -eaf |grep sbd
root      6129     1  0 17:35 ?        00:00:00 sbd: inquisitor
root      6133  6129  0 17:35 ?        00:00:00 sbd: watcher:
/dev/mapper/sbd - slot: 1 - uuid: 6e80a337-95db-4608-bd62-d59517f39103
root      6134  6129  0 17:35 ?        00:00:00 sbd: watcher: Pacemaker
root      6135  6129  0 17:35 ?        00:00:00 sbd: watcher: Cluster

This cluster does not start ocfs2 resources when I first intentionally
crashed(reboot) both the nodes, then try to start ocfs2 resource while
one node is  offline.

To fix the issue, I have one permanent solution, bring the other
node(offline) online and things get fixed automatically, i.e ocfs2
resources mounts.

--
Regards,
Muhammad Sharfuddin

On 3/12/2018 5:25 PM, Klaus Wenninger wrote:
Hi Muhammad!

Could you be a little bit more elaborate on your fencing-setup!
I read about you using SBD but I don't see any sbd-fencing-resource.
For the case you wanted to use watchdog-fencing with SBD this
would require stonith-watchdog-timeout property to be set.
But watchdog-fencing relies on quorum (without 2-node trickery)
and thus wouldn't work on a 2-node-cluster anyway.

Didn't read through the whole thread - so I might be missing
something ...

Regards,
Klaus

On 03/12/2018 12:51 PM, Muhammad Sharfuddin wrote:
Hello Gang,

as informed, previously cluster was fixed to start the ocfs2
resources by

a) crm resource start dlm

b) mount/umount the ocfs2 file system manually. (this step was the
fix)

and then starting the clone group(which include dlm, ocfs2 file
systems) worked fine:

c) crm resource start base-clone.

Now I crash the nodes intentionally and then keep only one node
online, again cluster stopped starting the ocfs2 resources. I again
tried to follow your instructions i.e

i) crm resource start dlm

then try to mount the ocfs2 file system manually which got hanged this
time(previously manually mounting helped me):

# cat /proc/3966/stack
[<ffffffffa039f18e>] do_uevent+0x7e/0x200 [dlm]
[<ffffffffa039fe0a>] new_lockspace+0x80a/0xa70 [dlm]
[<ffffffffa03a02d9>] dlm_new_lockspace+0x69/0x160 [dlm]
[<ffffffffa038e758>] user_cluster_connect+0xc8/0x350 [ocfs2_stack_user]
[<ffffffffa03c2872>] ocfs2_cluster_connect+0x192/0x240
[ocfs2_stackglue]
[<ffffffffa045eefc>] ocfs2_dlm_init+0x31c/0x570 [ocfs2]
[<ffffffffa04a9983>] ocfs2_fill_super+0xb33/0x1200 [ocfs2]
[<ffffffff8120e130>] mount_bdev+0x1a0/0x1e0
[<ffffffff8120ea1a>] mount_fs+0x3a/0x170
[<ffffffff81228bf2>] vfs_kern_mount+0x62/0x110
[<ffffffff8122b123>] do_mount+0x213/0xcd0
[<ffffffff8122bed5>] SyS_mount+0x85/0xd0
[<ffffffff81614b0a>] entry_SYSCALL_64_fastpath+0x1e/0xb6
[<ffffffffffffffff>] 0xffffffffffffffff

I killed the mount.ocfs2 process stop(crm resource stop dlm) the dlm
process, and then try to start(crm resource start dlm) the dlm(which
previously always get started successfully), this time dlm didn't
start and I checked the dlm_controld process

cat /proc/3754/stack
[<ffffffff8121dc55>] poll_schedule_timeout+0x45/0x60
[<ffffffff8121f0bc>] do_sys_poll+0x38c/0x4f0
[<ffffffff8121f2dd>] SyS_poll+0x5d/0xe0
[<ffffffff81614b0a>] entry_SYSCALL_64_fastpath+0x1e/0xb6
[<ffffffffffffffff>] 0xffffffffffffffff

Nutshell:

1 - this cluster is configured to run when single node is online

2 - this cluster does not start the ocfs2 resources after a crash when
only one node is online.

--
Regards,
Muhammad Sharfuddin | +923332144823 | nds.com.pk

On 3/12/2018 12:41 PM, Gang He wrote:
Hello Gang,

to follow your instructions, I started the dlm resource via:

         crm resource start dlm

then mount/unmount the ocfs2 file system manually..(which seems to be
the fix of the situation).

Now resources are getting started properly on a single node.. I am
happy
as the issue is fixed, but at the same time I am lost because I have
no idea

how things get fixed here(merely by mounting/unmounting the ocfs2
file
systems)
>From your description.
I just wonder  the DLM resource does not work normally under that
situation.
Yan/Bin, do you have any comments about two-node cluster? which
configuration settings will affect corosync quorum/DLM ?


Thanks
Gang


--
Regards,
Muhammad Sharfuddin

On 3/12/2018 10:59 AM, Gang He wrote:
Hello Muhammad,

Usually, ocfs2 resource startup failure is caused by mount command
timeout
(or hanged).
The sample debugging method is,
remove ocfs2 resource from crm first,
then mount this file system manually, see if the mount command
will be
timeout or hanged.
If this command is hanged, please watch where is mount.ocfs2
process hanged
via "cat /proc/xxx/stack" command.
If the back trace is stopped at DLM kernel module, usually the root
cause is
cluster configuration problem.
Thanks
Gang


On 3/12/2018 7:32 AM, Gang He wrote:
Hello Muhammad,

I think this problem is not in ocfs2, the cause looks like the
cluster
quorum is missed.
For two-node cluster (does not three-node cluster), if one node
is offline,
the quorum will be missed by default.
So, you should configure two-node related quorum setting
according to the
pacemaker manual.
Then, DLM can work normal, and ocfs2 resource can start up.
Yes its configured accordingly, no-quorum is set to "ignore".

property cib-bootstrap-options: \
               have-watchdog=true \
               stonith-enabled=true \
               stonith-timeout=80 \
               startup-fencing=true \
               no-quorum-policy=ignore

Thanks
Gang


Hi,

This two node cluster starts resources when both nodes are
online but
does not start the ocfs2 resources

when one node is offline. e.g if I gracefully stop the cluster
resources
then stop the pacemaker service on

either node, and try to start the ocfs2 resource on the online
node, it
fails.

logs:

pipci001 pengine[17732]:   notice: Start   dlm:0#011(pipci001)
pengine[17732]:   notice: Start   p-fssapmnt:0#011(pipci001)
pengine[17732]:   notice: Start   p-fsusrsap:0#011(pipci001)
pipci001 pengine[17732]:   notice: Calculated transition 2,
saving
inputs in /var/lib/pacemaker/pengine/pe-input-339.bz2
pipci001 crmd[17733]:   notice: Processing graph 2
(ref=pe_calc-dc-1520613202-31) derived from
/var/lib/pacemaker/pengine/pe-input-339.bz2
crmd[17733]:   notice: Initiating start operation dlm_start_0
locally on
pipci001
lrmd[17730]:   notice: executing - rsc:dlm action:start
call_id:69
dlm_controld[19019]: 4575 dlm_controld 4.0.7 started
lrmd[17730]:   notice: finished - rsc:dlm action:start call_id:69
pid:18999 exit-code:0 exec-time:1082ms queue-time:1ms
crmd[17733]:   notice: Result of start operation for dlm on
pipci001: 0 (ok)
crmd[17733]:   notice: Initiating monitor operation
dlm_monitor_60000
locally on pipci001
crmd[17733]:   notice: Initiating start operation
p-fssapmnt_start_0
locally on pipci001
lrmd[17730]:   notice: executing - rsc:p-fssapmnt action:start
call_id:71
Filesystem(p-fssapmnt)[19052]: INFO: Running start for
/dev/mapper/sapmnt on /sapmnt
kernel: [ 4576.529938] dlm: Using TCP for communications
kernel: [ 4576.530233] dlm: BFA9FF042AA045F4822C2A6A06020EE9:
joining
the lockspace group.
dlm_controld[19019]: 4629 fence work wait for quorum
dlm_controld[19019]: 4634 BFA9FF042AA045F4822C2A6A06020EE9 wait
for quorum
lrmd[17730]:  warning: p-fssapmnt_start_0 process (PID 19052)
timed out
kernel: [ 4636.418223] dlm: BFA9FF042AA045F4822C2A6A06020EE9:
group
event done -512 0
kernel: [ 4636.418227] dlm: BFA9FF042AA045F4822C2A6A06020EE9:
group join
failed -512 0
lrmd[17730]:  warning: p-fssapmnt_start_0:19052 - timed out
after 60000ms
lrmd[17730]:   notice: finished - rsc:p-fssapmnt action:start
call_id:71
pid:19052 exit-code:1 exec-time:60002ms queue-time:0ms
kernel: [ 4636.420628] ocfs2: Unmounting device (254,1) on
(node 0)
crmd[17733]:    error: Result of start operation for
p-fssapmnt on
pipci001: Timed Out
crmd[17733]:  warning: Action 11 (p-fssapmnt_start_0) on
pipci001 failed
(target: 0 vs. rc: 1): Error
crmd[17733]:   notice: Transition aborted by operation
p-fssapmnt_start_0 'modify' on pipci001: Event failed
crmd[17733]:  warning: Action 11 (p-fssapmnt_start_0) on
pipci001 failed
(target: 0 vs. rc: 1): Error
crmd[17733]:   notice: Transition 2 (Complete=5, Pending=0,
Fired=0,
Skipped=0, Incomplete=6,
Source=/var/lib/pacemaker/pengine/pe-input-339.bz2): Complete
pengine[17732]:   notice: Watchdog will be used via SBD if
fencing is
required
pengine[17732]:   notice: On loss of CCM Quorum: Ignore
pengine[17732]:  warning: Processing failed op start for
p-fssapmnt:0 on
pipci001: unknown error (1)
pengine[17732]:  warning: Processing failed op start for
p-fssapmnt:0 on
pipci001: unknown error (1)
pengine[17732]:  warning: Forcing base-clone away from pipci001
after
1000000 failures (max=2)
pengine[17732]:  warning: Forcing base-clone away from pipci001
after
1000000 failures (max=2)
pengine[17732]:   notice: Stop    dlm:0#011(pipci001)
pengine[17732]:   notice: Stop    p-fssapmnt:0#011(pipci001)
pengine[17732]:   notice: Calculated transition 3, saving
inputs in
/var/lib/pacemaker/pengine/pe-input-340.bz2
pengine[17732]:   notice: Watchdog will be used via SBD if
fencing is
required
pengine[17732]:   notice: On loss of CCM Quorum: Ignore
pengine[17732]:  warning: Processing failed op start for
p-fssapmnt:0 on
pipci001: unknown error (1)
pengine[17732]:  warning: Processing failed op start for
p-fssapmnt:0 on
pipci001: unknown error (1)
pengine[17732]:  warning: Forcing base-clone away from pipci001
after
1000000 failures (max=2)
pipci001 pengine[17732]:  warning: Forcing base-clone away from
pipci001
after 1000000 failures (max=2)
pengine[17732]:   notice: Stop    dlm:0#011(pipci001)
pengine[17732]:   notice: Stop    p-fssapmnt:0#011(pipci001)
pengine[17732]:   notice: Calculated transition 4, saving
inputs in
/var/lib/pacemaker/pengine/pe-input-341.bz2
crmd[17733]:   notice: Processing graph 4
(ref=pe_calc-dc-1520613263-36)
derived from /var/lib/pacemaker/pengine/pe-input-341.bz2
crmd[17733]:   notice: Initiating stop operation
p-fssapmnt_stop_0
locally on pipci001
lrmd[17730]:   notice: executing - rsc:p-fssapmnt action:stop
call_id:72
Filesystem(p-fssapmnt)[19189]: INFO: Running stop for
/dev/mapper/sapmnt
on /sapmnt
pipci001 lrmd[17730]:   notice: finished - rsc:p-fssapmnt
action:stop
call_id:72 pid:19189 exit-code:0 exec-time:83ms queue-time:0ms
pipci001 crmd[17733]:   notice: Result of stop operation for
p-fssapmnt
on pipci001: 0 (ok)
crmd[17733]:   notice: Initiating stop operation dlm_stop_0
locally on
pipci001
pipci001 lrmd[17730]:   notice: executing - rsc:dlm action:stop
call_id:74
pipci001 dlm_controld[19019]: 4636 shutdown ignored, active
lockspaces


resource configuration:

primitive p-fssapmnt Filesystem \
               params device="/dev/mapper/sapmnt"
directory="/sapmnt"
fstype=ocfs2 \
               op monitor interval=20 timeout=40 \
               op start timeout=60 interval=0 \
               op stop timeout=60 interval=0
primitive dlm ocf:pacemaker:controld \
               op monitor interval=60 timeout=60 \
               op start interval=0 timeout=90 \
               op stop interval=0 timeout=100
clone base-clone base-group \
               meta interleave=true target-role=Started

cluster properties:
property cib-bootstrap-options: \
               have-watchdog=true \
               stonith-enabled=true \
               stonith-timeout=80 \
               startup-fencing=true \


Software versions:

kernel version: 4.4.114-94.11-default
pacemaker-1.1.16-4.8.x86_64
corosync-2.3.6-9.5.1.x86_64
ocfs2-kmp-default-4.4.114-94.11.3.x86_64
ocfs2-tools-1.8.5-1.35.x86_64
dlm-kmp-default-4.4.114-94.11.3.x86_64
libdlm3-4.0.7-1.28.x86_64
libdlm-4.0.7-1.28.x86_64


--
Regards,
Muhammad Sharfuddin


---
This email has been checked for viruses by Avast antivirus
software.
https://www.avast.com/antivirus

_______________________________________________
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

--
Regards,
Muhammad Sharfuddin

_______________________________________________
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

_______________________________________________
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

_______________________________________________
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


_______________________________________________
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to