>>> Gang He schrieb am 22.01.2021 um 09:44 in Nachricht
:
>
> On 2021/1/22 16:17, Ulrich Windl wrote:
> Gang He schrieb am 22.01.2021 um 09:13 in Nachricht
>> <1fd1c07d-d12c-fea9-4b17-90a977fe7...@suse.com>:
>>> Hi Ulrich,
>>>
>>> I reviewed the crm configuration file, there are some comments as below,
>>> 1) lvmlockd resource is used for shared VG, if you do not plan to add
>>> any shared VG in your cluster, I suggest to drop this resource and clone.
>>> 2) second, lvmlockd service depends on DLM service, it will create
>>> "lvm_xxx" related lock spaces when any shared VG is created/activated.
>>> but some other resource also depends on DLM to create lock spaces for
>>> avoiding race condition, e.g. clustered MD, ocfs2, etc. Then, the file
>>> system resource should start later than lvm2(lvmlockd) related resources.
>>> That means this order should be wrong.
>>> order ord_lockspace_fs__lvmlockd Mandatory: cln_lockspace_ocfs2
cln_lvmlock
>>
>> But cln_lockspace_ocfs2 provides the shared filesystem that lvmlockd uses.
I
>> thought for locking in a cluster it needs a cluster-wide filesystem.
>
> ocfs2 file system resource only depends on DLM resource if you use a
> shared raw disk(e.g /dev/vdb3), e.g.
> primitive dlm ocf:pacemaker:controld \
> op start interval=0 timeout=90 \
> op stop interval=0 timeout=100 \
> op monitor interval=20 timeout=600
> primitive ocfs2-2 Filesystem \
> params device="/dev/vdb3" directory="/mnt/shared" fstype=ocfs2 \
> op monitor interval=20 timeout=40
> group base-group dlm ocfs2-2
> clone base-clone base-group
>
> If you use ocfs2 file system on top of shared VG(e.g./dev/vg1/lv1), you
> need to add lvmlock/LVM-activate resource before ocfs2 file system, e.g.
> primitive dlm ocf:pacemaker:controld \
> op monitor interval=60 timeout=60
> primitive lvmlockd lvmlockd \
> op start timeout=90 interval=0 \
> op stop timeout=100 interval=0 \
> op monitor interval=30 timeout=90
> primitive ocfs2-2 Filesystem \
> params device="/dev/vg1/lv1" directory="/mnt/shared" fstype=ocfs2 \
> op monitor interval=20 timeout=40
> primitive vg1 LVM-activate \
> params vgname=vg1 vg_access_mode=lvmlockd activation_mode=shared \
> op start timeout=90s interval=0 \
> op stop timeout=90s interval=0 \
> op monitor interval=30s timeout=90s
> group base-group dlm lvmlockd vg1 ocfs2-2
> clone base-clone base-group
Hi!
I don't see the problem:
As said before OCFS2 used for lockspace does not use LVM itself, but it uses a
clustered-MD (prm_lockspace_ocfs2 Filesystem, cln_lockspace_ocfs2).
That is co-located with DLM and the RAID (cln_lockspace_raid_md10). (And also
for cln_lvmlockd)
Ordering is somewhat redundant as clustered RAID needs DLM, and OCFS needs DLM
and the RAID.
lvmlockd (prm_lvmlockd, cln_lvmlockd) is co-located with DLM (hmm...does that
mean it used DLM and maybe does NOT need a shared filesystem?) and
cln_lockspace_ocfs2.
Accordingly ordering is that vlmlockd starts after DLM (cln_DLM) and after
OCFS (cln_lockspace_ocfs2)
To summarize the related resources:
Node List:
* Online: [ h16 h18 h19 ]
Full List of Resources:
* Clone Set: cln_DLM [prm_DLM]:
* Started: [ h16 h18 h19 ]
* Clone Set: cln_lvmlockd [prm_lvmlockd]:
* Started: [ h16 h18 h19 ]
* Clone Set: cln_lockspace_raid_md10 [prm_lockspace_raid_md10]:
* Started: [ h16 h18 h19 ]
* Clone Set: cln_lockspace_ocfs2 [prm_lockspace_ocfs2]:
* Started: [ h16 h18 h19 ]
Regards,
Ulrich
>
> Thanks
> Gang
>
>
>>
>>>
>>>
>>> Thanks
>>> Gang
>>>
>>> On 2021/1/21 20:08, Ulrich Windl wrote:
>>> Gang He schrieb am 21.01.2021 um 11:30 in Nachricht
<59b543ee-0824-6b91-d0af-48f66922b...@suse.com>:
> Hi Ulrich,
>
> The problem is reproduced stably? could you help to share your
> pacemaker crm configure and OS/lvm2/resource‑agents related version
> information?
OK, the problem occurred on every node, so I guess it's reproducible.
OS is SLES15 SP2 with all current updates (lvm2-2.03.05-8.18.1.x86_64,
pacemaker-2.0.4+20200616.2deceaa3a-3.3.1.x86_64,
resource-agents-4.4.0+git57.70549516-3.12.1.x86_64).
The configuration (somewhat trimmed) is attached.
The only VG the cluster node sees is:
ph16:~ # vgs
VG #PV #LV #SN Attr VSize VFree
sys 1 3 0 wz--n- 222.50g0
Regards,
Ulrich
> I feel the problem was probably caused by lvmlock resource agent
script,
> which did not handle this corner case correctly.
>
> Thanks
> Gang
>
>
> On 2021/1/21 17:53, Ulrich Windl wrote:
>> Hi!
>>
>> I have a problem: For tests I had configured lvmlockd. Now that the
>> tests
> have ended, no LVM is used for cluster resources any more, but lvmlockd
>> is
> still configured.
>> Unfortunately I ran into this problem:
>> On OCFS2 mount was unmounted successfully, another holding the
lockspace
for
>