Re: [ClusterLabs] Some unexpected DLM messages; OCFS2 related? "send_repeat_remove dir" / "send_repeat_remove dir"

2021-10-08 Thread Gang He via Users

Hello Ulrich,

See my comments inline.


On 2021/10/8 16:38, Ulrich Windl wrote:

Hi!

I just noticed these two messages on two nodes of a 3-node cluster:
Oct 08 10:00:14 h18 kernel: dlm: 790F9C237C2A45758135FE4945B7A744: 
send_repeat_remove dir 119 O09d835
Oct 08 10:00:14 h19 kernel: dlm: 790F9C237C2A45758135FE4945B7A744: 
receive_remove from 118 not found O09d835
These two line messages were printed fs/dlm kernel module when it 
handled dlm lock resource related things.
For "send_repeat_remove dir", it does not refer to ocfs2 directory, is 
related to dlm implementation details.


So far, did you encounter any ocfs2 file system level problem? e.g. file 
system hang, etc.



Thanks
Gang



Due to "genuine configuration"TM node id 118 corresponds to node h18, while 119 
corresponds to (you guessed it!) node h19.

journalctl colors these messages in red, so I guess they are somewhat 
unexpected.

My guess is that the messages are related to OCFS2 reflink snapshots that are 
created every hour (and may take 15 Seconds).
The kernel is 5.3.18-24.83-default (SLES15 SP2) on all nodes, and we actually 
had a lockup on OCFS2 snapshots with some older kernels.
So I wonder whether those messages may still be an indication of some problem.

My snapshots do not create any directories ("dir"), BTW. But the nodes 
create/rename/remove different files in the same directory.

Actually I have snapshots with these date stamps:
Change: 2021-10-08 10:00:10.275375897 +0200
Change: 2021-10-08 10:00:15.371632277 +0200
Change: 2021-10-08 10:00:15.371632277 +0200
Change: 2021-10-08 10:00:15.371632277 +0200
Change: 2021-10-08 10:00:15.371632277 +0200
Change: 2021-10-08 10:00:15.371632277 +0200
Change: 2021-10-08 10:00:15.443455304 +0200
Change: 2021-10-08 10:00:15.938216584 +0200
Change: 2021-10-08 10:00:15.974216964 +0200
Change: 2021-10-08 10:00:16.183466675 +0200
Change: 2021-10-08 10:00:16.223467289 +0200
Change: 2021-10-08 10:00:16.251467719 +0200
Change: 2021-10-08 10:00:17.375484990 +0200
Change: 2021-10-08 10:00:18.187497465 +0200
Change: 2021-10-08 10:00:18.843647739 +0200

Regards,
Ulrich




___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Problem with high load (IO)

2021-09-29 Thread Gang He via Users




On 2021/9/29 16:20, Lentes, Bernd wrote:



- On Sep 29, 2021, at 4:37 AM, Gang He g...@suse.com wrote:


Hi Lentes,

Thank for your feedback.
I have some questions as below,
1) how to clone these VM images from each ocfs2 nodes via reflink?
do you encounter any problems during this step?
I want to say, this is a shared file system, you do not clone all VM
images from each node, duplicated.
2) after the cloned VM images are created, how do you copy these VM
images? copy to another backup file system, right?
The problem usually happened during this step?

Thanks
Gang


1) No problems during this step, the procedure just needs a few seconds.
reflink is a binary. See reflink --help
Yes, it is a cluster filesystem. I do the procedure just on one node,
so i don't have duplicates.

2) just with "cp source destination" to a NAS.
Yes, the problems appear during this step.

Ok, when you cp the cloned file to the NAS directory,
the NAS directory should be another file system, right?
During the copying process, the original VM running will be affected, 
right?


Thanks
Gang



Bernd



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: Re: Antw: [EXT] Re: Antw: Hanging OCFS2 Filesystem any one else?

2021-07-13 Thread Gang He



On 2021/7/12 15:52, Ulrich Windl wrote:

Hi!

can you give some details on what is necessary to trigger the problem?
There is a ABBA lock between reflink comand and ocfs2 
ocfs2_complete_recovery routine(this routine will be triggered by timer, 
mount, node recovery), the dead lock is not always encountered.

For the more details, refer to the link as below,
https://oss.oracle.com/pipermail/ocfs2-devel/2021-July/015671.html

Thanks
Gang


(I/O load, CPU load, concurrent operations on one node or on multiple nodes,
using reflink snapshots, using ioctl(FS_IOC_FIEMAP), etc.)

Regards,
Ulrich


Gang He  schrieb am 11.07.2021 um 10:55 in Nachricht




Hi Ulrich,

Thank for your update.
Based on some feedback from the upstream, there is a patch (ocfs2:
initialize ip_next_orphan), which should fix this problem.
I can comfirm the patch looks very similar with your problem.
I will verify it next week, then let you know the result.

Thanks
Gang


From: Users  on behalf of Ulrich Windl

Sent: Friday, July 9, 2021 15:56
To: users@clusterlabs.org
Subject: [ClusterLabs] Antw: [EXT] Re: Antw: Hanging OCFS2 Filesystem any
one else?

Hi!

An update on the issue:
SUSE support found out that the reason for the hanging processes is a
deadlock caused by a race condition (Kernel 5.3.18‑24.64‑default). Support

is

working on a fix.
Today the cluster "fixed" the problem in an unusual way:

h19 kernel: Out of memory: Killed process 6838 (corosync) total‑vm:261212kB,



anon‑rss:31444kB, file‑rss:7700kB, shmem‑rss:121872kB

I doubt that was the best possible choice ;‑)

The dead corosync caused the DC (h18) to fence h19 (which was successful),
but the DC was fenced while it tried to recover resources, so the complete
cluster rebooted.

Regards,
Ulrich




___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/




___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] Re: Antw: Hanging OCFS2 Filesystem any one else?

2021-07-11 Thread Gang He
Hi Ulrich,

Thank for your update.
Based on some feedback from the upstream, there is a patch (ocfs2: initialize 
ip_next_orphan), which should fix this problem.
I can comfirm the patch looks very similar with your problem.
I will verify it next week, then let you know the result.

Thanks
Gang


From: Users  on behalf of Ulrich Windl 

Sent: Friday, July 9, 2021 15:56
To: users@clusterlabs.org
Subject: [ClusterLabs] Antw: [EXT] Re: Antw: Hanging OCFS2 Filesystem any one 
else?

Hi!

An update on the issue:
SUSE support found out that the reason for the hanging processes is a deadlock 
caused by a race condition (Kernel 5.3.18-24.64-default). Support is working on 
a fix.
Today the cluster "fixed" the problem in an unusual way:

h19 kernel: Out of memory: Killed process 6838 (corosync) total-vm:261212kB, 
anon-rss:31444kB, file-rss:7700kB, shmem-rss:121872kB

I doubt that was the best possible choice ;-)

The dead corosync caused the DC (h18) to fence h19 (which was successful), but 
the DC was fenced while it tried to recover resources, so the complete cluster 
rebooted.

Regards,
Ulrich




___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] Re: Antw: Hanging OCFS2 Filesystem any one else?

2021-06-16 Thread Gang He

Hi Ulrich,

On 2021/6/15 17:01, Ulrich Windl wrote:

Hi Guys!

Just to keep you informed on the issue:
I was informed that I'm not the only one seeing this problem, and there seems
to be some "negative interference" between BtrFS reorganizing its extents
periodically and OCFS2 making reflink snapshots (a local cron job here) in
current SUSE SLES kernels. It seems that happens almost exactly at 0:00 o'
clock.
We encountered the same hang in local environment, the problem looks 
like caused by btrfs btrfs-balance job run, but I need to crash the 
kernel for the further analysis.
Hi Ulrich, do you know how to reproduce this hang stably? e.g. run 
reflink snapshot script and trigger the btrfs-balance job



Thanks
Gang



The only thing that BtrFS and OCFS2 have in common here is that BtrFS provides
the mount point for OCFS2.

Regards,
Ulrich


Ulrich Windl schrieb am 02.06.2021 um 11:00 in Nachricht <60B748A4.E0C :

161 :
60728>:

Gang He  schrieb am 02.06.2021 um 08:34 in Nachricht




om>


Hi Ulrich,

The hang problem looks like a fix

(90bd070aae6c4fb5d302f9c4b9c88be60c8197ec

ocfs2: fix deadlock between setattr and dio_end_io_write), but it is not

100%

sure.
If possible, could you help to report a bug to SUSE, then we can work on
that further.


Hi!

Actually a service request for the issue is open at SUSE. However I don't
know which L3 engineer is working on it.
I have some "funny" effects, like these:
On one node "ls" hangs, but can be interrupted with ^C; on another node "ls"



also hangs, but cannot be stopped with ^C or ^Z
(Most processes cannot even be killed with "kill -9")
"ls" on the directory also hangs, just as an "rm" for a non-existent file

What I really wonder is what triggered the effect, and more importantly  how



to recover from it.
Initially I had suspected a rather full (95%) flesystem, but that means
there are still 24GB available.
The other suspect was concurrent creation of reflink snapshots while the
file being snapshot did change (e.g. allocate a hole in a sparse file)

Regards,
Ulrich



Thanks
Gang


From: Users  on behalf of Ulrich Windl

Sent: Tuesday, June 1, 2021 15:14
To: users@clusterlabs.org
Subject: [ClusterLabs] Antw: Hanging OCFS2 Filesystem any one else?


Ulrich Windl schrieb am 31.05.2021 um 12:11 in Nachricht <60B4B65A.A8F

: 161



:
60728>:

Hi!

We have an OCFS2 filesystem shared between three cluster nodes (SLES 15

SP2,

Kernel 5.3.18‑24.64‑default). The filesystem is filled up to about 95%,

and

we have an odd effect:
A stat() systemcall to some of the files hangs indefinitely (state "D").
("ls ‑l" and "rm" also hang, but I suspect those are calling state()
internally, too).
My only suspect is that the effect might be related to the 95% being

used.

The other suspect is that concurrent reflink calls may trigger the

effect.


Did anyone else experience something similar?


Hi!

I have some details:
It seems there is a reader/writer deadlock trying to allocate additional
blocks for a file.
The stacktrace looks like this:
Jun 01 07:56:31 h16 kernel:  rwsem_down_write_slowpath+0x251/0x620
Jun 01 07:56:31 h16 kernel:  ? __ocfs2_change_file_space+0xb3/0x620

[ocfs2]

Jun 01 07:56:31 h16 kernel:  __ocfs2_change_file_space+0xb3/0x620 [ocfs2]
Jun 01 07:56:31 h16 kernel:  ocfs2_fallocate+0x82/0xa0 [ocfs2]
Jun 01 07:56:31 h16 kernel:  vfs_fallocate+0x13f/0x2a0
Jun 01 07:56:31 h16 kernel:  ksys_fallocate+0x3c/0x70
Jun 01 07:56:31 h16 kernel:  __x64_sys_fallocate+0x1a/0x20
Jun 01 07:56:31 h16 kernel:  do_syscall_64+0x5b/0x1e0

That is the only writer (on that host), bit there are multiple readers

like

this:
Jun 01 07:56:31 h16 kernel:  rwsem_down_read_slowpath+0x172/0x300
Jun 01 07:56:31 h16 kernel:  ? dput+0x2c/0x2f0
Jun 01 07:56:31 h16 kernel:  ? lookup_slow+0x27/0x50
Jun 01 07:56:31 h16 kernel:  lookup_slow+0x27/0x50
Jun 01 07:56:31 h16 kernel:  walk_component+0x1c4/0x300
Jun 01 07:56:31 h16 kernel:  ? path_init+0x192/0x320
Jun 01 07:56:31 h16 kernel:  path_lookupat+0x6e/0x210
Jun 01 07:56:31 h16 kernel:  ? __put_lkb+0x45/0xd0 [dlm]
Jun 01 07:56:31 h16 kernel:  filename_lookup+0xb6/0x190
Jun 01 07:56:31 h16 kernel:  ? kmem_cache_alloc+0x3d/0x250
Jun 01 07:56:31 h16 kernel:  ? getname_flags+0x66/0x1d0
Jun 01 07:56:31 h16 kernel:  ? vfs_statx+0x73/0xe0
Jun 01 07:56:31 h16 kernel:  vfs_statx+0x73/0xe0
Jun 01 07:56:31 h16 kernel:  ? fsnotify_grab_connector+0x46/0x80
Jun 01 07:56:31 h16 kernel:  __do_sys_newstat+0x39/0x70
Jun 01 07:56:31 h16 kernel:  ? do_unlinkat+0x92/0x320
Jun 01 07:56:31 h16 kernel:  do_syscall_64+0x5b/0x1e0

So that will match the hanging stat() quite nicely!

However the PID displayed as holding the writer does not exist in the

system



(on that node).

Regards,
Ulrich




Regards,
Ulrich









___
Manage you

Re: [ClusterLabs] Antw: Hanging OCFS2 Filesystem any one else?

2021-06-02 Thread Gang He
Hi Ulrich,

The hang problem looks like a fix (90bd070aae6c4fb5d302f9c4b9c88be60c8197ec 
ocfs2: fix deadlock between setattr and dio_end_io_write), but it is not 100% 
sure.
If possible, could you help to report a bug to SUSE, then we can work on that 
further.

Thanks
Gang


From: Users  on behalf of Ulrich Windl 

Sent: Tuesday, June 1, 2021 15:14
To: users@clusterlabs.org
Subject: [ClusterLabs] Antw: Hanging OCFS2 Filesystem any one else?

>>> Ulrich Windl schrieb am 31.05.2021 um 12:11 in Nachricht <60B4B65A.A8F : 
>>> 161 :
60728>:
> Hi!
>
> We have an OCFS2 filesystem shared between three cluster nodes (SLES 15 SP2,
> Kernel 5.3.18-24.64-default). The filesystem is filled up to about 95%, and
> we have an odd effect:
> A stat() systemcall to some of the files hangs indefinitely (state "D").
> ("ls -l" and "rm" also hang, but I suspect those are calling state()
> internally, too).
> My only suspect is that the effect might be related to the 95% being used.
> The other suspect is that concurrent reflink calls may trigger the effect.
>
> Did anyone else experience something similar?

Hi!

I have some details:
It seems there is a reader/writer deadlock trying to allocate additional blocks 
for a file.
The stacktrace looks like this:
Jun 01 07:56:31 h16 kernel:  rwsem_down_write_slowpath+0x251/0x620
Jun 01 07:56:31 h16 kernel:  ? __ocfs2_change_file_space+0xb3/0x620 [ocfs2]
Jun 01 07:56:31 h16 kernel:  __ocfs2_change_file_space+0xb3/0x620 [ocfs2]
Jun 01 07:56:31 h16 kernel:  ocfs2_fallocate+0x82/0xa0 [ocfs2]
Jun 01 07:56:31 h16 kernel:  vfs_fallocate+0x13f/0x2a0
Jun 01 07:56:31 h16 kernel:  ksys_fallocate+0x3c/0x70
Jun 01 07:56:31 h16 kernel:  __x64_sys_fallocate+0x1a/0x20
Jun 01 07:56:31 h16 kernel:  do_syscall_64+0x5b/0x1e0

That is the only writer (on that host), bit there are multiple readers like 
this:
Jun 01 07:56:31 h16 kernel:  rwsem_down_read_slowpath+0x172/0x300
Jun 01 07:56:31 h16 kernel:  ? dput+0x2c/0x2f0
Jun 01 07:56:31 h16 kernel:  ? lookup_slow+0x27/0x50
Jun 01 07:56:31 h16 kernel:  lookup_slow+0x27/0x50
Jun 01 07:56:31 h16 kernel:  walk_component+0x1c4/0x300
Jun 01 07:56:31 h16 kernel:  ? path_init+0x192/0x320
Jun 01 07:56:31 h16 kernel:  path_lookupat+0x6e/0x210
Jun 01 07:56:31 h16 kernel:  ? __put_lkb+0x45/0xd0 [dlm]
Jun 01 07:56:31 h16 kernel:  filename_lookup+0xb6/0x190
Jun 01 07:56:31 h16 kernel:  ? kmem_cache_alloc+0x3d/0x250
Jun 01 07:56:31 h16 kernel:  ? getname_flags+0x66/0x1d0
Jun 01 07:56:31 h16 kernel:  ? vfs_statx+0x73/0xe0
Jun 01 07:56:31 h16 kernel:  vfs_statx+0x73/0xe0
Jun 01 07:56:31 h16 kernel:  ? fsnotify_grab_connector+0x46/0x80
Jun 01 07:56:31 h16 kernel:  __do_sys_newstat+0x39/0x70
Jun 01 07:56:31 h16 kernel:  ? do_unlinkat+0x92/0x320
Jun 01 07:56:31 h16 kernel:  do_syscall_64+0x5b/0x1e0

So that will match the hanging stat() quite nicely!

However the PID displayed as holding the writer does not exist in the system 
(on that node).

Regards,
Ulrich


>
> Regards,
> Ulrich
>
>
>
>




___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] OCFS2 fragmentation with snapshots

2021-05-20 Thread Gang He

Hi Ulrich,



On 2021/5/18 18:52, Ulrich Windl wrote:

Hi!

I thought using the reflink feature of OCFS2 would be just a nice way to make 
crash-consistent VM snapshots while they are running.
As it is a bit tricky to find out how much data is shared between snapshots, I 
started to write an utility to examine the blocks allocated to the VM backing 
files and snapshots.

Unfortunately (as it seems) OCFS2 fragments terribly under reflink snapshots.

Here is an example of a rather "good" file: It has 85 extents that are rather 
large (not that the extents are sorted by first block; in reality it's a bit worse):
DEBUG(5): update_stats: blk_list[0]: 3551627-3551632 (6, 0x2000)
DEBUG(5): update_stats: blk_list[1]: 3553626-3556978 (3353, 0x2000)
DEBUG(5): update_stats: blk_list[2]: 16777217-16780688 (3472, 0x2000)
DEBUG(5): update_stats: blk_list[3]: 16780689-16792832 (12144, 0x2000)
DEBUG(5): update_stats: blk_list[4]: 17301147-17304618 (3472, 0x2000)
DEBUG(5): update_stats: blk_list[5]: 17304619-17316762 (12144, 0x2000)
...
DEBUG(5): update_stats: blk_list[81]: 31178385-31190528 (12144, 0x2000)
DEBUG(5): update_stats: blk_list[82]: 31191553-31195024 (3472, 0x2000)
DEBUG(5): update_stats: blk_list[83]: 31195025-31207168 (12144, 0x2000)
DEBUG(5): update_stats: blk_list[84]: 31210641-31222385 (11745, 0x2001)
filesystem: 655360 blocks of size 16384
655360 (100%) blocks type 0x2000 (shared)

And here's a terrible example (33837 extents):
DEBUG(4): finalize_blockstats: blk_list[0]: 257778-257841 (64, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[1]: 257842-257905 (64, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[2]: 263503-263513 (11, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[3]: 263558-263558 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[4]: 263559-263569 (11, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[5]: 263587-263587 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[6]: 263597-263610 (14, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[7]: 270414-270415 (2, 0x2000)
...
DEBUG(4): finalize_blockstats: blk_list[90]: 382214-382406 (193, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[91]: 382791-382918 (128, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[92]: 382983-382990 (8, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[93]: 383520-383522 (3, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[94]: 384672-384692 (21, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[95]: 384860-384918 (59, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[96]: 385088-385089 (2, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[97]: 385090-385091 (2, 0x2000)
...
DEBUG(4): finalize_blockstats: blk_list[805]: 2769213-2769213 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[806]: 2769214-2769214 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[807]: 2769259-2769259 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[808]: 2769261-2769261 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[809]: 2769314-2769314 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[810]: 2772041-2772042 (2, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[811]: 2772076-2772076 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[812]: 2772078-2772078 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[813]: 2772079-2772080 (2, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[814]: 2772096-2772096 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[815]: 2772099-2772099 (1, 0x2000)
...
DEBUG(4): finalize_blockstats: blk_list[33829]: 39317682-39317704 (23, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[33830]: 39317770-39317775 (6, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[33831]: 39318022-39318045 (24, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[33832]: 39318274-39318284 (11, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[33833]: 39318327-39318344 (18, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[33834]: 39319157-39319166 (10, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[33835]: 39319172-39319184 (13, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[33836]: 39319896-39319936 (41, 0x2000)
filesystem: 1966076 blocks of size 16384
mapped=1121733 (57%)
1007658 (51%) blocks type 0x2000 (shared)
114075 (6%) blocks type 0x2800 (unwritten|shared)

So I wonder (while understanding the principle of copy-on-write for reflink 
snapshots):
Is there a way to avoid or undo the fragmentation?


Since these files(the original file and the cloned files) share the same 
extent tree, after the files are written,the extents are split(fragmented).
There is a un-fragmentation tool in ocfs2-tools upstream, but it 
obviously do not work for this case(reflink file).
The workaround is, copy the cloned(have fragmentated) file to a new 
file, and delete the cloned file.


Thanks
Gang



Regards,
Ulrich

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your 

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Re: Q: What is lvmlockd locking?

2021-01-22 Thread Gang He



On 2021/1/22 16:17, Ulrich Windl wrote:

Gang He  schrieb am 22.01.2021 um 09:13 in Nachricht

<1fd1c07d-d12c-fea9-4b17-90a977fe7...@suse.com>:

Hi Ulrich,

I reviewed the crm configuration file, there are some comments as below,
1) lvmlockd resource is used for shared VG, if you do not plan to add
any shared VG in your cluster, I suggest to drop this resource and clone.
2) second, lvmlockd service depends on DLM service, it will create
"lvm_xxx" related lock spaces when any shared VG is created/activated.
but some other resource also depends on DLM to create lock spaces for
avoiding race condition, e.g. clustered MD, ocfs2, etc. Then, the file
system resource should start later than lvm2(lvmlockd) related resources.
That means this order should be wrong.
order ord_lockspace_fs__lvmlockd Mandatory: cln_lockspace_ocfs2 cln_lvmlock


But cln_lockspace_ocfs2 provides the shared filesystem that lvmlockd uses. I
thought for locking in a cluster it needs a cluster-wide filesystem.


ocfs2 file system resource only depends on DLM resource if you use a 
shared raw disk(e.g /dev/vdb3), e.g.

primitive dlm ocf:pacemaker:controld \
op start interval=0 timeout=90 \
op stop interval=0 timeout=100 \
op monitor interval=20 timeout=600
primitive ocfs2-2 Filesystem \
params device="/dev/vdb3" directory="/mnt/shared" fstype=ocfs2 \
op monitor interval=20 timeout=40
group base-group dlm ocfs2-2
clone base-clone base-group

If you use ocfs2 file system on top of shared VG(e.g./dev/vg1/lv1), you 
need to add lvmlock/LVM-activate resource before ocfs2 file system, e.g.

primitive dlm ocf:pacemaker:controld \
op monitor interval=60 timeout=60
primitive lvmlockd lvmlockd \
op start timeout=90 interval=0 \
op stop timeout=100 interval=0 \
op monitor interval=30 timeout=90
primitive ocfs2-2 Filesystem \
params device="/dev/vg1/lv1" directory="/mnt/shared" fstype=ocfs2 \
op monitor interval=20 timeout=40
primitive vg1 LVM-activate \
params vgname=vg1 vg_access_mode=lvmlockd activation_mode=shared \
op start timeout=90s interval=0 \
op stop timeout=90s interval=0 \
op monitor interval=30s timeout=90s
group base-group dlm lvmlockd vg1 ocfs2-2
clone base-clone base-group

Thanks
Gang







Thanks
Gang

On 2021/1/21 20:08, Ulrich Windl wrote:

Gang He  schrieb am 21.01.2021 um 11:30 in Nachricht

<59b543ee-0824-6b91-d0af-48f66922b...@suse.com>:

Hi Ulrich,

The problem is reproduced stably?  could you help to share your
pacemaker crm configure and OS/lvm2/resource‑agents related version
information?


OK, the problem occurred on every node, so I guess it's reproducible.
OS is SLES15 SP2 with all current updates (lvm2-2.03.05-8.18.1.x86_64,
pacemaker-2.0.4+20200616.2deceaa3a-3.3.1.x86_64,
resource-agents-4.4.0+git57.70549516-3.12.1.x86_64).

The configuration (somewhat trimmed) is attached.

The only VG the cluster node sees is:
ph16:~ # vgs
VG  #PV #LV #SN Attr   VSize   VFree
sys   1   3   0 wz--n- 222.50g0

Regards,
Ulrich


I feel the problem was probably caused by lvmlock resource agent script,
which did not handle this corner case correctly.

Thanks
Gang


On 2021/1/21 17:53, Ulrich Windl wrote:

Hi!

I have a problem: For tests I had configured lvmlockd. Now that the

tests

have ended, no LVM is used for cluster resources any more, but lvmlockd

is

still configured.

Unfortunately I ran into this problem:
On OCFS2 mount was unmounted successfully, another holding the lockspace

for

lvmlockd is still active.

lvmlockd shuts down. At least it says so.

Unfortunately that stop never succeeds (runs into a timeout).

My suspect is something like this:
Some non‑LVM lock exists for the now unmounted OCFS2 filesystem.
lvmlockd want to access that filesystem for unknown reasons.

I don't understand waht's going on.

The events at nod shutdown were:
Some Xen PVM was live‑migrated successfully to another node, but during

that

there was a message like this:

Jan 21 10:20:13 h19 virtlockd[41990]: libvirt version: 6.0.0
Jan 21 10:20:13 h19 virtlockd[41990]: hostname: h19
Jan 21 10:20:13 h19 virtlockd[41990]: resource busy: Lockspace resource

'4c6bebd1f4bc581255b422a65d317f31deef91f777e51ba0daf04419dda7ade5' is not
locked

Jan 21 10:20:13 h19 libvirtd[41991]: libvirt version: 6.0.0
Jan 21 10:20:13 h19 libvirtd[41991]: hostname: h19
Jan 21 10:20:13 h19 libvirtd[41991]: resource busy: Lockspace resource

'4c6bebd1f4bc581255b422a65d317f31deef91f777e51ba0daf04419dda7ade5' is not
locked

Jan 21 10:20:13 h19 libvirtd[41991]: Unable to release lease on

test‑jeos4

Jan 21 10:20:13 h19 VirtualDomain(prm_xen_test‑jeos4)[32786]: INFO:

test‑jeos4: live migration to h18 succeeded.


Unfortnuately the log message makes it practically impossible to guess

what



the locked object actually is (indirect lock using SHA256 as hash it

seems).


Then the OCFS for the VM images unmounts successfully while the stop of

lvmlockd is still 

Re: [ClusterLabs] Antw: [EXT] Re: Q: What is lvmlockd locking?

2021-01-22 Thread Gang He

Hi Ulrich,

I reviewed the crm configuration file, there are some comments as below,
1) lvmlockd resource is used for shared VG, if you do not plan to add 
any shared VG in your cluster, I suggest to drop this resource and clone.
2) second, lvmlockd service depends on DLM service, it will create 
"lvm_xxx" related lock spaces when any shared VG is created/activated.
but some other resource also depends on DLM to create lock spaces for 
avoiding race condition, e.g. clustered MD, ocfs2, etc. Then, the file 
system resource should start later than lvm2(lvmlockd) related resources.

That means this order should be wrong.
order ord_lockspace_fs__lvmlockd Mandatory: cln_lockspace_ocfs2 cln_lvmlock


Thanks
Gang

On 2021/1/21 20:08, Ulrich Windl wrote:

Gang He  schrieb am 21.01.2021 um 11:30 in Nachricht

<59b543ee-0824-6b91-d0af-48f66922b...@suse.com>:

Hi Ulrich,

The problem is reproduced stably?  could you help to share your
pacemaker crm configure and OS/lvm2/resource‑agents related version
information?


OK, the problem occurred on every node, so I guess it's reproducible.
OS is SLES15 SP2 with all current updates (lvm2-2.03.05-8.18.1.x86_64,
pacemaker-2.0.4+20200616.2deceaa3a-3.3.1.x86_64,
resource-agents-4.4.0+git57.70549516-3.12.1.x86_64).

The configuration (somewhat trimmed) is attached.

The only VG the cluster node sees is:
ph16:~ # vgs
   VG  #PV #LV #SN Attr   VSize   VFree
   sys   1   3   0 wz--n- 222.50g0

Regards,
Ulrich


I feel the problem was probably caused by lvmlock resource agent script,
which did not handle this corner case correctly.

Thanks
Gang


On 2021/1/21 17:53, Ulrich Windl wrote:

Hi!

I have a problem: For tests I had configured lvmlockd. Now that the tests

have ended, no LVM is used for cluster resources any more, but lvmlockd is
still configured.

Unfortunately I ran into this problem:
On OCFS2 mount was unmounted successfully, another holding the lockspace

for

lvmlockd is still active.

lvmlockd shuts down. At least it says so.

Unfortunately that stop never succeeds (runs into a timeout).

My suspect is something like this:
Some non‑LVM lock exists for the now unmounted OCFS2 filesystem.
lvmlockd want to access that filesystem for unknown reasons.

I don't understand waht's going on.

The events at nod shutdown were:
Some Xen PVM was live‑migrated successfully to another node, but during

that

there was a message like this:

Jan 21 10:20:13 h19 virtlockd[41990]: libvirt version: 6.0.0
Jan 21 10:20:13 h19 virtlockd[41990]: hostname: h19
Jan 21 10:20:13 h19 virtlockd[41990]: resource busy: Lockspace resource

'4c6bebd1f4bc581255b422a65d317f31deef91f777e51ba0daf04419dda7ade5' is not
locked

Jan 21 10:20:13 h19 libvirtd[41991]: libvirt version: 6.0.0
Jan 21 10:20:13 h19 libvirtd[41991]: hostname: h19
Jan 21 10:20:13 h19 libvirtd[41991]: resource busy: Lockspace resource

'4c6bebd1f4bc581255b422a65d317f31deef91f777e51ba0daf04419dda7ade5' is not
locked

Jan 21 10:20:13 h19 libvirtd[41991]: Unable to release lease on test‑jeos4
Jan 21 10:20:13 h19 VirtualDomain(prm_xen_test‑jeos4)[32786]: INFO:

test‑jeos4: live migration to h18 succeeded.


Unfortnuately the log message makes it practically impossible to guess what



the locked object actually is (indirect lock using SHA256 as hash it

seems).


Then the OCFS for the VM images unmounts successfully while the stop of

lvmlockd is still busy:

Jan 21 10:20:16 h19 lvmlockd(prm_lvmlockd)[32945]: INFO: stop the

lockspaces

of shared VG(s)...

...
Jan 21 10:21:56 h19 pacemaker‑controld[42493]:  error: Result of stop

operation for prm_lvmlockd on h19: Timed Out


As said before: I don't have shared VGs any more. I don't understand.

On a node without VMs running I see:
h19:~ # lvmlockctl ‑d
1611221190 lvmlockd started
1611221190 No lockspaces found to adopt
1611222560 new cl 1 pi 2 fd 8
1611222560 recv client[10817] cl 1 dump_info . "" mode iv flags 0
1611222560 send client[10817] cl 1 dump result 0 dump_len 149
1611222560 send_dump_buf delay 0 total 149
1611222560 close client[10817] cl 1 fd 8
1611222563 new cl 2 pi 2 fd 8
1611222563 recv client[10818] cl 2 dump_log . "" mode iv flags 0

On a node with VMs running I see:
h16:~ # lvmlockctl ‑d
1611216942 lvmlockd started
1611216942 No lockspaces found to adopt
1611221684 new cl 1 pi 2 fd 8
1611221684 recv pvs[17159] cl 1 lock gl "" mode sh flags 0
1611221684 lockspace "lvm_global" not found for dlm gl, adding...
1611221684 add_lockspace_thread dlm lvm_global version 0
1611221684 S lvm_global lm_add_lockspace dlm wait 0 adopt 0
1611221685 S lvm_global lm_add_lockspace done 0
1611221685 S lvm_global R GLLK action lock sh
1611221685 S lvm_global R GLLK res_lock cl 1 mode sh
1611221685 S lvm_global R GLLK lock_dlm
1611221685 S lvm_global R GLLK res_lock rv 0 read vb 0 0 0
1611221685 S lvm_global R GLLK res_lock all versions zero
1611221685 S lvm_global R GLLK res_lock invalidate global state
1611221685 send pvs[17159] c

Re: [ClusterLabs] Q: What is lvmlockd locking?

2021-01-21 Thread Gang He

Hi Ulrich,

The problem is reproduced stably?  could you help to share your 
pacemaker crm configure and OS/lvm2/resource-agents related version 
information?
I feel the problem was probably caused by lvmlock resource agent script, 
which did not handle this corner case correctly.


Thanks
Gang


On 2021/1/21 17:53, Ulrich Windl wrote:

Hi!

I have a problem: For tests I had configured lvmlockd. Now that the tests have 
ended, no LVM is used for cluster resources any more, but lvmlockd is still 
configured.
Unfortunately I ran into this problem:
On OCFS2 mount was unmounted successfully, another holding the lockspace for 
lvmlockd is still active.
lvmlockd shuts down. At least it says so.

Unfortunately that stop never succeeds (runs into a timeout).

My suspect is something like this:
Some non-LVM lock exists for the now unmounted OCFS2 filesystem.
lvmlockd want to access that filesystem for unknown reasons.

I don't understand waht's going on.

The events at nod shutdown were:
Some Xen PVM was live-migrated successfully to another node, but during that 
there was a message like this:
Jan 21 10:20:13 h19 virtlockd[41990]: libvirt version: 6.0.0
Jan 21 10:20:13 h19 virtlockd[41990]: hostname: h19
Jan 21 10:20:13 h19 virtlockd[41990]: resource busy: Lockspace resource 
'4c6bebd1f4bc581255b422a65d317f31deef91f777e51ba0daf04419dda7ade5' is not locked
Jan 21 10:20:13 h19 libvirtd[41991]: libvirt version: 6.0.0
Jan 21 10:20:13 h19 libvirtd[41991]: hostname: h19
Jan 21 10:20:13 h19 libvirtd[41991]: resource busy: Lockspace resource 
'4c6bebd1f4bc581255b422a65d317f31deef91f777e51ba0daf04419dda7ade5' is not locked
Jan 21 10:20:13 h19 libvirtd[41991]: Unable to release lease on test-jeos4
Jan 21 10:20:13 h19 VirtualDomain(prm_xen_test-jeos4)[32786]: INFO: test-jeos4: 
live migration to h18 succeeded.

Unfortnuately the log message makes it practically impossible to guess what the 
locked object actually is (indirect lock using SHA256 as hash it seems).

Then the OCFS for the VM images unmounts successfully while the stop of 
lvmlockd is still busy:
Jan 21 10:20:16 h19 lvmlockd(prm_lvmlockd)[32945]: INFO: stop the lockspaces of 
shared VG(s)...
...
Jan 21 10:21:56 h19 pacemaker-controld[42493]:  error: Result of stop operation 
for prm_lvmlockd on h19: Timed Out

As said before: I don't have shared VGs any more. I don't understand.

On a node without VMs running I see:
h19:~ # lvmlockctl -d
1611221190 lvmlockd started
1611221190 No lockspaces found to adopt
1611222560 new cl 1 pi 2 fd 8
1611222560 recv client[10817] cl 1 dump_info . "" mode iv flags 0
1611222560 send client[10817] cl 1 dump result 0 dump_len 149
1611222560 send_dump_buf delay 0 total 149
1611222560 close client[10817] cl 1 fd 8
1611222563 new cl 2 pi 2 fd 8
1611222563 recv client[10818] cl 2 dump_log . "" mode iv flags 0

On a node with VMs running I see:
h16:~ # lvmlockctl -d
1611216942 lvmlockd started
1611216942 No lockspaces found to adopt
1611221684 new cl 1 pi 2 fd 8
1611221684 recv pvs[17159] cl 1 lock gl "" mode sh flags 0
1611221684 lockspace "lvm_global" not found for dlm gl, adding...
1611221684 add_lockspace_thread dlm lvm_global version 0
1611221684 S lvm_global lm_add_lockspace dlm wait 0 adopt 0
1611221685 S lvm_global lm_add_lockspace done 0
1611221685 S lvm_global R GLLK action lock sh
1611221685 S lvm_global R GLLK res_lock cl 1 mode sh
1611221685 S lvm_global R GLLK lock_dlm
1611221685 S lvm_global R GLLK res_lock rv 0 read vb 0 0 0
1611221685 S lvm_global R GLLK res_lock all versions zero
1611221685 S lvm_global R GLLK res_lock invalidate global state
1611221685 send pvs[17159] cl 1 lock gl rv 0
1611221685 recv pvs[17159] cl 1 lock vg "sys" mode sh flags 0
1611221685 lockspace "lvm_sys" not found
1611221685 send pvs[17159] cl 1 lock vg rv -210 ENOLS
1611221685 close pvs[17159] cl 1 fd 8
1611221685 S lvm_global R GLLK res_unlock cl 1 from close
1611221685 S lvm_global R GLLK unlock_dlm
1611221685 S lvm_global R GLLK res_unlock lm done
1611222582 new cl 2 pi 2 fd 8
1611222582 recv client[19210] cl 2 dump_log . "" mode iv flags 0

Note: "lvm_sys" may refer to VG sys used for the hypervisor.

Regards,
Ulrich



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Q: LVM-activate a shared LV

2020-12-11 Thread Gang He
Hi Ulrish

Which Linux distribution/version do you use? could you share the whole crm 
configure?

There is a crm configuration demo for your reference.
primitive dlm ocf:pacemaker:controld \
op start interval=0 timeout=90 \
op stop interval=0 timeout=100 \
op monitor interval=20 timeout=600
primitive libvirt_stonith stonith:external/libvirt \
params hostlist="ghe-nd1,ghe-nd2,ghe-nd3" 
hypervisor_uri="qemu+tcp://10.67.160.2/system" \
op monitor interval=60
primitive lvmlockd lvmlockd \
op start timeout=90 interval=0 \
op stop timeout=100 interval=0 \
op monitor interval=30 timeout=90
primitive ocfs2-rear Filesystem \
params device="/dev/TEST1_vg/test1_lv" directory="/rear" fstype=ocfs2 
options=acl \
op monitor interval=20 timeout=60 \
op start timeout=60 interval=0 \
op stop timeout=180 interval=0 
primitive test1_vg LVM-activate \
params vgname=TEST1_vg vg_access_mode=lvmlockd activation_mode=shared \
op start timeout=90s interval=0 \
op stop timeout=90s interval=0 \
op monitor interval=30s timeout=90s
group base-group dlm lvmlockd test1_vg ocfs2-rear
clone base-clone base-group
property cib-bootstrap-options: \
have-watchdog=false \
stonith-enabled=true \
dc-version="2.0.4+20200616.2deceaa3a-3.3.1-2.0.4+20200616.2deceaa3a" \
cluster-infrastructure=corosync \
cluster-name=cluster \
last-lrm-refresh=1606730020



Thanks
Gang 


From: Users  on behalf of Ulrich Windl 

Sent: Thursday, December 10, 2020 22:55
To: users@clusterlabs.org
Subject: [ClusterLabs] Q: LVM-activate a shared LV

Hi!

I configured a clustered LV (I think) for activation on three nodes, but it 
won't work. Error is:
 LVM-activate(prm_testVG0_test-jeos_activate)[48844]: ERROR:  LV locked by 
other host: testVG0/test-jeos Failed to lock logical volume testVG0/test-jeos.

primitive prm_testVG0_test-jeos_activate LVM-activate \
params vgname=testVG0 lvname=test-jeos activation_mode=shared 
vg_access_mode=lvmlockd \
op start timeout=90s interval=0 \
op stop timeout=90s interval=0 \
op monitor interval=60s timeout=90s
clone cln_testVG0_test-jeos_activate prm_testVG0_test-jeos_activate \
meta interleave=true

Is this a software bug, or am I using the wrong RA or configuration?

Regards,
Ulrich



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] ocfs2 + pacemaker

2020-09-23 Thread Gang He




On 9/23/2020 6:02 PM, Michael Ivanov wrote:

Hallo Gang,

Thanks for the directions. I have read through the docs you pointed to 
and yes o2cb is not mentioned at all.
Does it mean ocfs2 can work without o2cb at all now? 

Yes.


Before I had to define an ocfs2-specific cluster
in /etc/ocfs2/cluster.conf and specify the name of this cluster when 
creating ocfs2 file system.
Should I understand that current ocfs2 implementation does not need this 
when configured under pacemaker?

Yes. You need not to do that, if you use pacemaker to setup a cluster.
You should refer to SUSE HA doc for how to setup pacemaker/corosync 
cluster stack.

Then, add dlm resource clone, and add ocfs2 resource clone.


Thanks
Gang




Best regards,

On 23.09.2020 10:11, Gang He wrote:

Hello Michael,

ocfs2:o2cb resource is provided by resource-agents on SELS 11.x series.
For the new SLES series(e.g. 12 or 15), there is not o2cb resource 
agent in resource-agent rpm, since this resource is not needed.
You can refer to new SUSE HA guide for how to setup ocfs2 on pacemaker 
cluster.

e.g.
https://documentation.suse.com/sle-ha/15-SP1/single-html/SLE-HA-guide/#book-sleha-guide 



Thanks
Gang

On 9/23/2020 5:26 AM, Michael Ivanov wrote:

Hallo,

I am trying to get ocfs2 running under pacemaker. The description I 
found at
https://wiki.clusterlabs.org/wiki/Dual_Primary_DRBD_+_OCFS2#The_o2cb_Service 


refers to ocf:ocfs2:o2cb resource. But I cannot find it anywhere.

I'm building the cluster using debian/testing (pacemaker 2.0.4, 
ocfs-tools 1.8.6)


Best regards,



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



--
  \   / |  |
  (OvO) |  Mikhail Iwanow   |
  (^^^) |  Voice:   +7 (911) 223-1300   |
   \^/  |  E-mail:iv...@logit-ag.de|
   ^ ^  |   |


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] ocfs2 + pacemaker

2020-09-23 Thread Gang He

Hello Michael,

ocfs2:o2cb resource is provided by resource-agents on SELS 11.x series.
For the new SLES series(e.g. 12 or 15), there is not o2cb resource agent 
in resource-agent rpm, since this resource is not needed.
You can refer to new SUSE HA guide for how to setup ocfs2 on pacemaker 
cluster.

e.g.
https://documentation.suse.com/sle-ha/15-SP1/single-html/SLE-HA-guide/#book-sleha-guide

Thanks
Gang

On 9/23/2020 5:26 AM, Michael Ivanov wrote:

Hallo,

I am trying to get ocfs2 running under pacemaker. The description I found at
https://wiki.clusterlabs.org/wiki/Dual_Primary_DRBD_+_OCFS2#The_o2cb_Service
refers to ocf:ocfs2:o2cb resource. But I cannot find it anywhere.

I'm building the cluster using debian/testing (pacemaker 2.0.4, ocfs-tools 
1.8.6)

Best regards,



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] SBD on shared disk

2020-02-05 Thread Gang He
Hello Strahil,

This kind of configuration should not be recommended.
Why?
Since SBD partition need to be accessed by the cluster nodes stably/frequently.
But the other partition (for XFS file system) is probably under extreme 
pressure conditions, 
in that case, the SBD partition IO requests will starve to timeout.

Thanks
Gang 


From: Users  on behalf of Strahil Nikolov 

Sent: Thursday, February 6, 2020 5:15 AM
To: users@clusterlabs.org
Subject: [ClusterLabs] SBD on shared disk

Hello Community,

I'm preparing for my EX436 and I was wondering if there are any drawbacks if a 
shared LUN is split into 2 partitions and the first partition is used for SBD , 
while the second one for Shared File System (Either XFS for active/passive, or 
GFS2 for active/active).

Do you see any drawback in such implementation ?
Thanks in advance.

Best Regards,
Strahil Nikolov
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] DLM in the cluster can tolerate more than one node failure at the same time?

2019-10-23 Thread Gang He
Hi Christine,

Thank for your explanation.
Originally, there is concurrent fence feature in Pacemaker, you clear my doubt 
:-) 

Thanks
Gang

> -Original Message-
> From: Users [mailto:users-boun...@clusterlabs.org] On Behalf Of christine
> caulfield
> Sent: 2019年10月23日 15:09
> To: users@clusterlabs.org
> Subject: Re: [ClusterLabs] DLM in the cluster can tolerate more than one node
> failure at the same time?
> 
> On 22/10/2019 07:15, Gang He wrote:
> > Hi List,
> >
> > I remember that master node has the full copy for one DLM lock
> > resource and the other nodes have their own lock status, then if one node is
> failed(or fenced), the DLM lock status can be recovered from the remained
> node quickly.
> > My question is,
> > if there are more than one node which are failed at the same time, the DLM
> lock service for the remained nodes in the cluster can still continue to work
> after recovery?
> >
> >
> 
> Yes. The local DLM keeps a copy of its own locks and the remaining DLM
> nodes will collaborate to re-master all of the locks that they know about.  
> The
> number of nodes that leave at one time has no impact on this.
> 
> Chrissie
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] gfs2: fsid=xxxx:work.3: fatal: filesystem consistency error

2019-10-21 Thread Gang He
Hi Bob,

> -Original Message-
> From: Users [mailto:users-boun...@clusterlabs.org] On Behalf Of Bob
> Peterson
> Sent: 2019年10月21日 21:02
> To: Cluster Labs - All topics related to open-source clustering welcomed
> 
> Subject: Re: [ClusterLabs] gfs2: fsid=:work.3: fatal: filesystem 
> consistency
> error
> 
> - Original Message -
> > Hello List,
> >
> > I got gfs2 file system consistency error from one user, who is using
> > kernel 4.12.14-95.29-default on SLE12SP4(x86_64).
> > The error message is as below,
> > 2019-09-26T10:22:10.333792+02:00 node4 kernel: [ 3456.176234] gfs2:
> > fsid=:work.3: fatal: filesystem consistency error
> > 2019-09-26T10:22:10.333806+02:00 node4 kernel: [ 3456.176234]
> inode = 280
> > 342097926
> > 2019-09-26T10:22:10.333807+02:00 node4 kernel: [ 3456.176234]
> function =
> > gfs2_dinode_dealloc, file = ../fs/gfs2/super.c, line = 1459
> > 2019-09-26T10:22:10.333808+02:00 node4 kernel: [ 3456.176235] gfs2:
> > fsid=:work.3: about to withdraw this file system
> >
> > I cat the super.c file, the related code is,
> > 1451 static int gfs2_dinode_dealloc(struct gfs2_inode *ip)
> > 1452 {
> > 1453 struct gfs2_sbd *sdp = GFS2_SB(>i_inode);
> > 1454 struct gfs2_rgrpd *rgd;
> > 1455 struct gfs2_holder gh;
> > 1456 int error;
> > 1457
> > 1458 if (gfs2_get_inode_blocks(>i_inode) != 1) {
> > 1459 gfs2_consist_inode(ip);   <<== here
> > 1460 return -EIO;
> > 1461 }
> >
> >
> > It looks the upstream has fixed this bug? who can help to point out
> > which patches to be needed for back-port?
> >
> > Thanks
> > Gang
> 
> Hi,
> 
> Yes, we have made lots of patches since the 4.12 kernel, some of which may
> be relevant. However, that error often indicates file system corruption.
> (It means the block count for a dinode became corrupt.)
> 
> I've been working on a set of problems caused whenever gfs2 replays one of
> its journals during recovery, with a wide variety of symptoms, including that
> one. So it might be one of those. Some of my resulting patches are already
> pushed to upstream, but I'm not yet at the point where I can push them all.
> 
> I recommend doing a fsck.gfs2 on the volume to ensure consistency.

The customer has repaired it using fsck.gfs2, however every time the 
application workload starts (concurrent writing), 
the filesystem becomes inaccessible, causing also a stop operation failure of 
the app resource, consequently causing a fence.
Do you have any suggestion in this case? It looks there is a serious bug in 
case concurrent writing with some stress.

Thanks
Gang 

> 
> Regards,
> 
> Bob Peterson
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] DLM, cLVM, GFS2 and OCFS2 managed by systemd instead of crm ?

2019-10-15 Thread Gang He
Hello Lentes,o

In the cluster environment, usually we need to fence(or dynamically add/delete) 
node,
the full stacks provided by packmaker/corosync can help to complete it 
automatically/integrally.


Thanks
Gang


From: Users  on behalf of Lentes, Bernd 

Sent: Wednesday, October 16, 2019 3:35 AM
To: Pacemaker ML 
Subject: [ClusterLabs] DLM, cLVM, GFS2 and OCFS2 managed by systemd instead of 
crm ?

Hi,

i'm a big fan of simple solutions (KISS).
Currently i have DLM, cLVM, GFS2 and OCFS2 managed by pacemaker.
They all are fundamental prerequisites for my resources (Virtual Domains).
To configure them i used clones and groups.
Why not having them managed by systemd to make the cluster setup more 
overseeable ?

Is there a strong reason that pacemaker cares about them ?

Bernd

--

Bernd Lentes
Systemadministration
Institut für Entwicklungsgenetik
Gebäude 35.34 - Raum 208
HelmholtzZentrum münchen
bernd.len...@helmholtz-muenchen.de
phone: +49 89 3187 1241
phone: +49 89 3187 3827
fax: +49 89 3187 2294
http://www.helmholtz-muenchen.de/idg

Perfekt ist wer keine Fehler macht
Also sind Tote perfekt


Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDir'in Prof. Dr. Veronika von Messling
Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Kerstin Guenther
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] trace of Filesystem RA does not log

2019-10-14 Thread Gang He


> -Original Message-
> From: Users [mailto:users-boun...@clusterlabs.org] On Behalf Of Lentes,
> Bernd
> Sent: 2019年10月14日 20:04
> To: Pacemaker ML 
> Subject: Re: [ClusterLabs] trace of Filesystem RA does not log
> 
> 
> >> -Original Message-
> >> From: Users [mailto:users-boun...@clusterlabs.org] On Behalf Of
> >> Lentes, Bernd
> >> Sent: 2019年10月11日 22:32
> >> To: Pacemaker ML 
> >> Subject: [ClusterLabs] trace of Filesystem RA does not log
> >>
> >> Hi,
> >>
> >> occasionally the stop of a Filesystem resource for an OCFS2 Partition
> >> fails to stop.
> > Which SLE version are you using?
> > When ocfs2 file system stop fails, that means the umount process is hung?
> > Could you cat that process stack via /proc/xxx/stack?
> > Of course, you also can use o2locktop to identify if there is any
> > active/hanged dlm lock at that moment.
> >
> 
> I'm using SLES 12 SP4. I don't know exactly why umount isn't working or if it
> hangs, that's why i tried to trace the stop operation to have more infos.
> I will test o2locktop.
> What do you mean by "/proc/xxx/stack" ?
> The stack of which process should i investigate ? umount ?
Yes, you can use "pstree" command to find umount related processes,
Then cat the related processes' stack.
Usually, the last process hang point can help to find the root cause.


Thanks
Gang
> 
> 
> Bernd
> 
> 
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Prof. Dr. Veronika von Messling
> Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Kerstin Guenther
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] trace of Filesystem RA does not log

2019-10-13 Thread Gang He
Hello Lentes,



> -Original Message-
> From: Users [mailto:users-boun...@clusterlabs.org] On Behalf Of Lentes,
> Bernd
> Sent: 2019年10月11日 22:32
> To: Pacemaker ML 
> Subject: [ClusterLabs] trace of Filesystem RA does not log
> 
> Hi,
> 
> occasionally the stop of a Filesystem resource for an OCFS2 Partition fails to
> stop.
Which SLE version are you using? 
When ocfs2 file system stop fails, that means the umount process is hung?
Could you cat that process stack via /proc/xxx/stack? 
Of course, you also can use o2locktop to identify if there is any active/hanged 
dlm lock at that moment.

Thanks
Gang

> I'm currently tracing this RA hoping to find the culprit.
> I'm putting one of both nodes into standby, hoping the error appears.
> Afterwards setting it online again and doing the same procedure with the
> other node.
> Of course now the error does not appear :-)) But i don't find any files under
> /var/lib/heartbeat/trace_ra/Filesystem for a stop operation.
> Resource is part of a group which is cloned.
> 
> I configured the tracing with "crm resource trace fs_ocfs2 stop".
> 
> Result:
> primitive fs_ocfs2 Filesystem \
> params device="/dev/vg_san/lv_ocfs2" directory="/mnt/ocfs2"
> fstype=ocfs2 \
> params fast_stop=no force_unmount=true \
> op monitor interval=30 timeout=20 \
> op start timeout=60 interval=0 \
> op stop timeout=60 interval=0 \
> op_params trace_ra=1 \
> meta is-managed=true target-role=Started
> 
> I expect log files for the stop operation in
> /var/lib/heartbeat/trace_ra/Filesystem.
> But i don't get any.
> 
> Thanks.
> 
> Bernd
> 
> --
> 
> Bernd Lentes
> Systemadministration
> Institut für Entwicklungsgenetik
> Gebäude 35.34 - Raum 208
> HelmholtzZentrum münchen
> bernd.len...@helmholtz-muenchen.de
> phone: +49 89 3187 1241
> phone: +49 89 3187 3827
> fax: +49 89 3187 2294
> http://www.helmholtz-muenchen.de/idg
> 
> Perfekt ist wer keine Fehler macht
> Also sind Tote perfekt
> 
> 
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Prof. Dr. Veronika von Messling
> Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Kerstin Guenther
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Where to find documentation for cluster MD?

2019-10-10 Thread Gang He
Hello Ulrich

Cluster MD belongs to SLE HA extension product.
The related doc link is here, e.g. 
https://documentation.suse.com/sle-ha/15-SP1/single-html/SLE-HA-guide/#cha-ha-cluster-md

Thanks
Gang

> -Original Message-
> From: Users [mailto:users-boun...@clusterlabs.org] On Behalf Of Ulrich
> Windl
> Sent: 2019年10月9日 15:13
> To: users@clusterlabs.org
> Subject: [ClusterLabs] Where to find documentation for cluster MD?
> 
> Hi!
> 
> In recent SLES there is "cluster MD", like in
> cluster-md-kmp-default-4.12.14-197.18.1.x86_64
> (/lib/modules/4.12.14-197.18-default/kernel/drivers/md/md-cluster.ko).
> However I could not find any manual page for it.
> 
> Where is the official documentation, meaning: Where is a description of the
> feature supprted by SLES?
> 
> Regards,
> Ulrich
> 
> 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] File System does not do a recovery on fail over

2019-06-12 Thread Gang He
CC our file system people Jeff to this loop.

>From my view, I feel the file system recovery time usually depends on file 
>system journal size, not file system size.
Hello Jeff, do you think XFS will take 5 ~ 10 mins during the mounting after a 
uncleanly switch.

Thanks
Gang

>>> On 6/12/2019 at  1:29 pm, in message
, Indivar
Nair  wrote:
> Thanks, Gang
> 
> It is a very large file system - around 600TB.
> Could this be why it takes around 5 - 10mins to do journal recovery?
> 
> What we do as a workaround is -
> - Disable the filesystem resource on startup
> - Manually mount it (wait for as long as it takes)
> - Then umount it
> - Enable filesystem resource
> 
> But this doesn't seem like the right approach.
> 
> We have tried repairing the Filesystem when a failover happens, but it
> has never shown any major corruption.
> 
> Regards,
> 
> 
> Indivar Nair
> 
> 
> 
> On Tue, Jun 11, 2019 at 10:18 AM Gang He  wrote:
>>
>> Hi Indivar,
>>
>> See my comments inline.
>>
>> >>> On 6/11/2019 at 12:10 pm, in message
>> , Indivar
>> Nair  wrote:
>> > Hello ...,
>> >
>> > I have an Active-Passive cluster with two nodes hosting an XFS
>> > Filesystem over a CLVM Volume.
>> >
>> > If a failover happens, the volume is mounted on the other node without
>> > a recovery that usually happens to a volume that has not been cleanly
>> > unmounted.
>> > The FS journal is on the same volume.
>> >
>> > Now, when we fail it back (with a complete cluster shutdown and
>> > restart) on to its original node, it undergoes the automatic recovery.
>> >
>> > 1.
>> > Shouldn't it do an FS recovery during the failover to the other node?
>> > Note: The FS journal is on the same volume.
>> Usually, file system must do the log recovery during the file system is 
> mounted.
>>
>> >
>> > 2.
>> > Also, the failback usually fails because the FS check takes a
>> > considerable amount of time. How do I configure the mount not to fail
>> > when an automatic FS check is going on?
>> File system introduces a journal to avoiding take too long time for file 
> system recovery.
>> If the time is too long, maybe this is a file system problem, e.g. file 
> system is damaged.
>> Secondly, you can set the timeout value longer.
>>
>> Thanks
>> Gang
>>
>> >
>> > Any help/pointers would be highly appreciated.
>> >
>> > Thanks.
>> >
>> > Regards,
>> >
>> >
>> > Indivar Nair
>> > ___
>> > Manage your subscription:
>> > https://lists.clusterlabs.org/mailman/listinfo/users 
>> >
>> > ClusterLabs home: https://www.clusterlabs.org/ 
>>
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>
>> ClusterLabs home: https://www.clusterlabs.org/ 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] File System does not do a recovery on fail over

2019-06-10 Thread Gang He
Hi Indivar,

See my comments inline.

>>> On 6/11/2019 at 12:10 pm, in message
, Indivar
Nair  wrote:
> Hello ...,
> 
> I have an Active-Passive cluster with two nodes hosting an XFS
> Filesystem over a CLVM Volume.
> 
> If a failover happens, the volume is mounted on the other node without
> a recovery that usually happens to a volume that has not been cleanly
> unmounted.
> The FS journal is on the same volume.
> 
> Now, when we fail it back (with a complete cluster shutdown and
> restart) on to its original node, it undergoes the automatic recovery.
> 
> 1.
> Shouldn't it do an FS recovery during the failover to the other node?
> Note: The FS journal is on the same volume.
Usually, file system must do the log recovery during the file system is mounted.
 
> 
> 2.
> Also, the failback usually fails because the FS check takes a
> considerable amount of time. How do I configure the mount not to fail
> when an automatic FS check is going on?
File system introduces a journal to avoiding take too long time for file system 
recovery.
If the time is too long, maybe this is a file system problem, e.g. file system 
is damaged. 
Secondly, you can set the timeout value longer.

Thanks
Gang

> 
> Any help/pointers would be highly appreciated.
> 
> Thanks.
> 
> Regards,
> 
> 
> Indivar Nair
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Where do we download the source code of libdlm

2019-05-27 Thread Gang He
Hello Guys,

As the subject said, I want to download the source code of libdlm, to see its 
git log changes.
libdm is used to build dlm_controld, dlm_stonith, dlm_tool and etc.


Thanks
Gang


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Q: repeating message " cmirrord[17741]: [yEa32lLX] Retry #1 of cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN"

2018-11-12 Thread Gang He
Hello Ulrich,

Could you reproduce this issue stably? if yes, please share your steps.
Since we also encountered a similar issue, it looks that Cmirrord can not join 
the CPG(corosync related concept), then the resource is timeout, the node is 
fenced.

Thanks
Gang

>>> On 2018/11/12 at 15:46, in message
<5be92fc202a10002e...@gwsmtp1.uni-regensburg.de>, "Ulrich Windl"
 wrote:
> Hi!
> 
> While analyzing some odd cluster problem in SLES11 SP4, I found this message 
> repeating quite a lot (several times per second) with the same text:
> 
> [...more...]
> Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX]  Retry #1 of 
> cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
> Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX]  Retry #1 of 
> cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
> Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX]  Retry #1 of 
> cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
> Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX]  Retry #1 of 
> cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
> Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX]  Retry #1 of 
> cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
> Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX]  Retry #1 of 
> cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
> Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX]  Retry #1 of 
> cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
> Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX]  Retry #1 of 
> cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
> Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX]  Retry #1 of 
> cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
> Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX]  Retry #1 of 
> cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
> Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX]  Retry #1 of 
> cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
> [...many more...]
> 
> I wonder: Shouldn't the retry number be incremented? Or are these different 
> retries? If so, where is it visible?
> 
> The situation I'm analyzing is when a node should have been fenced, but 
> somehow it wasn't, but also just stopped working (seemed like frozen). The 
> last message a few minutes(!) before the other rnodes complained was:
> 
> Nov 10 22:04:18 h01 crmd[16596]:   notice: throttle_mode: High CIB load 
> detected: 1.246333
> (After this the node seemed dead/frozen).
> 
> Regards,
> Ulrich
> 
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] VirtualDomain as resources and OCFS2

2018-09-11 Thread Gang He
Hello Lentes,

>>> On 2018/9/11 at 20:50, in message
<584818902.7776848.1536670226935.javamail.zim...@helmholtz-muenchen.de>,
"Lentes, Bernd"  wrote:

> 
> - On Sep 11, 2018, at 4:29 AM, Gang He g...@suse.com wrote:
> 
>> Hello Lentes,
>> 
>> It does not look like a OCFS2 or pacemaker problem, more like virtualization
>> problem.
>> From OCFS2/LVM2 perspective, if you use one LV for one VirtualDomain, that 
> means
>> the guest VMs on that VirtualDomain can not occupy the other LVs' storage
>> space.
> 
> Hi Gang,
> 
> i see that.
> 
>> If you use OCFS2 on one LV for all VirtualDomains, the guest VMs can share 
> the
>> storage space, but you can look at OCFS2 ACL mechanism, maybe this can help 
> to
>> limit each directory to use storage space.
> 
> The storage space isn't the problem. I have enough disks in my SAN, and most 
> of the guests will not grow
> or just grow very slowly.
> 
> Do you see s.th. else which contradicts to the use of one OCFS2 Volume for 
> all virtual guests ?
No, 
just suggest you to create a separated directory for each VirtualDomain, try to 
decouple the image file locations.

Thanks
Gang

> 
> Bernd
> 
>  
> 
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de 
> Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrer: Prof. Dr. med. Dr. h.c. Matthias H. Tschoep, Heinrich 
> Bassler, Dr. rer. nat. Alfons Enhsen
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] VirtualDomain as resources and OCFS2

2018-09-10 Thread Gang He
Hello Lentes,

It does not look like a OCFS2 or pacemaker problem, more like virtualization 
problem.
From OCFS2/LVM2 perspective, if you use one LV for one VirtualDomain, that 
means the guest VMs on that VirtualDomain can not occupy the other LVs' storage 
space.
If you use OCFS2 on one LV for all VirtualDomains, the guest VMs can share the 
storage space, but you can look at OCFS2 ACL mechanism, maybe this can help to 
limit each directory to use storage space. 

Thanks
Gang



>>> On 2018/9/11 at 0:20, in message
<719144896.5605540.1536596432401.javamail.zim...@helmholtz-muenchen.de>,
"Lentes, Bernd"  wrote:
> Hi,
> 
> i'm establishing a cluster with virtual guests as resources which should 
> reside in a raw files on OCFS2 formatted logical volumes.
> My first idea was to create for each VirtualDomain its own logical volume, i 
> thought that would be well-structured. 
> But now i realize that my cluster configuration gets confusing. E.g. i need 
> for each OCFS2 Volume a resource, which must be cloned and which has an order 
> with its respective guest. I'm having about 10 Virtual Domains. 10 
> VirtualDomain resources, 10 clones, 10 orders. Confusing.
> Now i'm thinking of one Logical Volume for all VirtualDomains. Then the 
> cluster configuration would be more overseeable. Just one OCFS2 resource, 
> just one clone, just one order. 
> Is there any reason which contradicts to the use of only one LV and one 
> OCFS2 resource for several VirtualDomains ?
> 
> Thanks.
> 
> 
> Bernd
> 
> -- 
> 
> Bernd Lentes 
> Systemadministration 
> Institut für Entwicklungsgenetik 
> Gebäude 35.34 - Raum 208 
> HelmholtzZentrum münchen 
> [ mailto:bernd.len...@helmholtz-muenchen.de | 
> bernd.len...@helmholtz-muenchen.de ] 
> phone: +49 89 3187 1241 
> fax: +49 89 3187 2294 
> [ http://www.helmholtz-muenchen.de/idg | 
> http://www.helmholtz-muenchen.de/idg ] 
> 
> wer Fehler macht kann etwas lernen 
> wer nichts macht kann auch nichts lernen
>  
> 
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de 
> Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrer: Prof. Dr. med. Dr. h.c. Matthias H. Tschoep, Heinrich 
> Bassler, Dr. rer. nat. Alfons Enhsen
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Fwd: Re: [Cluster-devel] [PATCH] dlm: prompt the user SCTP is experimental

2018-04-16 Thread Gang He
Hi David and Mark,

I compiled my own DLM kernel module with getting ride of "return -EINVAL;" line.
Then did some tests with new DLM kernel module, under two-ring cluster plus 
"protocol=tcp" setting in /etc/dlm/dlm.conf.
1) if both networks were OK, all the tests were passed.
2) if I broken the second ring network, all the tests were passed (no any 
effect, since tcp protocol only uses the first ring's ip address).
3) if I broken the first ring network (e.g. ifconfig eth0 down on node3), 
the tests were hanged on the other nodes (e.g. node1 and node2), until node3 
was rebooted manually or node3's network was back (e.g. ifconfig eth0 up on 
node3). 
4) I switched two-ring cluster into one-ring cluster (edit 
/etc/corosync/corosync.conf), I broken the network from one node, this node was 
fenced immediately.
5) but why the node3 was not fenced in case 3)? it looks like a bug? since the 
tests were hanged, we have to reboot that node manually. 

Thanks
Gang


>>> 
> On Thu, Apr 12, 2018 at 09:31:49PM -0600, Gang He wrote:
>> During this period, could we allow tcp protocol work (rather than return 
> error directly) under two-ring cluster?
>> If the user sets using TCP protocol in command-line or dlm configuration 
> file, could we use the first ring IP address to work?
>> I do not know why we return error directly in this case? there was any 
> concern before?
> 
> You're talking about this:
> 
> /* We don't support multi-homed hosts */
> if (dlm_local_addr[1] != NULL) {
> log_print("TCP protocol can't handle multi-homed hosts, "
>   "try SCTP");
> return -EINVAL;
> }
> 
> I think that should be ok to remove, and just use the first addr.
> Mark, do you see any reason to avoid that?
> 
> Dave

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] snapshoting of running VirtualDomain resources - OCFS2 ?

2018-03-18 Thread Gang He
Hi Lentes,


>>> 

> 
> - On Mar 15, 2018, at 3:47 AM, Gang He g...@suse.com wrote:
>> Just one comments, you have to make sure the VM file integrity before 
> calling
>> reflink.
>> 
> 
> Hi Gang,
> 
> how could i achieve that ? sync ? The disks of the VM's are configured 
> without cache,
> otherwise they can't be live migrated.
I am not familiar with VM, my means is that you need to make sure the VM file 
is integrity when you reflinks that file.
The VM file data can be in memory(page cache) or in disk, OCFS2 file system can 
handle that, 
but OCFS2 do not know if VM file data is integrity from virt-manager angle. 

Thanks
Gang

> 
> 
> Bernd
>  
> 
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de 
> Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons 
> Enhsen
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] snapshoting of running VirtualDomain resources - OCFS2 ?

2018-03-14 Thread Gang He
Hello Lentes,


>>> 
> Hi,
> 
> i have a 2-node-cluster with my services (web, db) running in VirtualDomain 
> resources.
> I have a SAN with cLVM, each guest lies in a dedicated logical volume with 
> an ext3 fs.
> 
> Currently i'm thinking about snapshoting the guests to make a backup in the 
> background. With cLVM that's not possible, you can't snapshot a lustered lv.
> Using virsh and qemu-img i didn't find a way to do this without a shutdown 
> of the guest, which i'd like to avoid.
> 
> I found that ocfs2 is able to make snapshots, oracle calls them reflinks.
> So formatting the logical volumes for the guests with ocfs2 would give me 
> the possibility to snapshot them.
> 
> I know that using ocfs2 for the lv's is oversized, but i didn't find another 
> way to solve my problem.
Yes, OCFS2 reflink can meet your requirement, this is also why ocfs2 introduces 
file clone. 

> 
> What do you think ? I'd like to avoid to shutdown my guests, that's too 
> risky. I experienced already several times that a shutdown can last very long
> because of problems with umounting filesystems because of open files or user 
> connected remotely (on windows guests).
Just one comments, you have to make sure the VM file integrity before calling 
reflink.

Thanks
Gang

> 
> 
> Bernd
> 
> -- 
> 
> Bernd Lentes 
> Systemadministration 
> Institut für Entwicklungsgenetik 
> Gebäude 35.34 - Raum 208 
> HelmholtzZentrum münchen 
> [ mailto:bernd.len...@helmholtz-muenchen.de | 
> bernd.len...@helmholtz-muenchen.de ] 
> phone: +49 89 3187 1241 
> fax: +49 89 3187 2294 
> [ http://www.helmholtz-muenchen.de/idg | 
> http://www.helmholtz-muenchen.de/idg ] 
> 
> no backup - no mercy
>  
> 
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de 
> Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons 
> Enhsen
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Gang He



>>> 
> Hello Gang,
> 
> to follow your instructions, I started the dlm resource via:
> 
>  crm resource start dlm
> 
> then mount/unmount the ocfs2 file system manually..(which seems to be 
> the fix of the situation).
> 
> Now resources are getting started properly on a single node.. I am happy 
> as the issue is fixed, but at the same time I am lost because I have no idea
> 
> how things get fixed here(merely by mounting/unmounting the ocfs2 file 
> systems)

>From your description.
I just wonder  the DLM resource does not work normally under that situation.
Yan/Bin, do you have any comments about two-node cluster? which configuration 
settings will affect corosync quorum/DLM ?


Thanks
Gang


> 
> 
> --
> Regards,
> Muhammad Sharfuddin
> 
> On 3/12/2018 10:59 AM, Gang He wrote:
>> Hello Muhammad,
>>
>> Usually, ocfs2 resource startup failure is caused by mount command timeout 
> (or hanged).
>> The sample debugging method is,
>> remove ocfs2 resource from crm first,
>> then mount this file system manually, see if the mount command will be 
> timeout or hanged.
>> If this command is hanged, please watch where is mount.ocfs2 process hanged 
> via "cat /proc/xxx/stack" command.
>> If the back trace is stopped at DLM kernel module, usually the root cause is 
> cluster configuration problem.
>>
>>
>> Thanks
>> Gang
>>
>>
>>> On 3/12/2018 7:32 AM, Gang He wrote:
>>>> Hello Muhammad,
>>>>
>>>> I think this problem is not in ocfs2, the cause looks like the cluster
>>> quorum is missed.
>>>> For two-node cluster (does not three-node cluster), if one node is offline,
>>> the quorum will be missed by default.
>>>> So, you should configure two-node related quorum setting according to the
>>> pacemaker manual.
>>>> Then, DLM can work normal, and ocfs2 resource can start up.
>>> Yes its configured accordingly, no-quorum is set to "ignore".
>>>
>>> property cib-bootstrap-options: \
>>>have-watchdog=true \
>>>stonith-enabled=true \
>>>stonith-timeout=80 \
>>>startup-fencing=true \
>>>no-quorum-policy=ignore
>>>
>>>> Thanks
>>>> Gang
>>>>
>>>>
>>>>> Hi,
>>>>>
>>>>> This two node cluster starts resources when both nodes are online but
>>>>> does not start the ocfs2 resources
>>>>>
>>>>> when one node is offline. e.g if I gracefully stop the cluster resources
>>>>> then stop the pacemaker service on
>>>>>
>>>>> either node, and try to start the ocfs2 resource on the online node, it
>>>>> fails.
>>>>>
>>>>> logs:
>>>>>
>>>>> pipci001 pengine[17732]:   notice: Start   dlm:0#011(pipci001)
>>>>> pengine[17732]:   notice: Start   p-fssapmnt:0#011(pipci001)
>>>>> pengine[17732]:   notice: Start   p-fsusrsap:0#011(pipci001)
>>>>> pipci001 pengine[17732]:   notice: Calculated transition 2, saving
>>>>> inputs in /var/lib/pacemaker/pengine/pe-input-339.bz2
>>>>> pipci001 crmd[17733]:   notice: Processing graph 2
>>>>> (ref=pe_calc-dc-1520613202-31) derived from
>>>>> /var/lib/pacemaker/pengine/pe-input-339.bz2
>>>>> crmd[17733]:   notice: Initiating start operation dlm_start_0 locally on
>>>>> pipci001
>>>>> lrmd[17730]:   notice: executing - rsc:dlm action:start call_id:69
>>>>> dlm_controld[19019]: 4575 dlm_controld 4.0.7 started
>>>>> lrmd[17730]:   notice: finished - rsc:dlm action:start call_id:69
>>>>> pid:18999 exit-code:0 exec-time:1082ms queue-time:1ms
>>>>> crmd[17733]:   notice: Result of start operation for dlm on pipci001: 0 
>>>>> (ok)
>>>>> crmd[17733]:   notice: Initiating monitor operation dlm_monitor_6
>>>>> locally on pipci001
>>>>> crmd[17733]:   notice: Initiating start operation p-fssapmnt_start_0
>>>>> locally on pipci001
>>>>> lrmd[17730]:   notice: executing - rsc:p-fssapmnt action:start call_id:71
>>>>> Filesystem(p-fssapmnt)[19052]: INFO: Running start for
>>>>> /dev/mapper/sapmnt on /sapmnt
>>>>> kernel: [ 4576.529938] dlm: Using TCP for communications
>>>>> kernel: [ 4576.530233] dlm: BFA9FF042AA045F4822C2A6A06020EE9: joining
>>>>>

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Gang He
Hello Muhammad,

Usually, ocfs2 resource startup failure is caused by mount command timeout (or 
hanged).
The sample debugging method is, 
remove ocfs2 resource from crm first,
then mount this file system manually, see if the mount command will be timeout 
or hanged.  
If this command is hanged, please watch where is mount.ocfs2 process hanged via 
"cat /proc/xxx/stack" command.
If the back trace is stopped at DLM kernel module, usually the root cause is 
cluster configuration problem.


Thanks
Gang


>>> 
> On 3/12/2018 7:32 AM, Gang He wrote:
>> Hello Muhammad,
>>
>> I think this problem is not in ocfs2, the cause looks like the cluster 
> quorum is missed.
>> For two-node cluster (does not three-node cluster), if one node is offline, 
> the quorum will be missed by default.
>> So, you should configure two-node related quorum setting according to the 
> pacemaker manual.
>> Then, DLM can work normal, and ocfs2 resource can start up.
> Yes its configured accordingly, no-quorum is set to "ignore".
> 
> property cib-bootstrap-options: \
>   have-watchdog=true \
>   stonith-enabled=true \
>   stonith-timeout=80 \
>   startup-fencing=true \
>   no-quorum-policy=ignore
> 
>>
>> Thanks
>> Gang
>>
>>
>>> Hi,
>>>
>>> This two node cluster starts resources when both nodes are online but
>>> does not start the ocfs2 resources
>>>
>>> when one node is offline. e.g if I gracefully stop the cluster resources
>>> then stop the pacemaker service on
>>>
>>> either node, and try to start the ocfs2 resource on the online node, it
>>> fails.
>>>
>>> logs:
>>>
>>> pipci001 pengine[17732]:   notice: Start   dlm:0#011(pipci001)
>>> pengine[17732]:   notice: Start   p-fssapmnt:0#011(pipci001)
>>> pengine[17732]:   notice: Start   p-fsusrsap:0#011(pipci001)
>>> pipci001 pengine[17732]:   notice: Calculated transition 2, saving
>>> inputs in /var/lib/pacemaker/pengine/pe-input-339.bz2
>>> pipci001 crmd[17733]:   notice: Processing graph 2
>>> (ref=pe_calc-dc-1520613202-31) derived from
>>> /var/lib/pacemaker/pengine/pe-input-339.bz2
>>> crmd[17733]:   notice: Initiating start operation dlm_start_0 locally on
>>> pipci001
>>> lrmd[17730]:   notice: executing - rsc:dlm action:start call_id:69
>>> dlm_controld[19019]: 4575 dlm_controld 4.0.7 started
>>> lrmd[17730]:   notice: finished - rsc:dlm action:start call_id:69
>>> pid:18999 exit-code:0 exec-time:1082ms queue-time:1ms
>>> crmd[17733]:   notice: Result of start operation for dlm on pipci001: 0 (ok)
>>> crmd[17733]:   notice: Initiating monitor operation dlm_monitor_6
>>> locally on pipci001
>>> crmd[17733]:   notice: Initiating start operation p-fssapmnt_start_0
>>> locally on pipci001
>>> lrmd[17730]:   notice: executing - rsc:p-fssapmnt action:start call_id:71
>>> Filesystem(p-fssapmnt)[19052]: INFO: Running start for
>>> /dev/mapper/sapmnt on /sapmnt
>>> kernel: [ 4576.529938] dlm: Using TCP for communications
>>> kernel: [ 4576.530233] dlm: BFA9FF042AA045F4822C2A6A06020EE9: joining
>>> the lockspace group.
>>> dlm_controld[19019]: 4629 fence work wait for quorum
>>> dlm_controld[19019]: 4634 BFA9FF042AA045F4822C2A6A06020EE9 wait for quorum
>>> lrmd[17730]:  warning: p-fssapmnt_start_0 process (PID 19052) timed out
>>> kernel: [ 4636.418223] dlm: BFA9FF042AA045F4822C2A6A06020EE9: group
>>> event done -512 0
>>> kernel: [ 4636.418227] dlm: BFA9FF042AA045F4822C2A6A06020EE9: group join
>>> failed -512 0
>>> lrmd[17730]:  warning: p-fssapmnt_start_0:19052 - timed out after 6ms
>>> lrmd[17730]:   notice: finished - rsc:p-fssapmnt action:start call_id:71
>>> pid:19052 exit-code:1 exec-time:60002ms queue-time:0ms
>>> kernel: [ 4636.420628] ocfs2: Unmounting device (254,1) on (node 0)
>>> crmd[17733]:error: Result of start operation for p-fssapmnt on
>>> pipci001: Timed Out
>>> crmd[17733]:  warning: Action 11 (p-fssapmnt_start_0) on pipci001 failed
>>> (target: 0 vs. rc: 1): Error
>>> crmd[17733]:   notice: Transition aborted by operation
>>> p-fssapmnt_start_0 'modify' on pipci001: Event failed
>>> crmd[17733]:  warning: Action 11 (p-fssapmnt_start_0) on pipci001 failed
>>> (target: 0 vs. rc: 1): Error
>>> crmd[17733]:   notice: Transition 2 (Complete=5, Pending=0, Fired=0,
>>> Skipped=0, Incomplete=6,
>>> Source=/var/lib/pacemaker/pengine/pe-input-339.bz2): Com

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-11 Thread Gang He
Hello Muhammad,

I think this problem is not in ocfs2, the cause looks like the cluster quorum 
is missed.
For two-node cluster (does not three-node cluster), if one node is offline, the 
quorum will be missed by default.
So, you should configure two-node related quorum setting according to the 
pacemaker manual.
Then, DLM can work normal, and ocfs2 resource can start up.


Thanks
Gang 


>>> 
> Hi,
> 
> This two node cluster starts resources when both nodes are online but 
> does not start the ocfs2 resources
> 
> when one node is offline. e.g if I gracefully stop the cluster resources 
> then stop the pacemaker service on
> 
> either node, and try to start the ocfs2 resource on the online node, it 
> fails.
> 
> logs:
> 
> pipci001 pengine[17732]:   notice: Start   dlm:0#011(pipci001)
> pengine[17732]:   notice: Start   p-fssapmnt:0#011(pipci001)
> pengine[17732]:   notice: Start   p-fsusrsap:0#011(pipci001)
> pipci001 pengine[17732]:   notice: Calculated transition 2, saving 
> inputs in /var/lib/pacemaker/pengine/pe-input-339.bz2
> pipci001 crmd[17733]:   notice: Processing graph 2 
> (ref=pe_calc-dc-1520613202-31) derived from 
> /var/lib/pacemaker/pengine/pe-input-339.bz2
> crmd[17733]:   notice: Initiating start operation dlm_start_0 locally on 
> pipci001
> lrmd[17730]:   notice: executing - rsc:dlm action:start call_id:69
> dlm_controld[19019]: 4575 dlm_controld 4.0.7 started
> lrmd[17730]:   notice: finished - rsc:dlm action:start call_id:69 
> pid:18999 exit-code:0 exec-time:1082ms queue-time:1ms
> crmd[17733]:   notice: Result of start operation for dlm on pipci001: 0 (ok)
> crmd[17733]:   notice: Initiating monitor operation dlm_monitor_6 
> locally on pipci001
> crmd[17733]:   notice: Initiating start operation p-fssapmnt_start_0 
> locally on pipci001
> lrmd[17730]:   notice: executing - rsc:p-fssapmnt action:start call_id:71
> Filesystem(p-fssapmnt)[19052]: INFO: Running start for 
> /dev/mapper/sapmnt on /sapmnt
> kernel: [ 4576.529938] dlm: Using TCP for communications
> kernel: [ 4576.530233] dlm: BFA9FF042AA045F4822C2A6A06020EE9: joining 
> the lockspace group.
> dlm_controld[19019]: 4629 fence work wait for quorum
> dlm_controld[19019]: 4634 BFA9FF042AA045F4822C2A6A06020EE9 wait for quorum
> lrmd[17730]:  warning: p-fssapmnt_start_0 process (PID 19052) timed out
> kernel: [ 4636.418223] dlm: BFA9FF042AA045F4822C2A6A06020EE9: group 
> event done -512 0
> kernel: [ 4636.418227] dlm: BFA9FF042AA045F4822C2A6A06020EE9: group join 
> failed -512 0
> lrmd[17730]:  warning: p-fssapmnt_start_0:19052 - timed out after 6ms
> lrmd[17730]:   notice: finished - rsc:p-fssapmnt action:start call_id:71 
> pid:19052 exit-code:1 exec-time:60002ms queue-time:0ms
> kernel: [ 4636.420628] ocfs2: Unmounting device (254,1) on (node 0)
> crmd[17733]:error: Result of start operation for p-fssapmnt on 
> pipci001: Timed Out
> crmd[17733]:  warning: Action 11 (p-fssapmnt_start_0) on pipci001 failed 
> (target: 0 vs. rc: 1): Error
> crmd[17733]:   notice: Transition aborted by operation 
> p-fssapmnt_start_0 'modify' on pipci001: Event failed
> crmd[17733]:  warning: Action 11 (p-fssapmnt_start_0) on pipci001 failed 
> (target: 0 vs. rc: 1): Error
> crmd[17733]:   notice: Transition 2 (Complete=5, Pending=0, Fired=0, 
> Skipped=0, Incomplete=6, 
> Source=/var/lib/pacemaker/pengine/pe-input-339.bz2): Complete
> pengine[17732]:   notice: Watchdog will be used via SBD if fencing is 
> required
> pengine[17732]:   notice: On loss of CCM Quorum: Ignore
> pengine[17732]:  warning: Processing failed op start for p-fssapmnt:0 on 
> pipci001: unknown error (1)
> pengine[17732]:  warning: Processing failed op start for p-fssapmnt:0 on 
> pipci001: unknown error (1)
> pengine[17732]:  warning: Forcing base-clone away from pipci001 after 
> 100 failures (max=2)
> pengine[17732]:  warning: Forcing base-clone away from pipci001 after 
> 100 failures (max=2)
> pengine[17732]:   notice: Stopdlm:0#011(pipci001)
> pengine[17732]:   notice: Stopp-fssapmnt:0#011(pipci001)
> pengine[17732]:   notice: Calculated transition 3, saving inputs in 
> /var/lib/pacemaker/pengine/pe-input-340.bz2
> pengine[17732]:   notice: Watchdog will be used via SBD if fencing is 
> required
> pengine[17732]:   notice: On loss of CCM Quorum: Ignore
> pengine[17732]:  warning: Processing failed op start for p-fssapmnt:0 on 
> pipci001: unknown error (1)
> pengine[17732]:  warning: Processing failed op start for p-fssapmnt:0 on 
> pipci001: unknown error (1)
> pengine[17732]:  warning: Forcing base-clone away from pipci001 after 
> 100 failures (max=2)
> pipci001 pengine[17732]:  warning: Forcing base-clone away from pipci001 
> after 100 failures (max=2)
> pengine[17732]:   notice: Stopdlm:0#011(pipci001)
> pengine[17732]:   notice: Stopp-fssapmnt:0#011(pipci001)
> pengine[17732]:   notice: Calculated transition 4, saving inputs in 
> /var/lib/pacemaker/pengine/pe-input-341.bz2
> crmd[17733]:   notice: Processing 

Re: [ClusterLabs] [Cluster-devel] DLM connection channel switch take too long time (> 5mins)

2018-03-08 Thread Gang He
Hello Digimer,



>>> 
> On 2018-03-08 12:10 PM, David Teigland wrote:
>>> I use active rrp_mode in corosync.conf and reboot the cluster to let the 
> configuration effective.
>>> But, the about 5 mins hang in new_lockspace() function is still here.
>> 
>> The last time I tested connection failures with sctp was several years
>> ago, but I recall seeing similar problems.  I had hoped that some of the
>> sctp changes might have helped, but perhaps they didn't.
>> Dave
> 
> To add to this; We found serious issues with DLM over sctp/rrp. Our
> solution was to remove RRP and reply on active/passive (mode=1) bonding.
> I do not believe you can make anything using DLM reliable on RRP in
> either active or passive mode.
Do you have the detailed steps to describe this workaround? 
My means is, how to remove RRP? and reply on active/passive (mode=1) bonding?
From the code, we have to use sctp protocol in DLM on a two-rings cluster.

Thanks
Gang

> 
> -- 
> Digimer
> Papers and Projects: https://alteeve.com/w/ 
> "I am, somehow, less interested in the weight and convolutions of
> Einstein’s brain than in the near certainty that people of equal talent
> have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] [Cluster-devel] DLM connection channel switch take too long time (> 5mins)

2018-03-08 Thread Gang He
Hello David,

If sctp implementation did not fix this problem, there is any workaround for a 
two-rings cluster?
Could we use TCP protocol in DLM under a two-rings cluster to by-pass 
connection channel switch issue?

Thanks
Gang 


>>> 
>>  I use active rrp_mode in corosync.conf and reboot the cluster to let the 
> configuration effective.
>> But, the about 5 mins hang in new_lockspace() function is still here.
> 
> The last time I tested connection failures with sctp was several years
> ago, but I recall seeing similar problems.  I had hoped that some of the
> sctp changes might have helped, but perhaps they didn't.
> Dave

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: [Cluster-devel] DLM connection channel switch take too long time (> 5mins)

2018-03-08 Thread Gang He



>>> "Ulrich Windl" <ulrich.wi...@rz.uni-regensburg.de> 03/08/18 7:24 PM >>>
Hi!

What surprises me most is that a connect(...O_NONBLOCK) actually blocks:

EINPROGRESS
  The  socket  is  non-blocking  and the connection cannot be com-
  pleted immediately.
Yes, the behavior does not follow the O_NONBLOCK flag, it is too long for 5 
mins.

Thanks
Gang

Regards,
Ulrich


>>> "Gang He" <g...@suse.com> schrieb am 08.03.2018 um 10:48 in Nachricht
<5aa1776502f9000ad...@prv-mh.provo.novell.com>:
> Hi Feldhost,
> 
> I use active rrp_mode in corosync.conf and reboot the cluster to let the 
> configuration effective.
> But, the about 5 mins hang in new_lockspace() function is still here.
> 
> Thanks
> Gang
>  
> 
>>>> 
>> Hi, so try to use active mode.
>> 
>> https://www.suse.com/documentation/sle_ha/book_sleha/data/sec_ha_installatio 
> 
>> n_terms.html
>> 
>> That fixes I saw in 4.14.*
>> 
>>> On 8 Mar 2018, at 09:12, Gang He <g...@suse.com> wrote:
>>> 
>>> Hi Feldhost,
>>> 
>>> 
>>>>>> 
>>>> Hello Gang He,
>>>> 
>>>> which type of corosync rrp_mode you use? Passive or Active? 
>>> clvm1:/etc/corosync # cat corosync.conf  | grep rrp_mode
>>>rrp_mode:   passive
>>> 
>>> Did you try test both?
>>> No, only this mode. 
>>> Also, what kernel version you use? I see some SCTP fixes in latest kernels.
>>> clvm1:/etc/corosync # uname -r
>>> 4.4.114-94.11-default
>>> It looks that sock->ops->connect() function is blocked for too long time 
>>> before 
>> return, under broken network situation. 
>>> In normal network, sock->ops->connect() function returns very quickly.
>>> 
>>> Thanks
>>> Gang
>>> 
>>>> 
>>>>> On 8 Mar 2018, at 08:52, Gang He <g...@suse.com> wrote:
>>>>> 
>>>>> Hello list and David Teigland,
>>>>> 
>>>>> I got a problem under a two rings cluster, the problem can be reproduced 
>>>> with the below steps.
>>>>> 1) setup a two rings cluster with two nodes.
>>>>> e.g. 
>>>>> clvm1(nodeid 172204569)  addr_list eth0 10.67.162.25 eth1 192.168.152.240
>>>>> clvm2(nodeid 172204570)  addr_list eth0 10.67.162.26 eth1 192.168.152.103
>>>>> 
>>>>> 2) the whole cluster works well, then I put eth0 down on node clvm2, and 
>>>> restart pacemaker service on that node.
>>>>> ifconfig eth0 down
>>>>> rcpacemaker restart
>>>>> 
>>>>> 3) the whole cluster still work well (that means corosync is very smooth 
>>>>> to 
>>>> switch to the other ring).
>>>>> Then, I can mount ocfs2 file system on node clvm2 quickly with the 
>>>>> command 
>>>>> mount /dev/sda /mnt/ocfs2 
>>>>> 
>>>>> 4) Next, I do the same mount on node clvm1, the mount command will be 
>>>>> hanged 
> 
>> 
>>>> for about 5 mins, and finally the mount command is done.
>>>>> But, if we setup a ocfs2 file system resource in pacemaker,
>>>>> the pacemaker resource agent will consider ocfs2 file system resource 
>>>> startup failure before this command returns,
>>>>> the pacemaker will fence node clvm1. 
>>>>> This problem is impacting our customer's estimate, since they think the 
>>>>> two 
>>>> rings can be switched smoothly.
>>>>> 
>>>>> According to this problem, I can see the mount command is hanged with the 
>>>> below back trace,
>>>>> clvm1:/ # cat /proc/6688/stack
>>>>> [] new_lockspace+0x92d/0xa70 [dlm]
>>>>> [] dlm_new_lockspace+0x69/0x160 [dlm]
>>>>> [] user_cluster_connect+0xc8/0x350 [ocfs2_stack_user]
>>>>> [] ocfs2_cluster_connect+0x192/0x240 [ocfs2_stackglue]
>>>>> [] ocfs2_dlm_init+0x31c/0x570 [ocfs2]
>>>>> [] ocfs2_fill_super+0xb33/0x1200 [ocfs2]
>>>>> [] mount_bdev+0x1a0/0x1e0
>>>>> [] mount_fs+0x3a/0x170
>>>>> [] vfs_kern_mount+0x62/0x110
>>>>> [] do_mount+0x213/0xcd0
>>>>> [] SyS_mount+0x85/0xd0
>>>>> [] entry_SYSCALL_64_fastpath+0x1e/0xb6
>>>>> [] 0x
>>>>> 
>>>>> The root cause is in sctp_connect_to_sock() function

Re: [ClusterLabs] [Cluster-devel] DLM connection channel switch take too long time (> 5mins)

2018-03-08 Thread Gang He
Hi Feldhost,

I use active rrp_mode in corosync.conf and reboot the cluster to let the 
configuration effective.
But, the about 5 mins hang in new_lockspace() function is still here.

Thanks
Gang
 

>>> 
> Hi, so try to use active mode.
> 
> https://www.suse.com/documentation/sle_ha/book_sleha/data/sec_ha_installatio 
> n_terms.html
> 
> That fixes I saw in 4.14.*
> 
>> On 8 Mar 2018, at 09:12, Gang He <g...@suse.com> wrote:
>> 
>> Hi Feldhost,
>> 
>> 
>>>>> 
>>> Hello Gang He,
>>> 
>>> which type of corosync rrp_mode you use? Passive or Active? 
>> clvm1:/etc/corosync # cat corosync.conf  | grep rrp_mode
>>rrp_mode:   passive
>> 
>> Did you try test both?
>> No, only this mode. 
>> Also, what kernel version you use? I see some SCTP fixes in latest kernels.
>> clvm1:/etc/corosync # uname -r
>> 4.4.114-94.11-default
>> It looks that sock->ops->connect() function is blocked for too long time 
>> before 
> return, under broken network situation. 
>> In normal network, sock->ops->connect() function returns very quickly.
>> 
>> Thanks
>> Gang
>> 
>>> 
>>>> On 8 Mar 2018, at 08:52, Gang He <g...@suse.com> wrote:
>>>> 
>>>> Hello list and David Teigland,
>>>> 
>>>> I got a problem under a two rings cluster, the problem can be reproduced 
>>> with the below steps.
>>>> 1) setup a two rings cluster with two nodes.
>>>> e.g. 
>>>> clvm1(nodeid 172204569)  addr_list eth0 10.67.162.25 eth1 192.168.152.240
>>>> clvm2(nodeid 172204570)  addr_list eth0 10.67.162.26 eth1 192.168.152.103
>>>> 
>>>> 2) the whole cluster works well, then I put eth0 down on node clvm2, and 
>>> restart pacemaker service on that node.
>>>> ifconfig eth0 down
>>>> rcpacemaker restart
>>>> 
>>>> 3) the whole cluster still work well (that means corosync is very smooth 
>>>> to 
>>> switch to the other ring).
>>>> Then, I can mount ocfs2 file system on node clvm2 quickly with the command 
>>>> mount /dev/sda /mnt/ocfs2 
>>>> 
>>>> 4) Next, I do the same mount on node clvm1, the mount command will be 
>>>> hanged 
> 
>>> for about 5 mins, and finally the mount command is done.
>>>> But, if we setup a ocfs2 file system resource in pacemaker,
>>>> the pacemaker resource agent will consider ocfs2 file system resource 
>>> startup failure before this command returns,
>>>> the pacemaker will fence node clvm1. 
>>>> This problem is impacting our customer's estimate, since they think the 
>>>> two 
>>> rings can be switched smoothly.
>>>> 
>>>> According to this problem, I can see the mount command is hanged with the 
>>> below back trace,
>>>> clvm1:/ # cat /proc/6688/stack
>>>> [] new_lockspace+0x92d/0xa70 [dlm]
>>>> [] dlm_new_lockspace+0x69/0x160 [dlm]
>>>> [] user_cluster_connect+0xc8/0x350 [ocfs2_stack_user]
>>>> [] ocfs2_cluster_connect+0x192/0x240 [ocfs2_stackglue]
>>>> [] ocfs2_dlm_init+0x31c/0x570 [ocfs2]
>>>> [] ocfs2_fill_super+0xb33/0x1200 [ocfs2]
>>>> [] mount_bdev+0x1a0/0x1e0
>>>> [] mount_fs+0x3a/0x170
>>>> [] vfs_kern_mount+0x62/0x110
>>>> [] do_mount+0x213/0xcd0
>>>> [] SyS_mount+0x85/0xd0
>>>> [] entry_SYSCALL_64_fastpath+0x1e/0xb6
>>>> [] 0x
>>>> 
>>>> The root cause is in sctp_connect_to_sock() function in lowcomms.c,
>>>> 1075
>>>> 1076 log_print("connecting to %d", con->nodeid);
>>>> 1077
>>>> 1078 /* Turn off Nagle's algorithm */
>>>> 1079 kernel_setsockopt(sock, SOL_TCP, TCP_NODELAY, (char *),
>>>> 1080   sizeof(one));
>>>> 1081
>>>> 1082 result = sock->ops->connect(sock, (struct sockaddr *), 
>>> addr_len,
>>>> 1083O_NONBLOCK);  <<= here, this 
>>>> invoking 
>>> will cost > 5 mins before return ETIMEDOUT(-110).
>>>> 1084 printk(KERN_ERR "sctp_connect_to_sock connect: %d\n", result);
>>>> 1085
>>>> 1086 if (result == -EINPROGRESS)
>>>> 1087 result = 0;
>>>> 1088 if (result == 0)
>>>> 1089 goto out;
>>>> 
>>>> Then, I want to know if this problem was found/fixed before? 
>>>> it looks DLM can not switch the second ring very quickly, this will impact 
>>> the above application (e.g. CLVM, ocfs2) to create a new lock space before 
>>> it's startup.
>>>> 
>>>> Thanks
>>>> Gang
>>>> 
>>>> 
>>>> ___
>>>> Users mailing list: Users@clusterlabs.org 
>>>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>>> 
>>>> Project Home: http://www.clusterlabs.org 
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>> Bugs: http://bugs.clusterlabs.org
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] [Cluster-devel] DLM connection channel switch take too long time (> 5mins)

2018-03-08 Thread Gang He
Hi Feldhost,


>>> 
> Hello Gang He,
> 
> which type of corosync rrp_mode you use? Passive or Active? 
clvm1:/etc/corosync # cat corosync.conf  | grep rrp_mode
rrp_mode:   passive

Did you try test both?
No, only this mode. 
Also, what kernel version you use? I see some SCTP fixes in latest kernels.
clvm1:/etc/corosync # uname -r
4.4.114-94.11-default
It looks that sock->ops->connect() function is blocked for too long time before 
return, under broken network situation. 
In normal network, sock->ops->connect() function returns very quickly.

Thanks
Gang

> 
>> On 8 Mar 2018, at 08:52, Gang He <g...@suse.com> wrote:
>> 
>> Hello list and David Teigland,
>> 
>> I got a problem under a two rings cluster, the problem can be reproduced 
> with the below steps.
>> 1) setup a two rings cluster with two nodes.
>> e.g. 
>> clvm1(nodeid 172204569)  addr_list eth0 10.67.162.25 eth1 192.168.152.240
>> clvm2(nodeid 172204570)  addr_list eth0 10.67.162.26 eth1 192.168.152.103
>> 
>> 2) the whole cluster works well, then I put eth0 down on node clvm2, and 
> restart pacemaker service on that node.
>> ifconfig eth0 down
>> rcpacemaker restart
>> 
>> 3) the whole cluster still work well (that means corosync is very smooth to 
> switch to the other ring).
>> Then, I can mount ocfs2 file system on node clvm2 quickly with the command 
>> mount /dev/sda /mnt/ocfs2 
>> 
>> 4) Next, I do the same mount on node clvm1, the mount command will be hanged 
> for about 5 mins, and finally the mount command is done.
>> But, if we setup a ocfs2 file system resource in pacemaker,
>> the pacemaker resource agent will consider ocfs2 file system resource 
> startup failure before this command returns,
>> the pacemaker will fence node clvm1. 
>> This problem is impacting our customer's estimate, since they think the two 
> rings can be switched smoothly.
>> 
>> According to this problem, I can see the mount command is hanged with the 
> below back trace,
>> clvm1:/ # cat /proc/6688/stack
>> [] new_lockspace+0x92d/0xa70 [dlm]
>> [] dlm_new_lockspace+0x69/0x160 [dlm]
>> [] user_cluster_connect+0xc8/0x350 [ocfs2_stack_user]
>> [] ocfs2_cluster_connect+0x192/0x240 [ocfs2_stackglue]
>> [] ocfs2_dlm_init+0x31c/0x570 [ocfs2]
>> [] ocfs2_fill_super+0xb33/0x1200 [ocfs2]
>> [] mount_bdev+0x1a0/0x1e0
>> [] mount_fs+0x3a/0x170
>> [] vfs_kern_mount+0x62/0x110
>> [] do_mount+0x213/0xcd0
>> [] SyS_mount+0x85/0xd0
>> [] entry_SYSCALL_64_fastpath+0x1e/0xb6
>> [] 0x
>> 
>> The root cause is in sctp_connect_to_sock() function in lowcomms.c,
>> 1075
>> 1076 log_print("connecting to %d", con->nodeid);
>> 1077
>> 1078 /* Turn off Nagle's algorithm */
>> 1079 kernel_setsockopt(sock, SOL_TCP, TCP_NODELAY, (char *),
>> 1080   sizeof(one));
>> 1081
>> 1082 result = sock->ops->connect(sock, (struct sockaddr *), 
> addr_len,
>> 1083O_NONBLOCK);  <<= here, this 
>> invoking 
> will cost > 5 mins before return ETIMEDOUT(-110).
>> 1084 printk(KERN_ERR "sctp_connect_to_sock connect: %d\n", result);
>> 1085
>> 1086 if (result == -EINPROGRESS)
>> 1087 result = 0;
>> 1088 if (result == 0)
>> 1089 goto out;
>> 
>> Then, I want to know if this problem was found/fixed before? 
>> it looks DLM can not switch the second ring very quickly, this will impact 
> the above application (e.g. CLVM, ocfs2) to create a new lock space before 
> it's startup.
>> 
>> Thanks
>> Gang
>> 
>> 
>> ___
>> Users mailing list: Users@clusterlabs.org 
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>> 
>> Project Home: http://www.clusterlabs.org 
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> Bugs: http://bugs.clusterlabs.org
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] DLM connection channel switch take too long time (> 5mins)

2018-03-07 Thread Gang He
Hello list and David Teigland,

I got a problem under a two rings cluster, the problem can be reproduced with 
the below steps.
1) setup a two rings cluster with two nodes.
e.g. 
clvm1(nodeid 172204569)  addr_list eth0 10.67.162.25 eth1 192.168.152.240
clvm2(nodeid 172204570)  addr_list eth0 10.67.162.26 eth1 192.168.152.103

2) the whole cluster works well, then I put eth0 down on node clvm2, and 
restart pacemaker service on that node.
ifconfig eth0 down
rcpacemaker restart

3) the whole cluster still work well (that means corosync is very smooth to 
switch to the other ring).
Then, I can mount ocfs2 file system on node clvm2 quickly with the command 
mount /dev/sda /mnt/ocfs2 

4) Next, I do the same mount on node clvm1, the mount command will be hanged 
for about 5 mins, and finally the mount command is done.
But, if we setup a ocfs2 file system resource in pacemaker,
the pacemaker resource agent will consider ocfs2 file system resource startup 
failure before this command returns,
the pacemaker will fence node clvm1. 
This problem is impacting our customer's estimate, since they think the two 
rings can be switched smoothly.

According to this problem, I can see the mount command is hanged with the below 
back trace,
clvm1:/ # cat /proc/6688/stack
[] new_lockspace+0x92d/0xa70 [dlm]
[] dlm_new_lockspace+0x69/0x160 [dlm]
[] user_cluster_connect+0xc8/0x350 [ocfs2_stack_user]
[] ocfs2_cluster_connect+0x192/0x240 [ocfs2_stackglue]
[] ocfs2_dlm_init+0x31c/0x570 [ocfs2]
[] ocfs2_fill_super+0xb33/0x1200 [ocfs2]
[] mount_bdev+0x1a0/0x1e0
[] mount_fs+0x3a/0x170
[] vfs_kern_mount+0x62/0x110
[] do_mount+0x213/0xcd0
[] SyS_mount+0x85/0xd0
[] entry_SYSCALL_64_fastpath+0x1e/0xb6
[] 0x

The root cause is in sctp_connect_to_sock() function in lowcomms.c,
1075
1076 log_print("connecting to %d", con->nodeid);
1077
1078 /* Turn off Nagle's algorithm */
1079 kernel_setsockopt(sock, SOL_TCP, TCP_NODELAY, (char *),
1080   sizeof(one));
1081
1082 result = sock->ops->connect(sock, (struct sockaddr *), 
addr_len,
1083O_NONBLOCK);  <<= here, this invoking 
will cost > 5 mins before return ETIMEDOUT(-110).
1084 printk(KERN_ERR "sctp_connect_to_sock connect: %d\n", result);
1085
1086 if (result == -EINPROGRESS)
1087 result = 0;
1088 if (result == 0)
1089 goto out;

Then, I want to know if this problem was found/fixed before? 
it looks DLM can not switch the second ring very quickly, this will impact the 
above application (e.g. CLVM, ocfs2) to create a new lock space before it's 
startup.

Thanks
Gang


___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Corosync OCFS2

2017-11-08 Thread Gang He
Hello David,

If you want to use OCFS2 with Pacemaker stack, you do not need ocfs2_controld 
in the new version.
you do not need configure o2cb resource too.

I can give you a crm demo in SLE12SP3 environment (actually there is not any 
change since SLE12SP1)

crm(live/tb-node1)configure# show
node 1084784015: tb-node2
node 1084784039: tb-node1
node 1084784110: tb-node3
primitive dlm ocf:pacemaker:controld \
op monitor interval=60 timeout=60
primitive fs1 Filesystem \
params directory="/mnt/shared" fstype=ocfs2 device="/dev/sdb1" \
op start timeout=60s interval=0 \
op stop timeout=60s interval=0 \
op monitor interval=20s timeout=40s
primitive stonith-libvirt stonith:external/libvirt \
params hostlist="tb-node1,tb-node2,tb-node3" 
hypervisor_uri="qemu+tcp://192.168.125.1/system" \
op monitor interval=60 timeout=120 \
meta target-role=Started
group base-group dlm fs1
clone base-clone base-group \
meta interleave=true
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=1.1.17-3.1-36d2962a8 \
cluster-infrastructure=corosync \
cluster-name=hacluster \
stonith-enabled=true \
placement-strategy=balanced
rsc_defaults rsc-options: \
resource-stickiness=1 \
migration-threshold=3
op_defaults op-options: \
timeout=600 \
record-pending=true


Thanks
Gang



>>> 
> I'm trying to set up a 2 node cluster using OCFS2 with a Pacemaker and
> Corosync stack on Debian. I attempted to ocf:heartbeat:o2cb to satisfy
> the o2cb requirement of OCFS2, but found that the required daemon
> o2cb_controld.pcmk is not available for Debian because it was dependent
> on OpenAIS which is no longer part of Corosync. I've reviewed the
> relevant code for this daemon, but I am not familiar with the Corosync
> or OpenAIS APIs in order to make the necessary conversion. The relevant
> code is less than 200 lines long and can be found here:
> https://oss.oracle.com/git/gitweb.cgi?p=ocfs2-tools.git;a=blob;f=ocfs2_contro 
> ld/pacemaker.c;h=18f776a748ca4d39f06c9bad84c7faf5fe0c6910;hb=HEAD
> Can someone take a look at this code and tell me if it can be converted
> to Corosync, and if so point me in the direction of how to begin? Is
> Corosync CPG the replacement for OpenAIS? 
> 
> I'm able to get OCFS2 working with lsb:o2cb, but OCFS2 fails the
> ping_pong test provided with ctdb which is my ultimate goal here. From
> my understanding, o2cb must use o2cb_controld.pcmk in order for OCFS2 to
> function correctly in regards to ctdb. I obviously haven't been able to
> test this configuration due to the current OpenAIS requirement of
> o2cb_controld.pcmk. 
> 
> Thanks, 
> 
> David Ellingsworth


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] PSA Ubuntu 16.04 and OCSF2 corruption

2017-04-11 Thread Gang He
Hello Kyle,

>From the ocfs2 code, ocfs2 supports fstrim operation in the cluster 
>environment.
If there was a file system corruption encountered, it should be a bug.
By the way, you should be recommended to run a fstrim operation from one node 
in the cluster.


Thanks
Gang 


>>> 
> Hello,
> 
> Just opened what I think is a bug with the Ubuntu util-linux package:
> 
> https://bugs.launchpad.net/ubuntu/+source/util-linux/+bug/1681410 
> 
> TL;DR
> 
> The 'fstrim' command is run weekly on all filesystems.  If you're using 
> ocfs2 and the same filesystem is mounted on multiple Ubuntu 16.04 servers, 
> this fstrim is run at the same time to the same device from all servers.  I'm 
> positive this is what's causing my filesystem corruption issues, which occurs 
> a minute or two after fstrim is scheduled to run.
> 
> -Kyle
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> http://lists.clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] HA/Clusterlabs Summit 2017 Proposal

2017-02-06 Thread Gang He
Hi Kristoffer,

The meeting looks very attractive.
Just one question, does the meeting have any website to archive the previous 
topics/presentations/materials?


Thanks
Gang 


>>> 
> Hi everyone!
> 
> The last time we had an HA summit was in 2015, and the intention then
> was to have SUSE arrange the next meetup in the following year. We did
> try to find a date that would be suitable for everyone, but for various
> reasons there was never a conclusion and 2016 came and went.
> 
> Well, I'd like to give it another try this year! This time, I've already
> got a proposal for a place and date: September 7-8 in Nuremberg, Germany
> (SUSE main office). I've got the new event area in the SUSE office
> already reserved for these dates.
> 
> My suggestion is to do a two day event similar to the one in Brno, but I
> am open to any suggestions as to format and content. The main reason for
> having the event would be for everyone to have a chance to meet and get
> to know each other, but it's also an opportunity to discuss the future
> of Clusterlabs and the direction going forward.
> 
> Any thoughts or feedback are more than welcome! Let me know if you are
> interested in coming or unable to make it.
> 
> Cheers,
> Kristoffer
> 
> -- 
> // Kristoffer Grönlund
> // kgronl...@suse.com 
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> http://lists.clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org