Re: [ClusterLabs] mdraid - pacemaker resource agent

2022-12-09 Thread Roger Zhou via Users
Mastercard Mountain View, Central Park | Leopard -Original Message- From: Roger Zhou Sent: Thursday 8 December 2022 05:56 To: Cluster Labs - All topics related to open-source clustering welcomed ; Jelen, Piotr Cc: Nielsen, Laust Subject: {EXTERNAL} Re: [ClusterLabs] mdraid - pacemaker

Re: [ClusterLabs] mdraid - pacemaker resource agent

2022-12-07 Thread Roger Zhou via Users
On 12/7/22 18:44, Jelen, Piotr wrote: Hi ClusterLabs team , I would like to ask if this resource agent was tested and if it can be use in production? resource-agents/mdraid at main · ClusterLabs/resource-agents · GitHub

Re: [ClusterLabs] Q: fence_kdump and fence_kdump_send

2022-02-25 Thread Roger Zhou via Users
On 2/24/22 20:21, Ulrich Windl wrote: Hi! After reading about fence_kdump and fence_kdump_send I wonder: Does anybody use that in production? Having the networking and bonding in initrd does not sound like a good idea to me. I assume one of motivation for fence_kdump is to reduce the

Re: [ClusterLabs] what is the "best" way to completely shutdown a two-node cluster ?

2022-02-10 Thread Roger Zhou via Users
On 2/9/22 17:46, Lentes, Bernd wrote: - On Feb 7, 2022, at 4:13 PM, Jehan-Guillaume de Rorthais j...@dalibo.com wrote: On Mon, 7 Feb 2022 14:24:44 +0100 (CET) "Lentes, Bernd" wrote: Hi, i'm currently changing a bit in my cluster because i realized that my configuration for a power

Re: [ClusterLabs] Possible timing bug in SLES15

2021-10-12 Thread Roger Zhou via Users
On 10/12/21 3:32 PM, Ulrich Windl wrote: Hi! I just examined the corosync.service unit in SLES15. It contains: # /usr/lib/systemd/system/corosync.service [Unit] Description=Corosync Cluster Engine Documentation=man:corosync man:corosync.conf man:corosync_overview

Re: [ClusterLabs] (no subject)

2021-09-02 Thread Roger Zhou via Users
On 9/3/21 10:09 AM, ?? via Users wrote: HELLO! ?0?2 ?0?2 I built a two node corosync + pacemaker cluster?? and the main end runs on node0. There are two network ports with the same network segment IP on node0. I This need attention a bit. "Usually not a good idea to connect two

Re: [ClusterLabs] Antw: [EXT] Re: Antw: Hanging OCFS2 Filesystem any one else?

2021-07-12 Thread Roger Zhou
On 7/9/21 3:56 PM, Ulrich Windl wrote: [...] h19 kernel: Out of memory: Killed process 6838 (corosync) total-vm:261212kB, anon-rss:31444kB, file-rss:7700kB, shmem-rss:121872kB I doubt that was the best possible choice ;-) The dead corosync caused the DC (h18) to fence h19 (which was

Re: [ClusterLabs] Antw: [EXT] Correctly stop pacemaker on 2-node cluster with SBD and failed devices?

2021-06-16 Thread Roger Zhou
On 6/16/21 3:03 PM, Andrei Borzenkov wrote: We thought that access to storage was restored, but one step was missing so devices appeared empty. At this point I tried to restart the pacemaker. But as soon as I stopped pacemaker SBD rebooted nodes ‑ which is logical, as quorum was now lost.

Re: [ClusterLabs] Q: VirtualDomain RA

2021-03-02 Thread Roger Zhou
On 3/1/21 7:17 PM, Ulrich Windl wrote: Hi! I have a question about the VirtualDomain RA (as in SLES15 SP2): Why does the RA "undefine", then "create" a domain instead of just "start"ing a domain? I mean: Assuming that an "installation" does "define" the domains, why bother with configuration

Re: [ClusterLabs] Q: What is lvmlockd locking?

2021-01-22 Thread Roger Zhou
On 1/22/21 6:58 PM, Ulrich Windl wrote: Roger Zhou schrieb am 22.01.2021 um 11:26 in Nachricht <8dcd53e2-b65b-aafe-ae29-7bdeea3b8...@suse.com>: On 1/22/21 5:45 PM, Ulrich Windl wrote: Roger Zhou schrieb am 22.01.2021 um 10:18 in Nachricht : Could be the naming of lv

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] Re: Q: What is lvmlockd locking?

2021-01-22 Thread Roger Zhou
On 1/22/21 5:45 PM, Ulrich Windl wrote: Roger Zhou schrieb am 22.01.2021 um 10:18 in Nachricht : Could be the naming of lvmlockd and virtlockd mislead you, I guess. I agree that there is one "virtlockd" name in the resources that refers to lvmlockd. That is confusin

Re: [ClusterLabs] Antw: Re: Antw: [EXT] Re: Q: What is lvmlockd locking?

2021-01-22 Thread Roger Zhou
On 1/22/21 4:17 PM, Ulrich Windl wrote: Gang He schrieb am 22.01.2021 um 09:13 in Nachricht <1fd1c07d-d12c-fea9-4b17-90a977fe7...@suse.com>: Hi Ulrich, I reviewed the crm configuration file, there are some comments as below, 1) lvmlockd resource is used for shared VG, if you do not plan to

Re: [ClusterLabs] Antw: [EXT] Re: Questions about the infamous TOTEM retransmit list

2021-01-13 Thread Roger Zhou
On 1/13/21 3:31 PM, Ulrich Windl wrote: Roger Zhou schrieb am 13.01.2021 um 05:32 in Nachricht <97ac2305-85b4-cbb0-7133-ac1372143...@suse.com>: On 1/12/21 4:23 PM, Ulrich Windl wrote: Hi! Before setting up our first pacemaker cluster we thought one low-speed redundant network

Re: [ClusterLabs] Questions about the infamous TOTEM retransmit list

2021-01-13 Thread Roger Zhou
On 1/12/21 4:23 PM, Ulrich Windl wrote: Hi! Before setting up our first pacemaker cluster we thought one low-speed redundant network would be good in addition to the normal high-speed network. However as is seems now (SLES15 SP2) there is NO reasonable RRP mode to drive such a configuration

Re: [ClusterLabs] Antw: Re: Antw: [EXT] delaying start of a resource

2020-12-17 Thread Roger Zhou
Here is a tool intend to standardize the approach to simulate split-brain https://software.opensuse.org/package/python3-cluster-preflight-check After installation, simply run the comand: `ha-cluster-preflight-check --split-brain-iptables` Thanks, Roger On 12/17/20 4:14 PM, Gabriele Bulfon

Re: [ClusterLabs] Antw: Another word of warning regarding VirtualDomain and Live Migration

2020-12-16 Thread Roger Zhou
On 12/16/20 5:06 PM, Ulrich Windl wrote: Hi! (I changed the subject of the thread) VirtualDomain seems to be broken, as it does not handle a failed live-,igration correctly: With my test-VM running on node h16, this happened when I tried to move it away (for testing): Dec 16 09:28:46 h19

Re: [ClusterLabs] crm enhancement proposal (configure grep): Opinions?

2020-12-16 Thread Roger Zhou
Hi Ulrich, Sounds reasonable and handy! Can you create the github issue to track this? Thanks, Roger On 11/30/20 8:47 PM, Ulrich Windl wrote: Hi! Would would users of crm shell think about this enhancement proposal: crm configure grep That command would search the configuration for any

Re: [ClusterLabs] resource management of standby node

2020-12-08 Thread Roger Zhou
On 12/1/20 4:03 PM, Ulrich Windl wrote: Ken Gaillot schrieb am 30.11.2020 um 19:52 in Nachricht : ... Though there's nothing wrong with putting all nodes in standby. Another alternative would be to set the stop-all-resources cluster property. Hi Ken, thanks for the valuable feedback!

Re: [ClusterLabs] Antw: [EXT] Re: Q: high-priority messages from DLM?

2020-12-08 Thread Roger Zhou
On 12/8/20 6:48 PM, Strahil Nikolov wrote: Nope, but if you don't use clustered FS, you could also use plain LVM + tags. As far as I know you need dlm and clvmd for clustered FS. FYI, clvmd is dropped since lvm2 v2_03, and is replaced by lvmlockd. BTW, lvmlockd (or its precedent clvmd) is

Re: [ClusterLabs] Antw: [EXT] sbd v1.4.2

2020-12-08 Thread Roger Zhou
Great news for the new version, first of all! On 12/8/20 8:12 PM, Klaus Wenninger wrote: On 12/8/20 11:51 AM, Klaus Wenninger wrote: On 12/3/20 9:29 AM, Reid Wahl wrote: On Thu, Dec 3, 2020 at 12:03 AM Ulrich Windl [...] ‑ add robustness against misconfiguration / improve documentation

Re: [ClusterLabs] Q: "crm node status" display

2020-12-08 Thread Roger Zhou
Can you create the Github Issues before we lose tracking? Thank you Ulrich! https://github.com/ClusterLabs/crmsh/issues BR, Roger On 11/20/20 2:50 PM, Ulrich Windl wrote: Hi! Setting up a new cluster with SLES15 SP2, I'm wondering: "crm node status" displays XML. Is that the way it should

Re: [ClusterLabs] Antw: [EXT] Re: Setting up HA cluster on Raspberry pi4 with ubuntu 20.04 aarch64 architecture

2020-06-16 Thread Roger Zhou
On 6/15/20 3:44 PM, Ulrich Windl wrote: Strahil Nikolov schrieb am 12.06.2020 um 14:00 in Nachricht <22726_1591963256_5EE36E78_22726_156_1_03FA2901-B9CC-4CE7-8952-283A864E1C72@yaho .com>: Out of curiosity , are you running it on sles/opensuse? I think it is easier with 'crm cluster

Re: [ClusterLabs] Mirrored cLVM/Xen PVM Performance question for block device

2020-05-25 Thread Roger Zhou
On 5/20/20 2:50 PM, Ulrich Windl wrote: Hi! I have a performance question regarding delay for reading blocks in a PV Xen VM. Forst a little background: Originally to monitor NFS outages, I developed a tool "iotwatch" (short: IOTW) that reads the first block of a block device or file (or

Re: [ClusterLabs] SBD restarted the node while pacemaker in maintenance mode

2019-12-26 Thread Roger Zhou
On 12/24/19 11:48 AM, Jerry Kross wrote: > Hi, > The pacemaker cluster manages a 2 node database cluster configured to use 3 > iscsi disk targets in its stonith configuration. The pacemaker cluster was > put > in maintenance mode but we see SBD writing to the system logs. And just after >

Re: [ClusterLabs] Dual Primary DRBD + OCFS2

2019-11-19 Thread Roger Zhou
On 11/19/19 4:51 PM, Илья Насонов wrote: > Hello! > > Configured a cluster (2-node DRBD+DLM+CFS2) and it works. > > I heard the opinion that OCFS2 file system is better. Found an old > cluster setup > description:https://wiki.clusterlabs.org/wiki/Dual_Primary_DRBD_%2B_OCFS2 > > but as I

Re: [ClusterLabs] Antw: Re: fencing on iscsi device not working

2019-11-06 Thread Roger Zhou
On 11/7/19 1:55 AM, Andrei Borzenkov wrote: > 06.11.2019 18:55, Ken Gaillot пишет: >> On Wed, 2019-11-06 at 08:04 +0100, Ulrich Windl wrote: >> Ken Gaillot schrieb am 05.11.2019 um >> 16:05 in >>> >>> Nachricht >>> : Coincidentally, the documentation for the pcmk_host_check default

Re: [ClusterLabs] fencing on iscsi device not working

2019-11-04 Thread Roger Zhou
On 11/3/19 12:56 AM, wf...@niif.hu wrote: > Andrei Borzenkov writes: > >> According to documentation, pcmk_host_list is used only if >> pcmk_host_check=static-list which is not default, by default pacemaker >> queries agent for nodes it can fence and fence_scsi does not return >> anything. >

Re: [ClusterLabs] Stupid DRBD/LVM Global Filter Question

2019-10-30 Thread Roger Zhou
On 10/30/19 6:17 AM, Eric Robinson wrote: > If I have an LV as a backing device for a DRBD disk, can someone explain > why I need an LVM filter? It seems to me that we would want the LV to be > always active under both the primary and secondary DRBD devices, and > there should be no need or

Re: [ClusterLabs] volume group won't start in a nested DRBD setup

2019-10-29 Thread Roger Zhou
On 10/29/19 12:30 PM, Andrei Borzenkov wrote: >> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO: Activating volume group vg0 >> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO: Reading all physical >> volumes. This may take a while... Found volume group "vmspace" using >> metadata type

Re: [ClusterLabs] Safe way to stop pacemaker on both nodes of a two node cluster

2019-10-20 Thread Roger Zhou
On 10/21/19 12:28 AM, Valentin Vidić wrote: > On Sun, Oct 20, 2019 at 09:24:31PM +0530, Dileep V Nair wrote: >> I am confused about the best way to stop pacemaker on both nodes of a >> two node cluster. The options I know of are >> 1. Put the cluster in Maintenance Mode, stop the

Re: [ClusterLabs] Antw: Re: DLM, cLVM, GFS2 and OCFS2 managed by systemd instead of crm ?

2019-10-16 Thread Roger Zhou
On 10/16/19 3:19 PM, Ulrich Windl wrote: >>>> Roger Zhou schrieb am 16.10.2019 um 08:54 in Nachricht > : >> Hi Bernd, >> >> Apart from Ken's insights. >> >> I try to put it simple between systemd vs. pacemaker: >> >> pacemaker

Re: [ClusterLabs] DLM, cLVM, GFS2 and OCFS2 managed by systemd instead of crm ?

2019-10-16 Thread Roger Zhou
Hi Bernd, Apart from Ken's insights. I try to put it simple between systemd vs. pacemaker: pacemaker does manage dependencies among nodes, well, systemd just not. Cheers, Roger On 10/16/19 5:16 AM, Ken Gaillot wrote: > On Tue, 2019-10-15 at 21:35 +0200, Lentes, Bernd wrote: >> Hi, >> >> i'm a

Re: [ClusterLabs] SBD with shared device - loss of both interconnect and shared device?

2019-10-10 Thread Roger Zhou
On 10/9/19 3:28 PM, Andrei Borzenkov wrote: > What happens if both interconnect and shared device is lost by node? I > assume node will reboot, correct? > From my understanding from Pacemaker integration feature in `man sbd` Yes, sbd will do self-fence upon lose access to sbd disk when the

Re: [ClusterLabs] Where to find documentation for cluster MD?

2019-10-10 Thread Roger Zhou
In addition to the admin guide, there are some more advanced articles about the internals: https://lwn.net/Articles/674085/ https://www.kernel.org/doc/Documentation/driver-api/md/md-cluster.rst Cheers, Roger On 10/10/19 4:27 PM, Gang He wrote: > Hello Ulrich > > Cluster MD belongs to SLE HA

Re: [ClusterLabs] Gracefully stop nodes one by one with disk-less sbd

2019-08-13 Thread Roger Zhou
On 8/12/19 9:24 PM, Klaus Wenninger wrote: [...] > If you shutdown solely pacemaker one-by-one on all nodes > and these shutdowns are considered graceful then you are > not gonna experience any reboots (e.g. 3 node cluster). While revisit what you said, then run `systemctl stop pacemaker`

Re: [ClusterLabs] Antw: Re: Gracefully stop nodes one by one with disk-less sbd

2019-08-12 Thread Roger Zhou
On 8/12/19 2:48 PM, Ulrich Windl wrote: Andrei Borzenkov schrieb am 09.08.2019 um 18:40 in > Nachricht <217d10d8-022c-eaf6-28ae-a4f58b2f9...@gmail.com>: >> 09.08.2019 16:34, Yan Gao пишет: [...] >> >> Lack of cluster wide shutdown mode was mentioned more than once on this >> list. I

Re: [ClusterLabs] corosync.service (and sbd.service) are not stopper on pacemaker shutdown when corosync-qdevice is used

2019-08-09 Thread Roger Zhou
On 8/9/19 3:39 PM, Jan Friesse wrote: > Roger Zhou napsal(a): >> >> On 8/9/19 2:27 PM, Roger Zhou wrote: >>> >>> On 7/29/19 12:24 AM, Andrei Borzenkov wrote: >>>> corosync.service sets StopWhenUnneded=yes which normally stops it when >&g

Re: [ClusterLabs] corosync.service (and sbd.service) are not stopper on pacemaker shutdown when corosync-qdevice is used

2019-08-09 Thread Roger Zhou
On 8/9/19 2:27 PM, Roger Zhou wrote: > > On 7/29/19 12:24 AM, Andrei Borzenkov wrote: >> corosync.service sets StopWhenUnneded=yes which normally stops it when >> pacemaker is shut down. One more thought, Make sense to add "RefuseManualStop=true" to pacemaker.se

Re: [ClusterLabs] corosync.service (and sbd.service) are not stopper on pacemaker shutdown when corosync-qdevice is used

2019-08-09 Thread Roger Zhou
On 7/29/19 12:24 AM, Andrei Borzenkov wrote: > corosync.service sets StopWhenUnneded=yes which normally stops it when > pacemaker is shut down. `systemctl stop corosync.service` is the right command to stop those cluster stack. It stops pacemaker and corosync-qdevice first, and stop SBD too.

Re: [ClusterLabs] Feedback wanted: Node reaction to fabric fencing

2019-07-25 Thread Roger Zhou
On 7/25/19 1:33 AM, Ken Gaillot wrote: > Hi all, > > A recent bugfix (clbz#5386) brings up a question. > > A node may receive notification of its own fencing when fencing is > misconfigured (for example, an APC switch with the wrong plug number) > or when fabric fencing is used that doesn't

[ClusterLabs] What's the best practice to scale-out/increase the cluster size? (was: "node is unclean" leads to gratuitous reboot)

2019-07-11 Thread Roger Zhou
On 7/11/19 2:15 AM, Michael Powell wrote: > Thanks to you and Andrei for your responses. In our particular situation, we > want to be able to operate with either node in stand-alone mode, or with both > nodes protected by HA. I did not mention this, but I am working on upgrading > our

Re: [ClusterLabs] Where do we download the source code of libdlm

2019-05-27 Thread Roger Zhou
David settled a new home for it more than two years ago https://pagure.io/dlm Cheers, Roger On 5/27/19 5:04 PM, Gang He wrote: Hello Guys, As the subject said, I want to download the source code of libdlm, to see its git log changes. libdm is used to build dlm_controld, dlm_stonith,

Re: [ClusterLabs] Anyone have a document on how to configure VMWare fencing on Suse Linux

2018-12-16 Thread Roger Zhou
The following command will give you the detailed information: crm ra info stonith:external/vcenter Hope it is useful. Cheers, Roger On 12/14/18 12:29 AM, Dileep V Nair wrote: Hi, I am using pacemaker for my clusters and shared sbd disk as the Stonith mechanism. Now I have an issue

Re: [ClusterLabs] Antw: LVM resource and DAS - would two resources off one DAS...

2017-08-04 Thread roger zhou
On 07/27/2017 09:20 PM, Ulrich Windl wrote: Hi! I think it will work, because the cluster does not monitor the PVs or prtition or LUNs. It just checks whether you can activate the LVs (i.e.: the VG). That's what I know... Regards, Ulrich lejeczek schrieb am

Re: [ClusterLabs] clustered MD - beyond RAID1

2015-12-25 Thread roger zhou
On 12/22/2015 10:33 AM, Tejas Rao wrote: On 12/21/2015 20:50, Aaron Knister wrote: [...] I'm curious now, Redhat doesn't support SW raid failover? I did some googling and found this: https://access.redhat.com/solutions/231643 While I can't read the solution I have to figure that they're

Re: [ClusterLabs] (no subject) --> JUNK email

2015-10-09 Thread roger zhou
TaMen说我挑食 <974120...@qq.com>, You'd better compose your email title with a word like, JUNK or TEST, to avoid misleading people here. Digimer, You are really nice! It is suspicious to me this user just to send a junk email to confirm the subscription not in digest format ;) Regards, Roger