Mastercard
Mountain View, Central Park | Leopard
-Original Message-
From: Roger Zhou
Sent: Thursday 8 December 2022 05:56
To: Cluster Labs - All topics related to open-source clustering welcomed
; Jelen, Piotr
Cc: Nielsen, Laust
Subject: {EXTERNAL} Re: [ClusterLabs] mdraid - pacemaker
On 12/7/22 18:44, Jelen, Piotr wrote:
Hi ClusterLabs team ,
I would like to ask if this resource agent was tested and if it can be use in
production?
resource-agents/mdraid at main · ClusterLabs/resource-agents · GitHub
On 2/24/22 20:21, Ulrich Windl wrote:
Hi!
After reading about fence_kdump and fence_kdump_send I wonder:
Does anybody use that in production?
Having the networking and bonding in initrd does not sound like a good idea to
me.
I assume one of motivation for fence_kdump is to reduce the
On 2/9/22 17:46, Lentes, Bernd wrote:
- On Feb 7, 2022, at 4:13 PM, Jehan-Guillaume de Rorthais j...@dalibo.com
wrote:
On Mon, 7 Feb 2022 14:24:44 +0100 (CET)
"Lentes, Bernd" wrote:
Hi,
i'm currently changing a bit in my cluster because i realized that my
configuration for a power
On 10/12/21 3:32 PM, Ulrich Windl wrote:
Hi!
I just examined the corosync.service unit in SLES15. It contains:
# /usr/lib/systemd/system/corosync.service
[Unit]
Description=Corosync Cluster Engine
Documentation=man:corosync man:corosync.conf man:corosync_overview
On 9/3/21 10:09 AM, ?? via Users wrote:
HELLO!
?0?2 ?0?2 I built a two node corosync + pacemaker cluster?? and the main end runs on
node0. There are two network ports with the same network segment IP on node0. I
This need attention a bit.
"Usually not a good idea to connect two
On 7/9/21 3:56 PM, Ulrich Windl wrote:
[...]
h19 kernel: Out of memory: Killed process 6838 (corosync) total-vm:261212kB,
anon-rss:31444kB, file-rss:7700kB, shmem-rss:121872kB
I doubt that was the best possible choice ;-)
The dead corosync caused the DC (h18) to fence h19 (which was
On 6/16/21 3:03 PM, Andrei Borzenkov wrote:
We thought that access to storage was restored, but one step was
missing so devices appeared empty.
At this point I tried to restart the pacemaker. But as soon as I
stopped pacemaker SBD rebooted nodes ‑ which is logical, as quorum was
now lost.
On 3/1/21 7:17 PM, Ulrich Windl wrote:
Hi!
I have a question about the VirtualDomain RA (as in SLES15 SP2):
Why does the RA "undefine", then "create" a domain instead of just "start"ing a
domain?
I mean: Assuming that an "installation" does "define" the domains, why bother with configuration
On 1/22/21 6:58 PM, Ulrich Windl wrote:
Roger Zhou schrieb am 22.01.2021 um 11:26 in Nachricht
<8dcd53e2-b65b-aafe-ae29-7bdeea3b8...@suse.com>:
On 1/22/21 5:45 PM, Ulrich Windl wrote:
Roger Zhou schrieb am 22.01.2021 um 10:18 in Nachricht
:
Could be the naming of lv
On 1/22/21 5:45 PM, Ulrich Windl wrote:
Roger Zhou schrieb am 22.01.2021 um 10:18 in Nachricht
:
Could be the naming of lvmlockd and virtlockd mislead you, I guess.
I agree that there is one "virtlockd" name in the resources that refers to
lvmlockd. That is confusin
On 1/22/21 4:17 PM, Ulrich Windl wrote:
Gang He schrieb am 22.01.2021 um 09:13 in Nachricht
<1fd1c07d-d12c-fea9-4b17-90a977fe7...@suse.com>:
Hi Ulrich,
I reviewed the crm configuration file, there are some comments as below,
1) lvmlockd resource is used for shared VG, if you do not plan to
On 1/13/21 3:31 PM, Ulrich Windl wrote:
Roger Zhou schrieb am 13.01.2021 um 05:32 in Nachricht
<97ac2305-85b4-cbb0-7133-ac1372143...@suse.com>:
On 1/12/21 4:23 PM, Ulrich Windl wrote:
Hi!
Before setting up our first pacemaker cluster we thought one low-speed
redundant network
On 1/12/21 4:23 PM, Ulrich Windl wrote:
Hi!
Before setting up our first pacemaker cluster we thought one low-speed
redundant network would be good in addition to the normal high-speed network.
However as is seems now (SLES15 SP2) there is NO reasonable RRP mode to drive
such a configuration
Here is a tool intend to standardize the approach to simulate split-brain
https://software.opensuse.org/package/python3-cluster-preflight-check
After installation, simply run the comand:
`ha-cluster-preflight-check --split-brain-iptables`
Thanks,
Roger
On 12/17/20 4:14 PM, Gabriele Bulfon
On 12/16/20 5:06 PM, Ulrich Windl wrote:
Hi!
(I changed the subject of the thread)
VirtualDomain seems to be broken, as it does not handle a failed live-,igration
correctly:
With my test-VM running on node h16, this happened when I tried to move it away
(for testing):
Dec 16 09:28:46 h19
Hi Ulrich,
Sounds reasonable and handy! Can you create the github issue to track this?
Thanks,
Roger
On 11/30/20 8:47 PM, Ulrich Windl wrote:
Hi!
Would would users of crm shell think about this enhancement proposal:
crm configure grep
That command would search the configuration for any
On 12/1/20 4:03 PM, Ulrich Windl wrote:
Ken Gaillot schrieb am 30.11.2020 um 19:52 in Nachricht
:
...
Though there's nothing wrong with putting all nodes in standby. Another
alternative would be to set the stop-all-resources cluster property.
Hi Ken,
thanks for the valuable feedback!
On 12/8/20 6:48 PM, Strahil Nikolov wrote:
Nope,
but if you don't use clustered FS, you could also use plain LVM + tags.
As far as I know you need dlm and clvmd for clustered FS.
FYI, clvmd is dropped since lvm2 v2_03, and is replaced by lvmlockd. BTW,
lvmlockd (or its precedent clvmd) is
Great news for the new version, first of all!
On 12/8/20 8:12 PM, Klaus Wenninger wrote:
On 12/8/20 11:51 AM, Klaus Wenninger wrote:
On 12/3/20 9:29 AM, Reid Wahl wrote:
On Thu, Dec 3, 2020 at 12:03 AM Ulrich Windl
[...]
‑ add robustness against misconfiguration / improve documentation
Can you create the Github Issues before we lose tracking? Thank you Ulrich!
https://github.com/ClusterLabs/crmsh/issues
BR,
Roger
On 11/20/20 2:50 PM, Ulrich Windl wrote:
Hi!
Setting up a new cluster with SLES15 SP2, I'm wondering: "crm node status"
displays XML. Is that the way it should
On 6/15/20 3:44 PM, Ulrich Windl wrote:
Strahil Nikolov schrieb am 12.06.2020 um 14:00 in
Nachricht
<22726_1591963256_5EE36E78_22726_156_1_03FA2901-B9CC-4CE7-8952-283A864E1C72@yaho
.com>:
Out of curiosity , are you running it on sles/opensuse?
I think it is easier with 'crm cluster
On 5/20/20 2:50 PM, Ulrich Windl wrote:
Hi!
I have a performance question regarding delay for reading blocks in a PV Xen VM.
Forst a little background: Originally to monitor NFS outages, I developed a tool
"iotwatch" (short: IOTW) that reads the first block of a block device or file
(or
On 12/24/19 11:48 AM, Jerry Kross wrote:
> Hi,
> The pacemaker cluster manages a 2 node database cluster configured to use 3
> iscsi disk targets in its stonith configuration. The pacemaker cluster was
> put
> in maintenance mode but we see SBD writing to the system logs. And just after
>
On 11/19/19 4:51 PM, Илья Насонов wrote:
> Hello!
>
> Configured a cluster (2-node DRBD+DLM+CFS2) and it works.
>
> I heard the opinion that OCFS2 file system is better. Found an old
> cluster setup
> description:https://wiki.clusterlabs.org/wiki/Dual_Primary_DRBD_%2B_OCFS2
>
> but as I
On 11/7/19 1:55 AM, Andrei Borzenkov wrote:
> 06.11.2019 18:55, Ken Gaillot пишет:
>> On Wed, 2019-11-06 at 08:04 +0100, Ulrich Windl wrote:
>> Ken Gaillot schrieb am 05.11.2019 um
>> 16:05 in
>>>
>>> Nachricht
>>> :
Coincidentally, the documentation for the pcmk_host_check default
On 11/3/19 12:56 AM, wf...@niif.hu wrote:
> Andrei Borzenkov writes:
>
>> According to documentation, pcmk_host_list is used only if
>> pcmk_host_check=static-list which is not default, by default pacemaker
>> queries agent for nodes it can fence and fence_scsi does not return
>> anything.
>
On 10/30/19 6:17 AM, Eric Robinson wrote:
> If I have an LV as a backing device for a DRBD disk, can someone explain
> why I need an LVM filter? It seems to me that we would want the LV to be
> always active under both the primary and secondary DRBD devices, and
> there should be no need or
On 10/29/19 12:30 PM, Andrei Borzenkov wrote:
>> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO: Activating volume group vg0
>> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO: Reading all physical
>> volumes. This may take a while... Found volume group "vmspace" using
>> metadata type
On 10/21/19 12:28 AM, Valentin Vidić wrote:
> On Sun, Oct 20, 2019 at 09:24:31PM +0530, Dileep V Nair wrote:
>> I am confused about the best way to stop pacemaker on both nodes of a
>> two node cluster. The options I know of are
>> 1. Put the cluster in Maintenance Mode, stop the
On 10/16/19 3:19 PM, Ulrich Windl wrote:
>>>> Roger Zhou schrieb am 16.10.2019 um 08:54 in Nachricht
> :
>> Hi Bernd,
>>
>> Apart from Ken's insights.
>>
>> I try to put it simple between systemd vs. pacemaker:
>>
>> pacemaker
Hi Bernd,
Apart from Ken's insights.
I try to put it simple between systemd vs. pacemaker:
pacemaker does manage dependencies among nodes, well, systemd just not.
Cheers,
Roger
On 10/16/19 5:16 AM, Ken Gaillot wrote:
> On Tue, 2019-10-15 at 21:35 +0200, Lentes, Bernd wrote:
>> Hi,
>>
>> i'm a
On 10/9/19 3:28 PM, Andrei Borzenkov wrote:
> What happens if both interconnect and shared device is lost by node? I
> assume node will reboot, correct?
>
From my understanding from Pacemaker integration feature in `man sbd`
Yes, sbd will do self-fence upon lose access to sbd disk when the
In addition to the admin guide, there are some more advanced articles
about the internals:
https://lwn.net/Articles/674085/
https://www.kernel.org/doc/Documentation/driver-api/md/md-cluster.rst
Cheers,
Roger
On 10/10/19 4:27 PM, Gang He wrote:
> Hello Ulrich
>
> Cluster MD belongs to SLE HA
On 8/12/19 9:24 PM, Klaus Wenninger wrote:
[...]
> If you shutdown solely pacemaker one-by-one on all nodes
> and these shutdowns are considered graceful then you are
> not gonna experience any reboots (e.g. 3 node cluster).
While revisit what you said, then run `systemctl stop pacemaker`
On 8/12/19 2:48 PM, Ulrich Windl wrote:
Andrei Borzenkov schrieb am 09.08.2019 um 18:40 in
> Nachricht <217d10d8-022c-eaf6-28ae-a4f58b2f9...@gmail.com>:
>> 09.08.2019 16:34, Yan Gao пишет:
[...]
>>
>> Lack of cluster wide shutdown mode was mentioned more than once on this
>> list. I
On 8/9/19 3:39 PM, Jan Friesse wrote:
> Roger Zhou napsal(a):
>>
>> On 8/9/19 2:27 PM, Roger Zhou wrote:
>>>
>>> On 7/29/19 12:24 AM, Andrei Borzenkov wrote:
>>>> corosync.service sets StopWhenUnneded=yes which normally stops it when
>&g
On 8/9/19 2:27 PM, Roger Zhou wrote:
>
> On 7/29/19 12:24 AM, Andrei Borzenkov wrote:
>> corosync.service sets StopWhenUnneded=yes which normally stops it when
>> pacemaker is shut down.
One more thought,
Make sense to add "RefuseManualStop=true" to pacemaker.se
On 7/29/19 12:24 AM, Andrei Borzenkov wrote:
> corosync.service sets StopWhenUnneded=yes which normally stops it when
> pacemaker is shut down.
`systemctl stop corosync.service` is the right command to stop those
cluster stack.
It stops pacemaker and corosync-qdevice first, and stop SBD too.
On 7/25/19 1:33 AM, Ken Gaillot wrote:
> Hi all,
>
> A recent bugfix (clbz#5386) brings up a question.
>
> A node may receive notification of its own fencing when fencing is
> misconfigured (for example, an APC switch with the wrong plug number)
> or when fabric fencing is used that doesn't
On 7/11/19 2:15 AM, Michael Powell wrote:
> Thanks to you and Andrei for your responses. In our particular situation, we
> want to be able to operate with either node in stand-alone mode, or with both
> nodes protected by HA. I did not mention this, but I am working on upgrading
> our
David settled a new home for it more than two years ago
https://pagure.io/dlm
Cheers,
Roger
On 5/27/19 5:04 PM, Gang He wrote:
Hello Guys,
As the subject said, I want to download the source code of libdlm, to see its
git log changes.
libdm is used to build dlm_controld, dlm_stonith,
The following command will give you the detailed information:
crm ra info stonith:external/vcenter
Hope it is useful.
Cheers,
Roger
On 12/14/18 12:29 AM, Dileep V Nair wrote:
Hi,
I am using pacemaker for my clusters and shared sbd disk as the Stonith
mechanism. Now I have an issue
On 07/27/2017 09:20 PM, Ulrich Windl wrote:
Hi!
I think it will work, because the cluster does not monitor the PVs or prtition
or LUNs. It just checks whether you can activate the LVs (i.e.: the VG). That's
what I know...
Regards,
Ulrich
lejeczek schrieb am
On 12/22/2015 10:33 AM, Tejas Rao wrote:
On 12/21/2015 20:50, Aaron Knister wrote:
[...]
I'm curious now, Redhat doesn't support SW raid failover? I did some
googling and found this:
https://access.redhat.com/solutions/231643
While I can't read the solution I have to figure that they're
TaMen说我挑食 <974120...@qq.com>,
You'd better compose your email title with a word like, JUNK or TEST, to
avoid misleading people here.
Digimer,
You are really nice! It is suspicious to me this user just to send a
junk email to confirm the subscription not in digest format ;)
Regards,
Roger
46 matches
Mail list logo