Re: [ClusterLabs] poor performance for large resource

2024-11-04 Thread chenzu...@gmail.com

Hi Miroslav,
Thank you for your helpful suggestions! I followed your advice and used the pcs 
command with the -f option to create the cluster CIB configuration, and then 
pushed it using pcs cluster cib-push.
I'm happy to report that the configuration time has been significantly reduced! 
The resource configuration time has been reduced from the original 2 hours and 
31 minutes to just 31 minutes.  I appreciate your help in pointing me in the 
right direction.
Thanks again for your assistance!
Best regards,
Zufei Chen

attachment:
new create bash: pcs_create.sh

---
Message: 1
Date: Thu, 24 Oct 2024 14:01:57 +0200
From: Miroslav Lisik 
To: users@clusterlabs.org
Subject: Re: [ClusterLabs] poor performance for large resource
configuration
Message-ID: 
Content-Type: text/plain; charset=UTF-8; format=flowed
 
On 10/21/24 13:07, zufei chen wrote:
> Hi all,
>
> background?
>
>  1. lustre(2.15.5) + corosync(3.1.5) + pacemaker(2.1.0-8.el8) +
> pcs(0.10.8)
>  2. there are 11 nodes in total, divided into 3 groups. If a node
> fails within a group, the resources can only be taken over by
> nodes within that group.
>  3. Each node has 2 MDTs and 16 OSTs.
>
> Issues:
>
>  1. The resource configuration time progressively increases. the
> second mdt-0? cost only???8s?the last?ost-175 cost??1min:37s
>  2. The total time taken for the configuration is approximately 2
> hours and 31 minutes. Is there a way to improve it?
>
>
> attachment:
> create bash: pcs_create.sh
> create log:?pcs_create.log
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
Hi,
 
you could try to create cluster CIB configuration with pcs commands on a
file using the '-f' option and then push it to the pacemaker all at once.
 
pcs cluster cib > original.xml
cp original.xml new.xml
pcs -f new.xml 
...
...
pcs cluster cib-push new.xml diff-against=original.xml
 
And then wait for the cluster to settle into stable state:
 
crm_resource --wait
 
Or there is pcs command since version v0.11.8:
 
pcs status wait []
 
I hope this will help you to improve the performance.
 
Regards,
Miroslav
 
 
 
--
 
Subject: Digest Footer
 
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
 
ClusterLabs home: https://www.clusterlabs.org/
 
 
--
 
End of Users Digest, Vol 117, Issue 24
**


pcs_create.sh
Description: Binary data
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Lustre MDT/OST Mount Failures During Virtual Machine Reboot with Pacemaker

2025-03-14 Thread chenzu...@gmail.com

Thank you for your advice.

The reason I understand is as follows: 
During reboot, both the system and Pacemaker will unmount the Lustre resource 
simultaneously. 
If the system unmounts first and Pacemaker unmounts afterward, Pacemaker will 
immediately return success. 
However, at this point, the system's unmounting process is not yet complete, 
causing Pacemaker to mount on the target end, which triggers this issue.

My current modification is as follows: 
Add the following lines to the file 
`/usr/lib/systemd/system/resource-agents-deps.target`:
```
After=remote-fs.target  
Before=shutdown.target reboot.target halt.target
```

After making this modification, the issue no longer occurs during reboot.




chenzu...@gmail.com
 
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Investigation of Corosync Heartbeat Loss: Simulating Network Failures with Redundant Network Configuration

2025-03-14 Thread chenzu...@gmail.com

Background: 
There are 11 physical machines, with two virtual machines running on each 
physical machine.
lustre-mds-nodexx runs the Lustre MDS server, and lustre-oss-nodexx runs the 
Lustre OSS service.
Each virtual machine is directly connected to two network interfaces, service1 
and service2.
Pacemaker is used to ensure high availability of the Lustre services.
lustre(2.15.5) + corosync(3.1.5) + pacemaker(2.1.0-8.el8) + pcs(0.10.8)

Issue: During testing, the network interface service1 on lustre-oss-node30 and 
lustre-oss-node40 was repeatedly brought up and down every 1 second (to 
simulate a network failure).
The Corosync logs showed that heartbeats were lost, triggering a fencing action 
that powered off the nodes with lost heartbeats.
Given that Corosync is configured with redundant networks, why did the 
heartbeat loss occur? Is it due to a configuration issue, or is Corosync not 
designed to handle this scenario?

Other:
The configuration of corosync.conf can be found in the attached file 
corosync.conf.
Other relevant information is available in the attached file log.txt.
The script used for the up/down testing is attached as ip_up_and_down.sh.





chenzu...@gmail.com


log.txt
Description: Binary data


ip_up_and_down.sh
Description: Binary data


corosync.conf
Description: Binary data
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Lustre MDT/OST Mount Failures During Virtual Machine

2025-03-14 Thread chenzu...@gmail.com

Thank you for your advice.

The reason I understand is as follows: 
During reboot, both the system and Pacemaker will unmount the Lustre resource 
simultaneously. 
If the system unmounts first and Pacemaker unmounts afterward, Pacemaker will 
immediately return success. 
However, at this point, the system's unmounting process is not yet complete, 
causing Pacemaker to mount on the target end, which triggers this issue.

My current modification is as follows: 
Add the following lines to the file 
`/usr/lib/systemd/system/resource-agents-deps.target`:
```
After=remote-fs.target  
Before=shutdown.target reboot.target halt.target
```

After making this modification, the issue no longer occurs during reboot.




chenzu...@gmail.com
 
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Lustre MDT/OST Mount Failures During Virtual Machine Reboot with Pacemaker

2025-03-02 Thread chenzu...@gmail.com
1. Background:
There are three physical servers, each running a KVM virtual machine. The 
virtual machines host Lustre services (MGS/MDS/OSS). Pacemaker is used to 
ensure high availability of the Lustre services.
lustre(2.15.5) + corosync(3.1.5) + pacemaker(2.1.0-8.el8) + pcs(0.10.8)
2. Problem:
When a reboot command is issued on one of the virtual machines, the MDT/OST 
resources are taken over by the virtual machines on other nodes. However, the 
mounting of these resources fails during the switch (Pacemaker attempts to 
mount multiple times and eventually succeeds).
Workaround: Before executing the reboot command, run pcs node standby 
 to move the resources away.
Question: I would like to know if this is an inherent issue with Pacemaker?
3. Analysis:
From the log analysis, it appears that the MDT/OST resources are being mounted 
on the target node before the unmount process is completed on the source node. 
The Multiple Mount Protection (MMP) detects that the source node has updated 
the sequence number, which causes the mount operation to fail on the target 
node.
4. Logs:
Node 28 (Source Node):
Tue Feb 18 23:46:31 CST 2025reboot

ll /dev/disk/by-id/virtio-ost-node28-3-36
lrwxrwxrwx 1 root root 9 Feb 18 23:47 /dev/disk/by-id/virtio-ost-node28-3-36 -> 
../../vdy

Tue Feb 18 23:46:31 CST 2025
* ost-36_start_0 on lustre-oss-node29 'error' (1): call=769, status='complete', 
exitreason='Couldn't mount device [/dev/disk/by-id/virtio-ost-node28-3-36] as 
/lustre/ost-36', last-rc-change='Tue Feb 18 23:46:32 2025', queued=0ms, 
exec=21472ms

Feb 18 23:46:31 lustre-oss-node28 systemd[1]: Unmounting /lustre/ost-36...
Feb 18 23:46:31 lustre-oss-node28 kernel: LDISKFS-fs warning (device vdy): 
kmmpd:186: czf MMP failure info: epoch:6609375025013, seq: 37, last update 
time: 1739893591, last update node: lustre-oss-node28, last update device: vdy
Feb 18 23:46:32 lustre-oss-node28 Filesystem(ost-36)[19748]: INFO: Running stop 
for /dev/disk/by-id/virtio-ost-node28-3-36 on /lustre/ost-36
Feb 18 23:46:32 lustre-oss-node28 pacemaker-controld[1700]: notice: Result of 
stop operation for ost-36 on lustre-oss-node28: ok
Feb 18 23:46:34 lustre-oss-node28 kernel: LDISKFS-fs warning (device vdy): 
kmmpd:258: czf set mmp seq clean
Feb 18 23:46:34 lustre-oss-node28 kernel: LDISKFS-fs warning (device vdy): 
kmmpd:258: czf MMP failure info: epoch:6612033802827, seq: 4283256144, last 
update time: 1739893594, last update node: lustre-oss-node28, last update 
device: vdy
Feb 18 23:46:34 lustre-oss-node28 systemd[1]: Unmounted /lustre/ost-36.

Node 29 (Target Node):
/dev/disk/by-id/virtio-ost-node28-3-36 -> ../../vdt

Feb 18 23:46:32 lustre-oss-node29 Filesystem(ost-36)[451114]: INFO: Running 
start for /dev/disk/by-id/virtio-ost-node28-3-36 on /lustre/ost-36
Feb 18 23:46:32 lustre-oss-node29 kernel: LDISKFS-fs warning (device vdt): 
ldiskfs_multi_mount_protect:350: MMP interval 42 higher than expected, please 
wait.
Feb 18 23:46:53 lustre-oss-node29 kernel: czf, not equel, Current time: 
23974372799987 ns, 37,4283256144
Feb 18 23:46:53 lustre-oss-node29 kernel: LDISKFS-fs warning (device vdt): 
ldiskfs_multi_mount_protect:364: czf MMP failure info: epoch:23974372801877, 
seq: 4283256144, last update time: 1739893594, last update node: 
lustre-oss-node28, last update device: vdy



chenzu...@gmail.com
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Investigation of Corosync Heartbeat Loss: Simulating Network Failures with Redundant Network Configuration

2025-05-27 Thread chenzu...@gmail.com


The server-side configuration IP addresses are similar and belong to the same 
subnet:
lustre-mds-node32
service1: 10.255.153.236
service2: 10.255.153.237
lustre-oss-node32
service1: 10.255.153.238
service2: 10.255.153.239
lustre-mds-node40
service1: 10.255.153.240
service2: 10.255.153.241
lustre-oss-node40
service1: 10.255.153.242
service2: 10.255.153.243
lustre-mds-node41
service1: 10.255.153.244
service2: 10.255.153.245
lustre-oss-node41
service1: 10.255.153.246
service2: 10.255.153.247
Root Cause
The root cause of the issue is that messages sent to service2 fail to receive a 
reply from the correct interface. Specifically, replies are being sent from 
service1 instead of service2, which leads to communication failures.
Solution
The solution involves configuring policy-based routing on the server side, 
similar to the ARP flux issue for MR node mentioned in the 
https://wiki.lustre.org/LNet_Router_Config_Guide.





chenzu...@gmail.com
 
From: users-request
Date: 2025-03-14 17:48
To: users
Subject: Users Digest, Vol 122, Issue 3
Send Users mailing list submissions to
users@clusterlabs.org
 
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.clusterlabs.org/mailman/listinfo/users
or, via email, send a message with subject or body 'help' to
users-requ...@clusterlabs.org
 
You can reach the person managing the list at
users-ow...@clusterlabs.org
 
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Users digest..."
 
 
Today's Topics:
 
   1. Investigation of Corosync Heartbeat Loss: Simulating Network
  Failures with Redundant Network Configuration (chenzu...@gmail.com)
 
 
--
 
Message: 1
Date: Fri, 14 Mar 2025 17:48:22 +0800
From: "chenzu...@gmail.com" 
To: users 
Subject: [ClusterLabs] Investigation of Corosync Heartbeat Loss:
Simulating Network Failures with Redundant Network Configuration
Message-ID: <2025031417480017156...@gmail.com>
Content-Type: text/plain; charset="gb2312"
 
 
Background: 
There are 11 physical machines, with two virtual machines running on each 
physical machine.
lustre-mds-nodexx runs the Lustre MDS server, and lustre-oss-nodexx runs the 
Lustre OSS service.
Each virtual machine is directly connected to two network interfaces, service1 
and service2.
Pacemaker is used to ensure high availability of the Lustre services.
lustre(2.15.5) + corosync(3.1.5) + pacemaker(2.1.0-8.el8) + pcs(0.10.8)
 
Issue: During testing, the network interface service1 on lustre-oss-node30 and 
lustre-oss-node40 was repeatedly brought up and down every 1 second (to 
simulate a network failure).
The Corosync logs showed that heartbeats were lost, triggering a fencing action 
that powered off the nodes with lost heartbeats.
Given that Corosync is configured with redundant networks, why did the 
heartbeat loss occur? Is it due to a configuration issue, or is Corosync not 
designed to handle this scenario?
 
Other?
The configuration of corosync.conf can be found in the attached file 
corosync.conf.
Other relevant information is available in the attached file log.txt.
The script used for the up/down testing is attached as ip_up_and_down.sh.
 
 
 
 
 
chenzu...@gmail.com
-- next part --
An HTML attachment was scrubbed...
URL: 
<https://lists.clusterlabs.org/pipermail/users/attachments/20250314/e36f13fe/attachment.htm>
-- next part --
A non-text attachment was scrubbed...
Name: log.txt
Type: application/octet-stream
Size: 25107 bytes
Desc: not available
URL: 
<https://lists.clusterlabs.org/pipermail/users/attachments/20250314/e36f13fe/attachment.obj>
-- next part --
A non-text attachment was scrubbed...
Name: ip_up_and_down.sh
Type: application/octet-stream
Size: 209 bytes
Desc: not available
URL: 
<https://lists.clusterlabs.org/pipermail/users/attachments/20250314/e36f13fe/attachment-0001.obj>
-- next part --
A non-text attachment was scrubbed...
Name: corosync.conf
Type: application/octet-stream
Size: 1863 bytes
Desc: not available
URL: 
<https://lists.clusterlabs.org/pipermail/users/attachments/20250314/e36f13fe/attachment-0002.obj>
 
--
 
Subject: Digest Footer
 
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
 
ClusterLabs home: https://www.clusterlabs.org/
 
 
--
 
End of Users Digest, Vol 122, Issue 3
*
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Incorrect Node Fencing Issue in Lustre Cluster During Network Failure Simulation

2025-06-09 Thread chenzu...@gmail.com
17:54:50 [1412] lustre-mds-node40 corosync notice  [QUORUM] Sync 
members[3]: 1 2 3
Jun 09 17:54:50 [1412] lustre-mds-node40 corosync notice  [QUORUM] Sync 
left[1]: 4
Jun 09 17:54:50 [1412] lustre-mds-node40 corosync notice  [TOTEM ] A new 
membership (1.45) was formed. Members left: 4
Jun 09 17:54:50 [1412] lustre-mds-node40 corosync notice  [TOTEM ] Failed to 
receive the leave message. failed: 4


Jun 09 17:54:29 [8913] lustre-mds-node41 corosync info[KNET  ] link: host: 
1 link: 0 is down
Jun 09 17:54:29 [8913] lustre-mds-node41 corosync info[KNET  ] host: host: 
1 (passive) best link: 1 (pri: 1)
Jun 09 17:54:30 [8913] lustre-mds-node41 corosync info[KNET  ] link: host: 
1 link: 1 is down
Jun 09 17:54:30 [8913] lustre-mds-node41 corosync info[KNET  ] host: host: 
1 (passive) best link: 1 (pri: 1)
Jun 09 17:54:30 [8913] lustre-mds-node41 corosync warning [KNET  ] host: host: 
1 has no active links
Jun 09 17:54:36 [8913] lustre-mds-node41 corosync notice  [TOTEM ] Token has 
not been received in 8475 ms
Jun 09 17:54:39 [8913] lustre-mds-node41 corosync notice  [TOTEM ] A processor 
failed, forming new configuration: token timed out (11300ms), waiting 13560ms 
for consensus.
Jun 09 17:54:47 [8913] lustre-mds-node41 corosync info[KNET  ] rx: host: 1 
link: 1 is up
Jun 09 17:54:47 [8913] lustre-mds-node41 corosync info[KNET  ] link: 
Resetting MTU for link 1 because host 1 joined
Jun 09 17:54:47 [8913] lustre-mds-node41 corosync info[KNET  ] host: host: 
1 (passive) best link: 1 (pri: 1)
Jun 09 17:54:47 [8913] lustre-mds-node41 corosync info[KNET  ] pmtud: 
Global data MTU changed to: 1397
Jun 09 17:54:50 [8913] lustre-mds-node41 corosync notice  [QUORUM] Sync 
members[3]: 1 2 3
Jun 09 17:54:50 [8913] lustre-mds-node41 corosync notice  [QUORUM] Sync 
left[1]: 4
Jun 09 17:54:50 [8913] lustre-mds-node41 corosync notice  [TOTEM ] A new 
membership (1.45) was formed. Members left: 4
Jun 09 17:54:50 [8913] lustre-mds-node41 corosync notice  [TOTEM ] Failed to 
receive the leave message. failed: 4


Jun 09 17:54:28 [8900] lustre-mds-node42 corosync info[KNET  ] link: host: 
1 link: 0 is down
Jun 09 17:54:28 [8900] lustre-mds-node42 corosync info[KNET  ] host: host: 
1 (passive) best link: 1 (pri: 1)
Jun 09 17:54:30 [8900] lustre-mds-node42 corosync info[KNET  ] link: host: 
1 link: 1 is down
Jun 09 17:54:30 [8900] lustre-mds-node42 corosync info[KNET  ] host: host: 
1 (passive) best link: 1 (pri: 1)
Jun 09 17:54:30 [8900] lustre-mds-node42 corosync warning [KNET  ] host: host: 
1 has no active links
Jun 09 17:54:36 [8900] lustre-mds-node42 corosync notice  [TOTEM ] Token has 
not been received in 8475 ms
Jun 09 17:54:45 [8900] lustre-mds-node42 corosync info[KNET  ] rx: host: 1 
link: 1 is up
Jun 09 17:54:45 [8900] lustre-mds-node42 corosync info[KNET  ] link: 
Resetting MTU for link 1 because host 1 joined
Jun 09 17:54:45 [8900] lustre-mds-node42 corosync info[KNET  ] host: host: 
1 (passive) best link: 1 (pri: 1)
Jun 09 17:54:45 [8900] lustre-mds-node42 corosync info[KNET  ] pmtud: 
Global data MTU changed to: 1397
Jun 09 17:54:50 [8900] lustre-mds-node42 corosync notice  [QUORUM] Sync 
members[3]: 1 2 3
Jun 09 17:54:50 [8900] lustre-mds-node42 corosync notice  [QUORUM] Sync 
left[1]: 4
Jun 09 17:54:50 [8900] lustre-mds-node42 corosync notice  [TOTEM ] A new 
membership (1.45) was formed. Members left: 4
Jun 09 17:54:50 [8900] lustre-mds-node42 corosync notice  [TOTEM ] Failed to 
receive the leave message. failed: 4




/etc/corosync/corosync.conf
totem {
version: 2
cluster_name: mds_cluster
transport: knet
crypto_cipher: aes256
crypto_hash: sha256
cluster_uuid: 11f2c4097ac44d5981769a9ed579c99e
token: 1
}

nodelist {
node {
ring0_addr: 10.255.153.240
ring1_addr: 10.255.153.241
name: lustre-mds-node40
nodeid: 1
}

node {
ring0_addr: 10.255.153.244
ring1_addr: 10.255.153.245
name: lustre-mds-node41
nodeid: 2
}

node {
ring0_addr: 10.255.153.248
ring1_addr: 10.255.153.249
name: lustre-mds-node42
nodeid: 3
}

node {
ring0_addr: 10.255.153.236
ring1_addr: 10.255.153.237
name: lustre-mds-node32
nodeid: 4
}
}

quorum {
provider: corosync_votequorum
}

logging {
to_logfile: yes
logfile: /var/log/cluster/corosync.log
to_syslog: yes
timestamp: on
}






chenzu...@gmail.com
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/