Hi, We have a fairly standard setup with 4 nodes, 1 primary and 3 secondary (2 geo clusters (of 2 nodes each)). The setup uses LVM volumes as drbd lower devices. All is managed by pacemaker using linbit & pacemaker ocf resources.
DRBD kernel version is 9.1.5. Util is 9.20.2. I must say we deploy on AWS nodes, using EBS for block devices. Once DRBD is promoted active a mount is created in active node (this is managed by ocf Filesystem agent). The FS is of type XFS. Sometimes (1/30 maybe, after making failovers between the geo clusters, so having the primary devices swapped), we observed an error on the Filesystem OCF agent: *stderr [ mount: mount /dev/drbd0 on /mnt/audio failed: Resource temporarily unavailable* This happen even if DRBD is promoted primary. I dump the logs. Anyone know what could be the reason? If some verbose can be activated we could do it, regards Mar 13 09:59:18 ip-172-31-12-232 kernel: drbd audiodata: role( Secondary -> Primary ) Mar 13 09:59:18 ip-172-31-12-232 kernel: drbd audiodata: Preparing cluster-wide state change 2445677710 (1->3 499/145) Mar 13 09:59:18 ip-172-31-12-232 crmd[1571]: notice: Result of promote operation for audiodata on ip-172-31-12-232: 0 (ok) Mar 13 09:59:18 ip-172-31-12-232 crmd[1571]: notice: Initiating notify operation audiodata_post_notify_promote_0 on ip-172-31-12-173 Mar 13 09:59:18 ip-172-31-12-232 kernel: drbd audiodata ip-172-31-12-173: Aborting local state change 2445677710 to yield to remote state change 1509202161. Mar 13 09:59:20 ip-172-31-12-232 kernel: drbd audiodata: Aborting cluster-wide state change 2445677710 (2054ms) rv = -19 Mar 13 09:59:20 ip-172-31-12-232 kernel: drbd audiodata ip-172-31-12-173: Preparing remote state change 1509202161 Mar 13 09:59:20 ip-172-31-12-232 kernel: drbd audiodata ip-172-31-12-173: Aborting remote state change 1509202161 Mar 13 09:59:20 ip-172-31-12-232 kernel: drbd audiodata/0 drbd0 ip-172-31-12-173: repl( WFBitMapS -> SyncSource ) Mar 13 09:59:20 ip-172-31-12-232 kernel: drbd audiodata/0 drbd0 ip-172-31-12-173: Began resync as SyncSource (will sync 2076 KB [519 bits set]). Mar 13 09:59:20 ip-172-31-12-232 kernel: drbd audiodata: Preparing cluster-wide state change 522239102 (1->3 499/145) Mar 13 09:59:21 ip-172-31-12-232 kernel: drbd audiodata/0 drbd0 ip-172-31-12-56: drbd_sync_handshake: Mar 13 09:59:21 ip-172-31-12-232 kernel: drbd audiodata/0 drbd0 ip-172-31-12-56: self 9141832129BF9D9C:0000000000000000:FCAD090A6554F6EA:0000000000000000 bits:0 flags:20 Mar 13 09:59:21 ip-172-31-12-232 kernel: drbd audiodata/0 drbd0 ip-172-31-12-56: peer 9141832129BF9D9C:0000000000000000:FCAD090A6554F6EA:0000000000000000 bits:0 flags:1120 Mar 13 09:59:21 ip-172-31-12-232 kernel: drbd audiodata/0 drbd0 ip-172-31-12-56: uuid_compare()=no-sync by rule=lost-quorum Mar 13 09:59:21 ip-172-31-12-232 kernel: drbd audiodata ip-172-31-12-173: Aborting local state change 522239102 to yield to remote state change 2672355414. Mar 13 09:59:21 ip-172-31-12-232 kernel: drbd audiodata: Aborting cluster-wide state change 522239102 (96ms) rv = -19 Mar 13 09:59:21 ip-172-31-12-232 kernel: drbd audiodata ip-172-31-12-173: Preparing remote state change 2672355414 Mar 13 09:59:21 ip-172-31-12-232 awsvip(audio-awsalias)[8149]: INFO: secondary_private_ip has been successfully brought up (172.31.12.90) Mar 13 09:59:21 ip-172-31-12-232 crmd[1571]: notice: Result of start operation for audio-awsalias on ip-172-31-12-232: 0 (ok) Mar 13 09:59:21 ip-172-31-12-232 crmd[1571]: notice: Initiating notify operation audiodata_post_notify_promote_0 locally on ip-172-31-12-232 Mar 13 09:59:21 ip-172-31-12-232 crmd[1571]: notice: Result of notify operation for audiodata on ip-172-31-12-232: 0 (ok) Mar 13 09:59:21 ip-172-31-12-232 pengine[1570]: notice: * Start audio-fs ( ip-172-31-12-232 ) Mar 13 09:59:21 ip-172-31-12-232 pengine[1570]: notice: * Start audio-cleanup ( ip-172-31-12-232 ) Mar 13 09:59:21 ip-172-31-12-232 pengine[1570]: notice: * Start audio-nginx ( ip-172-31-12-232 ) Mar 13 09:59:21 ip-172-31-12-232 crmd[1571]: notice: Initiating monitor operation audiodata_monitor_5000 on ip-172-31-12-173 Mar 13 09:59:21 ip-172-31-12-232 crmd[1571]: notice: Initiating start operation audio-fs_start_0 locally on ip-172-31-12-232 Mar 13 09:59:21 ip-172-31-12-232 crmd[1571]: notice: Initiating monitor operation audio-awsalias_monitor_5000 locally on ip-172-31-12-232 Mar 13 09:59:22 ip-172-31-12-232 Filesystem(audio-fs)[8661]: INFO: Running start for /dev/drbd0 on /mnt/audio Mar 13 09:59:23 ip-172-31-12-232 kernel: drbd audiodata ip-172-31-12-173: Aborting remote state change 2672355414 Mar 13 09:59:23 ip-172-31-12-232 kernel: drbd audiodata: Preparing cluster-wide state change 145509007 (1->3 499/145) Mar 13 09:59:23 ip-172-31-12-232 kernel: drbd audiodata/0 drbd0 ip-172-31-12-56: drbd_sync_handshake: Mar 13 09:59:23 ip-172-31-12-232 kernel: drbd audiodata/0 drbd0 ip-172-31-12-56: self 9141832129BF9D9C:0000000000000000:FCAD090A6554F6EA:0000000000000000 bits:0 flags:20 Mar 13 09:59:23 ip-172-31-12-232 kernel: drbd audiodata/0 drbd0 ip-172-31-12-56: peer 9141832129BF9D9C:0000000000000000:FCAD090A6554F6EA:0000000000000000 bits:0 flags:1120 Mar 13 09:59:23 ip-172-31-12-232 kernel: drbd audiodata/0 drbd0 ip-172-31-12-56: uuid_compare()=no-sync by rule=lost-quorum Mar 13 09:59:23 ip-172-31-12-232 kernel: drbd audiodata ip-172-31-12-173: Aborting local state change 145509007 to yield to remote state change 1845370428. Mar 13 09:59:23 ip-172-31-12-232 kernel: drbd audiodata: Aborting cluster-wide state change 145509007 (91ms) rv = -19 Mar 13 09:59:23 ip-172-31-12-232 kernel: drbd audiodata ip-172-31-12-173: Preparing remote state change 1845370428 Mar 13 09:59:23 ip-172-31-12-232 kernel: drbd audiodata/0 drbd0 ip-172-31-12-173: updated UUIDs 9141832129BF9D9C:0000000000000000:4C4977DFD426BCE0:FCAD090A6554F6EA Mar 13 09:59:23 ip-172-31-12-232 kernel: drbd audiodata/0 drbd0 ip-172-31-12-173: Resync done (total 2 sec; paused 0 sec; 1036 K/sec) Mar 13 09:59:23 ip-172-31-12-232 kernel: drbd audiodata/0 drbd0 ip-172-31-12-173: pdsk( Inconsistent -> UpToDate ) repl( SyncSource -> Established ) Mar 13 09:59:23 ip-172-31-12-232 kernel: drbd audiodata ip-172-31-12-173: helper command: /sbin/drbdadm unfence-peer Mar 13 09:59:23 ip-172-31-12-232 kernel: drbd audiodata ip-172-31-12-173: helper command: /sbin/drbdadm unfence-peer exit code 0 Mar 13 09:59:25 ip-172-31-12-232 Filesystem(audio-fs)[8661]: ERROR: Couldn't mount device [/dev/drbd0] as /mnt/audio Mar 13 09:59:25 ip-172-31-12-232 lrmd[1568]: notice: audio-fs_start_0:8661:stderr [ mount: mount /dev/drbd0 on /mnt/audio failed: Resource temporarily unavailable ] Mar 13 09:59:25 ip-172-31-12-232 lrmd[1568]: notice: audio-fs_start_0:8661:stderr [ ocf-exit-reason:Couldn't mount device [/dev/drbd0] as /mnt/audio ]
_______________________________________________ Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-user@lists.linbit.com https://lists.linbit.com/mailman/listinfo/drbd-user