Re: [DRBD-user] DRBD 8.0 life cycle
Hi Lars, On 26.08.21 15:03, Lars Ellenberg wrote: As we put on our web page at https://linbit.com/solutions-rfq/ DRBD 8 < 8.4 is end of life. DRBD 8.4 has left "active maintenance" years ago, but needs to be supported while existing customers pay for existing deployments. The "relevant" platform here is RHEL 7, which ends its "Extended Lifecycle Support" in 2026, so that's the end date for our 8.4 support as well. DRBD 8.4 (at least the kernel module) is part of the vanilla Linux kernel. Will DRBD be removed from the upstream kernel or replaced with DRBD 9.x? Is there any ongoing work/progress related to this? Regards, Michael ___ Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-user@lists.linbit.com https://lists.linbit.com/mailman/listinfo/drbd-user
[DRBD-user] DRBD 9: UpToDate but verify reports resource to be almost completely out of sync, no resynchronisation on disconnect/connect however
Hi all, In our three node, single master DRBD cluster we experienced the following situation. Check state: root@host101010:/home/admins# drbdadm status vm vm role:Primary disk:UpToDate host101011 role:Secondary peer-disk:UpToDate host101020 role:Secondary peer-disk:UpToDate Everything looks nice. Let's perform a verification: root@host101010:/home/admins# drbdadm verify vm Running: root@host101010:/home/admins# drbdadm status vm vm role:Primary disk:UpToDate host101011 role:Secondary replication:VerifyS peer-disk:UpToDate done:0.09 host101020 role:Secondary replication:VerifyS peer-disk:UpToDate done:0.11 The results are remarkable: [Wed May 5 11:07:55 2021] drbd vm/0 drbd0 host101011: Online verify done (total 7112 sec; paused 0 sec; 29484 K/sec) [Wed May 5 11:07:55 2021] drbd vm/0 drbd0 host101011: Online verify found 1 4k blocks out of sync! [Wed May 5 11:07:55 2021] drbd vm/0 drbd0 host101011: repl( VerifyS -> Established ) [Wed May 5 11:07:55 2021] drbd vm/0 drbd0 host101011: helper command: /sbin/drbdadm out-of-sync [Wed May 5 11:07:55 2021] drbd vm/0 drbd0 host101011: helper command: /sbin/drbdadm out-of-sync exit code 0 [Wed May 5 11:11:05 2021] drbd vm/0 drbd0 host101020: Online verify done (total 7302 sec; paused 0 sec; 28720 K/sec) [Wed May 5 11:11:05 2021] drbd vm/0 drbd0 host101020: Online verify found 41790818 4k blocks out of sync! [Wed May 5 11:11:05 2021] drbd vm/0 drbd0 host101020: repl( VerifyS -> Established ) [Wed May 5 11:11:05 2021] drbd vm/0 drbd0 host101020: helper command: /sbin/drbdadm out-of-sync [Wed May 5 11:11:05 2021] drbd vm/0 drbd0 host101020: helper command: /sbin/drbdadm out-of-sync exit code 0 Verify claims almost the whole 150G device to be out of sync. Let's perform disconnect/connect: [Wed May 5 11:44:30 2021] drbd vm: Preparing cluster-wide state change 1977682193 (0->1 496/16) [Wed May 5 11:44:30 2021] drbd vm: State change 1977682193: primary_nodes=1, weak_nodes=FFFA [Wed May 5 11:44:30 2021] drbd vm: Committing cluster-wide state change 1977682193 (0ms) [Wed May 5 11:44:30 2021] drbd vm1054 host101011: conn( Connected -> Disconnecting ) peer( Secondary -> Unknown ) [Wed May 5 11:44:30 2021] drbd vm/0 drbd0 host101011: pdsk( UpToDate -> DUnknown ) repl( Established -> Off ) [Wed May 5 11:44:30 2021] drbd vm host101011: ack_receiver terminated [Wed May 5 11:44:30 2021] drbd vm host101011: Terminating ack_recv thread [Wed May 5 11:44:30 2021] drbd vm host101011: Restarting sender thread [Wed May 5 11:44:30 2021] drbd vm host101011: Connection closed [Wed May 5 11:44:30 2021] drbd vm host101011: helper command: /sbin/drbdadm disconnected [Wed May 5 11:44:30 2021] drbd vm host101011: helper command: /sbin/drbdadm disconnected exit code 0 [Wed May 5 11:44:30 2021] drbd vm host101011: conn( Disconnecting -> StandAlone ) [Wed May 5 11:44:30 2021] drbd vm host101011: Terminating receiver thread [Wed May 5 11:44:30 2021] drbd vm: Preparing cluster-wide state change 2927240878 (0->2 496/16) [Wed May 5 11:44:30 2021] drbd vm: State change 2927240878: primary_nodes=1, weak_nodes=FFFE [Wed May 5 11:44:30 2021] drbd vm host101020: Cluster is now split [Wed May 5 11:44:30 2021] drbd vm: Committing cluster-wide state change 2927240878 (0ms) [Wed May 5 11:44:30 2021] drbd vm host101020: conn( Connected -> Disconnecting ) peer( Secondary -> Unknown ) [Wed May 5 11:44:30 2021] drbd vm/0 drbd0 host101020: pdsk( UpToDate -> DUnknown ) repl( Established -> Off ) [Wed May 5 11:44:30 2021] drbd vm host101020: ack_receiver terminated [Wed May 5 11:44:30 2021] drbd vm host101020: Terminating ack_recv thread [Wed May 5 11:44:30 2021] drbd vm host101020: Restarting sender thread [Wed May 5 11:44:30 2021] drbd vm host101020: Connection closed [Wed May 5 11:44:30 2021] drbd vm host101020: helper command: /sbin/drbdadm disconnected [Wed May 5 11:44:30 2021] drbd vm host101020: helper command: /sbin/drbdadm disconnected exit code 0 [Wed May 5 11:44:30 2021] drbd vm host101020: conn( Disconnecting -> StandAlone ) [Wed May 5 11:44:30 2021] drbd vm host101020: Terminating receiver thread [Wed May 5 11:44:33 2021] drbd vm host101011: conn( StandAlone -> Unconnected ) [Wed May 5 11:44:33 2021] drbd vm host101011: Starting receiver thread (from drbd_w_vm [38063]) [Wed May 5 11:44:33 2021] drbd vm host101011: conn( Unconnected -> Connecting ) [Wed May 5 11:44:33 2021] drbd vm host101020: conn( StandAlone -> Unconnected ) [Wed May 5 11:44:33 2021] drbd vm host101020: Starting receiver thread (from drbd_w_vm [38063]) [Wed May 5 11:44:33 2021] drbd vm host101020: conn(
Re: [DRBD-user] Control syncer speed
Hi all, On 31.03.21 11:32, Michael Hierweck wrote: Hi all, with DRBD 9.0.25 we experience problem with the syncer rate. Using default settings our IO system gets overloaded, especially when multiple resources are be resynced in parallel. Neither drbdadm peer-device-options --c-plan-ahead=X --c-max-sync-rate=Y drbdadm peer-device-options --c-plan-ahead=0 --resync-rate=Y seem take any effect even when setting Y to very low values, such as 1M. we ware able to slow down DRBD sync by applying: drbdadm peer-device-options --c-plan-ahead=0 --resync-rate=100 (Yes, 100, not 100M.) Syslog: Resync done (total 1212 sec; paused 0 sec; 43256 K/sec) I cannot explain how this setting fits with the result. Thanks in advance, Michael ___ Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-user@lists.linbit.com https://lists.linbit.com/mailman/listinfo/drbd-user
[DRBD-user] Control syncer speed
Hi all, with DRBD 9.0.25 we experience problem with the syncer rate. Using default settings our IO system gets overloaded, especially when multiple resources are be resynced in parallel. Neither drbdadm peer-device-options --c-plan-ahead=X --c-max-sync-rate=Y drbdadm peer-device-options --c-plan-ahead=0 --resync-rate=Y seem take any effect even when setting Y to very low values, such as 1M. Thanks in advance, Michael ___ Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-user@lists.linbit.com https://lists.linbit.com/mailman/listinfo/drbd-user
[DRBD-user] Is it possible to make DRBD open the backing device with O_DIRECT?
Hi all, is it possible to make DRBD open the backing device with O_DIRECT? Thanks in advance, Michael ___ Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-user@lists.linbit.com https://lists.linbit.com/mailman/listinfo/drbd-user
[DRBD-user] Suspicious Performance Gain
Hi all, we use hardware Raid controllers with BBU and 2.5" SAS3 drives in RAID 10. On top we use LVM fat provisioning. DRBD is 8.4. 1) HWR10 => LVM => XFS 2) HWR10 => LVM => DRBD "C" => QEMU/Virtio => XFS DRBD settings: disk-flushes = no md-flushes = no Virtio settings: cache = none Performing benchmarks with fio (direct=1, sync=1, bs=1k,2k,4k,...,128k) leads to suspicious. While 1) leads to expected results such as 200-400 IOPS on random writes (expected value) 2) leads to about 4 times better results. (I did the same benchmarks some years ago on the same hardware but with an older software stack and remember that DRBD protocol "C" lead to a performance decrease of about factor 2. Factor 2 seems to be a good value for synchronous replication.) What's going wrong here? Thanks in advance, Michael ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] drbd 8.4.7 out-of-sync on disabled host page cache in VM
On 23.04.19 09:28, Michael Hierweck wrote: > On 23.04.19 09:05, Armin Schindler wrote: >> On 20.04.2019 14:38, a...@sysgo.com wrote: >>>> On 13 March 2019 at 11:47 Roland Kammerer >>>> wrote: >>>> >>>> >>>> On Tue, Mar 12, 2019 at 09:08:42AM +0100, Armin Schindler wrote: >>>>> On 3/11/19 1:42 PM, Roland Kammerer wrote: >>>>>> On Mon, Mar 11, 2019 at 11:13:11AM +0100, Armin Schindler wrote: >>>>>>> 2 hosts Debian 9 (stretch) with default DRBD version 8.4.7. >>>>>> >>>>>> Please retry with the current 8.4.11 version of DRBD. You can it from >>>>>> here: >>>>>> https://www.linbit.com/en/drbd-community/drbd-download/ >>>>> >>>>> Okay, thanks. I will test 8.4.11. >>>>> >>>>> Do I need to change/update the tools as well or just the kernel driver? >>>>> I currently use drbd-utils 8.9.10. >>>> >>>> They should be fine. I don't remember any non-corner cases fixes for 8.4 >>>> in drbd-utils. >>> >>> I tried version 8.4.11 and the problem persists. >>> When using Qemu/KVM virtio disk with a caching mode that uses host page >>> cache, >>> or when using just a filesystem like ext4 on (without Qemu/KVM) on the >>> host, the >>> drbd device gets out of sync after a while. > > Same here: > > LVM (thick) => DRBD => Virtio (cache=none or cache=directsync) > > After some weeks of running about 80 VMs on 4 nodes, some of the VM backings > report out of sync > blocks. We are running an active/passive cluster with locally attached > storage. > > We were not able to reproduce this behaviour when using cache="writethrough" > or cache="writeback". > > We are running this setup since 2011/2012. The first years we were fine but > about 3 years ago > we run into serious trouble because out-of-sync blocks lead to damaged file > system (journals). > > The issue was discussed in 2014: > > https://lists.gt.net/drbd/users/25227 > > We love(d) DRBD because of its simplicity and reliability. (Ceph is much more > complex...) > However we wonder whether DRBD can still be considered that kind of "simple > and reliable" it > was some years ago. > > Even if the situation might be introduced by virtio block driver > optimizations some years ago > (no stable pages anymore?) a solution is needed. Remark: I would expect stacks such as "rdb => virtio" and maybe "ZFS ZVOL => virtio" to require stable pages, too, for the same reason as DRBD does: checksum calculation. http://lkml.iu.edu/hypermail/linux/kernel/1511.0/04497.html ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] drbd 8.4.7 out-of-sync on disabled host page cache in VM
On 23.04.19 09:05, Armin Schindler wrote: > On 20.04.2019 14:38, a...@sysgo.com wrote: >>> On 13 March 2019 at 11:47 Roland Kammerer >>> wrote: >>> >>> >>> On Tue, Mar 12, 2019 at 09:08:42AM +0100, Armin Schindler wrote: On 3/11/19 1:42 PM, Roland Kammerer wrote: > On Mon, Mar 11, 2019 at 11:13:11AM +0100, Armin Schindler wrote: >> 2 hosts Debian 9 (stretch) with default DRBD version 8.4.7. > > Please retry with the current 8.4.11 version of DRBD. You can it from > here: > https://www.linbit.com/en/drbd-community/drbd-download/ Okay, thanks. I will test 8.4.11. Do I need to change/update the tools as well or just the kernel driver? I currently use drbd-utils 8.9.10. >>> >>> They should be fine. I don't remember any non-corner cases fixes for 8.4 >>> in drbd-utils. >> >> I tried version 8.4.11 and the problem persists. >> When using Qemu/KVM virtio disk with a caching mode that uses host page >> cache, >> or when using just a filesystem like ext4 on (without Qemu/KVM) on the host, >> the >> drbd device gets out of sync after a while. Same here: LVM (thick) => DRBD => Virtio (cache=none or cache=directsync) After some weeks of running about 80 VMs on 4 nodes, some of the VM backings report out of sync blocks. We are running an active/passive cluster with locally attached storage. We were not able to reproduce this behaviour when using cache="writethrough" or cache="writeback". We are running this setup since 2011/2012. The first years we were fine but about 3 years ago we run into serious trouble because out-of-sync blocks lead to damaged file system (journals). The issue was discussed in 2014: https://lists.gt.net/drbd/users/25227 We love(d) DRBD because of its simplicity and reliability. (Ceph is much more complex...) However we wonder whether DRBD can still be considered that kind of "simple and reliable" it was some years ago. Even if the situation might be introduced by virtio block driver optimizations some years ago (no stable pages anymore?) a solution is needed. Michael ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
[DRBD-user] Complete decoupling of LINSTOR from DRBD (Q1 2019)
Hi all, Linbit announces the complete decoupling of LINSTOR from DRBD (Q1 2019). https://www.linbit.com/en/linstor/ Does this mean Linbit will abandon DRBD? (What technologies might replace or co-exist DBRD in an LINSTOR cluster?) Best regards, Michael ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
[DRBD-user] DRBD on ZVOL
Hi everybody, would anyone tell me which settings regarding disk flushes (barriers, flushes, drain) are recommended when using ZVOL as backing device for DRBD? Are there recommendation towards ZFS settings such as disabling primary or secondary caches? Of course, reliability is more important than speed. Thanks in advance, Michael ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
[DRBD-user] DRBD and LVM Snapshots
Hi all, up to now to use LVM (thick) provisioning to provide virtual disks for VMs: RAID => LVM => LV => DRBD => KVM/QEMU (virtio). We would like to be able to take snapshots from our virtual disks in order to be able to use them as readonly source to consistent backups. Therefore we would need to introduce a LVM thin provisioning layer somewhere. A. TP below DRBD RAID => LVM => LVM-TP => LV => DRBD => KVM/QEMU (virtio) - simple setup - I suppose going back to an older snapshost would this break DRBD and require a complete resync/verify run B. TP on top of DRBD RAID => LVM => LV => DRBD => LVM => LVM-TP => LV => KVM/QEMU (virtio) - complex setup - more layers affect reliability and performance - snapshots are replicated and consistent - additional diskspace for snaphosts is limited by the DRBD volume size an connot be shared across volumes. I would like to discuss the pros and the cons of both approaches while considering internal vs. external metadata. Thanks in advance Michael ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user