Re: [DRBD-user] The Problem of File System Corruption w/DRBD
Il 2021-06-04 15:08 Eric Robinson ha scritto: Those are all good points. Since the three legs of the information security triad are confidentiality, integrity, and availability, this is ultimately a security issue. We all know that information security is not about eliminating all possible risks, as that is an unattainable goal. It is about mitigating risks to acceptable levels. So I guess it boils down to how each person evaluates the risks in their own environment. Over my 38-year career, and especially the past 15 years of using Linux HA, I've seen more filesystem-type issues than the other possible issues you mentioned, so that one tends to feature more prominently on my risk radar. For the very limited goal of protecting from filesystem corruptions, you can use a snapshot/CoW layer as thinlvm. Keep multiple rolling snapshots and you can recover from sudden filesystem corruption. However this is simply move the SPOF down to the CoW layer (thinlvm, which is quite complex by itself and can be considered a stripped-down filesystem/allocator) or up to the application layer (where corruptions are relatively quite common). That said, nowadays a mature filesystem as EXT4 and XFS can be corrupted (barring obscure bugs) only by: - a double mount from different machines; - a direct write to the underlying raw disks; - a serious hardware issue. For what it is worth I am now accustomed to ZFS strong data integrity guarantee, but I fully realize that this does *not* protect from any corruptions scenario by itself, not even on XFS-over-ZVOL-over-DRBD-over-ZFS setups. If anything, a more complex filesystem (and I/O setup) has *greater* chances of exposing uncommon bugs. So: I strongly advise on placing your filesystem over a snapshot layer, but do not expect this to shield from any storage related issue. Regards. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 ___ Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-user@lists.linbit.com https://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD Synchronization Rate Questions
Il 23-12-2019 19:48 cpt...@yahoo.co.jp ha scritto: 1. Can DRBD only provide up to 80 MB/s of network bandwidth per volume during the initial full synchronization? We did an initial full synchronization in DRBD with the following settings: (I use a network with a speed of 1.25 GB/s.) Some years ago, I had similar problems with 10G-BaseT Ethernet until I enabled jumbo frames. After that I get over 600 MB/s for single volume synchronization. So, try enabling jumbo frames on both the servers and any switch used between them. Regards. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 ___ Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-user@lists.linbit.com https://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] ERROR: meta parameter misconfigured, expected clone-max -le 2, but found unset
On 02/10/2017 15:35, Lars Ellenberg wrote:> Usually a result of having (temporarily?) only a "primitive", without corresponding "ms" resource definition in the cib. Once you fixed the config, you should no longer get it, and be able to clear previous fail-counts by doing a "resource cleanup". So it seems the OCF script I can found in /usr/lib/ocf/resource.d/linbit/drbd does not find metadata it expects in the resource definition. However, metadata *are* specified in the resource file. Any suggestions on how to fix the problem? Don't put a "primitive" DRBD definition live without the corresponding "ms" definition. If you need to, populate a "shadow" cib first, and only commit that to "live" once it is fully populated. Hi Lars, thank you for pointing me in the right direction! As I am using "pcs" rather than "cmr" to configure/manage the cluster, I had some difficulties to follow the examples on DRBD User Guide. Creating the resource immediately specifying the master/slave parameters worked, indeed: pcs resource create drbd_vol1 ocf:linbit:drbd drbd_resource=vol1 ignore_missing_notifications=true op monitor interval=5s timeout=30s role="Slave" monitor interval=15s timeout=30s role="Master" master master-max=1 master-node-max=1 Thanks again. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD over ZFS - or the other way around?
Il 15-09-2017 17:12 David Bruzos ha scritto: Hi Danti, The behavior you are experiencing is normal. As I pointed out previously, I've been using this setup for many years and I've seen the same thing. You will encounter that when the filesystem is written on the DRBD device without the use of a partition table. As a side note, I've had some nasty stability issues with the DRBD version in the kernel (4.4/4.9 kernels) when running on ZFS, but DRBD 8.4.10 and ZFS 0.6.5.11 seem to be running great. I also run the storage as part of dom0, which many admins don't recommend, but generally it works alright. The stability issues were typically rare and happened under high I/O loads and were DRBD related deadlock type crashes. Again, those problems seem to be resolve in the latest DRBD 8 RELEASE. David Hi David, thanks again for taking the time to report your findings. I plan to use CentOS, which has no build-in DRBD support, so I will use ELRepo's DRBD packages + the official ZFS 0.7.x repository. Regards. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD over ZFS - or the other way around?
Il 06-09-2017 16:22 David Bruzos ha scritto: I've used DRBD devices on top of ZFS zvols for years now and have been very satisfied with the performance and possibilities that that configuration allows for. I use DRBD 8.x on ZFS latest mainly on Xen hypervisors running a mix a Linux and Windows VMs with both SSD and mechanical drives. I've also done similar things in the past with DRBD and LVM. The DRBD on ZFS conbination is the most flexible and elegant. You can use snapshotting and streams to do data migrations across the Internet with minimal down time while getting storage level redundancy and integrity from ZFS and realtime replication from DRBD. A few scripts can automate the creation/removal of devices and coordinate VM migrations and things will just work. Also, you can then use ZFS streams for offsite backups (if you need that kind of thing). Another thing is that you may not need the realtime replication for some workloads, so in those cases you can just run directly on ZFS and omit the DRBD device. At least for me, that great flexibility is what makes running my own configuration worth it. Just my 25 cents! David Hi David, thank you for your input. It was greatly appreciated. Regards. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD over ZFS - or the other way around?
Il 06-09-2017 16:03 Yannis Milios ha scritto: ...I mean by cloning it first, since snapshot does not appear as blockdev to the system but the clone does. Hi, this is incorrect: ZVOL snapshots surely can appear as regular block devices. You simply need to set the "snapdev=visible" property. Regards. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD over ZFS - or the other way around?
On 06/09/2017 15:31, Yannis Milios wrote: If your topology is like the following: HDD -> ZFS (ZVOL) -> DRBD -> XFS then I believe it should make sense to always mount at the DRBD level and not at the ZVOL level which happens to be the underlying blockdev for DRBD. Sure! Directly mounting the DRBD-backing ZVOL would, at the bare minumum, ruin the replication with the peer. I was speaking about mounting ZVOLs *snapshots* to access previous data version. Regards. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD over ZFS - or the other way around?
Hi, On 06/09/2017 13:28, Jan Schermer wrote: Not sure you can mount snapshot (I always create a clone). the only difference is that snapshots are read-only, while clones are read-write. This is why I used the "-o ro,norecovery" option while mounting XFS. However I never saw anything about “drbd” filesystem - what distribution is this? Apparently it tries to be too clever… It is a CentOS 7.3 x86_64. Actually, I *really* like what the mount command is doing: by checking at the device end and discovering the DRBD metadata, it prevent accidental double mounts of the main (DRBD-backing) block device. I was only wondering if it is something that only happens to me, or it is "normal" to specify the mounting filesystem when using snapshot volumes with DRBD. Regards. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD over ZFS - or the other way around?
On 19/08/2017 10:24, Yannis Milios wrote: Option (b) seems more suitable for a 2 node drbd8 cluster in a primary/secondary setup. Haven't tried it so I cannot tell if there are any clurpits. My only concern in such setup would be if drbd corrupts silently the data on the lower level and zfs is not aware of that.Also, if you are *not* going to use live migration, and you can affort loosing some seconds of data on the secondary node in favor of better performance in the primary node, then you could consider using protocol A instead of C for the replication link. Hi all, I "revive" this old thread to let you know I settled to use DRBD 8.4 on top of ZVOLs. I have a question for anyone using DRBD on top of a snapshot-capable backend (eg: ZFS, LVM, etc)... When snapshotting a DRBD block device, trying to mount it (the snapshot, not the original volume!) results in the following error message: [root@master7 tank]# mount /dev/zvol/tank/vol1\@snap1 /mnt/ mount: unknown filesystem type 'drbd' To successfully mount the snapshot volume, I need to specify the volume filesystem, for example (the other options are xfs-specific): [root@master7 tank]# mount -t xfs /dev/zvol/tank/vol1\@snap1 /mnt/ -o ro,norecovery,nouuid Is that the right approach? Or I am missing something? Thanks. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD Dual Primary + GFS2 for redundant KVM hosts
On 28/08/2017 14:13, Yannis Milios wrote: I use lvm-based block device on critical installation, actually. However, backing up block devices is an hassle compared to regular files: you basically need to use ddrescue or, even worse, plain dd. Especially with the latter, you need to be *extremely* careful, as its "alias name" (data-destroyer) is here for a reason. I use proxmox which has implemented vzdump for backing up raw devices. No issues so far and no need to use dd. https://pve.proxmox.com/wiki/VZDump In addition to that, ZFS snapshots can be used as a quick point in time backup. Yannis Fair enought. I'll surely try it. Thanks. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD Dual Primary + GFS2 for redundant KVM hosts
Il 26-08-2017 14:28 Yannis Milios ha scritto: Ah I see, sorry I wasn't aware of that. Is there a particular reason in using file based VMs ? Simply for easy of management :) since generally speaking the performance on raw devices is much better and in DRBD9 you can leverage of thin lvm or zfs based snapshots. I use lvm-based block device on critical installation, actually. However, backing up block devices is an hassle compared to regular files: you basically need to use ddrescue or, even worse, plain dd. Especially with the latter, you need to be *extremely* careful, as its "alias name" (data-destroyer) is here for a reason. Thanks. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD Dual Primary + GFS2 for redundant KVM hosts
Il 26-08-2017 13:56 Yannis Milios ha scritto: Have you considered a HA NFS over a 2-node DRBD8 cluster ? Should work well on most hypervisors (qcow2,raw,vmdk based). Yannis Hi Yannis, yes, I considered that. However, as this would be a converged setup (ie: virtual machines run on the same node exporting the storage), this would require a loopback NFS mount. From what I know, this is (was?) discouraged due to possible livelock on the NFS kernel process. Thanks. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD Dual Primary + GFS2 for redundant KVM hosts
Il 25-08-2017 22:01 Digimer ha scritto: On 2017-08-25 03:37 PM, Gionatan Danti wrote: The overhead of clustered locking is likely such that your VM performance would not be good, I think. Mmm... I need to do some more testing with fio, it seems ;) With raw clustered LVs backing the servers, you don't need cluster locking on a per-IO basis, only on LV create/change/delete. Because LVM is sitting on top of DRBD (in dual-primary), live-migration is no trouble at all and performance is good, too. True. GFS2, being a cluster FS, will work fine if a node is lost, provided it is fenced succesfully. It's wouldn't be much of a cluster-FS otherwise. :) So no problem with quorum? A loss of a system in a two-node cluster seems to wreack havok on other cluster filesystems (Gluster, for example...) Thanks. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD Dual Primary + GFS2 for redundant KVM hosts
Il 25-08-2017 21:46 Remolina, Diego J ha scritto: Danti, Have you considered using something other than drbd for VM disk storage? Glusterfs for example is really good for that. You may want to give it a try. You should look at enabling sharding from the get-go to speed up heals when a node goes down. I would not use Glusterfs for a file server as it's performance is abysmal dealing with lots of small files. I would definitively use DRBD for a file server as it is great in handling lots of small files whereas Glusterfs is horrible. But for VMs, I think gluster offers many niceties, easy to setup, good performance, no need to deal with GFS2, etc. HTH, Diego Hi Diego, sure, I have an on-going discussion with the gluster mailing list about how to do that. However, having been served so well by DRBD in the last 4 years, I am somewhat reluctant to abandon it for another, probably not so battle-tested, technology. In particular, it seems that to be useful (ie: stable enought) for VM disk storage, Gluster need a 3-way cluster and sharding enabled (which I would like to avoid). In contrast, DRBD + GFS2 (or DRBD + LVM) can be used in a 2-way cluster without problem. Thanks. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD Dual Primary + GFS2 for redundant KVM hosts
Il 25-08-2017 14:34 Digimer ha scritto: Our Anvil! project (https://www.alteeve.com/w/Build_an_m2_Anvil!) is basically this, except we put the VMs on clustered LVs and use gfs2 to store install media and the server XML files. I would NOT put the image files on gfs2, as the distributed locking overhead would hurt performance a fair bit. Hi, I contemplated using an active/active lvm-based configuration, but I would really like to use files as per VM disks. So, do you feel GFS2 would be inadequate for VM disk storage? What if a node crashes/reboots? Will GFS2 continue to work? Thanks. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
[DRBD-user] DRBD Dual Primary + GFS2 for redundant KVM hosts
Hi list, in my endless search for information :) I was testing DRBD in dual primary mode + GFS2. The goal is "the same": running a replicated, 2-node KVM setup where disk images are located on the GFS2 filesystem. I know how to integrate DRBD with corosync/pacemaker, but I wonder how the system will perform under load. More specifically, it is my understanding that GFS2 inevitably has some significan overhead compared to a traditional, one-node filesystem, and this can lead to decreased performance. This should be especially true when restarting (or live-migrate) a virtual machine on the other host: as the first node has cached significan portion of the vm disk, GFS2 will, on every first read on the new host, fire its coherency protocol to be sure to update the first's node in-memory cached data. The overhead should be lowered by not using cache at all (ie: cache=none, which implies O_DIRECT), but this will also case degraded peformance in the common case (ie: when all is working correctly and VMs run on the first node only). So I ask: has some of you direct experience with a similar setup? How do you feel about? Any other suggestions? Thanks. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD over ZFS - or the other way around?
Il 18-08-2017 17:22 Veit Wahlich ha scritto: Yes, I regard qemu -> DRBD -> volume management [-> RAID] -> disk the most recommendable solution for this scenario. I personally go with LVM thinp for volume management, but ZVOLs should do the trick, too. With named ressources (named after VMs) and multiple volumes per ressource (for multiple VM disks), this works very well for us for hundreds of VMs. Having a cluster-wide unified system for numbering VMs is very advisable, as it allows to calculate the ports and minor numbers for both DRBD and KVM/qemu configuration. Example: * numbering VMs from 0 to 999 as , padded with leading zeros * numbering volumes from 0 to 99 as , padded with leading zeros * DRBD ressource port: 10 * VNC/SPICE unencrypted port: 11 * SPICE TLS port: 12 * DRBD minor: Let's say your VM gets number 123, it has 3 virtual disks and uses VNC: * DRBD ressource port: 10123 * VNC port: 11123 * DRBD minor of volume/VM disk 0: 12300 * DRBD minor of volume/VM disk 1: 12301 * DRBD minor of volume/VM disk 2: 12302 Best regards, // Veit Hi Veit, excellent advises! Thank you. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD over ZFS - or the other way around?
Il 18-08-2017 17:09 Yannis Milios ha scritto: Personally I'm using option (a) on a 3 node proxmox cluster and drbd9. Replica count per VM is 2 and all 3 nodes act as both drbd control volumes and satellite nodes.I can live migrate VM between all nodes and snapshot them by using drbdmanage utility (which is using zfs snapshot+clones). Hi Yannis, thank you for describing your setup! Option (b) seems more suitable for a 2 node drbd8 cluster in a primary/secondary setup. Haven't tried it so I cannot tell if there are any clurpits. My only concern in such setup would be if drbd corrupts silently the data on the lower level and zfs is not aware of that. I think that such a silent corruption will be better caught by ZFS when it happens at the lower layer (ie: the ZOOL, at VDEV level) rather than when it happens at upper layers (ie: DRBD on ZVOL). This is the stronger argument why on the ZFS list I was adviced to use DRBD on RAW device + ZFS on higher layer. From what I read on this list, however, basically no-one is using ZFS over DRBD over RAW disks, so I am somewhat worried about some potential, hidden pitfalls. Also, if you are *not* going to use live migration, and you can afford loosing some seconds of data on the secondary node in favor of better performance on the primary node, then you could consider using protocol A instead of C for the replication link. Sure. On other installation, I am using protocol B with great success. Thanks. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD over ZFS - or the other way around?
Il 18-08-2017 14:40 Veit Wahlich ha scritto: To clarify: Am Freitag, den 18.08.2017, 14:34 +0200 schrieb Veit Wahlich: hosts simultaniously, enables VM live migration and your hosts may even VM live migration requires primary/primary configuration of the DRBD ressource accessed by the VM, but only during migration. The ressource can be reconfigured for allow-two-primaries and revert this setting afterwards on the fly. Hi Veit, this is interesting. So you suggest to use DRBD on top of a ZVOLs? Thanks. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD over ZFS - or the other way around?
Il 18-08-2017 12:58 Julien Escario ha scritto: If you design with a signle big ressource, a simple split brain and you're screwed. Julien Hi, I plan to use a primary/secondary setup, with manual failover. In other words, split brain should not be possible at all. Thanks. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
[DRBD-user] DRBD over ZFS - or the other way around?
Hi list, I am discussing how to have a replicated ZFS setup on the ZoL mailing list, and DRBD is obviously on the radar ;) It seems that three possibilities exist: a) DRBD over ZVOLs (with one DRBD resource per ZVOL); b) ZFS over DRBD over the RAW disks (with DRBD resource per disk); c) ZFS over DRBD over a single huge and sparse ZVOL (see for an example: http://v-optimal.nl/index.php/2016/02/04/ha-zfs/) What option do you feel is the better one? On the ZoL list seems to exists a preference for option b - create a DRBD resource for each disk and let ZFS manage the DRBD devices. Any thought on that? Thanks. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Dual primary and LVM
Il 07-08-2017 12:59 Lars Ellenberg ha scritto: DRBD does not at all interact with the layers above it, so it does not know, and does not care, which entities may or may not have cached data they read earlier. Any entities that need cache coherency accross multiple instances need to coordinate in some way. But that is not DRBD specific at all, and not even specific to clustering or multi-node setups. This means that if you intend to use something that is NOT cluster aware (or multi-instance aware) itself, you may need to add your own band-aid locking and flushing "somewhere". I remember that "in the old days", kernel buffer pages may linger for quite some time, even if the corresponding devices was no longer open, which caused problems with migrating VMs even with something as a shared scsi device. Integration scripts added explicit calls to sync and blockdev --flushbufs and the like... The kernel then learned to invalidate cache pages on last close, so these hacks are no longer necessary (as long as no-one keeps the device open when not actively used). The other alternative is to always use "direct IO". You can (destructively!) experiment with dual primary drbd, make both nodes primary, on node A, watch -n1 "dd if=/dev/drbd0 bs=4096 count=1 | strings" watch -n1 "dd if=/dev/drbd0 bs=4096 count=1 iflag=direct | strings" on node B, while sleep 0.1; do date +%F_%T.%N | dd of=/dev/drbd0 bs=4096 iflag=sync of=direct; done iflag=sync padds with NUL to full bs, of=direct makes sure it finds its way to DRBD and not just into buffer cache pages You should see both "watch" thingies show the date changes written on the other node. If you then do on "node A": sleep 10 < /dev/drbd0, the non if=direct watch should show the same date for ten seconds, because it gets its data from buffer cache, and the device is kept open by the sleep. Once the "open count" of the device drops down to zero again, the kernel will invalidate the pages, and the next read will need to re-read from disk (just as the "direct" read always does). You can then do "sleep 10 wait", and see the non-direct watch update the date just once after 5 seconds, and then again once the sleep 10 has finished... Again, this does not really have anything to do with DRBD, but with how the kernel treats block devices, and if and how entities coordinate alternating and concurrent access to "things". You can easily have two entities on the same node corrupt a boring plain text file on a classic file system on just a single node, if they both assume "exclusive access", and don't coordinate properly. Great explanation Lars, thank you very much! -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Dual primary and LVM
Il 27-07-2017 13:09 Igor Cicimov ha scritto: And in case of live migration I'm sure the tool you decide to use will freeze the guest and make sync() call to flush the os cache *before* stopping and starting the guest on the other node. Yeah, I was busy architecting a valid example [1] until I realized that libvirt/KVM surely issues a sync() before live migrating the guest :p Thank you Igor. [1] For reference, here you find an example showing as the kernel buffer can sometime act as a writeback cache... Consider the following command and output: [root@localhost ~]# dd if=/dev/zero of=/dev/sdb bs=1M count=64 64+0 records in 64+0 records out 67108864 bytes (67 MB) copied, 0.981192 s, 68.4 MB/s As you can see, write bandwidth is in line with the underlying disk real performance. This means that the close() call (issued by dd just before exiting) flushes the buffers. Now, let's concurrently open the block device in *read only* mode: [root@localhost ~]# exec 3This time, write bandwidth as reported by dd is about 1.5 GB/s, which is clearly so much more of the disk's real write speed. This means that the close() call returned *before* buffer flushing. I don't fully understand if, and how, this correlate in read-life with a opened LVM device on top of a DRBD resource in a dual-primary setup. Maybe it opens a small opportunity window for an application writing to a raw block device, and not issuing proper sync/fsync calls, to see stale data in the event of a node migration in a dual-primary setup (without no powerloss or hardware failure happening). However, this is a very contrived scenario. Moreover, I am not sure on how DRBD interact with kernel's buffers. Time to do some more tests, it seems ... ;) -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Dual primary and LVM
Il 27-07-2017 10:23 Igor Cicimov ha scritto: When in cluster mode LVM will not use local cache that's part of the configuration you need to do during setup. Hi Igor, I am not referring to LVM's metadata cache. I speak about the kernel I/O buffers (ie: the one you can see from "free -m" under the buffer column) which, in some case, work similarly to a "real" pagecache. Thanks. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Dual primary and LVM
Il 27-07-2017 09:38 Gionatan Danti ha scritto: Thanks for your input. I also read your excellent suggestions on link Igor posted. To clarify: the main reason I am asking about the feasibility of a dual-primary DRBD setup with LVs on top of it is about cache coherency. Let me do a step back: the given explaination for deny even read access on a secondary node is of broken cache coherency/consistency: if the read/write node writes something the secondary node had previously read, the latter will not recognize the changes done by the first node. The canonical solution to this problem is to use a dual-primary setup with a clustered filesystem (eg: GFS2) which not only arbitrates write access, but maintains read cache consistency also. Now, let's remove the clustered filesystem layer, leaving "naked" LVs only. How read cache coherency is mantained in this case? As no filesystem is layered on top of the raw LVs, there is not real pagecache at work, but the kernel's buffers remains - and they need to be made coherents. How DRBD achieves this? Does it update the receiving kernel I/O buffers each time the other node writes something? Thanks. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Dual primary and LVM
Il 27-07-2017 03:59 Digimer ha scritto: To clarify; The clustered locking in clvmd doesn't do anything to restrict access to the storage, it's job is only to inform other nodes that something changing on the underlying shared storage when it makes a change. Hi Digimer, yes, I am full aware of that. To make sure a VM runs on only one node at a time, then you must use pacemaker to handle that (and stonith to prevent split-brains, but I am sure you know that already). If you use clvmd on top on DRBD, make your life simple and don't use it below DRBD at the same time. It's technically possible but it is needless complexity. You seem to already be planning this, but just to make it clear. Also note; If you don't use clvmd, then you will probably need to do a pv/vg/lvscan to pick up changes. This is quite risky though and certainly not recommended if you run DRBD in dual-primary. Thanks for your input. I also read your excellent suggestions on link Igor posted. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Dual primary and LVM
Il 27-07-2017 03:52 Igor Cicimov ha scritto: I would recommend going through this lengthy post http://lists.linbit.com/pipermail/drbd-user/2011-January/015236.html [1] covering all pros and cons of several possible scenarios. Hi Igor, thanks for the link, very interesting thread! The easiest scenario for dual-primary DRBD would be a DRBD device per VM, so something like this RAID -> PV -> VG -> LV -> DRBD -> VM, where you don't even need LVM locking (since that layer is not even exposed to the user) and is great for dual-primary KVM clusters. You get live migration and also keep the resizing functionality too since you can grow the underlying LV and then the DRBD it self to increase the VM disk size lets say. The VM needs to be started on one node only of course so you (or your software) need to make sure this is always the case. One huge drawback of this approach though is the large number of DRBD device to maintain in case of hundreds of VM's! Although since you have already committed to different approach this might not be possible at this point. Note: In this case though you don't even need dual primary since the DRBD for each VM can be independently promoted to primary on any node at any time. In case of single DRBD it is all-or-nothing so no possibility of migrating individual VMs. Yeah, I considered such a solution in the past. Its strong appeal depends on be able to live-migrating without going for a dual-primary setup. However I would like to avoid creating/deleting DRBD devices for each added/removed virtual machine. One possibile solution is to use ganeti, which automates resource and VM creation, right? Now adding LV on top of DRBD is bit more complicated. I guess your current setup is something like this? LV1 -> VM1 RAID -> DRBD -> PV -> VG -> LV2 -> VM2 LV3 -> VM3 ... In this case when DRBD is in dual-primary the DLM/cLVM setup is imperative so the LVMs know who has the write access. But then on VM migration "something" needs to shutdown the VM on Node1 to release the LVM lock and start the VM on Node2. Same as above, as long as each VM is running on *only one* node you should be fine, the moment you start it on both you will probably corrupt your VM. Software like Proxmox should be able to help you on both points. Which brings me to an important question I should have asked at the very beginning actually: what do you use to manage the cluster?? (if anything) Pacemaker surely is a possibility, but for manual, non automated live migration, I should be fine with libvirt integrated locking: https://libvirt.org/locking.html. It avoid (at application level) the concurrent starting/running of any virtual machine. Thank you very much, Igor. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
[DRBD-user] Dual primary and LVM
Hi all, I have a possibly naive question about a dual primary setup involving LVM devices on top of DRBD. The main question is: using cLVM or native LVM locking, can I safely use a LV block device on the first node, *close it*, and reopen it on the second one? No filesystem is involved and no host is expected to concurrently use the same LV. Scenario: two CentOS 7 + DRBD 8.4 nodes with LVs on top of DRBD on top of a physical RAID array. Basically, DRBD replicate anything written to the specific hardware array. Goal: having a redundant virtual machine setup, where vms can be live migrated between the two hosts. Current setup: I currently run a single-primary, dual nodes setup, where the second host has no access at all to any LV. This setup worked very well in the past years, but it forbid using live migration (the secondary host has no access to the LV-based vdisk attached to the vms, so it is impossible to live migrate the running vms). I thought to use a dual-primary setup to have the LVs available on *both* nodes, using a lock manager to arbitrate access to them. How do you see such a solution? It is workable? Or would you recommend to use a clustered filesystem on top of the dual-primary DRBD device? Thanks. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
[DRBD-user] DRBD Dual Primary (writable/writeble) setup over VDSL WAN links
Hi list, I have a question about the feasibility of a dual primary setup over two VDSL WAN links. I already searched the list and I found similar question, but without having a definitive conclusion. My goal is to have a shared fileserver over two distant offices, using DRBD to create the illusion of a single block devices. I plan to use GFS2 (or OCFS) filesystem, exporting it to users (on both side) via Samaba shares. Please note that: 1) both server are CentOS 6.x x86_64 machines, with DRBD packages provided by ElRepo (DRBD 8.4.x) 2) the main and remote offices are connected with relatively fast (~100 Mb/s) but high latency (+30 ms RTT) VDSL links 3) write speed should be as high as possible My current understanding (correct me if wrong) is that: 1) to not impair write speed, I should use protocol A 2) GFS2 (or any cluster filesystem) is quite sensitive to latency, so some operation (eg: metadata mangling, locking, reading an alredy remote-opened files) will be slowed down anyway 3) all will work (more or less) fine unless the WAN/VPN goes down. It is the "WAN failed" scenario that puzzle me. Let me do some questions: 1) at the block/device level, will both hosts (being both primary nodes) continue to accept writes? If so, will they end in a "split brain" scenario, right? 2) will the higher level filesystem (eg: GFS2) will block until the remote connection is re-established? 3) what will happens at connection recover? I understand that I can try myself (and I am going to do it, indeed), however I can only test on a fast LAN, not on a (relatively) slow WAN link. Thank you all. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD Dual Primary (writable/writeble) setup over VDSL WAN links
On 23/10/15 08:23, Digimer wrote: No, it is not feasible. Your cluster will block every time the network faults and will stay blocked because the only path to fence the peer will go down with the WAN. This is to say nothing about performance issues. I understand. Thank you very much. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.da...@assyoma.it - i...@assyoma.it GPG public key ID: FF5F32A8 ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user