Re: [DRBD-user] The Problem of File System Corruption w/DRBD

2021-06-06 Thread Gionatan Danti

Il 2021-06-04 15:08 Eric Robinson ha scritto:

Those are all good points. Since the three legs of the information
security triad are confidentiality, integrity, and availability, this
is ultimately a security issue. We all know that information security
is not about eliminating all possible risks, as that is an
unattainable goal. It is about mitigating risks to acceptable levels.
So I guess it boils down to how each person evaluates the risks in
their own environment. Over my 38-year career, and especially the past
15 years of using Linux HA, I've seen more filesystem-type issues than
the other possible issues you mentioned, so that one tends to feature
more prominently on my risk radar.


For the very limited goal of protecting from filesystem corruptions, you 
can use a snapshot/CoW layer as thinlvm. Keep multiple rolling snapshots 
and you can recover from sudden filesystem corruption. However this is 
simply move the SPOF down to the CoW layer (thinlvm, which is quite 
complex by itself and can be considered a stripped-down 
filesystem/allocator) or up to the application layer (where corruptions 
are relatively quite common).


That said, nowadays a mature filesystem as EXT4 and XFS can be corrupted 
(barring obscure bugs) only by:

- a double mount from different machines;
- a direct write to the underlying raw disks;
- a serious hardware issue.

For what it is worth I am now accustomed to ZFS strong data integrity 
guarantee, but I fully realize that this does *not* protect from any 
corruptions scenario by itself, not even on 
XFS-over-ZVOL-over-DRBD-over-ZFS setups. If anything, a more complex 
filesystem (and I/O setup) has *greater* chances of exposing uncommon 
bugs.


So: I strongly advise on placing your filesystem over a snapshot layer, 
but do not expect this to shield from any storage related issue.

Regards.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD Synchronization Rate Questions

2020-01-10 Thread Gionatan Danti

Il 23-12-2019 19:48 cpt...@yahoo.co.jp ha scritto:

1. Can DRBD only provide up to 80 MB/s of network bandwidth per volume 
during the initial full synchronization?
We did an initial full synchronization in DRBD with the following 
settings:

(I use a network with a speed of 1.25 GB/s.)


Some years ago, I had similar problems with 10G-BaseT Ethernet until I 
enabled jumbo frames. After that I get over 600 MB/s for single volume 
synchronization.


So, try enabling jumbo frames on both the servers and any switch used 
between them.


Regards.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] ERROR: meta parameter misconfigured, expected clone-max -le 2, but found unset

2017-10-03 Thread Gionatan Danti
On 02/10/2017 15:35, Lars Ellenberg wrote:> Usually a result of having 
(temporarily?) only a "primitive", without

corresponding "ms" resource definition in the cib.

Once you fixed the config, you should no longer get it,
and be able to clear previous fail-counts by doing a "resource cleanup".



So it seems the OCF script I can found in
/usr/lib/ocf/resource.d/linbit/drbd does not find metadata it expects in the
resource definition. However, metadata *are* specified in the resource file.

Any suggestions on how to fix the problem?


Don't put a "primitive" DRBD definition live
without the corresponding "ms" definition.
If you need to, populate a "shadow" cib first,
and only commit that to "live" once it is fully populated.


Hi Lars,
thank you for pointing me in the right direction! As I am using "pcs" 
rather than "cmr" to configure/manage the cluster, I had some 
difficulties to follow the examples on DRBD User Guide.


Creating the resource immediately specifying the master/slave parameters 
worked, indeed:


pcs resource create drbd_vol1 ocf:linbit:drbd drbd_resource=vol1 
ignore_missing_notifications=true op monitor interval=5s timeout=30s 
role="Slave" monitor interval=15s timeout=30s role="Master" master 
master-max=1 master-node-max=1


Thanks again.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD over ZFS - or the other way around?

2017-09-16 Thread Gionatan Danti

Il 15-09-2017 17:12 David Bruzos ha scritto:

Hi Danti,
The behavior you are experiencing is normal.  As I pointed out
previously, I've been using this setup for many years and I've seen
the same thing.  You will encounter that when the filesystem is
written on the DRBD device without the use of a partition table.
As a side note, I've had some nasty stability issues with the DRBD
version in the kernel (4.4/4.9 kernels) when running on ZFS, but DRBD
8.4.10 and ZFS 0.6.5.11 seem to be running great.  I also run the
storage as part of dom0, which many admins don't recommend, but
generally it works alright.  The stability issues were typically rare
and happened under high I/O loads and were DRBD related deadlock type
crashes.  Again, those problems seem to be resolve in the latest DRBD
8 RELEASE.

David


Hi David, thanks again for taking the time to report your findings.
I plan to use CentOS, which has no build-in DRBD support, so I will use 
ELRepo's DRBD packages + the official ZFS 0.7.x repository.


Regards.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD over ZFS - or the other way around?

2017-09-06 Thread Gionatan Danti

Il 06-09-2017 16:22 David Bruzos ha scritto:

I've used DRBD devices on top of ZFS zvols for years now and have been
very satisfied with the performance and possibilities that that
configuration allows for.  I use DRBD 8.x on ZFS latest mainly on Xen
hypervisors running a mix a Linux and Windows VMs with both SSD and
mechanical drives.  I've also done similar things in the past with
DRBD and LVM.  The DRBD on ZFS conbination is the most flexible and
elegant.  You can use snapshotting and streams to do data migrations
across the Internet with minimal down time while getting storage level
redundancy and integrity from ZFS and realtime replication from DRBD.
A few scripts can automate the creation/removal of devices and
coordinate VM migrations and things will just work.  Also, you can
then use ZFS streams for offsite backups (if you need that kind of
thing).
Another thing is that you may not need the realtime replication for
some workloads, so in those cases you can just run directly on ZFS and
omit the DRBD device.  At least for me, that great flexibility is what
makes running my own configuration worth it.

Just my 25 cents!

David


Hi David,
thank you for your input. It was greatly appreciated.

Regards.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD over ZFS - or the other way around?

2017-09-06 Thread Gionatan Danti

Il 06-09-2017 16:03 Yannis Milios ha scritto:

...I mean by cloning it first, since snapshot does not appear as
blockdev to the system but the clone does.


Hi, this is incorrect: ZVOL snapshots surely can appear as regular block 
devices. You simply need to set the "snapdev=visible" property.


Regards.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD over ZFS - or the other way around?

2017-09-06 Thread Gionatan Danti

On 06/09/2017 15:31, Yannis Milios wrote:
If your topology is like the following:  HDD -> ZFS (ZVOL) -> DRBD -> 
XFS then I believe it should make sense to always mount at the DRBD 
level and not at the ZVOL level which happens to be the underlying 
blockdev for DRBD.
Sure! Directly mounting the DRBD-backing ZVOL would, at the bare 
minumum, ruin the replication with the peer.


I was speaking about mounting ZVOLs *snapshots* to access previous data 
version.


Regards.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD over ZFS - or the other way around?

2017-09-06 Thread Gionatan Danti

Hi,

On 06/09/2017 13:28, Jan Schermer wrote:

Not sure you can mount snapshot (I always create a clone).


the only difference is that snapshots are read-only, while clones are 
read-write. This is why I used the "-o ro,norecovery" option while 
mounting XFS.



However I never saw anything about “drbd” filesystem - what distribution is 
this? Apparently it tries to be too clever…


It is a CentOS 7.3 x86_64. Actually, I *really* like what the mount 
command is doing: by checking at the device end and discovering the DRBD 
metadata, it prevent accidental double mounts of the main (DRBD-backing) 
block device.


I was only wondering if it is something that only happens to me, or it 
is "normal" to specify the mounting filesystem when using snapshot 
volumes with DRBD.


Regards.


--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD over ZFS - or the other way around?

2017-09-06 Thread Gionatan Danti

On 19/08/2017 10:24, Yannis Milios wrote:
Option (b) seems more suitable for a 2 node drbd8 cluster in a 
primary/secondary setup. Haven't tried it so I cannot tell if there are 
any clurpits. My only concern in such setup would be if drbd corrupts 
silently the data on the lower level and zfs is not aware of that.Also, 
if you are *not* going to use live migration, and you can affort loosing 
some seconds of data on the secondary node in favor of better 
performance in the primary node, then you could consider using protocol 
A instead of C for the replication link.


Hi all,
I "revive" this old thread to let you know I settled to use DRBD 8.4 on 
top of ZVOLs.


I have a question for anyone using DRBD on top of a snapshot-capable 
backend (eg: ZFS, LVM, etc)...


When snapshotting a DRBD block device, trying to mount it (the snapshot, 
not the original volume!) results in the following error message:


[root@master7 tank]# mount /dev/zvol/tank/vol1\@snap1 /mnt/
mount: unknown filesystem type 'drbd'

To successfully mount the snapshot volume, I need to specify the volume 
filesystem, for example (the other options are xfs-specific):


[root@master7 tank]# mount -t xfs /dev/zvol/tank/vol1\@snap1 /mnt/ -o 
ro,norecovery,nouuid


Is that the right approach? Or I am missing something?
Thanks.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD Dual Primary + GFS2 for redundant KVM hosts

2017-08-28 Thread Gionatan Danti

On 28/08/2017 14:13, Yannis Milios wrote:

I use lvm-based block device on critical installation, actually.
However, backing up block devices is an hassle compared to regular
files: you basically need to use ddrescue or, even worse, plain dd.
Especially with the latter, you need to be *extremely* careful, as its
"alias name" (data-destroyer) is here for a reason.


I use proxmox which has implemented vzdump for backing up raw devices. 
No issues so far and no need to use dd.


https://pve.proxmox.com/wiki/VZDump

In addition to that, ZFS snapshots can be used as a quick point in time 
backup.


Yannis


Fair enought. I'll surely try it.
Thanks.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD Dual Primary + GFS2 for redundant KVM hosts

2017-08-26 Thread Gionatan Danti

Il 26-08-2017 14:28 Yannis Milios ha scritto:

Ah I see, sorry I wasn't aware of that. Is there a particular reason
in using file based VMs ?


Simply for easy of management :)


since generally speaking the performance on
raw devices is much better and in DRBD9 you can leverage of thin lvm
or zfs based snapshots.


I use lvm-based block device on critical installation, actually. 
However, backing up block devices is an hassle compared to regular 
files: you basically need to use ddrescue or, even worse, plain dd. 
Especially with the latter, you need to be *extremely* careful, as its 
"alias name" (data-destroyer) is here for a reason.


Thanks.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD Dual Primary + GFS2 for redundant KVM hosts

2017-08-26 Thread Gionatan Danti

Il 26-08-2017 13:56 Yannis Milios ha scritto:

Have you considered a HA NFS over a 2-node DRBD8 cluster ? Should work
well on most hypervisors (qcow2,raw,vmdk based).

Yannis


Hi Yannis, yes, I considered that.
However, as this would be a converged setup (ie: virtual machines run on 
the same node exporting the storage), this would require a loopback NFS 
mount. From what I know, this is (was?) discouraged due to possible 
livelock on the NFS kernel process.


Thanks.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD Dual Primary + GFS2 for redundant KVM hosts

2017-08-25 Thread Gionatan Danti

Il 25-08-2017 22:01 Digimer ha scritto:

On 2017-08-25 03:37 PM, Gionatan Danti wrote:

The overhead of clustered locking is likely such that your VM
performance would not be good, I think.


Mmm... I need to do some more testing with fio, it seems ;)


With raw clustered LVs backing the servers, you don't need cluster
locking on a per-IO basis, only on LV create/change/delete. Because LVM
is sitting on top of DRBD (in dual-primary), live-migration is no
trouble at all and performance is good, too.


True.


GFS2, being a cluster FS, will work fine if a node is lost, provided it
is fenced succesfully. It's wouldn't be much of a cluster-FS otherwise. 
:)


So no problem with quorum? A loss of a system in a two-node cluster 
seems to wreack havok on other cluster filesystems (Gluster, for 
example...)


Thanks.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD Dual Primary + GFS2 for redundant KVM hosts

2017-08-25 Thread Gionatan Danti

Il 25-08-2017 21:46 Remolina, Diego J ha scritto:

Danti,

Have you considered using something other than drbd for VM disk
storage? Glusterfs for example is really good for that. You may want
to give it a try. You should look at enabling sharding from the get-go
to speed up heals when a node goes down.

I would not use Glusterfs for a file server as it's performance is
abysmal dealing with lots of small files. I would definitively use
DRBD for a file server as it is great in handling lots of small files
whereas Glusterfs is horrible. But for VMs, I think gluster offers
many niceties, easy to setup, good performance, no need to deal with
GFS2, etc.

HTH,

Diego


Hi Diego,
sure, I have an on-going discussion with the gluster mailing list about 
how to do that.


However, having been served so well by DRBD in the last 4 years, I am 
somewhat reluctant to abandon it for another, probably not so 
battle-tested, technology.


In particular, it seems that to be useful (ie: stable enought) for VM 
disk storage, Gluster need a 3-way cluster and sharding enabled (which I 
would like to avoid). In contrast, DRBD + GFS2 (or DRBD + LVM) can be 
used in a 2-way cluster without problem.


Thanks.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD Dual Primary + GFS2 for redundant KVM hosts

2017-08-25 Thread Gionatan Danti

Il 25-08-2017 14:34 Digimer ha scritto:


Our Anvil! project (https://www.alteeve.com/w/Build_an_m2_Anvil!) is
basically this, except we put the VMs on clustered LVs and use gfs2 to
store install media and the server XML files.

I would NOT put the image files on gfs2, as the distributed locking
overhead would hurt performance a fair bit.


Hi, I contemplated using an active/active lvm-based configuration, but I 
would really like to use files as per VM disks.
So, do you feel GFS2 would be inadequate for VM disk storage? What if a 
node crashes/reboots? Will GFS2 continue to work?


Thanks.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] DRBD Dual Primary + GFS2 for redundant KVM hosts

2017-08-25 Thread Gionatan Danti

Hi list,
in my endless search for information :) I was testing DRBD in dual 
primary mode + GFS2. The goal is "the same": running a replicated, 
2-node KVM setup where disk images are located on the GFS2 filesystem.


I know how to integrate DRBD with corosync/pacemaker, but I wonder how 
the system will perform under load. More specifically, it is my 
understanding that GFS2 inevitably has some significan overhead compared 
to a traditional, one-node filesystem, and this can lead to decreased 
performance.


This should be especially true when restarting (or live-migrate) a 
virtual machine on the other host: as the first node has cached 
significan portion of the vm disk, GFS2 will, on every first read on the 
new host, fire its coherency protocol to be sure to update the first's 
node in-memory cached data. The overhead should be lowered by not using 
cache at all (ie: cache=none, which implies O_DIRECT), but this will 
also case degraded peformance in the common case (ie: when all is 
working correctly and VMs run on the first node only).


So I ask: has some of you direct experience with a similar setup? How do 
you feel about? Any other suggestions?

Thanks.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD over ZFS - or the other way around?

2017-08-18 Thread Gionatan Danti

Il 18-08-2017 17:22 Veit Wahlich ha scritto:


Yes, I regard qemu -> DRBD -> volume management [-> RAID] -> disk the
most recommendable solution for this scenario.

I personally go with LVM thinp for volume management, but ZVOLs should
do the trick, too.

With named ressources (named after VMs) and multiple volumes per
ressource (for multiple VM disks), this works very well for us for
hundreds of VMs.

Having a cluster-wide unified system for numbering VMs is very
advisable, as it allows to calculate the ports and minor numbers for
both DRBD and KVM/qemu configuration.

Example:
* numbering VMs from 0 to 999 as , padded with leading zeros
* numbering volumes from 0 to 99 as , padded with leading zeros
* DRBD ressource port: 10
* VNC/SPICE unencrypted port: 11
* SPICE TLS port: 12
* DRBD minor: 

Let's say your VM gets number 123, it has 3 virtual disks and uses VNC:
* DRBD ressource port: 10123
* VNC port: 11123
* DRBD minor of volume/VM disk 0: 12300
* DRBD minor of volume/VM disk 1: 12301
* DRBD minor of volume/VM disk 2: 12302

Best regards,
// Veit


Hi Veit, excellent advises!
Thank you.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD over ZFS - or the other way around?

2017-08-18 Thread Gionatan Danti

Il 18-08-2017 17:09 Yannis Milios ha scritto:

Personally I'm using option (a) on a 3 node proxmox cluster and drbd9.
Replica count per VM is 2 and all 3 nodes act as both drbd control
volumes and satellite nodes.I can live migrate VM between all nodes
and snapshot them by using drbdmanage utility (which is using zfs
snapshot+clones).


Hi Yannis, thank you for describing your setup!


Option (b) seems more suitable for a 2 node drbd8 cluster in a
primary/secondary setup. Haven't tried it so I cannot tell if there
are any clurpits. My only concern in such setup would be if drbd
corrupts silently the data on the lower level and zfs is not aware of
that.


I think that such a silent corruption will be better caught by ZFS when 
it happens at the lower layer (ie: the ZOOL, at VDEV level) rather than 
when it happens at upper layers (ie: DRBD on ZVOL). This is the stronger 
argument why on the ZFS list I was adviced to use DRBD on RAW device + 
ZFS on higher layer.


From what I read on this list, however, basically no-one is using ZFS 
over DRBD over RAW disks, so I am somewhat worried about some potential, 
hidden pitfalls.



Also, if you are *not* going to use live migration, and you can
afford loosing some seconds of data on the secondary node in favor of
better performance on the primary node, then you could consider using
protocol A instead of C for the replication link.


Sure. On other installation, I am using protocol B with great success.

Thanks.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD over ZFS - or the other way around?

2017-08-18 Thread Gionatan Danti

Il 18-08-2017 14:40 Veit Wahlich ha scritto:

To clarify:

Am Freitag, den 18.08.2017, 14:34 +0200 schrieb Veit Wahlich:
hosts simultaniously, enables VM live migration and your hosts may 
even


VM live migration requires primary/primary configuration of the DRBD
ressource accessed by the VM, but only during migration. The ressource
can be reconfigured for allow-two-primaries and revert this setting
afterwards on the fly.


Hi Veit, this is interesting.
So you suggest to use DRBD on top of a ZVOLs?

Thanks.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD over ZFS - or the other way around?

2017-08-18 Thread Gionatan Danti

Il 18-08-2017 12:58 Julien Escario ha scritto:

If you design with a signle big ressource, a simple split brain and
you're screwed.

Julien


Hi, I plan to use a primary/secondary setup, with manual failover.
In other words, split brain should not be possible at all.

Thanks.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] DRBD over ZFS - or the other way around?

2017-08-17 Thread Gionatan Danti

Hi list,
I am discussing how to have a replicated ZFS setup on the ZoL mailing 
list, and DRBD is obviously on the radar ;)


It seems that three possibilities exist:

a) DRBD over ZVOLs (with one DRBD resource per ZVOL);
b) ZFS over DRBD over the RAW disks (with DRBD resource per disk);
c) ZFS over DRBD over a single huge and sparse ZVOL (see for an example: 
http://v-optimal.nl/index.php/2016/02/04/ha-zfs/)


What option do you feel is the better one? On the ZoL list seems to 
exists a preference for option b - create a DRBD resource for each disk 
and let ZFS manage the DRBD devices.


Any thought on that?
Thanks.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Dual primary and LVM

2017-08-11 Thread Gionatan Danti

Il 07-08-2017 12:59 Lars Ellenberg ha scritto:

DRBD does not at all interact with the layers above it,
so it does not know, and does not care, which entities may or may not
have cached data they read earlier.
Any entities that need cache coherency accross multiple instances
need to coordinate in some way.

But that is not DRBD specific at all,
and not even specific to clustering or multi-node setups.

This means that if you intend to use something that is NOT cluster 
aware

(or multi-instance aware) itself, you may need to add your own band-aid
locking and flushing "somewhere".

I remember that "in the old days", kernel buffer pages may linger for
quite some time, even if the corresponding devices was no longer open,
which caused problems with migrating VMs even with something as a 
shared

scsi device.  Integration scripts added explicit calls to sync and
blockdev --flushbufs and the like...

The kernel then learned to invalidate cache pages on last close,
so these hacks are no longer necessary (as long as no-one keeps
the device open when not actively used).

The other alternative is to always use "direct IO".

You can (destructively!) experiment with dual primary drbd,
make both nodes primary,

on node A,
watch -n1 "dd if=/dev/drbd0 bs=4096 count=1 | strings"
watch -n1 "dd if=/dev/drbd0 bs=4096 count=1 iflag=direct | strings"

on node B,
while sleep 0.1; do date +%F_%T.%N | dd of=/dev/drbd0 bs=4096
iflag=sync of=direct; done

iflag=sync padds with NUL to full bs,
of=direct makes sure it finds its way to DRBD
and not just into buffer cache pages

You should see both "watch" thingies show the date changes written on
the other node.

If you then do on "node A": sleep 10 < /dev/drbd0,
the non if=direct watch should show the same date for ten seconds,
because it gets its data from buffer cache, and the device is kept open
by the sleep.

Once the "open count" of the device drops down to zero again, the 
kernel

will invalidate the pages, and the next read will need to re-read from
disk (just as the "direct" read always does).

You can then do
"sleep 10 wait",

and see the non-direct watch update the date just once after 5 seconds,
and then again once the sleep 10 has finished...

Again, this does not really have anything to do with DRBD,
but with how the kernel treats block devices,
and if and how entities coordinate alternating and concurrent access
to "things".

You can easily have two entities on the same node corrupt a boring 
plain

text file on a classic file system on just a single node, if they both
assume "exclusive access", and don't coordinate properly.


Great explanation Lars, thank you very much!

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Dual primary and LVM

2017-07-27 Thread Gionatan Danti

Il 27-07-2017 13:09 Igor Cicimov ha scritto:


​And in case of live migration I'm sure the tool you decide to use
will freeze the guest and make sync() call to flush the os cache
*before* stopping and starting the guest on the other node.​



Yeah, I was busy architecting a valid example [1] until I realized that 
libvirt/KVM surely issues a sync() before live migrating the guest :p


Thank you Igor.


[1] For reference, here you find an example showing as the kernel buffer 
can sometime act as a writeback cache...


Consider the following command and output:

[root@localhost ~]# dd if=/dev/zero of=/dev/sdb bs=1M count=64
64+0 records in
64+0 records out
67108864 bytes (67 MB) copied, 0.981192 s, 68.4 MB/s

As you can see, write bandwidth is in line with the underlying disk real 
performance. This means that the close() call (issued by dd just before 
exiting) flushes the buffers.


Now, let's concurrently open the block device in *read only* mode:

[root@localhost ~]# exec 3This time, write bandwidth as reported by dd is about 1.5 GB/s, which is 
clearly so much more of the disk's real write speed. This means that the 
close() call returned *before* buffer flushing.


I don't fully understand if, and how, this correlate in read-life with a 
opened LVM device on top of a DRBD resource in a dual-primary setup. 
Maybe it opens a small opportunity window for an application writing to 
a raw block device, and not issuing proper sync/fsync calls, to see 
stale data in the event of a node migration in a dual-primary setup 
(without no powerloss or hardware failure happening).


However, this is a very contrived scenario. Moreover, I am not sure on 
how DRBD interact with kernel's buffers. Time to do some more tests, it 
seems ... ;)


--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Dual primary and LVM

2017-07-27 Thread Gionatan Danti

Il 27-07-2017 10:23 Igor Cicimov ha scritto:


When in cluster mode LVM will not use local cache that's part of the
configuration you need to do during setup.



Hi Igor, I am not referring to LVM's metadata cache. I speak about the 
kernel I/O buffers (ie: the one you can see from "free -m" under the 
buffer column) which, in some case, work similarly to a "real" 
pagecache.


Thanks.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Dual primary and LVM

2017-07-27 Thread Gionatan Danti

Il 27-07-2017 09:38 Gionatan Danti ha scritto:


Thanks for your input. I also read your excellent suggestions on link
Igor posted.



To clarify: the main reason I am asking about the feasibility of a 
dual-primary DRBD setup with LVs on top of it is about cache coherency. 
Let me do a step back: the given explaination for deny even read access 
on a secondary node is of broken cache coherency/consistency: if the 
read/write node writes something the secondary node had previously read, 
the latter will not recognize the changes done by the first node. The 
canonical solution to this problem is to use a dual-primary setup with a 
clustered filesystem (eg: GFS2) which not only arbitrates write access, 
but maintains read cache consistency also.


Now, let's remove the clustered filesystem layer, leaving "naked" LVs 
only. How read cache coherency is mantained in this case? As no 
filesystem is layered on top of the raw LVs, there is not real pagecache 
at work, but the kernel's buffers remains - and they need to be made 
coherents. How DRBD achieves this? Does it update the receiving kernel 
I/O buffers each time the other node writes something?


Thanks.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Dual primary and LVM

2017-07-27 Thread Gionatan Danti

Il 27-07-2017 03:59 Digimer ha scritto:


To clarify; The clustered locking in clvmd doesn't do anything to
restrict access to the storage, it's job is only to inform other nodes
that something changing on the underlying shared storage when it makes 
a

change.



Hi Digimer,
yes, I am full aware of that.



To make sure a VM runs on only one node at a time, then you must use
pacemaker to handle that (and stonith to prevent split-brains, but I am
sure you know that already).

If you use clvmd on top on DRBD, make your life simple and don't use it
below DRBD at the same time. It's technically possible but it is
needless complexity. You seem to already be planning this, but just to
make it clear.

Also note; If you don't use clvmd, then you will probably need to do a
pv/vg/lvscan to pick up changes. This is quite risky though and
certainly not recommended if you run DRBD in dual-primary.



Thanks for your input. I also read your excellent suggestions on link 
Igor posted.


--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Dual primary and LVM

2017-07-27 Thread Gionatan Danti

Il 27-07-2017 03:52 Igor Cicimov ha scritto:


​I would recommend going through this lengthy post
http://lists.linbit.com/pipermail/drbd-user/2011-January/015236.html
[1] covering all pros and cons of several possible scenarios.



Hi Igor, thanks for the link, very interesting thread!


The easiest scenario for dual-primary DRBD would be a DRBD device per
VM, so something like this RAID -> PV -> VG -> LV -> DRBD -> VM, where
you don't even need LVM locking (since that layer is not even exposed
to the user) and is great for dual-primary KVM clusters. You get live
migration and also keep the resizing functionality too since you can
grow the underlying LV and then the DRBD it self to increase the VM
disk size lets say. The VM needs to be started on one node only of
course so you (or your software) need to make sure this is always the
case. One huge drawback of this approach though is the large number of
DRBD device to maintain in case of hundreds of VM's! Although since
you have already committed to different approach this might not be
possible at this point.

Note: In this case though you don't even need dual primary since the
DRBD for each VM can be independently promoted to primary on any node
at any time. In case of single DRBD it is all-or-nothing so no
possibility of migrating individual VMs.


Yeah, I considered such a solution in the past. Its strong appeal 
depends on be able to live-migrating without going for a dual-primary 
setup. However I would like to avoid creating/deleting DRBD devices for 
each added/removed virtual machine. One possibile solution is to use 
ganeti, which automates resource and VM creation, right?



Now adding LV on top of DRBD is bit more complicated. I guess your
current setup is something like this?

  LV1 -> VM1

RAID -> DRBD -> PV -> VG -> LV2 -> VM2
  LV3 -> VM3
   ...

In this case when DRBD is in dual-primary the DLM/cLVM setup is
imperative so the LVMs know who has the write access. But then on VM
migration "something" needs to shutdown the VM on Node1 to release the
LVM lock and start the VM on Node2. Same as above, as long as each VM
is running on *only one* node you should be fine, the moment you start
it on both you will probably corrupt your VM. Software like Proxmox
should be able to help you on both points.

Which brings me to an important question I should have asked at the
very beginning actually: what do you use to manage the cluster?? (if
anything)


Pacemaker surely is a possibility, but for manual, non automated live 
migration, I should be fine with libvirt integrated locking: 
https://libvirt.org/locking.html. It avoid (at application level) the 
concurrent starting/running of any virtual machine.


Thank you very much, Igor.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Dual primary and LVM

2017-07-26 Thread Gionatan Danti

Hi all,
I have a possibly naive question about a dual primary setup involving 
LVM devices on top of DRBD.


The main question is: using cLVM or native LVM locking, can I safely use 
a LV block device on the first node, *close it*, and reopen it on the 
second one? No filesystem is involved and no host is expected to 
concurrently use the same LV.


Scenario: two CentOS 7 + DRBD 8.4 nodes with LVs on top of DRBD on top 
of a physical RAID array. Basically, DRBD replicate anything written to 
the specific hardware array.


Goal: having a redundant virtual machine setup, where vms can be live 
migrated between the two hosts.


Current setup: I currently run a single-primary, dual nodes setup, where 
the second host has no access at all to any LV. This setup worked very 
well in the past years, but it forbid using live migration (the 
secondary host has no access to the LV-based vdisk attached to the vms, 
so it is impossible to live migrate the running vms).


I thought to use a dual-primary setup to have the LVs available on 
*both* nodes, using a lock manager to arbitrate access to them.


How do you see such a solution? It is workable? Or would you recommend 
to use a clustered filesystem on top of the dual-primary DRBD device?


Thanks.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] DRBD Dual Primary (writable/writeble) setup over VDSL WAN links

2015-10-23 Thread Gionatan Danti

Hi list,
I have a question about the feasibility of a dual primary setup over two 
VDSL WAN links. I already searched the list and I found similar 
question, but without having a definitive conclusion.


My goal is to have a shared fileserver over two distant offices, using 
DRBD to create the illusion of a single block devices. I plan to use 
GFS2 (or OCFS) filesystem, exporting it to users (on both side) via 
Samaba shares.


Please note that:
1) both server are CentOS 6.x x86_64 machines, with DRBD packages 
provided by ElRepo (DRBD 8.4.x)
2) the main and remote offices are connected with relatively fast (~100 
Mb/s) but high latency (+30 ms RTT) VDSL links

3) write speed should be as high as possible

My current understanding (correct me if wrong) is that:
1) to not impair write speed, I should use protocol A
2) GFS2 (or any cluster filesystem) is quite sensitive to latency, so 
some operation (eg: metadata mangling, locking, reading an alredy 
remote-opened files) will be slowed down anyway

3) all will work (more or less) fine unless the WAN/VPN goes down.

It is the "WAN failed" scenario that puzzle me. Let me do some questions:
1) at the block/device level, will both hosts (being both primary nodes) 
continue to accept writes? If so, will they end in a "split brain" 
scenario, right?
2) will the higher level filesystem (eg: GFS2) will block until the 
remote connection is re-established?

3) what will happens at connection recover?

I understand that I can try myself (and I am going to do it, indeed), 
however I can only test on a fast LAN, not on a (relatively) slow WAN link.


Thank you all.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD Dual Primary (writable/writeble) setup over VDSL WAN links

2015-10-23 Thread Gionatan Danti


On 23/10/15 08:23, Digimer wrote:


No, it is not feasible.

Your cluster will block every time the network faults and will stay
blocked because the only path to fence the peer will go down with the WAN.

This is to say nothing about performance issues.



I understand.
Thank you very much.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user