Re: [DRBD-user] DRBD 8.0 life cycle

2021-08-26 Thread Michael Hierweck

Hi Lars,

On 26.08.21 15:03, Lars Ellenberg wrote:


As we put on our web page at https://linbit.com/solutions-rfq/
DRBD 8 < 8.4 is end of life.
DRBD 8.4 has left "active maintenance" years ago, but needs to be supported
while existing customers pay for existing deployments.  The "relevant" platform
here is RHEL 7, which ends its "Extended Lifecycle Support" in 2026,
so that's the end date for our 8.4 support as well.


DRBD 8.4 (at least the kernel module) is part of the vanilla Linux kernel. Will DRBD be removed 
from the upstream kernel or replaced with DRBD 9.x? Is there any ongoing work/progress related 
to this?


Regards,

Michael
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] DRBD 9: UpToDate but verify reports resource to be almost completely out of sync, no resynchronisation on disconnect/connect however

2021-05-06 Thread Michael Hierweck

Hi all,

In our three node, single master DRBD cluster we experienced the following 
situation.



Check state:

root@host101010:/home/admins# drbdadm status vm
vm role:Primary
 disk:UpToDate
 host101011 role:Secondary
   peer-disk:UpToDate
 host101020 role:Secondary
   peer-disk:UpToDate

Everything looks nice.



Let's perform a verification:

root@host101010:/home/admins# drbdadm verify vm

Running:

root@host101010:/home/admins# drbdadm status vm
vm role:Primary
 disk:UpToDate
 host101011 role:Secondary
   replication:VerifyS peer-disk:UpToDate done:0.09
 host101020 role:Secondary
   replication:VerifyS peer-disk:UpToDate done:0.11



The results are remarkable:

[Wed May  5 11:07:55 2021] drbd vm/0 drbd0 host101011: Online verify done (total 7112 
sec; paused 0 sec; 29484 K/sec)
[Wed May  5 11:07:55 2021] drbd vm/0 drbd0 host101011: Online verify found 1 4k blocks 
out of sync!

[Wed May  5 11:07:55 2021] drbd vm/0 drbd0 host101011: repl( VerifyS -> 
Established )
[Wed May  5 11:07:55 2021] drbd vm/0 drbd0 host101011: helper command: /sbin/drbdadm 
out-of-sync
[Wed May  5 11:07:55 2021] drbd vm/0 drbd0 host101011: helper command: /sbin/drbdadm 
out-of-sync exit code 0
[Wed May  5 11:11:05 2021] drbd vm/0 drbd0 host101020: Online verify done (total 7302 
sec; paused 0 sec; 28720 K/sec)
[Wed May  5 11:11:05 2021] drbd vm/0 drbd0 host101020: Online verify found 41790818 4k 
blocks out of sync!

[Wed May  5 11:11:05 2021] drbd vm/0 drbd0 host101020: repl( VerifyS -> 
Established )
[Wed May  5 11:11:05 2021] drbd vm/0 drbd0 host101020: helper command: /sbin/drbdadm 
out-of-sync
[Wed May  5 11:11:05 2021] drbd vm/0 drbd0 host101020: helper command: /sbin/drbdadm 
out-of-sync exit code 0


Verify claims almost the whole 150G device to be out of sync.


Let's perform disconnect/connect:

[Wed May  5 11:44:30 2021] drbd vm: Preparing cluster-wide state change 1977682193 (0->1 
496/16)
[Wed May  5 11:44:30 2021] drbd vm: State change 1977682193: primary_nodes=1, 
weak_nodes=FFFA

[Wed May  5 11:44:30 2021] drbd vm: Committing cluster-wide state change 
1977682193 (0ms)
[Wed May  5 11:44:30 2021] drbd vm1054 host101011: conn( Connected -> Disconnecting ) peer( 
Secondary -> Unknown )
[Wed May  5 11:44:30 2021] drbd vm/0 drbd0 host101011: pdsk( UpToDate -> DUnknown ) 
repl( Established -> Off )

[Wed May  5 11:44:30 2021] drbd vm host101011: ack_receiver terminated
[Wed May  5 11:44:30 2021] drbd vm host101011: Terminating ack_recv thread
[Wed May  5 11:44:30 2021] drbd vm host101011: Restarting sender thread
[Wed May  5 11:44:30 2021] drbd vm host101011: Connection closed
[Wed May  5 11:44:30 2021] drbd vm host101011: helper command: 
/sbin/drbdadm disconnected
[Wed May  5 11:44:30 2021] drbd vm host101011: helper command: /sbin/drbdadm disconnected 
exit code 0

[Wed May  5 11:44:30 2021] drbd vm host101011: conn( Disconnecting -> 
StandAlone )
[Wed May  5 11:44:30 2021] drbd vm host101011: Terminating receiver thread
[Wed May  5 11:44:30 2021] drbd vm: Preparing cluster-wide state change 2927240878 (0->2 
496/16)
[Wed May  5 11:44:30 2021] drbd vm: State change 2927240878: primary_nodes=1, 
weak_nodes=FFFE

[Wed May  5 11:44:30 2021] drbd vm host101020: Cluster is now split
[Wed May  5 11:44:30 2021] drbd vm: Committing cluster-wide state change 
2927240878 (0ms)
[Wed May  5 11:44:30 2021] drbd vm host101020: conn( Connected -> Disconnecting ) peer( 
Secondary -> Unknown )
[Wed May  5 11:44:30 2021] drbd vm/0 drbd0 host101020: pdsk( UpToDate -> DUnknown ) 
repl( Established -> Off )

[Wed May  5 11:44:30 2021] drbd vm host101020: ack_receiver terminated
[Wed May  5 11:44:30 2021] drbd vm host101020: Terminating ack_recv thread
[Wed May  5 11:44:30 2021] drbd vm host101020: Restarting sender thread
[Wed May  5 11:44:30 2021] drbd vm host101020: Connection closed
[Wed May  5 11:44:30 2021] drbd vm host101020: helper command: 
/sbin/drbdadm disconnected
[Wed May  5 11:44:30 2021] drbd vm host101020: helper command: /sbin/drbdadm disconnected 
exit code 0

[Wed May  5 11:44:30 2021] drbd vm host101020: conn( Disconnecting -> 
StandAlone )
[Wed May  5 11:44:30 2021] drbd vm host101020: Terminating receiver thread
[Wed May  5 11:44:33 2021] drbd vm host101011: conn( StandAlone -> 
Unconnected )
[Wed May  5 11:44:33 2021] drbd vm host101011: Starting receiver thread (from drbd_w_vm 
[38063])

[Wed May  5 11:44:33 2021] drbd vm host101011: conn( Unconnected -> 
Connecting )
[Wed May  5 11:44:33 2021] drbd vm host101020: conn( StandAlone -> 
Unconnected )
[Wed May  5 11:44:33 2021] drbd vm host101020: Starting receiver thread (from drbd_w_vm 
[38063])

[Wed May  5 11:44:33 2021] drbd vm host101020: conn( 

Re: [DRBD-user] Control syncer speed

2021-03-31 Thread Michael Hierweck

Hi all,

On 31.03.21 11:32, Michael Hierweck wrote:

Hi all,

with DRBD 9.0.25 we experience problem with the syncer rate.

Using default settings our IO system gets overloaded, especially when multiple resources are be 
resynced in parallel.


Neither

drbdadm peer-device-options --c-plan-ahead=X --c-max-sync-rate=Y 
drbdadm peer-device-options --c-plan-ahead=0 --resync-rate=Y 

seem take any effect even when setting Y to very low values, such as 1M.


we ware able to slow down DRBD sync by applying:

drbdadm peer-device-options --c-plan-ahead=0 --resync-rate=100 

(Yes, 100, not 100M.)

Syslog:

Resync done (total 1212 sec; paused 0 sec; 43256 K/sec)

I cannot explain how this setting fits with the result.

Thanks in advance,

Michael
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Control syncer speed

2021-03-31 Thread Michael Hierweck

Hi all,

with DRBD 9.0.25 we experience problem with the syncer rate.

Using default settings our IO system gets overloaded, especially when multiple resources are be 
resynced in parallel.


Neither

drbdadm peer-device-options --c-plan-ahead=X --c-max-sync-rate=Y 
drbdadm peer-device-options --c-plan-ahead=0 --resync-rate=Y 

seem take any effect even when setting Y to very low values, such as 1M.

Thanks in advance,

Michael
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Is it possible to make DRBD open the backing device with O_DIRECT?

2021-01-11 Thread Michael Hierweck

Hi all,

is it possible to make DRBD open the backing device with O_DIRECT?

Thanks in advance,

Michael
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Suspicious Performance Gain

2019-06-21 Thread Michael Hierweck
Hi all,

we use hardware Raid controllers with BBU and 2.5" SAS3 drives in RAID 10. On 
top we use LVM
fat provisioning. DRBD is 8.4.

1) HWR10 => LVM => XFS
2) HWR10 => LVM => DRBD "C" => QEMU/Virtio => XFS

DRBD settings:
disk-flushes = no
md-flushes = no

Virtio settings:
cache = none

Performing benchmarks with fio (direct=1, sync=1, bs=1k,2k,4k,...,128k) leads 
to suspicious.

While 1) leads to expected results such as 200-400 IOPS on random writes 
(expected value) 2)
leads to about 4 times better results.

(I did the same benchmarks some years ago on the same hardware but with an 
older software stack
and remember that DRBD protocol "C" lead to a performance decrease of about 
factor 2. Factor 2
seems to be a good value for synchronous replication.)

What's going wrong here?

Thanks in advance,

Michael
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbd 8.4.7 out-of-sync on disabled host page cache in VM

2019-04-23 Thread Michael Hierweck
On 23.04.19 09:28, Michael Hierweck wrote:
> On 23.04.19 09:05, Armin Schindler wrote:
>> On 20.04.2019 14:38, a...@sysgo.com wrote:
>>>> On 13 March 2019 at 11:47 Roland Kammerer  
>>>> wrote:
>>>>
>>>>
>>>> On Tue, Mar 12, 2019 at 09:08:42AM +0100, Armin Schindler wrote:
>>>>> On 3/11/19 1:42 PM, Roland Kammerer wrote:
>>>>>> On Mon, Mar 11, 2019 at 11:13:11AM +0100, Armin Schindler wrote:
>>>>>>> 2 hosts Debian 9 (stretch) with default DRBD version 8.4.7.
>>>>>>
>>>>>> Please retry with the current 8.4.11 version of DRBD. You can it from
>>>>>> here:
>>>>>> https://www.linbit.com/en/drbd-community/drbd-download/
>>>>>
>>>>> Okay, thanks. I will test 8.4.11.
>>>>>
>>>>> Do I need to change/update the tools as well or just the kernel driver?
>>>>> I currently use drbd-utils 8.9.10.
>>>>
>>>> They should be fine. I don't remember any non-corner cases fixes for 8.4
>>>> in drbd-utils.
>>>
>>> I tried version 8.4.11 and the problem persists.
>>> When using Qemu/KVM virtio disk with a caching mode that uses host page 
>>> cache,
>>> or when using just a filesystem like ext4 on (without Qemu/KVM) on the 
>>> host, the
>>> drbd device gets out of sync after a while.
> 
> Same here:
> 
> LVM (thick) => DRBD => Virtio (cache=none or cache=directsync)
> 
> After some weeks of running about 80 VMs on 4 nodes, some of the VM backings 
> report out of sync
> blocks. We are running an active/passive cluster with locally attached 
> storage.
> 
> We were not able to reproduce this behaviour when using cache="writethrough" 
> or cache="writeback".
> 
> We are running this setup since 2011/2012. The first years we were fine but 
> about 3 years ago
> we run into serious trouble because out-of-sync blocks lead to damaged file 
> system (journals).
> 
> The issue was discussed in 2014:
> 
> https://lists.gt.net/drbd/users/25227
> 
> We love(d) DRBD because of its simplicity and reliability. (Ceph is much more 
> complex...)
> However we wonder whether DRBD can still be considered that kind of "simple 
> and reliable" it
> was some years ago.
> 
> Even if the situation might be introduced by virtio block driver 
> optimizations some years ago
> (no stable pages anymore?) a solution is needed.

Remark:

I would expect stacks such as "rdb => virtio" and maybe "ZFS ZVOL => virtio" to 
require stable
pages, too, for the same reason as DRBD does: checksum calculation.

http://lkml.iu.edu/hypermail/linux/kernel/1511.0/04497.html

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbd 8.4.7 out-of-sync on disabled host page cache in VM

2019-04-23 Thread Michael Hierweck
On 23.04.19 09:05, Armin Schindler wrote:
> On 20.04.2019 14:38, a...@sysgo.com wrote:
>>> On 13 March 2019 at 11:47 Roland Kammerer  
>>> wrote:
>>>
>>>
>>> On Tue, Mar 12, 2019 at 09:08:42AM +0100, Armin Schindler wrote:
 On 3/11/19 1:42 PM, Roland Kammerer wrote:
> On Mon, Mar 11, 2019 at 11:13:11AM +0100, Armin Schindler wrote:
>> 2 hosts Debian 9 (stretch) with default DRBD version 8.4.7.
>
> Please retry with the current 8.4.11 version of DRBD. You can it from
> here:
> https://www.linbit.com/en/drbd-community/drbd-download/

 Okay, thanks. I will test 8.4.11.

 Do I need to change/update the tools as well or just the kernel driver?
 I currently use drbd-utils 8.9.10.
>>>
>>> They should be fine. I don't remember any non-corner cases fixes for 8.4
>>> in drbd-utils.
>>
>> I tried version 8.4.11 and the problem persists.
>> When using Qemu/KVM virtio disk with a caching mode that uses host page 
>> cache,
>> or when using just a filesystem like ext4 on (without Qemu/KVM) on the host, 
>> the
>> drbd device gets out of sync after a while.

Same here:

LVM (thick) => DRBD => Virtio (cache=none or cache=directsync)

After some weeks of running about 80 VMs on 4 nodes, some of the VM backings 
report out of sync
blocks. We are running an active/passive cluster with locally attached storage.

We were not able to reproduce this behaviour when using cache="writethrough" or 
cache="writeback".

We are running this setup since 2011/2012. The first years we were fine but 
about 3 years ago
we run into serious trouble because out-of-sync blocks lead to damaged file 
system (journals).

The issue was discussed in 2014:

https://lists.gt.net/drbd/users/25227

We love(d) DRBD because of its simplicity and reliability. (Ceph is much more 
complex...)
However we wonder whether DRBD can still be considered that kind of "simple and 
reliable" it
was some years ago.

Even if the situation might be introduced by virtio block driver optimizations 
some years ago
(no stable pages anymore?) a solution is needed.


Michael

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Complete decoupling of LINSTOR from DRBD (Q1 2019)

2018-11-23 Thread Michael Hierweck
Hi all,

Linbit announces the complete decoupling of LINSTOR from DRBD (Q1 2019).

https://www.linbit.com/en/linstor/

Does this mean Linbit will abandon DRBD? (What technologies might replace or 
co-exist DBRD
in an LINSTOR cluster?)

Best regards,

Michael
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] DRBD on ZVOL

2018-05-12 Thread Michael Hierweck
Hi everybody,

would anyone tell me which settings regarding disk flushes (barriers,
flushes, drain) are recommended when using ZVOL as backing device for
DRBD? Are there recommendation towards ZFS settings such as disabling
primary or secondary caches?

Of course, reliability is more important than speed.

Thanks in advance,

Michael
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] DRBD and LVM Snapshots

2017-05-14 Thread Michael Hierweck
Hi all,

up to now to use LVM (thick) provisioning to provide virtual disks for
VMs: RAID => LVM => LV => DRBD => KVM/QEMU (virtio).

We would like to be able to take snapshots from our virtual disks in
order to be able to use them as readonly source to consistent backups.
Therefore we would need to introduce a LVM thin provisioning layer
somewhere.

A. TP below DRBD
RAID => LVM => LVM-TP => LV => DRBD => KVM/QEMU (virtio)

- simple setup
- I suppose going back to an older snapshost would this break DRBD and
require a complete resync/verify run

B. TP on top of DRBD
RAID => LVM => LV => DRBD => LVM => LVM-TP => LV => KVM/QEMU (virtio)
- complex setup
- more layers affect reliability and performance
- snapshots are replicated and consistent
- additional diskspace for snaphosts is limited by the DRBD volume size
an connot be shared across volumes.


I would like to discuss the pros and the cons of both approaches while
considering internal vs. external metadata.


Thanks in advance

Michael
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user