Dear DRBD-users, after using successfully for many years the in-kernel module for two node setup, we've decided to pass to the version 9.x to use create clusters with more nodes, but we are facing a serious problem.
Our setup is pretty simple, a three node cluster (HV01, HV02 and HV03 each one with a 4 port ethernet controller cross-connecting each node with a dedicated port), 2x2TB SAS HDD in each node (RAID-1, /dev/sda). We are using an extendend partition (/dev/sda4) to create the partitions for backing the DRBD devices, as follows: /dev/sda4 210126848 3905945599 3695818752 1.7T 5 Extended /dev/sda5 210128896 566644735 356515840 170G 83 Linux /dev/sda6 566646784 1405507583 838860800 400G 83 Linux The OS is the latest Gentoo Linux with a vanilla 6.0.9 kernel (compiled from the original source code, not patches applied), while we are using the DRBD kernel module Version: 9.2.1 (api:2/proto:86-121), compiled from source without any patch/change. The three nodes are identical in hw and sw configuration (the whole / partition is cloned, just changed the network parameters) We create a resource as follows (no other configs present, global.conf empty): resource "r01" { protocol C; volume 0 { device minor 1; disk "/dev/sda5"; meta-disk internal; } volume 1 { device minor 2; disk "/dev/sda6"; meta-disk internal; } on HV01 { node-id 0; } on HV02 { node-id 1; } on HV03 { node-id 2; } connection { host HV01 address 172.31.255.1:7012; host HV02 address 172.31.255.2:7021; } connection { host HV01 address 172.30.255.1:7013; host HV03 address 172.30.255.3:7031; } connection { host HV02 address 172.29.255.2:7023; host HV03 address 172.29.255.3:7032; } } ===================================================== # drbdadm create-md --force r01 (on each node) # drbdadm up r01 (on each node) # drbdadm create-md --force r01 (on each node) # drbdadm new-current-uuid --clear-bitmap r01/0 (on the primary node HV01) # drbdadm new-current-uuid --clear-bitmap r01/1 (on the primary node HV01) # drbdadm primary r01 (on the primary node) # mkfs.ext4 /dev/drbd1 (on the primary node) # mount /dev/drbd1 /mnt/r01 (on the primary node) Everything is fine up to this point. The status is the following: HV01 ~ # drbdsetup status --verbose --statistics r01 node-id:0 role:Primary suspended:no force-io-failures:no write-ordering:flush volume:0 minor:1 disk:UpToDate backing_dev:/dev/sda5 quorum:yes size:178247004 read:0 written:0 al-writes:0 bm-writes:0 upper-pending:0 lower-pending:0 al-suspended:no blocked:no volume:1 minor:2 disk:UpToDate backing_dev:/dev/sda6 quorum:yes size:419404764 read:0 written:0 al-writes:0 bm-writes:0 upper-pending:0 lower-pending:0 al-suspended:no blocked:no HV02 node-id:1 connection:Connected role:Secondary congested:no ap-in-flight:0 rs-in-flight:0 volume:0 replication:Established peer-disk:UpToDate resync-suspended:no received:0 sent:0 out-of-sync:0 pending:0 unacked:0 volume:1 replication:Established peer-disk:UpToDate resync-suspended:no received:0 sent:0 out-of-sync:0 pending:0 unacked:0 HV03 node-id:2 connection:Connected role:Secondary congested:no ap-in-flight:0 rs-in-flight:0 volume:0 replication:Established peer-disk:UpToDate resync-suspended:no received:0 sent:0 out-of-sync:0 pending:0 unacked:0 volume:1 replication:Established peer-disk:UpToDate resync-suspended:no received:0 sent:0 out-of-sync:0 pending:0 unacked:0 HV02 ~ # drbdsetup status --verbose --statistics r01 node-id:1 role:Secondary suspended:no force-io-failures:no write-ordering:flush volume:0 minor:1 disk:UpToDate backing_dev:/dev/sda5 quorum:yes size:178247004 read:0 written:0 al-writes:0 bm-writes:0 upper-pending:0 lower-pending:0 al-suspended:no blocked:no volume:1 minor:2 disk:UpToDate backing_dev:/dev/sda6 quorum:yes size:419404764 read:0 written:0 al-writes:0 bm-writes:0 upper-pending:0 lower-pending:0 al-suspended:no blocked:no HV01 node-id:0 connection:Connected role:Primary congested:no ap-in-flight:0 rs-in-flight:0 volume:0 replication:Established peer-disk:UpToDate resync-suspended:no received:0 sent:0 out-of-sync:0 pending:0 unacked:0 volume:1 replication:Established peer-disk:UpToDate resync-suspended:no received:0 sent:0 out-of-sync:0 pending:0 unacked:0 HV03 node-id:2 connection:Connected role:Secondary congested:no ap-in-flight:0 rs-in-flight:0 volume:0 replication:Established peer-disk:UpToDate resync-suspended:no received:0 sent:0 out-of-sync:0 pending:0 unacked:0 volume:1 replication:Established peer-disk:UpToDate resync-suspended:no received:0 sent:0 out-of-sync:0 pending:0 unacked:0 HV03 ~ # drbdsetup status --verbose --statistics r01 node-id:2 role:Secondary suspended:no force-io-failures:no write-ordering:flush volume:0 minor:1 disk:UpToDate backing_dev:/dev/sda5 quorum:yes size:178247004 read:0 written:0 al-writes:0 bm-writes:0 upper-pending:0 lower-pending:0 al-suspended:no blocked:no volume:1 minor:2 disk:UpToDate backing_dev:/dev/sda6 quorum:yes size:419404764 read:0 written:0 al-writes:0 bm-writes:0 upper-pending:0 lower-pending:0 al-suspended:no blocked:no HV01 node-id:0 connection:Connected role:Primary congested:no ap-in-flight:0 rs-in-flight:0 volume:0 replication:Established peer-disk:UpToDate resync-suspended:no received:0 sent:0 out-of-sync:0 pending:0 unacked:0 volume:1 replication:Established peer-disk:UpToDate resync-suspended:no received:0 sent:0 out-of-sync:0 pending:0 unacked:0 HV02 node-id:1 connection:Connected role:Secondary congested:no ap-in-flight:0 rs-in-flight:0 volume:0 replication:Established peer-disk:UpToDate resync-suspended:no received:0 sent:0 out-of-sync:0 pending:0 unacked:0 volume:1 replication:Established peer-disk:UpToDate resync-suspended:no received:0 sent:0 out-of-sync:0 pending:0 unacked:0 But when starting a copy of (i.e.) a 10GB file, this error comes up immediately on both the secondary nodes: *[Tue Dec 6 16:18:12 2022] drbd r01/0 drbd1 HV01: ASSERTION d->bi_size <= DRBD_MAX_BIO_SIZE FAILED in read_in_block[Tue Dec 6 16:18:12 2022] drbd r01 HV01: error receiving P_DATA, e: -5 l: 2093056!* [Tue Dec 6 16:18:12 2022] drbd r01 HV01: conn( Connected -> ProtocolError ) peer( Primary -> Unknown ) [Tue Dec 6 16:18:12 2022] drbd r01/0 drbd1: disk( UpToDate -> Consistent ) [Tue Dec 6 16:18:12 2022] drbd r01/0 drbd1 HV01: pdsk( UpToDate -> DUnknown ) repl( Established -> Off ) [Tue Dec 6 16:18:12 2022] drbd r01/1 drbd2: disk( UpToDate -> Consistent ) [Tue Dec 6 16:18:12 2022] drbd r01/1 drbd2 HV01: pdsk( UpToDate -> DUnknown ) repl( Established -> Off ) [Tue Dec 6 16:18:12 2022] drbd r01 HV01: Terminating sender thread [Tue Dec 6 16:18:12 2022] drbd r01 HV01: Starting sender thread (from drbd_r_r01 [13658]) [Tue Dec 6 16:18:12 2022] drbd r01/1 drbd2 HV03: pdsk( UpToDate -> Consistent ) [Tue Dec 6 16:18:12 2022] drbd r01: Preparing cluster-wide state change 120227718 (1->-1 0/0) [Tue Dec 6 16:18:12 2022] drbd r01 HV01: Connection closed [Tue Dec 6 16:18:12 2022] drbd r01 HV01: helper command: /sbin/drbdadm disconnected [Tue Dec 6 16:18:12 2022] drbd r01 HV01: helper command: /sbin/drbdadm disconnected exit code 0 [Tue Dec 6 16:18:12 2022] drbd r01 HV01: conn( ProtocolError -> Unconnected ) [Tue Dec 6 16:18:12 2022] drbd r01 HV01: Restarting receiver thread [...] *[Tue Dec 6 16:18:13 2022] drbd r01/0 drbd1 HV01: ASSERTION d->bi_size <= DRBD_MAX_BIO_SIZE FAILED in read_in_block[Tue Dec 6 16:18:13 2022] drbd r01 HV01: error receiving P_DATA, e: -5 l: 2093056!* [Tue Dec 6 16:18:13 2022] drbd r01 HV01: conn( Connected -> ProtocolError ) peer( Primary -> Unknown ) [Tue Dec 6 16:18:13 2022] drbd r01/0 drbd1: disk( UpToDate -> Consistent ) [Tue Dec 6 16:18:13 2022] drbd r01/0 drbd1 HV01: pdsk( UpToDate -> DUnknown ) repl( Established -> Off ) [Tue Dec 6 16:18:13 2022] drbd r01/1 drbd2: disk( UpToDate -> Consistent ) [Tue Dec 6 16:18:13 2022] drbd r01/1 drbd2 HV01: pdsk( UpToDate -> DUnknown ) repl( Established -> Off ) [Tue Dec 6 16:18:13 2022] drbd r01 HV01: Terminating sender thread [Tue Dec 6 16:18:13 2022] drbd r01 HV01: Starting sender thread (from drbd_r_r01 [13627]) [Tue Dec 6 16:18:13 2022] drbd r01: Preparing cluster-wide state change 1546196417 (2->-1 0/0) [Tue Dec 6 16:18:13 2022] drbd r01/0 drbd1 HV02: pdsk( UpToDate -> Consistent ) [Tue Dec 6 16:18:13 2022] drbd r01 HV01: Connection closed [Tue Dec 6 16:18:13 2022] drbd r01 HV01: helper command: /sbin/drbdadm disconnected [Tue Dec 6 16:18:13 2022] drbd r01 HV01: helper command: /sbin/drbdadm disconnected exit code 0 [Tue Dec 6 16:18:13 2022] drbd r01 HV01: conn( ProtocolError -> Unconnected ) [Tue Dec 6 16:18:13 2022] drbd r01 HV01: Restarting receiver thread [...] The problem is 100% replicable each time, the only thing that changes is the d->bi_size value (it also changes in the other error messages during the copy). The DRBD_MAX_BIO_SIZE is 1048576. Copying the same file on the same net interface / partition with scp or other tools works perfectly, at maximum hw speed. I've tried: - changing the --peer-max-bio-size - changing the primary node - connecting only the primary node with a secondary node - checking the equality of max_sectors_kb, max_hw_sectors_kb, nr_request for both the physical disk and the drbd device - changing the MTU of the network interfaces Nothing seems to have an effect. Everytime an intensive write operation starts, this error immediately shows on each secondary node. Could someone give us some hint, we are totally stuck at the moment? Thank you in advance and best regards, Andrea
_______________________________________________ Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-user@lists.linbit.com https://lists.linbit.com/mailman/listinfo/drbd-user