Hi all,
I'm trying to join a new node into an existing 2-node cluster and it
seems to be broken somehow...
I'm using DRBD9 on Ubuntu 16.04 LTS:
ii drbd-dkms 9.0.9-1ppa1~xenial1 all
RAID 1 over TCP/IP for Linux module source
ii drbd-utils 9.1.1-1ppa1~xenial1 amd64
RAID 1 over TCP/IP for Linux (user utilities)
ii python-drbdmanage 0.99.10-1ppa1~xenial1
all DRBD distributed resource management utility
The existing 2-node cluster:
- Node hv2 / HW-Raid5 = 5x 1TB (512n) / LVM-Crypt / Drbd uses LvmThinLv
- Node hv3 / HW-Raid5 = 3x 2TB (512n) / LVM-Crypt / Drbd uses LvmThinLv
The new one:
- Node hv1 / 1x 6TB (4Kn) / LVM-Crypt / Drbd uses LvmThinLv
(actually tried with SW-Raid1 and had the same results before)
State before joining:
root@hv2 ~ # drbd-overview
Resources:
0:.drbdctrl/0 Connected(2*) Second/Primar UpToDa/UpToDa
1:.drbdctrl/1 Connected(2*) Second/Primar UpToDa/UpToDa
100:vm_dc2/0 Connected(2*) Primar/Second UpToDa/UpToDa
*dc2 sda scsi
101:vm_dc1/0 Connected(2*) Primar/Second UpToDa/UpToDa
*dc1 sda scsi
... some more in same state
Then I add hv1:
root@hv3 ~ # drbdmanage add-node hv1 192.168.42.2
In dmesg of hv1 I see:
[ 1186.205695] drbd .drbdctrl/0 drbd0: logical block size of local
backend does not match (drbd:512, backend:4096); was this a late attach?
[ 1186.205702] drbd .drbdctrl/0 drbd0: logical block sizes do not match
(me:512, peer:512); this may cause problems.
In dmesg of hv2 and hv3 I see:
[ 7765.951161] drbd .drbdctrl/0 drbd0: logical block sizes do not match
(me:512, peer:4096); this may cause problems.
[ 7765.951165] drbd .drbdctrl/0 drbd0: current Primary must NOT adjust
logical block size (512 -> 4096); hope for the best.
State still looks good:
root@hv1 ~ # drbd-overview
0:.drbdctrl/0 Connected(3*) Seco(hv1,hv2)/Prim(hv3)
UpTo(hv1)/UpTo(hv2,hv3)
1:.drbdctrl/1 Connected(3*) Seco(hv2,hv1)/Prim(hv3)
UpTo(hv1)/UpTo(hv2,hv3)
Then I assign a resource:
root@hv3:~# drbdmanage assign-resource vm_dc2 hv1
and hv1 ends diskless :-(
[ 4564.041711] drbd vm_dc2/0 drbd100 hv2: helper command: /sbin/drbdadm
before-resync-target
[ 4564.044700] drbd vm_dc2/0 drbd100 hv2: helper command: /sbin/drbdadm
before-resync-target exit code 0 (0x0)
[ 4564.044759] drbd vm_dc2/0 drbd100 hv2: repl( WFBitMapT -> SyncTarget )
[ 4564.044765] drbd vm_dc2/0 drbd100 hv3: resync-susp( no -> connection
dependency )
[ 4564.044978] drbd vm_dc2/0 drbd100 hv2: Began resync as SyncTarget
(will sync 16777216 KB [4194304 bits set]).
[ 4564.048031] drbd vm_dc2/0 drbd100 hv3: receive bitmap stats
[Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
[ 4564.050267] drbd vm_dc2/0 drbd100 hv3: send bitmap stats
[Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
[ 4564.050285] drbd vm_dc2/0 drbd100 hv3: helper command: /sbin/drbdadm
before-resync-target
[ 4564.053016] drbd vm_dc2/0 drbd100 hv3: helper command: /sbin/drbdadm
before-resync-target exit code 0 (0x0)
[ 4564.053074] drbd vm_dc2/0 drbd100 hv3: repl( WFBitMapT -> PausedSyncT )
[ 4564.053286] drbd vm_dc2/0 drbd100 hv3: Began resync as PausedSyncT
(will sync 16777216 KB [4194304 bits set]).
[ 4636.342184] sd 4:0:0:0: [sda] Bad block number requested
[ 4636.346976] drbd vm_dc2/0 drbd100: write: error=10 s=2050s
[ 4636.347015] drbd vm_dc2/0 drbd100: disk( Inconsistent -> Failed )
[ 4636.347021] drbd vm_dc2/0 drbd100 hv2: repl( SyncTarget -> Established )
[ 4636.347026] drbd vm_dc2/0 drbd100 hv3: repl( PausedSyncT ->
Established ) resync-susp( connection dependency -> no )
[ 4636.347063] drbd vm_dc2/0 drbd100: Local IO failed in
drbd_endio_write_sec_final. Detaching...
[ 4636.354204] drbd vm_dc2/0 drbd100: disk( Failed -> Diskless )
Any ideas what is going wrong?
Thanks and regards
Sebastian Hasait
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user