Re: [DRBD-user] Removing DRBD Kernel Module Blocks

Andrew Martin Fri, 27 Jan 2012 14:20:06 -0800

Hi Felix,


>> Jan 26 15:44:14 node1 kernel: [177694.517283] block drbd0: Requested 
>> state change failed by peer : Refusing to be Primary while peer is not 
>> outdated (-7) 

> This is odd. I don't think DRBD should attempt to become primary when 
> you issue a stop command/ 
I agree. I don't understand why this happens when I attempt to stop it and 
remove the module. Does anyone know what this error means and why it would 
occur when attempting to stop DRBD? 


> Shouldn't pacemaker be stopping and starting this service for you? 
It is, however I discovered that entries for DRBD still existed in /etc/rc* - I 
removed those so now pacemaker is the only way to start/stop DRBD. 


> I'm not sure it's normal for DRBD to outdate its disk on disconnection, but 
> it does seem to make sense. 
This is probably because I have configured resource-level fencing using dopd. I 
believe the peer would fence this node once it loses connection 


After removing the /etc/rc*drbd entries and rebooting the nodes a couple of 
times, I am now able to stop heartbeat, which stops DRBD successfully. Now, 
however, one of my DRBD resources (ms_drbd_mount2) will not promote to master: 

Online: [ node1 ] 
OFFLINE: [ node2 ] 


Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1] 
Masters: [ node1 ] 
Stopped: [ p_drbd_mount1:1 ] 
Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2] 
Slaves: [ node1 ] 
Stopped: [ p_drbd_mount2:1 ] 
Resource Group: g_apache 
p_fs_varwww (ocf::heartbeat:Filesystem): Started node1 
p_apache (ocf::heartbeat:apache): Started node1 
Resource Group: g_mount1 
p_fs_mount1 (ocf::heartbeat:Filesystem): Started node1 
p_ip_nfs (ocf::heartbeat:IPaddr2): Started node1 


Any attempt to start it via crm resource [promote|start|stop|cleanup] does 
nothing. I am able to manually set the DRBD resource as primary. I took node2 
offline in the hopes that it would start with just one node active, but it 
still remains slave. I see some error messages in the log about migrating the 
resource from node2: 
pengine: [30681]: WARN: common_apply_stickiness: Forcing ms_drbd_crm away from 
node after 1000000 failures (max=1000000) 


However, shouldn't it have migrated already when that node went offline? How 
can I what is preventing the DRBD resource from being promoted? The syslog 
contains 

crmd: [30323]: info: te_rsc_command: Initiating action 43: monitor 
p_drbd_mount2:0_monitor_30000 on node1 (local) 
crmd: [30323]: info: do_lrm_rsc_op: Performing 
key=43:111:0:f84ff0aa-9a17-4b66-954d-8c3011a3441e 
op=p_drbd_mount2:0_monitor_30000 ) 
lrmd: [30320]: info: rsc:p_drbd_mount2:0 monitor[192] (pid 14960) 
lrmd: [30320]: info: operation monitor[192] on p_drbd_mount2:0 for client 
30323: pid 14960 exited with return code 0 
crmd: [30323]: info: process_lrm_event: LRM operation 
p_drbd_mount2:0_monitor_30000 (call=192, rc=0, cib-update=619, confirmed=false) 
ok 
crmd: [30323]: info: match_graph_event: Action p_drbd_mount2:0_monitor_30000 
(43) confirmed on node1 (rc=0) 
pengine: [30681]: notice: unpack_rsc_op: Operation 
p_drbd_mount1:0_last_failure_0 found resource p_drbd_mount1:0 active in master 
mode on node1 
pengine: [30681]: notice: unpack_rsc_op: Operation 
p_drbd_mount2:0_last_failure_0 found resource p_drbd_mount2:0 active on node1 
pengine: [30681]: notice: common_apply_stickiness: ms_drbd_mount1 can fail 
999998 more times on node2 before being forced off 
pengine: [30681]: notice: common_apply_stickiness: ms_drbd_mount1 can fail 
999998 more times on node2 before being forced off 
pengine: [30681]: WARN: common_apply_stickiness: Forcing ms_drbd_mount2 away 
from node2 after 1000000 failures (max=1000000) 
pengine: [30681]: WARN: common_apply_stickiness: Forcing ms_drbd_mount2 away 
from node2 after 1000000 failures (max=1000000) 
pengine: [30681]: notice: LogActions: Leave p_drbd_mount1:0#011(Master node1) 
pengine: [30681]: notice: LogActions: Leave p_drbd_mount1:1#011(Stopped) 
pengine: [30681]: notice: LogActions: Leave p_drbd_mount2:0#011(Slave node1) 
pengine: [30681]: notice: LogActions: Leave p_drbd_mount2:1#011(Stopped) 


I've attached my configuration (as outputted by crm configure show). 


Thanks, 


Andrew 


----- Original Message -----

From: "Felix Frank" < [email protected] > 
To: "Andrew Martin" < [email protected] > 
Sent: Friday, January 27, 2012 2:52:05 AM 
Subject: Re: [DRBD-user] Removing DRBD Kernel Module Blocks 

Hi, 

On 01/26/2012 11:18 PM, Andrew Martin wrote: 
> I am using DRBD with pacemaker+heartbeat for a HA cluster. There are no 

fair choice. 

> mounted filesystems at this time. Below is a copy of the kernel log 

So the DRBDs are idle and managed by pacemaker, correct? 

> after I attempted to stop the drbd service: 

Shouldn't pacemaker be stopping and starting this service for you? 

> Jan 26 15:44:14 node1 kernel: [177694.517283] block drbd0: Requested 
> state change failed by peer : Refusing to be Primary while peer is not 
> outdated (-7) 

This is odd. I don't think DRBD should attempt to become primary when 
you issue a stop command/ 

> Jan 26 15:44:14 node1 kernel: [177694.873466] block drbd0: peer( Primary 
> -> Unknown ) conn( Connected -> Disconnecting ) disk( UpToDate -> 
> Outdated ) pdsk( UpToDate -> DUnknown ) 

I'm not sure it's normal for DRBD to outdate its disk on disconnection, 
but it does seem to make sense. 

> Jan 26 15:44:14 node1 kernel: [177695.209668] block drbd0: disk( 
> Outdated -> Diskless ) 

This looks funny as well. But may just be correct. 

Do you stop pacemaker before stopping DRBD? 
What happens if you disable pacemaker, drbdadm down all and then stop DRBD? 

Regards, 
Felix

primitive p_apache ocf:heartbeat:apache \
        params configfile="/etc/apache2/apache2.conf" 
statusurl="http://localhost/server-status"; \
        op monitor interval="40s"
primitive p_drbd_mount2 ocf:linbit:drbd \
        params drbd_resource="mount2" \
        op monitor interval="15" role="Master" \
        op monitor interval="30" role="Slave"
primitive p_drbd_mount1 ocf:linbit:drbd \
        params drbd_resource="mount1" \
        op monitor interval="15" role="Master" \
        op monitor interval="30" role="Slave"
primitive p_exportfs_mount2 ocf:heartbeat:exportfs \
        params fsid="2" directory="/mnt/storage/mount2" options="rw,mountpoint" 
clientspec="192.168.0.0/255.255.255.0" wait_for_leasetime_on_stop="true" \
        op start interval="0" timeout="40" \
        op stop interval="0" timeout="10" \
        op monitor interval="30s" timeout="40"
primitive p_exportfs_mount1 ocf:heartbeat:exportfs \
        params fsid="1" directory="/mnt/storage/mount1" options="rw,mountpoint" 
clientspec="192.168.0.0/255.255.255.0" wait_for_leasetime_on_stop="true" \
        op start interval="0" timeout="40" \
        op stop interval="0" timeout="10" \
        op monitor interval="30s" timeout="40"
primitive p_fs_mount2 ocf:heartbeat:Filesystem \
        params device="/dev/drbd1" directory="/mnt/storage/mount2" 
fstype="ext3" \
        op monitor interval="10s"
primitive p_fs_mount1 ocf:heartbeat:Filesystem \
        params device="/dev/drbd0" directory="/mnt/storage/mount1" 
fstype="ext3" \
        op monitor interval="10s"
primitive p_fs_varwww ocf:heartbeat:Filesystem \
        params device="/mnt/storage/mount1/var/www" directory="/var/www" 
fstype="none" options="bind" \
        op monitor interval="10s"
primitive p_ip_nfs ocf:heartbeat:IPaddr2 \
        params ip="10.52.4.8" cidr_netmask="16" \
        op monitor interval="30s" \
        meta is-managed="true"
primitive p_lsb_nfsserver lsb:nfs-kernel-server \
        op monitor interval="30s" \
        meta multiple-active="stop_start" target-role="Stopped" 
is-managed="false"
primitive p_lsb_nmb lsb:nmbd \
        op monitor interval="30s"
primitive p_lsb_smb lsb:smbd \
        op monitor interval="30s"
group g_apache p_fs_varwww p_apache \
        meta target-role="Started"
group g_nfs p_fs_mount2 p_lsb_smb p_lsb_nmb p_lsb_nfsserver p_exportfs_mount1 
p_exportfs_mount2 \
        meta target-role="Stopped"
group g_mount1 p_fs_mount1 p_ip_nfs \
        meta target-role="Started"
ms ms_drbd_mount2 p_drbd_mount2 \
        meta master-max="1" master-node-max="1" clone-max="2" 
clone-node-max="1" notify="true" target-role="Master" is-managed="true"
ms ms_drbd_mount1 p_drbd_mount1 \
        meta master-max="1" master-node-max="1" clone-max="2" 
clone-node-max="1" notify="true" target-role="Master"
colocation c_mount2_drbd-fileservers inf: ms_drbd_mount2:Master g_nfs
colocation c_mount1_drbd-mount1 inf: ms_drbd_mount1:Master g_mount1
colocation c_mount1_mount1-apache inf: g_mount1 g_apache
order o_mount2_drbd-fileservers inf: ms_drbd_mount2 g_nfs
order o_mount1_drbd-mount1 inf: ms_drbd_mount1 g_mount1
order o_mount1_mount1-apache inf: g_mount1 g_apache
order o_mount1_mount1fs-before-exportfs inf: p_fs_mount1 p_exportfs_mount1
property $id="cib-bootstrap-options" \
        dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
        cluster-infrastructure="Heartbeat" \
        stonith-enabled="false" \
        no-quorum-policy="ignore" \
        last-lrm-refresh="1327700806"
rsc_defaults $id="rsc-options" \
        resource-stickiness="200"

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Removing DRBD Kernel Module Blocks

Reply via email to