Hi, On Fri, Mar 23, 2018 at 9:01 AM, Lozenkov Sergei <slozen...@gmail.com> wrote:
> Hello. > I have two Debian 9 servers with configured Corosync-Pacemaker-DRBD. All > work well for month. > After some servers issues (with reboots) I have situation that pacemaker > could not switch drbd node with such errors: > > Mar 16 06:25:11 [877] nfs01-az-eus.tech-corps.com lrmd: notice: > operation_finished: drbd_nfs_stop_0:3667:stderr [ 1: State change > failed: (-12) Device is held open by someone ] > > Mar 16 06:25:11 [877] nfs01-az-eus.tech-corps.com lrmd: notice: > operation_finished: drbd_nfs_stop_0:3667:stderr [ Command 'drbdsetup-84 > secondary 1' terminated with exit code 11 ] > > Mar 16 06:25:11 [877] nfs01-az-eus.tech-corps.com lrmd: info: > log_finished: finished - rsc:drbd_nfs action:stop call_id:47 pid:3667 > exit-code:1 exec-time:20002ms queue-time:0ms > > Mar 16 06:25:11 [880] nfs01-az-eus.tech-corps.com crmd: error: > process_lrm_event: Result of stop operation for drbd_nfs on > nfs01-az-eus.tech-corps.com: Timed Out | call=47 key=drbd_nfs_stop_0 > timeout=20000ms > > Mar 16 06:25:11 [880] nfs01-az-eus.tech-corps.com crmd: notice: > process_lrm_event: nfs01-az-eus.tech-corps.com-drbd_nfs_stop_0:47 [ > 1: State change failed: (-12) Device is held open by someone\nCommand > 'drbdsetup-84 secondary 1' terminated with exit code 11\n1: State change > failed: (-12) Device is held open by someone\nCommand 'drbdsetup-84 > secondary 1' terminated with exit code 11\n1: State change failed: (-12) > Device is held open by someone\nCommand 'drbdsetup-84 secondary 1' > terminated with exit > > I tried to resolve the issue with many googled receipts but all attempts > were unsuccessful. > As well I have another two node cluster with exactly the same > configuration and it works without any issues. > > Right now I placed nodes to standby mode and manually raised all services. > Please, could You help me to analyze and solve the problem? > Thanks > > Here are my configuration files: > --- CRM CONFIG --- > crm configure show > node 171049224: nfs01-az-eus.tech-corps.com \ > attributes standby=off > node 171049225: nfs02-az-eus.tech-corps.com \ > attributes standby=on > primitive drbd_nfs ocf:linbit:drbd \ > params drbd_resource=nfs \ > op monitor interval=29s role=Master \ > op monitor interval=31s role=Slave > primitive fs_nfs Filesystem \ > params device="/dev/drbd1" directory="/data" fstype=ext4 \ > meta is-managed=true > primitive nfs lsb:nfs-kernel-server \ > op monitor interval=5s > primitive nmbd lsb:nmbd \ > op monitor interval=5s > primitive smbd lsb:smbd \ > op monitor interval=5s > group NFS fs_nfs nfs nmbd smbd > ms ms_drbd_nfs drbd_nfs \ > meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 > notify=true > order fs-nfs-before-nfs inf: fs_nfs:start nfs:start > order fs-nfs-before-nmbd inf: fs_nfs:start nmbd:start > order fs-nfs-before-smbd inf: fs_nfs:start smbd:start > order ms-drbd-nfs-before-fs-nfs inf: ms_drbd_nfs:promote fs_nfs:start > colocation ms-drbd-nfs-with-ha inf: ms_drbd_nfs:Master NFS > order nmbd-before-smbd inf: nmbd:start smbd:start > property cib-bootstrap-options: \ > have-watchdog=false \ > dc-version=1.1.16-94ff4df \ > cluster-infrastructure=corosync \ > cluster-name=debian \ > stonith-enabled=false \ > no-quorum-policy=ignore > > > > --- DRBD GLOBAL --- > cat /etc/drbd.d/global_common.conf | grep -v '#' > > global { > usage-count no; > } > > common { > protocol C; > > handlers { > > } > > startup { > } > > options { > } > > disk { > } > > net { > } > } > > > --- DRBD -RESOURCE --- > cat /etc/drbd.d/nfs.res | grep -v '#' > resource nfs{ > meta-disk internal; > device /dev/drbd1; > syncer { > verify-alg sha1; > rate 100M; > } > > net{ > max-buffers 8000; > max-epoch-size 8000; > unplug-watermark 16; > sndbuf-size 0; > } > > disk{ > disk-barrier no; > disk-flushes no; > } > > on nfs01-az-eus.tech-corps.com{ > disk /dev/sdc1; > address 10.50.1.8:7789; > } > > on nfs02-az-eus.tech-corps.com{ > disk /dev/sdc1; > address 10.50.1.9:7789; > } > } > > > > > -- > Segey L > > _______________________________________________ > drbd-user mailing list > drbd-user@lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user > > Did you check with fuser what is holding the device/filesystem busy?
_______________________________________________ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user