Hi, this is $cat /proc/drbd version: 8.3.11 (api:88/proto:86-96) srcversion: DA5A13F16DE6553FC7CE9B2
1: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r----- ns:0 nr:0 dw:0 dr:1616 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:327516 i tryed to mount drbd resource by hand with: mount /dev/drbd/by-res/clusterdata /mnt/cluster and with mount /dev/drbd/by-disk/mapper/turrel-cluster_storage /mnt/cluster and with mount /dev/drbd1 /mnt/cluster each make this log entrys: Feb 16 15:00:52 turrel kernel: [80365.686822] dlm_new_lockspace error -512 Feb 16 15:00:52 turrel kernel: [80539.590344] GFS2: fsid=: Trying to join cluster "lock_dlm", "tumba:data" Feb 16 15:00:52 turrel kernel: [80539.603545] dlm: Using TCP for communications Feb 16 15:00:52 turrel dlm_controld[855]: process_uevent online@ error -17 errno 11 both tasks hang, only kill -9 can help after killing task i have this log entry: Feb 16 15:02:50 turrel kernel: [80657.576111] dlm: data: group join failed -512 0 I can check gfs2 filesystem with: fsck.gfs2 /dev/drbd1 Initializing fsck Validating Resource Group index. Level 1 RG check. (level 1 passed) Starting pass1 Pass1 complete Starting pass1b Pass1b complete Starting pass1c Pass1c complete Starting pass2 Pass2 complete Starting pass3 Pass3 complete Starting pass4 Pass4 complete Starting pass5 Pass5 complete gfs2_fsck complete So, whart is going wrong? I can't get it. > Hi, > I have a trouble with my test configuration. > I build an Actice/Active cluster Ubuntu(11.10)+DRBD+Cman+Pacemaker+gfs2+Xen > for test purpose. > Now i am doing some tests with availability. I am try to start cluster on > one node. > > Trouble is - the Filesystem primitive ClusterFS (fs type=gfs2) does not start > when one of two nodes is switched off. > > Here my configuration: > > node blaster \ > attributes standby="off" > node turrel \ > attributes standby="off" > primitive ClusterData ocf:linbit:drbd \ > params drbd_resource="clusterdata" \ > op monitor interval="60s" > primitive ClusterFS ocf:heartbeat:Filesystem \ > params device="/dev/drbd/by-res/clusterdata" directory="/mnt/cluster" > fstype="gfs2" \ > op start interval="0" timeout="60s" \ > op stop interval="0" timeout="60s" \ > op monitor interval="60s" timeout="60s" > primitive ClusterIP ocf:heartbeat:IPaddr2 \ > params ip="192.168.122.252" cidr_netmask="32" > clusterip_hash="sourceip" \ > op monitor interval="30s" > primitive SSH-stonith stonith:ssh \ > params hostlist="turrel blaster" \ > op monitor interval="60s" > primitive XenDom ocf:heartbeat:Xen \ > params xmfile="/etc/xen/xen1.example.com.cfg" \ > meta allow-migrate="true" is-managed="true" target-role="Stopped" \ > utilization cores="1" mem="512" \ > op monitor interval="30s" timeout="30s" \ > op start interval="0" timeout="90s" \ > op stop interval="0" timeout="300s" > ms ClusterDataClone ClusterData \ > meta master-max="2" master-node-max="1" clone-max="2" > clone-node-max="1" notify="true" > clone ClusterFSClone ClusterFS \ > meta target-role="Started" is-managed="true" > clone IP ClusterIP \ > meta globally-unique="true" clone-max="2" clone-node-max="2" > clone SSH-stonithClone SSH-stonith > location prefere-blaster XenDom 50: blaster > colocation XenDom-with-ClusterFS inf: XenDom ClusterFSClone > colocation fs_on_drbd inf: ClusterFSClone ClusterDataClone:Master > order ClusterFS-after-ClusterData inf: ClusterDataClone:promote > ClusterFSClone:start > order XenDom-after-ClusterFS inf: ClusterFSClone XenDom > property $id="cib-bootstrap-options" \ > dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \ > cluster-infrastructure="cman" \ > expected-quorum-votes="2" \ > stonith-enabled="true" \ > no-quorum-policy="ignore" \ > last-lrm-refresh="1329194925" > rsc_defaults $id="rsc-options" \ > resource-stickiness="100" > > Here is an $crm resource show: > > Master/Slave Set: ClusterDataClone [ClusterData] > Masters: [ turrel ] > Stopped: [ ClusterData:1 ] > Clone Set: IP [ClusterIP] (unique) > ClusterIP:0 (ocf::heartbeat:IPaddr2) Started > ClusterIP:1 (ocf::heartbeat:IPaddr2) Started > Clone Set: ClusterFSClone [ClusterFS] > Stopped: [ ClusterFS:0 ClusterFS:1 ] > Clone Set: SSH-stonithClone [SSH-stonith] > Started: [ turrel ] > Stopped: [ SSH-stonith:1 ] > XenDom (ocf::heartbeat:Xen) Stopped > > I tryed: > crm(live)resource# cleanup ClusterFSClone > Cleaning up ClusterFS:0 on turrel > Cleaning up ClusterFS:1 on turrel > Waiting for 3 replies from the CRMd... OK > > I can see only warn message in /var/log/cluster/corosync.log > Feb 14 16:25:56 turrel pengine: [1640]: WARN: unpack_rsc_op: Processing > failed op ClusterFS:0_start_0 on turrel: unknown exec error (-2) > and > Feb 14 16:25:56 turrel pengine: [1640]: WARN: common_apply_stickiness: > Forcing ClusterFSClone away from turrel after 1000000 failures (max=1000000) > Feb 14 16:25:56 turrel pengine: [1640]: WARN: common_apply_stickiness: > Forcing ClusterFSClone away from turrel after 1000000 failures (max=1000000) > > Direct me, please, what i need to check or else? > > Best regards, > Dmitriy Bogomolov > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > Best regards, Dmitriy Bogomolov _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org