Re: [Pacemaker] Trouble Starting Filesystem

Art Zemon Mon, 10 Dec 2012 15:10:56 -0800

Folks,
 
I am still struggling with this problem. At the moment, I cannot get my OCSF2 
filesystem to start at all. OCFS2 worked until I expanded my cluster from 2 
nodes to 4 nodes.


I see this in /var/log/syslog. In particular, note the "FATAL: Module 
scsi_hostadapter not found." on the last line.

Dec 10 16:48:03 aztestc1 crmd: [2416]: info: do_lrm_rsc_op: Performing 
key=71:14:0:a766cb8e-4813-483e-a127-d67cf25979ea op=p_fs_share_plesk:0_start_0 )
Dec 10 16:48:03 aztestc1 lrmd: [2413]: debug: on_msg_perform_op:2396: copying 
parameters for rsc p_fs_share_plesk:0
Dec 10 16:48:03 aztestc1 lrmd: [2413]: debug: on_msg_perform_op: add an 
operation operation start[29] on p_fs_share_plesk:0 for client 2416, its 
parameters: CRM_meta_notify_start_resource=[p_fs_share_plesk:0 
p_fs_share_plesk:1 ] CRM_meta_notify_stop_resource=[ ] fstype=[ocfs2] 
CRM_meta_notify_demote_resource=[ ] CRM_meta_notify_master_uname=[ ] 
CRM_meta_notify_promote_uname=[ ] CRM_meta_timeout=[60000] options=[rw,noatime] 
CRM_meta_name=[start] CRM_meta_notify_inactive_resource=[p_fs_share_plesk:0 
p_fs_share_plesk:1 ] CRM_meta_notify_start_uname=[aztestc1 aztestc2 ] 
crm_feature_set=[3.0 to the operation list.
Dec 10 16:48:03 aztestc1 lrmd: [2413]: info: rsc:p_fs_share_plesk:0 start[29] 
(pid 4528)
Dec 10 16:48:03 aztestc1 crmd: [2416]: debug: get_xpath_object: No match for 
//cib_update_result//diff-added//crm_config in /notify/cib_update_result/diff
Dec 10 16:48:03 aztestc1 crmd: [2416]: debug: get_xpath_object: No match for 
//cib_update_result//diff-added//crm_config in /notify/cib_update_result/diff
Dec 10 16:48:03 aztestc1 lrmd: [2413]: debug: rsc:p_drbd_share_plesk:1 
monitor[16] (pid 4530)
Dec 10 16:48:03 aztestc1 crmd: [2416]: debug: get_xpath_object: No match for 
//cib_update_result//diff-added//crm_config in /notify/cib_update_result/diff
Dec 10 16:48:03 aztestc1 Filesystem[4528]: INFO: Running start for 
/dev/drbd/by-res/shareplesk on /shareplesk
Dec 10 16:48:03 aztestc1 drbd[4530]: DEBUG: shareplesk: Calling 
/usr/sbin/crm_master -Q -l reboot -v 10000
Dec 10 16:48:03 aztestc1 lrmd: [2413]: info: RA output: 
(p_fs_share_plesk:0:start:stderr) FATAL: Module scsi_hostadapter not found.



DRBD is running in dual-primary mode:

root@aztestc1:~# service drbd status
drbd driver loaded OK; device status:
version: 8.3.11 (api:88/proto:86-96)
srcversion: 71955441799F513ACA6DA60 
m:res         cs         ro               ds                 p  mounted  fstype
1:shareplesk  Connected  Primary/Primary  UpToDate/UpToDate  C



Everything looks happy:

root@aztestc1:~# crm_mon -1
============
Last updated: Mon Dec 10 16:59:40 2012
Last change: Mon Dec 10 16:48:02 2012 via crmd on aztestc3
Stack: cman
Current DC: aztestc3 - partition with quorum
Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
4 Nodes configured, unknown expected votes
10 Resources configured.
============

Online: [ aztestc3 aztestc4 aztestc1 aztestc2 ]

 Clone Set: cl_fencing [p_stonith]
     Started: [ aztestc2 aztestc1 aztestc4 aztestc3 ]
 Clone Set: cl_o2cb [p_o2cb]
     Started: [ aztestc1 aztestc2 ]
 Master/Slave Set: ms_drbd_share_plesk [p_drbd_share_plesk]
     Masters: [ aztestc2 aztestc1 ]

Failed actions:
    p_fs_share_plesk:1_start_0 (node=aztestc2, call=31, rc=1, status=complete): 
unknown error
    p_fs_share_plesk:0_start_0 (node=aztestc1, call=29, rc=1, status=complete): 
unknown error



Here is my complete configuration, which does not work:

node aztestc1 \
        attributes standby="off"
node aztestc2 \
        attributes standby="off"
node aztestc3 \
        attributes standby="off"
node aztestc4 \
        attributes standby="off"
primitive p_drbd_share_plesk ocf:linbit:drbd \
        params drbd_resource="shareplesk" \
        op monitor interval="15s" role="Master" timeout="20s" \
        op monitor interval="20s" role="Slave" timeout="20s" \
        op start interval="0" timeout="240s" \
        op stop interval="0" timeout="100s"
primitive p_fs_share_plesk ocf:heartbeat:Filesystem \
        params device="/dev/drbd/by-res/shareplesk" directory="/shareplesk" 
fstype="ocfs2" options="rw,noatime" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60" \
        op monitor interval="20" timeout="40"
primitive p_o2cb ocf:pacemaker:o2cb \
        params stack="cman" \
        op start interval="0" timeout="90" \
        op stop interval="0" timeout="100" \
        op monitor interval="10" timeout="20"
primitive p_stonith stonith:fence_ec2 \
        params pcmk_host_check="static-list" pcmk_host_list="aztestc1 aztestc2 
aztestc3 aztestc4" \
        op monitor interval="600s" timeout="300s" \
        op start start-delay="10s" interval="0"
ms ms_drbd_share_plesk p_drbd_share_plesk \
        meta master-max="2" notify="true" interleave="true" clone-max="2" 
is-managed="true" target-role="Started"
clone cl_fencing p_stonith \
        meta target-role="Started"
clone cl_fs_share_plesk p_fs_share_plesk \
        meta clone-max="2" interleave="true" notify="true" 
globally-unique="false" target-role="Started"
clone cl_o2cb p_o2cb \
        meta clone-max="2" interleave="true" globally-unique="false" 
target-role="Started"
location lo_drbd_plesk3 ms_drbd_share_plesk -inf: aztestc3
location lo_drbd_plesk4 ms_drbd_share_plesk -inf: aztestc4
location lo_fs_plesk3 cl_fs_share_plesk -inf: aztestc3
location lo_fs_plesk4 cl_fs_share_plesk -inf: aztestc4
location lo_o2cb3 cl_o2cb -inf: aztestc3
location lo_o2cb4 cl_o2cb -inf: aztestc4
order o_20plesk inf: ms_drbd_share_plesk:promote cl_o2cb:start
order o_40fs_plesk inf: cl_o2cb cl_fs_share_plesk
property $id="cib-bootstrap-options" \
        stonith-enabled="true" \
        stonith-timeout="180s" \
        no-quorum-policy="freeze" \
        dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
        cluster-infrastructure="cman" \
        last-lrm-refresh="1355179514"
rsc_defaults $id="rsc-options" \
        resource-stickiness="100"



and here is my previous 2-node configuration, which worked "mostly." Sometimes 
I had to manually "crm resource cleanup cl_fs_share" to get the filesystem to 
mount but otherwise eveyrthing was fine.

node aztestc1 \
        attributes standby="off"
node aztestc2 \
        attributes standby="off"
primitive p_drbd_share ocf:linbit:drbd \
        params drbd_resource="share" \
        op monitor interval="15s" role="Master" timeout="20s" \
        op monitor interval="20s" role="Slave" timeout="20s" \
        op start interval="0" timeout="240s" \
        op stop interval="0" timeout="100s"
primitive p_fs_share ocf:heartbeat:Filesystem \
        params device="/dev/drbd/by-res/share" directory="/share" 
fstype="ocfs2" options="rw,noatime" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60" \
        op monitor interval="20" timeout="40"
primitive p_o2cb ocf:pacemaker:o2cb \
        params stack="cman" \
        op start interval="0" timeout="90" \
        op stop interval="0" timeout="100" \
        op monitor interval="10" timeout="20"
primitive p_stonith stonith:fence_ec2 \
        params pcmk_host_check="static-list" pcmk_host_list="aztestc1 aztestc2" 
\
        op monitor interval="600s" timeout="300s" \
        op start start-delay="10s" interval="0"
ms ms_drbd_share p_drbd_share \
        meta master-max="2" notify="true" interleave="true" clone-max="2" 
is-managed="true" target-role="Started"
clone cl_fencing p_stonith \
        meta target-role="Started"
clone cl_fs_share p_fs_share \
        meta interleave="true" notify="true" globally-unique="false" 
target-role="Started"
clone cl_o2cb p_o2cb \
        meta interleave="true" globally-unique="false"
order o_ocfs2 inf: ms_drbd_share:promote cl_o2cb
order o_share inf: cl_o2cb cl_fs_share
property $id="cib-bootstrap-options" \
        stonith-enabled="true" \
        stonith-timeout="180s" \
        dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
        cluster-infrastructure="cman" \
        last-lrm-refresh="1354808774"


Thoughts? Ideas? Suggestions?

Thank you,
    -- Art Z.

--
Art Zemon, President
 [http://www.hens-teeth.net/] Hen's Teeth Network for reliable web hosting and 
programming
 (866)HENS-NET / (636)447-3030 ext. 200 / www.hens-teeth.net
 


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Trouble Starting Filesystem

Reply via email to