Folks, I am still struggling with this problem. At the moment, I cannot get my OCSF2 filesystem to start at all. OCFS2 worked until I expanded my cluster from 2 nodes to 4 nodes.
I see this in /var/log/syslog. In particular, note the "FATAL: Module scsi_hostadapter not found." on the last line. Dec 10 16:48:03 aztestc1 crmd: [2416]: info: do_lrm_rsc_op: Performing key=71:14:0:a766cb8e-4813-483e-a127-d67cf25979ea op=p_fs_share_plesk:0_start_0 ) Dec 10 16:48:03 aztestc1 lrmd: [2413]: debug: on_msg_perform_op:2396: copying parameters for rsc p_fs_share_plesk:0 Dec 10 16:48:03 aztestc1 lrmd: [2413]: debug: on_msg_perform_op: add an operation operation start[29] on p_fs_share_plesk:0 for client 2416, its parameters: CRM_meta_notify_start_resource=[p_fs_share_plesk:0 p_fs_share_plesk:1 ] CRM_meta_notify_stop_resource=[ ] fstype=[ocfs2] CRM_meta_notify_demote_resource=[ ] CRM_meta_notify_master_uname=[ ] CRM_meta_notify_promote_uname=[ ] CRM_meta_timeout=[60000] options=[rw,noatime] CRM_meta_name=[start] CRM_meta_notify_inactive_resource=[p_fs_share_plesk:0 p_fs_share_plesk:1 ] CRM_meta_notify_start_uname=[aztestc1 aztestc2 ] crm_feature_set=[3.0 to the operation list. Dec 10 16:48:03 aztestc1 lrmd: [2413]: info: rsc:p_fs_share_plesk:0 start[29] (pid 4528) Dec 10 16:48:03 aztestc1 crmd: [2416]: debug: get_xpath_object: No match for //cib_update_result//diff-added//crm_config in /notify/cib_update_result/diff Dec 10 16:48:03 aztestc1 crmd: [2416]: debug: get_xpath_object: No match for //cib_update_result//diff-added//crm_config in /notify/cib_update_result/diff Dec 10 16:48:03 aztestc1 lrmd: [2413]: debug: rsc:p_drbd_share_plesk:1 monitor[16] (pid 4530) Dec 10 16:48:03 aztestc1 crmd: [2416]: debug: get_xpath_object: No match for //cib_update_result//diff-added//crm_config in /notify/cib_update_result/diff Dec 10 16:48:03 aztestc1 Filesystem[4528]: INFO: Running start for /dev/drbd/by-res/shareplesk on /shareplesk Dec 10 16:48:03 aztestc1 drbd[4530]: DEBUG: shareplesk: Calling /usr/sbin/crm_master -Q -l reboot -v 10000 Dec 10 16:48:03 aztestc1 lrmd: [2413]: info: RA output: (p_fs_share_plesk:0:start:stderr) FATAL: Module scsi_hostadapter not found. DRBD is running in dual-primary mode: root@aztestc1:~# service drbd status drbd driver loaded OK; device status: version: 8.3.11 (api:88/proto:86-96) srcversion: 71955441799F513ACA6DA60 m:res cs ro ds p mounted fstype 1:shareplesk Connected Primary/Primary UpToDate/UpToDate C Everything looks happy: root@aztestc1:~# crm_mon -1 ============ Last updated: Mon Dec 10 16:59:40 2012 Last change: Mon Dec 10 16:48:02 2012 via crmd on aztestc3 Stack: cman Current DC: aztestc3 - partition with quorum Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c 4 Nodes configured, unknown expected votes 10 Resources configured. ============ Online: [ aztestc3 aztestc4 aztestc1 aztestc2 ] Clone Set: cl_fencing [p_stonith] Started: [ aztestc2 aztestc1 aztestc4 aztestc3 ] Clone Set: cl_o2cb [p_o2cb] Started: [ aztestc1 aztestc2 ] Master/Slave Set: ms_drbd_share_plesk [p_drbd_share_plesk] Masters: [ aztestc2 aztestc1 ] Failed actions: p_fs_share_plesk:1_start_0 (node=aztestc2, call=31, rc=1, status=complete): unknown error p_fs_share_plesk:0_start_0 (node=aztestc1, call=29, rc=1, status=complete): unknown error Here is my complete configuration, which does not work: node aztestc1 \ attributes standby="off" node aztestc2 \ attributes standby="off" node aztestc3 \ attributes standby="off" node aztestc4 \ attributes standby="off" primitive p_drbd_share_plesk ocf:linbit:drbd \ params drbd_resource="shareplesk" \ op monitor interval="15s" role="Master" timeout="20s" \ op monitor interval="20s" role="Slave" timeout="20s" \ op start interval="0" timeout="240s" \ op stop interval="0" timeout="100s" primitive p_fs_share_plesk ocf:heartbeat:Filesystem \ params device="/dev/drbd/by-res/shareplesk" directory="/shareplesk" fstype="ocfs2" options="rw,noatime" \ op start interval="0" timeout="60" \ op stop interval="0" timeout="60" \ op monitor interval="20" timeout="40" primitive p_o2cb ocf:pacemaker:o2cb \ params stack="cman" \ op start interval="0" timeout="90" \ op stop interval="0" timeout="100" \ op monitor interval="10" timeout="20" primitive p_stonith stonith:fence_ec2 \ params pcmk_host_check="static-list" pcmk_host_list="aztestc1 aztestc2 aztestc3 aztestc4" \ op monitor interval="600s" timeout="300s" \ op start start-delay="10s" interval="0" ms ms_drbd_share_plesk p_drbd_share_plesk \ meta master-max="2" notify="true" interleave="true" clone-max="2" is-managed="true" target-role="Started" clone cl_fencing p_stonith \ meta target-role="Started" clone cl_fs_share_plesk p_fs_share_plesk \ meta clone-max="2" interleave="true" notify="true" globally-unique="false" target-role="Started" clone cl_o2cb p_o2cb \ meta clone-max="2" interleave="true" globally-unique="false" target-role="Started" location lo_drbd_plesk3 ms_drbd_share_plesk -inf: aztestc3 location lo_drbd_plesk4 ms_drbd_share_plesk -inf: aztestc4 location lo_fs_plesk3 cl_fs_share_plesk -inf: aztestc3 location lo_fs_plesk4 cl_fs_share_plesk -inf: aztestc4 location lo_o2cb3 cl_o2cb -inf: aztestc3 location lo_o2cb4 cl_o2cb -inf: aztestc4 order o_20plesk inf: ms_drbd_share_plesk:promote cl_o2cb:start order o_40fs_plesk inf: cl_o2cb cl_fs_share_plesk property $id="cib-bootstrap-options" \ stonith-enabled="true" \ stonith-timeout="180s" \ no-quorum-policy="freeze" \ dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \ cluster-infrastructure="cman" \ last-lrm-refresh="1355179514" rsc_defaults $id="rsc-options" \ resource-stickiness="100" and here is my previous 2-node configuration, which worked "mostly." Sometimes I had to manually "crm resource cleanup cl_fs_share" to get the filesystem to mount but otherwise eveyrthing was fine. node aztestc1 \ attributes standby="off" node aztestc2 \ attributes standby="off" primitive p_drbd_share ocf:linbit:drbd \ params drbd_resource="share" \ op monitor interval="15s" role="Master" timeout="20s" \ op monitor interval="20s" role="Slave" timeout="20s" \ op start interval="0" timeout="240s" \ op stop interval="0" timeout="100s" primitive p_fs_share ocf:heartbeat:Filesystem \ params device="/dev/drbd/by-res/share" directory="/share" fstype="ocfs2" options="rw,noatime" \ op start interval="0" timeout="60" \ op stop interval="0" timeout="60" \ op monitor interval="20" timeout="40" primitive p_o2cb ocf:pacemaker:o2cb \ params stack="cman" \ op start interval="0" timeout="90" \ op stop interval="0" timeout="100" \ op monitor interval="10" timeout="20" primitive p_stonith stonith:fence_ec2 \ params pcmk_host_check="static-list" pcmk_host_list="aztestc1 aztestc2" \ op monitor interval="600s" timeout="300s" \ op start start-delay="10s" interval="0" ms ms_drbd_share p_drbd_share \ meta master-max="2" notify="true" interleave="true" clone-max="2" is-managed="true" target-role="Started" clone cl_fencing p_stonith \ meta target-role="Started" clone cl_fs_share p_fs_share \ meta interleave="true" notify="true" globally-unique="false" target-role="Started" clone cl_o2cb p_o2cb \ meta interleave="true" globally-unique="false" order o_ocfs2 inf: ms_drbd_share:promote cl_o2cb order o_share inf: cl_o2cb cl_fs_share property $id="cib-bootstrap-options" \ stonith-enabled="true" \ stonith-timeout="180s" \ dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \ cluster-infrastructure="cman" \ last-lrm-refresh="1354808774" Thoughts? Ideas? Suggestions? Thank you, -- Art Z. -- Art Zemon, President [http://www.hens-teeth.net/] Hen's Teeth Network for reliable web hosting and programming (866)HENS-NET / (636)447-3030 ext. 200 / www.hens-teeth.net _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org