Re: [Linux-HA] How to prevent unexpected unmount (stop) of ordered clone Filesystem resource on alive node

Andrew Beekhof Fri, 28 Dec 2007 10:52:15 -0800

In the version you have, the clone placement wasn't completely stable
(which is my term for the behavior you want :-)
This will be addressed in SP2 which is due out "soon" IIRC


If you can't wait for SP2, try either the latest interim build or 2.1.3

On Dec 28, 2007 11:03 AM, Takekazu Okamoto <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I have a question about shared storage resource setup.
> Does anybody help me, I would appreciate.
>
> I have two nodes which have a shared storage.
> OCFS2 file systems are on that and it is managed by EVMS.
>
> >> following is my configuration
> node2:~ # crm_mon -1
>
>
> ============
> Last updated: Fri Dec 28 18:06:54 2007
> Current DC: node2 (9e7c6cd5-35d2-4256-9003-8b299a784a60)
> 2 Nodes configured.
> 4 Resources configured.
> ============
>
> Node: node1 (00b9f7f9-c7c2-492e-ad11-fc72c761e1a7): online
> Node: node2 (9e7c6cd5-35d2-4256-9003-8b299a784a60): online
>
> Clone Set: stonithcloneset
>     stonithclone:0      (stonith:external/ssh): Started node1
>     stonithclone:1      (stonith:external/ssh): Started node2
> Clone Set: evmscloneset
>     evmsclone:0 (heartbeat::ocf:EvmsSCC):       Started node1
>     evmsclone:1 (heartbeat::ocf:EvmsSCC):       Started node2
> Clone Set: imagestorecloneset
>     imagestoreclone:0   (heartbeat::ocf:Filesystem):    Started node1
>     imagestoreclone:1   (heartbeat::ocf:Filesystem):    Started node2
> Clone Set: configstorecloneset
>     configstoreclone:0  (heartbeat::ocf:Filesystem):    Started node1
>     configstoreclone:1  (heartbeat::ocf:Filesystem):    Started node2
> <<
>
> And there are order constraints.
>
> >> constraints
> node2:~ # cibadmin -Q -o constraints
>  <constraints>
>    <rsc_order id="evmsorderconstraints-01" from="imagestorecloneset"
> type="after" to="evmscloneset"/>
>    <rsc_order id="evmsorderconstraints-02" from="configstorecloneset"
> type="after" to="evmscloneset"/>
>  </constraints>
> <<
>
> On this system, I tried to emulate failure on node1 by killing
> heartbeat process on node1.
>
> >> my emulate command
> node2:~ # ssh node1 pkill heartbeat
> <<
>
> Then node2 detects node1 is dead.
>
> >> /var/log/messages on node2
> Dec 28 18:18:22 node2 heartbeat: [4231]: WARN: node node1: is dead
> <<
>
> I don't want to stop any resources on node2.
> Because node2 is still alive without any problems.
> However clone Filesystem resources on node2 was tried to stop.
>
> >> following is the portion of /var/log/messages
> Dec 28 18:18:23 node2 pengine: [4593]: info: native_start_constraints:
> Ordering evmsclone:1_start_0 after node1 recovery
> Dec 28 18:18:23 node2 pengine: [4593]: info: native_stop_constraints:
> evmsclone:0_stop_0 is implicit after node1 is fenced
> Dec 28 18:18:23 node2 pengine: [4593]: info: native_stop_constraints:
> Re-creating actions for evmscloneset
> Dec 28 18:18:23 node2 pengine: [4593]: notice: NoRoleChange: Leave
> resource evmsclone:1 (node2)
> Dec 28 18:18:23 node2 pengine: [4593]: notice: StopRsc:   node1 Stop
> evmsclone:0
> Dec 28 18:18:23 node2 pengine: [4593]: info: native_start_constraints:
> Ordering imagestoreclone:1_start_0 after node1 recovery
> Dec 28 18:18:23 node2 pengine: [4593]: info: native_stop_constraints:
> imagestoreclone:0_stop_0 is implicit after node1 is fenced
> Dec 28 18:18:23 node2 pengine: [4593]: info: native_stop_constraints:
> Re-creating actions for imagestorecloneset
> Dec 28 18:18:23 node2 pengine: [4593]: notice: NoRoleChange: Leave
> resource imagestoreclone:1   (node2)
> Dec 28 18:18:23 node2 pengine: [4593]: notice: StopRsc:   node1 Stop
> imagestoreclone:0
> Dec 28 18:18:23 node2 pengine: [4593]: info: native_start_constraints:
> Ordering configstoreclone:1_start_0 after node1 recovery
> Dec 28 18:18:23 node2 pengine: [4593]: info: native_stop_constraints:
> configstoreclone:0_stop_0 is implicit after node1 is fenced
> Dec 28 18:18:23 node2 pengine: [4593]: info: native_stop_constraints:
> Re-creating actions for configstorecloneset
> Dec 28 18:18:23 node2 pengine: [4593]: notice: NoRoleChange: Leave
> resource configstoreclone:1  (node2)
> Dec 28 18:18:23 node2 pengine: [4593]: notice: StopRsc:   node1 Stop
> configstoreclone:0
>
> Dec 28 18:18:23 node2 tengine: [4592]: info: send_rsc_command:
> Initiating action 72: evmsclone:1_pre_notify_stop_0 on node2
> Dec 28 18:18:23 node2 tengine: [4592]: info: send_rsc_command:
> Initiating action 75: imagestoreclone:1_pre_notify_stop_0 on node2
> Dec 28 18:18:23 node2 tengine: [4592]: info: send_rsc_command:
> Initiating action 77: configstoreclone:1_pre_notify_stop_0 on node2
>
> Dec 28 18:18:23 node2 Filesystem[24615]: [24682]: INFO: Running notify
> for /dev/evms/vmsharedclustercontainer/imagestore on
> /var/lib/xen/images
> Dec 28 18:18:23 node2 Filesystem[24625]: [24693]: INFO: Running notify
> for /dev/evms/vmsharedclustercontainer/configstore on /etc/xen/vm
> Dec 28 18:18:23 node2 Filesystem[24615]: [24702]: INFO:
> C996482914C647BEBC470F8DB4293A3E: notify: pre for stop
> Dec 28 18:18:23 node2 Filesystem[24615]: [24704]: INFO:
> C996482914C647BEBC470F8DB4293A3E: notify active: node2
> Dec 28 18:18:23 node2 Filesystem[24625]: [24705]: INFO:
> 68AEC97644E641DB8B6201C474199FD6: notify: pre for stop
> Dec 28 18:18:23 node2 Filesystem[24615]: [24706]: INFO:
> C996482914C647BEBC470F8DB4293A3E: notify stop: node1 node2
> Dec 28 18:18:23 node2 Filesystem[24615]: [24707]: INFO:
> C996482914C647BEBC470F8DB4293A3E: notify start: node2
> Dec 28 18:18:23 node2 Filesystem[24625]: [24708]: INFO:
> 68AEC97644E641DB8B6201C474199FD6: notify active: node2
> Dec 28 18:18:23 node2 Filesystem[24615]: [24709]: INFO:
> C996482914C647BEBC470F8DB4293A3E: ignoring pre-notify for stop.
> Dec 28 18:18:23 node2 crmd: [4530]: info: process_lrm_event: LRM
> operation imagestoreclone:1_notify_0 (call=22, rc=0) complete
> Dec 28 18:18:23 node2 tengine: [4592]: info: match_graph_event: Action
> imagestoreclone:1_pre_notify_stop_0 (75) confirmed on node2 (rc=0)
> Dec 28 18:18:23 node2 tengine: [4592]: info: te_pseudo_action: Pseudo
> action 41 fired and confirmed
> Dec 28 18:18:23 node2 tengine: [4592]: info: te_pseudo_action: Pseudo
> action 38 fired and confirmed
> Dec 28 18:18:23 node2 tengine: [4592]: info: send_rsc_command:
> Initiating action 29: imagestoreclone:1_stop_0 on node2
> Dec 28 18:18:23 node2 crmd: [4530]: info: do_lrm_rsc_op: Performing
> op=imagestoreclone:1_stop_0
> key=29:10:8b545ce3-0d15-446e-aed3-3c89c8bfd392)
> Dec 28 18:18:23 node2 lrmd: [4527]: info: perform_op:2792: operations
> on resource imagestoreclone:1 already delayed
> Dec 28 18:18:23 node2 Filesystem[24625]: [24710]: INFO:
> 68AEC97644E641DB8B6201C474199FD6: notify stop: node1 node2
> Dec 28 18:18:23 node2 Filesystem[24625]: [24718]: INFO:
> 68AEC97644E641DB8B6201C474199FD6: notify start: node2
> Dec 28 18:18:23 node2 Filesystem[24625]: [24719]: INFO:
> 68AEC97644E641DB8B6201C474199FD6: ignoring pre-notify for stop.
> Dec 28 18:18:23 node2 crmd: [4530]: info: process_lrm_event: LRM
> operation configstoreclone:1_notify_0 (call=23, rc=0) complete
> Dec 28 18:18:23 node2 tengine: [4592]: info: match_graph_event: Action
> configstoreclone:1_pre_notify_stop_0 (77) confirmed on node2 (rc=0)
> Dec 28 18:18:23 node2 tengine: [4592]: info: te_pseudo_action: Pseudo
> action 56 fired and confirmed
> Dec 28 18:18:23 node2 tengine: [4592]: info: te_pseudo_action: Pseudo
> action 53 fired and confirmed
> Dec 28 18:18:23 node2 tengine: [4592]: info: send_rsc_command:
> Initiating action 44: configstoreclone:1_stop_0 on node2
> Dec 28 18:18:23 node2 crmd: [4530]: info: do_lrm_rsc_op: Performing
> op=configstoreclone:1_stop_0
> key=44:10:8b545ce3-0d15-446e-aed3-3c89c8bfd392)
> Dec 28 18:18:23 node2 lrmd: [4527]: info: rsc:configstoreclone:1: stop
> Dec 28 18:18:23 node2 crmd: [4530]: info: process_lrm_event: LRM
> operation configstoreclone:1_monitor_20000 (call=20, rc=-2) Cancelled
> Dec 28 18:18:23 node2 crmd: [4530]: info: process_lrm_event: LRM
> operation imagestoreclone:1_monitor_20000 (call=17, rc=-2) Cancelled
> Dec 28 18:18:23 node2 Filesystem[24731]: [24778]: INFO: Running stop
> for /dev/evms/vmsharedclustercontainer/configstore on /etc/xen/vm
> Dec 28 18:18:23 node2 Filesystem[24731]: [24788]: INFO: Trying to
> unmount /etc/xen/vm
> Dec 28 18:18:24 node2 lrmd: [4527]: info: rsc:imagestoreclone:1: stop
> Dec 28 18:18:24 node2 Filesystem[24791]: [24823]: INFO: Running stop
> for /dev/evms/vmsharedclustercontainer/imagestore on
> /var/lib/xen/images
> Dec 28 18:18:24 node2 Filesystem[24791]: [24833]: INFO: Trying to
> unmount /var/lib/xen/images
> <<
>
> As you can see, volumes on node2 are unmounted.
>
> If there are NOT any order constraints, clone Filesystem resource on
> node2 does not stop.
> Both volumes are still mounted.
>
> Does anybody guide me how to configure ocfs2 Filesystem resource which
> depends on EVMS?
>
> >> My configuration
> cibadmin -Q
>
>  <cib admin_epoch="0" have_quorum="true" ignore_dtd="false"
> num_peers="2" cib_feature_revision="2.0" generated="true"
> num_updates="56" epoch="178" cib-last-written="Fri
>  Dec 28 17:42:33 2007" ccm_transition="2"
> dc_uuid="00b9f7f9-c7c2-492e-ad11-fc72c761e1a7">
>    <configuration>
>      <crm_config>
>        <cluster_property_set id="cib-bootstrap-options">
>          <attributes>
>            <nvpair id="cib-bootstrap-options-dc-version"
> name="dc-version" value="2.1.2.2007121923-node:
> dde7e7e4f00ae1b9f95ae2bd404d88d2beceab40"/>
>            <nvpair id="cib-bootstrap-options-stonith-enabled"
> name="stonith-enabled" value="true"/>
>            <nvpair id="cib-bootstrap-options-stop-orphan-resources"
> name="stop-orphan-resources" value="true"/>
>            <nvpair id="cib-bootstrap-options-no-quorum-policy"
> name="no-quorum-policy" value="stop"/>
>            <nvpair id="cib-bootstrap-options-last-lrm-refresh"
> name="last-lrm-refresh" value="1198822112"/>
>          </attributes>
>        </cluster_property_set>
>      </crm_config>
>      <nodes>
>        <node id="00b9f7f9-c7c2-492e-ad11-fc72c761e1a7" uname="node1"
> type="normal"/>
>        <node id="9e7c6cd5-35d2-4256-9003-8b299a784a60" uname="node2"
> type="normal"/>
>      </nodes>
>      <resources>
>        <clone globally_unique="false" id="stonithcloneset">
>          <primitive id="stonithclone" class="stonith"
> type="external/ssh" provider="heartbeat">
>            <operations>
>              <op name="monitor" interval="5s"
> id="stonithclone-op-01"/>
>            </operations>
>            <instance_attributes id="stonithclone">
>              <attributes>
>                <nvpair id="stonithclone-01" name="hostlist"
> value="node1,node2"/>
>              </attributes>
>            </instance_attributes>
>          </primitive>
>          <meta_attributes id="stonithcloneset_meta_attrs">
>            <attributes>
>              <nvpair name="target_role"
> id="stonithcloneset_metaattr_target_role" value="started"/>
>            </attributes>
>          </meta_attributes>
>        </clone>
>        <clone id="evmscloneset">
>          <meta_attributes id="evmscloneset_meta_attrs">
>            <attributes>
>              <nvpair id="evmscloneset_metaattr_target_role"
> name="target_role" value="started"/>
>              <nvpair id="evmscloneset_metaattr_clone_max"
> name="clone_max" value="2"/>
>              <nvpair id="evmscloneset_metaattr_clone_node_max"
> name="clone_node_max" value="1"/>
>              <nvpair id="evmscloneset_metaattr_notify" name="notify"
> value="true"/>
>              <nvpair id="evmscloneset_metaattr_globally_unique"
> name="globally_unique" value="false"/>
>            </attributes>
>          </meta_attributes>
>          <primitive id="evmsclone" class="ocf" type="EvmsSCC"
> provider="heartbeat"/>
>        </clone>
>        <clone id="imagestorecloneset">
>          <meta_attributes id="imagestorecloneset_meta_attrs">
>            <attributes>
>              <nvpair name="target_role"
> id="imagestorecloneset_metaattr_target_role" value="started"/>
>              <nvpair id="imagestorecloneset_metaattr_clone_max"
> name="clone_max" value="2"/>
>              <nvpair id="imagestorecloneset_metaattr_clone_node_max"
> name="clone_node_max" value="1"/>
>              <nvpair id="imagestorecloneset_metaattr_notify"
> name="notify" value="true"/>
>              <nvpair id="imagestorecloneset_metaattr_globally_unique"
> name="globally_unique" value="false"/>
>            </attributes>
>          </meta_attributes>
>          <primitive id="imagestoreclone" class="ocf" type="Filesystem"
> provider="heartbeat">
>            <instance_attributes id="imagestoreclone_instance_attrs">
>              <attributes>
>                <nvpair id="dc97d7bc-a2a5-426a-b1da-44b708810ce7"
> name="device" value="/dev/evms/vmsharedclustercontainer/imagestore"/>
>                <nvpair id="8b402efb-d209-43ea-bcd1-fe3ee65fb079"
> name="directory" value="/var/lib/xen/images"/>
>                <nvpair id="0cc389c9-0d5d-42a3-9eae-84a819dda735"
> name="fstype" value="ocfs2"/>
>              </attributes>
>            </instance_attributes>
>            <operations>
>              <op id="b15208c5-ef3c-406c-978e-0ca5b2539338"
> name="monitor" interval="20" timeout="60"/>
>              <op id="93464d88-34f8-46f5-8b69-3bcca1665d7b" name="stop"
> timeout="60"/>
>              <op id="9c75dd74-215c-40d8-961b-9aa72a7f1e2d" name="start"
> timeout="60"/>
>            </operations>
>          </primitive>
>        </clone>
>        <clone id="configstorecloneset">
>          <meta_attributes id="configstorecloneset_meta_attrs">
>            <attributes>
>              <nvpair name="target_role"
> id="configstorecloneset_metaattr_target_role" value="started"/>
>              <nvpair id="configstorecloneset_metaattr_clone_max"
> name="clone_max" value="2"/>
>              <nvpair id="configstorecloneset_metaattr_clone_node_max"
> name="clone_node_max" value="1"/>
>              <nvpair id="configstorecloneset_metaattr_notify"
> name="notify" value="true"/>
>              <nvpair id="configstorecloneset_metaattr_globally_unique"
> name="globally_unique" value="false"/>
>            </attributes>
>          </meta_attributes>
>          <primitive id="configstoreclone" class="ocf" type="Filesystem"
> provider="heartbeat">
>            <instance_attributes id="configstoreclone_instance_attrs">
>              <attributes>
>                <nvpair id="a794a5db-aab2-4fe6-b395-7749c8225d55"
> name="device" value="/dev/evms/vmsharedclustercontainer/configstore"/>
>                <nvpair id="bd9e4513-6544-4063-8aeb-02470a7bef7a"
> name="directory" value="/etc/xen/vm"/>
>                <nvpair id="3d0c237a-fcd2-47b6-83ef-d0744ae3b460"
> name="fstype" value="ocfs2"/>
>              </attributes>
>            </instance_attributes>
>            <operations>
>              <op id="2504373d-673e-4eb5-9c2f-59d256ede3d8"
> name="monitor" interval="20" timeout="60"/>
>              <op id="9898a6f6-ac64-47b5-b82e-0595b16e0b99" name="stop"
> timeout="60"/>
>              <op id="c3187abb-86f1-455f-8312-9b682f45def8" name="start"
> timeout="60"/>
>            </operations>
>          </primitive>
>        </clone>
>      </resources>
>      <constraints>
>        <rsc_order id="evmsorderconstraints-01"
> from="imagestorecloneset" type="after" to="evmscloneset"/>
>        <rsc_order id="evmsorderconstraints-02"
> from="configstorecloneset" type="after" to="evmscloneset"/>
>      </constraints>
>    </configuration>
> <<
>
> heartbeat is 2.1.2
> My Linux distro is SLES10SP1 x86_64 with latest patches
>
>
> Thank you,
> Takekazu Okamoto
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] How to prevent unexpected unmount (stop) of ordered clone Filesystem resource on alive node

Reply via email to