On 5/10/07, Andrew Beekhof <[EMAIL PROTECTED]> wrote:
On 5/8/07, Rene Purcell <[EMAIL PROTECTED]> wrote: > Ok.. so there something I probably don't understand.. each node should have > right privilege ( if we are talking about filesystem permission ) because > when one of my two node failed(node1), the ressource(vm1) can start on the > available node..(node2) the problem happend when the failed node comeback > online(node1).. the ressoruce(vm1) is supposed to shutdown on node2 and > restart on the node1 isn'it ? well your biggest problem is that its not starting anywhere right now
you need to fix that first In fact the ressource is able to start on each node.. I just sent another mail with all my config file I used and the log.. maybe you'll be able to understand what's happening here..! Thanks
We already try this setup with a SLES 32bits everything was working.. I just > want to know where the problem can be.. is it my configuration ? it's > supposed to be exactly the same as my old setup.. clearly something is different :-) is it the 64bits version of SLES ? unlikely > when I set in the default config: > symetric cluster = yes > default ressource stickiness = INFINITY > > and I add a place constraints, score INFINITY,expression #uname eq node1 > > my ressource is not supposed to go back to his original node ?? like if I > set auto_failback option in heartbeat V1 ?? > > I'm sorry if my previous post was not clear.. > > > On 5/8/07, Andrew Beekhof <[EMAIL PROTECTED]> wrote: > > > > On 5/8/07, Rene Purcell <[EMAIL PROTECTED]> wrote: > > > On 5/8/07, Rene Purcell <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > > > > > On 5/8/07, Andrew Beekhof <[EMAIL PROTECTED]> wrote: > > > > > > > > > > grep ERROR logfile > > > > > > > > > > try this for starters: > > > > > > > > > > May 7 16:31:41 qclsles01 lrmd: [5020]: info: RA output: > > > > > (resource_qclvmsles02:stop:stderr) Error: the domain > > > > > 'resource_qclvmsles02' > > > > > does not exist. > > > > > May 7 16:31:41 qclsles01 lrmd: [5020]: info: RA output: > > > > > (resource_qclvmsles02:stop:stdout) Domain resource_qclvmsles02 > > > > > terminated > > > > > May 7 16:31:41 qclsles01 crmd: [22028]: WARN: process_lrm_event: > > lrm.cLRM > > > > > operation (35) stop_0 on resource_qclvmsles02 Error: (4) > > insufficient > > > > > privileges > > > > > > > > > > > > yup I saw that.. it's weird. Heartbeat shutdown the vm, then say these > > > > errors.. and if I cleanup the ressource he restart on the correct > > node.. > > > > There should be something I missed lol > > > > > > > > > > > > On 5/7/07, Rene Purcell <[EMAIL PROTECTED] > wrote: > > > > > > I would like to know if someone had tried the Novell setup > > described > > > > > in " > > > > > > http://www.novell.com/linux/technical_library/has.pdf " with a > > x86_64 > > > > > arch ? > > > > > > > > > > > > I've tested this setup with a classic x86 arch and everything was > > > > > ok... but > > > > > > I doublechecked my config and everything look good but my VM never > > > > > start on > > > > > > his original node when it come back online... and I can't find > > why! > > > > > > > > > > > > > > > > > > here's the log when my node1 come back.. we can see the VM > > shutting > > > > > down and > > > > > > after that nothing happend in the other node.. > > > > > > > > > > > > May 7 16:31:25 qclsles01 cib: [22024]: info: > > > > > > cib_diff_notify:notify.cUpdate (client: 6403, call:13): > > > > > > 0.65.1020 -> 0.65.1021 (ok) > > > > > > May 7 16:31:25 qclsles01 tengine: [22591]: info: > > > > > > te_update_diff:callbacks.cProcessing diff (cib_update): > > > > > > 0.65.1020 -> 0.65.1021 > > > > > > May 7 16:31:25 qclsles01 tengine: [22591]: info: > > > > > > extract_event:events.cAborting on transient_attributes changes > > > > > > May 7 16:31:25 qclsles01 tengine: [22591]: info: > > > > > update_abort_priority: > > > > > > utils.c Abort priority upgraded to 1000000 > > > > > > May 7 16:31:25 qclsles01 tengine: [22591]: info: > > > > > update_abort_priority: > > > > > > utils.c Abort action 0 superceeded by 2 > > > > > > May 7 16:31:26 qclsles01 cib: [22024]: info: activateCibXml: io.cCIB > > > > > size > > > > > > is 161648 bytes (was 158548) > > > > > > May 7 16:31:26 qclsles01 cib: [22024]: info: > > > > > > cib_diff_notify:notify.cUpdate (client: 6403, call:14): > > > > > > 0.65.1021 -> 0.65.1022 (ok) > > > > > > May 7 16:31:26 qclsles01 haclient: on_event:evt:cib_changed > > > > > > May 7 16:31:26 qclsles01 tengine: [22591]: info: > > > > > > te_update_diff:callbacks.cProcessing diff (cib_update): > > > > > > 0.65.1021 -> 0.65.1022 > > > > > > May 7 16:31:26 qclsles01 tengine: [22591]: info: > > > > > > match_graph_event: events.cAction resource_qclvmsles02_stop_0 (9) > > > > > > confirmed > > > > > > May 7 16:31:26 qclsles01 cib: [25889]: info: write_cib_contents: > > io.cWrote > > > > > > version 0.65.1022 of the CIB to disk (digest: > > > > > > e71c271759371d44c4bad24d50b2421d) > > > > > > May 7 16:31:39 qclsles01 kernel: xenbr0: port 3(vif12.0) entering > > > > > disabled > > > > > > state > > > > > > May 7 16:31:39 qclsles01 kernel: device vif12.0 left promiscuous > > mode > > > > > > May 7 16:31:39 qclsles01 kernel: xenbr0: port 3( vif12.0) > > entering > > > > > disabled > > > > > > state > > > > > > May 7 16:31:39 qclsles01 logger: /etc/xen/scripts/vif-bridge: > > offline > > > > > > XENBUS_PATH=backend/vif/12/0 > > > > > > May 7 16:31:40 qclsles01 logger: /etc/xen/scripts/block: remove > > > > > > XENBUS_PATH=backend/vbd/12/768 > > > > > > May 7 16:31:40 qclsles01 logger: /etc/xen/scripts/block: remove > > > > > > XENBUS_PATH=backend/vbd/12/832 > > > > > > May 7 16:31:40 qclsles01 logger: /etc/xen/scripts/block: remove > > > > > > XENBUS_PATH=backend/vbd/12/5632 > > > > > > May 7 16:31:40 qclsles01 logger: /etc/xen/scripts/vif-bridge: > > brctl > > > > > delif > > > > > > xenbr0 vif12.0 failed > > > > > > May 7 16:31:40 qclsles01 logger: /etc/xen/scripts/vif-bridge: > > > > > ifconfig > > > > > > vif12.0 down failed > > > > > > May 7 16:31:40 qclsles01 logger: /etc/xen/scripts/vif-bridge: > > > > > Successful > > > > > > vif-bridge offline for vif12.0, bridge xenbr0. > > > > > > May 7 16:31:40 qclsles01 logger: > > > > > /etc/xen/scripts/xen-hotplug-cleanup: > > > > > > XENBUS_PATH=backend/vbd/12/5632 > > > > > > May 7 16:31:40 qclsles01 logger: > > > > > /etc/xen/scripts/xen-hotplug-cleanup: > > > > > > XENBUS_PATH=backend/vbd/12/768 > > > > > > May 7 16:31:40 qclsles01 ifdown: vif12.0 > > > > > > May 7 16:31:40 qclsles01 logger: > > > > > /etc/xen/scripts/xen-hotplug-cleanup: > > > > > > XENBUS_PATH=backend/vif/12/0 > > > > > > May 7 16:31:40 qclsles01 logger: > > > > > /etc/xen/scripts/xen-hotplug-cleanup: > > > > > > XENBUS_PATH=backend/vbd/12/832 > > > > > > May 7 16:31:40 qclsles01 ifdown: Interface not available and no > > > > > > configuration found. > > > > > > May 7 16:31:41 qclsles01 lrmd: [5020]: info: RA output: > > > > > > (resource_qclvmsles02:stop:stderr) Error: the domain > > > > > 'resource_qclvmsles02' > > > > > > does not exist. > > > > > > May 7 16:31:41 qclsles01 lrmd: [5020]: info: RA output: > > > > > > (resource_qclvmsles02:stop:stdout) Domain resource_qclvmsles02 > > > > > terminated > > > > > > May 7 16:31:41 qclsles01 crmd: [22028]: WARN: process_lrm_event: > > lrm.cLRM > > > > > > operation (35) stop_0 on resource_qclvmsles02 Error: (4) > > insufficient > > > > > > privileges > > > > > > May 7 16:31:41 qclsles01 cib: [22024]: info: activateCibXml: io.cCIB > > > > > size > > > > > > is 164748 bytes (was 161648) > > > > > > May 7 16:31:41 qclsles01 crmd: [22028]: info: > > > > > > do_state_transition: fsa.cqclsles01: State transition > > > > > > S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ > > > > > > input=I_PE_CALC cause=C_IPC_MESSAGE origin=route_message ] > > > > > > May 7 16:31:41 qclsles01 tengine: [22591]: info: > > > > > > te_update_diff: callbacks.cProcessing diff (cib_update): > > > > > > 0.65.1022 -> 0.65.1023 > > > > > > May 7 16:31:41 qclsles01 cib: [22024]: info: > > > > > > cib_diff_notify:notify.cUpdate (client: 22028, call:100): > > > > > > 0.65.1022 -> 0.65.1023 (ok) > > > > > > May 7 16:31:41 qclsles01 crmd: [22028]: info: > > do_state_transition: > > > > > fsa.c All > > > > > > 2 cluster nodes are eligable to run resources. > > > > > > May 7 16:31:41 qclsles01 tengine: [22591]: ERROR: > > match_graph_event: > > > > > > events.c Action resource_qclvmsles02_stop_0 on qclsles01 failed > > > > > (target: 0 > > > > > > vs. rc: 4): Error > > > > > > May 7 16:31:41 qclsles01 tengine: [22591]: info: > > > > > > match_graph_event:events.cAction resource_qclvmsles02_stop_0 (10) > > > > > > confirmed > > > > > > May 7 16:31:41 qclsles01 tengine: [22591]: info: > > > > > > run_graph: > > graph.c==================================================== > > > > > > May 7 16:31:41 qclsles01 tengine: [22591]: notice: > > > > > > run_graph: graph.cTransition 12: (Complete=3, Pending=0, Fired=0, > > > > > > Skipped=2, Incomplete=0) > > > > > > May 7 16:31:41 qclsles01 haclient: on_event:evt:cib_changed > > > > > > May 7 16:31:41 qclsles01 cib: [26190]: info: write_cib_contents: > > io.cWrote > > > > > > version 0.65.1023 of the CIB to disk (digest: > > > > > > c80326e44b5a106fe9a384240c4a3cc9) > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > process_pe_message: > > > > > > [generation] <cib generated="true" admin_epoch="0" > > have_quorum="true" > > > > > > num_peers="2" cib_feature_revision="1.3" ccm_transition="10" > > > > > > dc_uuid="46ef9c7b-5f6e-4cc0-a0bb-94227b605170" epoch="65" > > > > > > num_updates="1023"/> > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: WARN: unpack_config: > > > > > unpack.c No > > > > > > value specified for cluster preference: default_action_timeout > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > > > > unpack_config: unpack.cDefault stickiness: 1000000 > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > > > > unpack_config:unpack.cDefault failure stickiness: -500 > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > > > > unpack_config: unpack.cSTONITH of failed nodes is disabled > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > > > > unpack_config:unpack.cSTONITH will reboot nodes > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > > > > unpack_config: unpack.cCluster is symmetric - resources can run > > > > > > anywhere by default > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: unpack_config: > > > > > unpack.c On > > > > > > loss of CCM Quorum: Stop ALL resources > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > > > > unpack_config:unpack.cOrphan resources are stopped > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > > > > unpack_config:unpack.cOrphan resource actions are stopped > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: WARN: unpack_config: > > > > > unpack.c No > > > > > > value specified for cluster preference: remove_after_stop > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > > > > unpack_config:unpack.cStopped resources are removed from the > > status > > > > > > section: false > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: unpack_config: > > > > > unpack.c By > > > > > > default resources are managed > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > > > determine_online_status: > > > > > > unpack.c Node qclsles02 is online > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > > > determine_online_status: > > > > > > unpack.c Node qclsles01 is online > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: WARN: > > > > > > unpack_rsc_op: unpack.cProcessing failed op > > > > > > (resource_qclvmsles02_stop_0) for resource_qclvmsles02 > > > > > > on qclsles01 > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: WARN: > > > > > > unpack_rsc_op:unpack.cHandling failed stop for > > resource_qclvmsles02 on > > > > > > > > > > > qclsles01 > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > > > process_orphan_resource: > > > > > > Orphan resource <lrm_resource id="resource_NFS" type="nfs" > > class="lsb" > > > > > > provider="heartbeat"> > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > > > process_orphan_resource: > > > > > > Orphan resource <lrm_rsc_op id="resource_NFS_monitor_0" > > > > > > operation="monitor" crm-debug-origin="build_active_RAs" > > > > > > transition_key="27:3a815bc6-ffaa-49b3-aac2-0ed46e85f085" > > > > > > transition_magic="0:0;27:3a815bc6-ffaa-49b3-aac2-0ed46e85f085" > > > > > call_id="9" > > > > > > crm_feature_set="1.0.6" rc_code="0" op_status="0" interval="0" > > > > > > op_digest="08b7001b97ccdaa1ca23a9f165256bc1"/> > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > > > process_orphan_resource: > > > > > > Orphan resource <lrm_rsc_op id="resource_NFS_stop_0" > > > > > operation="stop" > > > > > > crm-debug-origin="build_active_RAs" > > > > > > transition_key="28:3a815bc6-ffaa-49b3-aac2-0ed46e85f085" > > > > > > transition_magic="0:0;28:3a815bc6-ffaa-49b3-aac2-0ed46e85f085" > > > > > call_id="10" > > > > > > crm_feature_set="1.0.6" rc_code="0" op_status="0" interval="0" > > > > > > op_digest="08b7001b97ccdaa1ca23a9f165256bc1"/> > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > > > process_orphan_resource: > > > > > > Orphan resource </lrm_resource> > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: WARN: > > > > > process_orphan_resource: > > > > > > unpack.c Nothing known about resource resource_NFS running on > > > > > qclsles01 > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > > > create_fake_resource: > > > > > > Orphan resource <lrm_resource id="resource_NFS" type="nfs" > > class="lsb" > > > > > > provider="heartbeat"> > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > > > create_fake_resource: > > > > > > Orphan resource <lrm_rsc_op id="resource_NFS_monitor_0" > > > > > > operation="monitor" crm-debug-origin="build_active_RAs" > > > > > > transition_key="27:3a815bc6-ffaa-49b3-aac2-0ed46e85f085" > > > > > > transition_magic="0:0;27:3a815bc6-ffaa-49b3-aac2-0ed46e85f085" > > > > > call_id="9" > > > > > > crm_feature_set="1.0.6" rc_code="0" op_status="0" interval="0" > > > > > > op_digest="08b7001b97ccdaa1ca23a9f165256bc1"/> > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > > > create_fake_resource: > > > > > > Orphan resource <lrm_rsc_op id="resource_NFS_stop_0" > > > > > operation="stop" > > > > > > crm-debug-origin="build_active_RAs" > > > > > > transition_key="28:3a815bc6-ffaa-49b3-aac2-0ed46e85f085" > > > > > > transition_magic="0:0;28:3a815bc6-ffaa-49b3-aac2-0ed46e85f085" > > > > > call_id="10" > > > > > > crm_feature_set="1.0.6" rc_code="0" op_status="0" interval="0" > > > > > > op_digest="08b7001b97ccdaa1ca23a9f165256bc1"/> > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > > > create_fake_resource: > > > > > > Orphan resource </lrm_resource> > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > > > process_orphan_resource: > > > > > > unpack.c Making sure orphan resource_NFS is stopped > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > resource_qclvmsles01 > > > > > > (heartbeat::ocf:Xen): Started qclsles01 > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > resource_qclvmsles02 > > > > > > > > > > > (heartbeat::ocf:Xen): Started qclsles01 (unmanaged) FAILED > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: resource_NFS > > > > > > (lsb:nfs): Stopped > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: notice: > > > > > > NoRoleChange:native.cLeave resource resource_qclvmsles01 > > > > > > (qclsles01) > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: notice: > > > > > > NoRoleChange:native.cMove resource > > resource_qclvmsles02 (qclsles01 > > > > > > > > > > > -> qclsles02) > > > > > > May 7 16:31:41 qclsles01 crmd: [22028]: info: > > > > > > do_state_transition:fsa.cqclsles01: State transition > > S_POLICY_ENGINE > > > > > > -> S_TRANSITION_ENGINE [ > > > > > > input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ] > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: WARN: > > > > > > custom_action:utils.cAction resource_qclvmsles02_stop_0 stop is > > for > > > > > > resource_qclvmsles02 > > > > > > (unmanaged) > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: WARN: > > > > > > custom_action:utils.cAction resource_qclvmsles02_start_0 start is > > for > > > > > > resource_qclvmsles02 > > > > > > (unmanaged) > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: notice: > > > > > > stage8:allocate.cCreated transition graph 13. > > > > > > May 7 16:31:41 qclsles01 tengine: [22591]: info: > > > > > > unpack_graph:unpack.cUnpacked transition 13: 0 actions in 0 > > synapses > > > > > > May 7 16:31:41 qclsles01 crmd: [22028]: info: > > > > > > do_state_transition:fsa.cqclsles01 : State transition > > > > > > S_TRANSITION_ENGINE -> S_IDLE [ > > > > > > input=I_TE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ] > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: WARN: > > process_pe_message: > > > > > > pengine.c No value specified for cluster preference: > > > > > pe-input-series-max > > > > > > May 7 16:31:41 qclsles01 tengine: [22591]: info: > > > > > > run_graph:graph.cTransition 13: (Complete=0, Pending=0, Fired=0, > > > > > > Skipped=0, Incomplete=0) > > > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > process_pe_message: > > > > > > pengine.c Transition 13: PEngine Input stored in: > > > > > > /var/lib/heartbeat/pengine/pe-input-100.bz2 > > > > > > May 7 16:31:41 qclsles01 tengine: [22591]: info: > > > > > > notify_crmd:actions.cTransition 13 status: te_complete - (null) > > > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > > > -- > > > > > > René Jr Purcell > > > > > > Chargé de projet, sécurité et sytèmes > > > > > > Techno Centre Logiciels Libres, http://www.tc2l.ca/ > > > > > > Téléphone : (418) 681-2929 #124 > > > > > > _______________________________________________ > > > > > > Linux-HA mailing list > > > > > > Linux-HA@lists.linux-ha.org > > > > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > > > > > See also: http://linux-ha.org/ReportingProblems > > > > > > > > > > > _______________________________________________ > > > > > Linux-HA mailing list > > > > > Linux-HA@lists.linux-ha.org > > > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > > > > See also: http://linux-ha.org/ReportingProblems > > > > > > > > > > > > > > > > > > > > > -- > > > > René Jr Purcell > > > > Chargé de projet, sécurité et sytèmes > > > > Techno Centre Logiciels Libres, http://www.tc2l.ca/ > > > > Téléphone : (418) 681-2929 #124 > > > > > > > > > > ah and how am I supposed to know which node is concerned int he log ? > > > I can read: > > > > > > "May 7 16:31:41 qclsles01 crmd: [22028]: WARN: process_lrm_event: lrm.cLRM > > > operation (35) stop_0 on resource_qclvmsles02 Error: (4) insufficient > > > privileges" > > > > > > on my first node and the same message except for the hostname in my > > second > > > node.. so which one have a privileges problem ? > > > > both > > _______________________________________________ > > Linux-HA mailing list > > Linux-HA@lists.linux-ha.org > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > > > > > > -- > René Jr Purcell > Chargé de projet, sécurité et sytèmes > Techno Centre Logiciels Libres, http://www.tc2l.ca/ > Téléphone : (418) 681-2929 #124 > _______________________________________________ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
-- René Jr Purcell Chargé de projet, sécurité et sytèmes Techno Centre Logiciels Libres, http://www.tc2l.ca/ Téléphone : (418) 681-2929 #124 _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems