Re: [Pacemaker] First confused (then enlightened ? :)
On 2011-02-14 22:37, Carlos G Mendioroz wrote: Andrew Beekhof @ 14/02/2011 05:44 -0300 dixit: -Is still the case that Heartbeat is not to be considered for new deployments ? (I read something along that line) pretty much http://www.clusterlabs.org/wiki/FAQ#Should_I_Run_Pacemaker_on_Heartbeat_or_Coroysnc.3F That was the place I was referring to. Still, the thing is pretty confusing. DR-BD talks about heartbeat in its description. Happy to take a patch for the User's Guide. When you install pacemaker using package managers, heartbeat is in the dependency list. And goes on... Only on Debian/Ubuntu, which has a dependency on corosync OR heartbeat -- something that many other distros don't even support --, all of which is to make rolling upgrades from the previous major release possible -- something that no other distro supports. Florian signature.asc Description: OpenPGP digital signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] [Problem]post_notify_start_0 is carried out in the node that disappeared.
Hi all, We test trouble at the time of the start of the Master/Slave resource. Step1) We start the first node and send cib. Last updated: Thu Feb 10 16:32:12 2011 Stack: Heartbeat Current DC: srv01 (c7435833-8bc5-43aa-8195-c666b818677f) - partition with quorum Version: 1.0.10-b0266dd5ffa9c51377c68b1f29d6bc84367f51dd 1 Nodes configured, unknown expected votes 5 Resources configured. Online: [ srv01 ] prmIpPostgreSQLDB (ocf::heartbeat:IPaddr2): Started srv01 Resource Group: grpStonith2 prmStonith2-2 (stonith:external/ssh): Started srv01 prmStonith2-3 (stonith:meatware): Started srv01 Master/Slave Set: msPostgreSQLDB Masters: [ srv01 ] Stopped: [ prmApPostgreSQLDB:1 ] Clone Set: clnPingd Started: [ srv01 ] Stopped: [ prmPingd:1 ] Migration summary: * Node srv01: Step2) We change Stateful of the node of the second of them. (snip) stateful_start() { ocf_log info Start of Stateful. sleep 120 add sleep. stateful_check_state master (snip) Step3) We start a node of the second of them. Step4) We confirm sleep and reboot a node of the second of them. [root@srv02 ~]# ps -ef |grep sleep Step5) The node of the first of them detects the disappearance of the node of the second of them. * But, STONITH is delayed because post_notify_start_0 of the node of the second of them is carried out. * In the srv02 node that disappeared, we do not need the practice of post_notify_start_0. * STONITH should be carried out immediately.(STONITH is kept waiting to time-out of post_notify_start_0 now.) --(snip) Feb 10 16:33:18 srv01 crmd: [4293]: info: ccm_event_detail: NEW MEMBERSHIP: trans=3, nodes=1, new=0, lost=1 n_idx=0, new_idx=1, old_idx=3 Feb 10 16:33:18 srv01 crmd: [4293]: info: ccm_event_detail: CURRENT: srv01 [nodeid=0, born=3] Feb 10 16:33:18 srv01 crmd: [4293]: info: ccm_event_detail: LOST:srv02 [nodeid=1, born=2] Feb 10 16:33:18 srv01 crmd: [4293]: info: ais_status_callback: status: srv02 is now lost (was member) Feb 10 16:33:18 srv01 crmd: [4293]: info: crm_update_peer: Node srv02: id=1 state=lost (new) addr=(null) votes=-1 born=2 seen=2 proc=0200 Feb 10 16:33:18 srv01 crmd: [4293]: info: erase_node_from_join: Removed node srv02 from join calculations: welcomed=0 itegrated=0 finalized=0 confirmed=1 Feb 10 16:33:18 srv01 crmd: [4293]: info: populate_cib_nodes_ha: Requesting the list of configured nodes Feb 10 16:33:19 srv01 crmd: [4293]: info: te_pseudo_action: Pseudo action 36 fired and confirmed Feb 10 16:33:19 srv01 crmd: [4293]: info: te_pseudo_action: Pseudo action 39 fired and confirmed Feb 10 16:33:19 srv01 crmd: [4293]: info: te_rsc_command: Initiating action 75: notify prmApPostgreSQLDB:0_post_notify_start_0 on srv01 (local) Feb 10 16:33:19 srv01 crmd: [4293]: info: do_lrm_rsc_op: Performing key=75:7:0:6918f8dc-fe1a-4c28-8aff-e8ac7a5e7143 op=prmApPostgreSQLDB:0_notify_0 ) Feb 10 16:33:19 srv01 lrmd: [4290]: info: rsc:prmApPostgreSQLDB:0:24: notify Feb 10 16:33:19 srv01 cib: [4289]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/99, version=0.9.22): ok (rc=0) Feb 10 16:33:19 srv01 crmd: [4293]: info: te_rsc_command: Initiating action 76: notify prmApPostgreSQLDB:1_post_notify_start_0 on srv02 Feb 10 16:33:19 srv01 lrmd: [4290]: info: RA output: (prmApPostgreSQLDB:0:notify:stdout) usage: /usr/lib/ocf/resource.d//pacemaker/Stateful {start|stop|promote|demote|monitor|validate-all|meta-data} Expects to have a fully populated OCF RA-compliant environment set. Feb 10 16:33:19 srv01 crmd: [4293]: info: process_lrm_event: LRM operation prmApPostgreSQLDB:0_notify_0 (call=24, rc=0, cib-update=101, confirmed=true) ok Feb 10 16:33:19 srv01 crmd: [4293]: info: match_graph_event: Action prmApPostgreSQLDB:0_post_notify_start_0 (75) confirmed on srv01 (rc=0) Feb 10 16:33:19 srv01 crmd: [4293]: info: te_pseudo_action: Pseudo action 40 fired and confirmed Feb 10 16:34:39 srv01 crmd: [4293]: WARN: action_timer_callback: Timer popped (timeout=2, abort_level=100, complete=false) Feb 10 16:34:39 srv01 crmd: [4293]: ERROR: print_elem: Aborting transition, action lost: [Action 76]: Failed (id: prmApPostgreSQLDB:1_post_notify_start_0, loc: srv02, priority: 100) Feb 10 16:34:39 srv01 crmd: [4293]: info: abort_transition_graph: action_timer_callback:486 - Triggered transition abort (complete=0) : Action lost Feb 10 16:34:39 srv01 crmd: [4293]: WARN: cib_action_update: rsc_op 76: prmApPostgreSQLDB:1_post_notify_start_0 on srv02 timed out Feb 10 16:34:39 srv01 crmd: [4293]: info: run_graph: Feb 10 16:34:39 srv01 crmd: [4293]: notice: run_graph: Transition 7 (Complete=16, Pending=0, Fired=0, Skipped=1, Incomplete=0, Source=/var/lib/pengine/pe-input-7.bz2): Stopped Feb 10 16:34:39 srv01 crmd: [4293]: info:
Re: [Pacemaker] First confused (then enlightened ? :)
Hi, snip Is there a searchable repository of the list content so I may find if some of my doubts are already explained ? Answering myself, I found that this (and some related lists) are archived and indexed at GossamerThreads, http://www.gossamer-threads.com/lists/linuxha I usually find that indexing a list like this is an invaluable tool, so here for the record. For future reference, maybe this method will help someone else. From http://www.clusterlabs.org/wiki/Mailing_lists there are 3 main archives: - http://oss.clusterlabs.org/pipermail/pacemaker - http://lists.linux-ha.org/pipermail/linux-ha - http://lists.linux-foundation.org/pipermail/openais + 1 for drbd - http://lists.linbit.com/pipermail/drbd-user/ What I do is to take the gzipped archives from all of the above, extract them as text and index them with google desktop for quick reference. Here's the one liner to do that: for i in http://oss.clusterlabs.org/pipermail/pacemaker http://lists.linux-ha.org/pipermail/linux-ha http://lists.linux-foundation.org/pipermail/openais http://lists.linbit.com/pipermail/drbd-user/ ; do mkdir -p $(pwd)/${i##*/} for j in $(wget $i -O - 2/dev/null | awk -F '' -v var=$i '/.gz/ {print var/$2}') ; do wget $j -P $(pwd)/${i##*/} 2/dev/null; done gunzip $(pwd)/${i##*/}/*.gz 2/dev/null; done Regards, Dan -- Dan Frincu CCNA, RHCE ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] First confused (then enlightened ? :)
On Mon, Feb 14, 2011 at 10:37 PM, Carlos G Mendioroz t...@huapi.ba.ar wrote: Andrew Beekhof @ 14/02/2011 05:44 -0300 dixit: -Is still the case that Heartbeat is not to be considered for new deployments ? (I read something along that line) pretty much http://www.clusterlabs.org/wiki/FAQ#Should_I_Run_Pacemaker_on_Heartbeat_or_Coroysnc.3F That was the place I was referring to. That section is 100% accurate. Still, the thing is pretty confusing. DR-BD talks about heartbeat in its description. When you install pacemaker using package managers, heartbeat is in the dependency list. And goes on... Well its still completely supported, but as a developer community we're moving away from it. So particularly for people coming to clustering for the first time, it doesn't make much sense to learn a deprecated/dead technology. also have a look at clusters from scratch: http://www.clusterlabs.org/doc Reading now. Nice doc. Would you accept errata items ? Of course. Well, actually, only for the 1.1 version - thats the only version we generate from docbook format. May be I should PM you, but page 10, pcmk-2 is supposed to be 19.168.9.42 . Typo^2 ? 192.168.122.102 ? Thats been fixed in the 1.1 version Also, ...add additional entries for the three machines. Three ? Yeah, adding a third machine was going to be part of the guide. But I never got to that part. Fixed. -- Carlos G Mendioroz t...@huapi.ba.ar LW7 EQI Argentina ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] [Problem]post_notify_start_0 is carried out in the node that disappeared.
On Tue, Feb 15, 2011 at 9:32 AM, renayama19661...@ybb.ne.jp wrote: Hi all, We test trouble at the time of the start of the Master/Slave resource. Step1) We start the first node and send cib. Last updated: Thu Feb 10 16:32:12 2011 Stack: Heartbeat Current DC: srv01 (c7435833-8bc5-43aa-8195-c666b818677f) - partition with quorum Version: 1.0.10-b0266dd5ffa9c51377c68b1f29d6bc84367f51dd 1 Nodes configured, unknown expected votes 5 Resources configured. Online: [ srv01 ] prmIpPostgreSQLDB (ocf::heartbeat:IPaddr2): Started srv01 Resource Group: grpStonith2 prmStonith2-2 (stonith:external/ssh): Started srv01 prmStonith2-3 (stonith:meatware): Started srv01 Master/Slave Set: msPostgreSQLDB Masters: [ srv01 ] Stopped: [ prmApPostgreSQLDB:1 ] Clone Set: clnPingd Started: [ srv01 ] Stopped: [ prmPingd:1 ] Migration summary: * Node srv01: Step2) We change Stateful of the node of the second of them. (snip) stateful_start() { ocf_log info Start of Stateful. sleep 120 add sleep. stateful_check_state master (snip) Step3) We start a node of the second of them. Step4) We confirm sleep and reboot a node of the second of them. [root@srv02 ~]# ps -ef |grep sleep Step5) The node of the first of them detects the disappearance of the node of the second of them. * But, STONITH is delayed because post_notify_start_0 of the node of the second of them is carried out. Wait, what? Why would post_notify_start_0 of prmApPostgreSQLDB block stonith? You didn't put a stonith resource in an ordering constraint did you? * In the srv02 node that disappeared, we do not need the practice of post_notify_start_0. * STONITH should be carried out immediately.(STONITH is kept waiting to time-out of post_notify_start_0 now.) --(snip) Feb 10 16:33:18 srv01 crmd: [4293]: info: ccm_event_detail: NEW MEMBERSHIP: trans=3, nodes=1, new=0, lost=1 n_idx=0, new_idx=1, old_idx=3 Feb 10 16:33:18 srv01 crmd: [4293]: info: ccm_event_detail: CURRENT: srv01 [nodeid=0, born=3] Feb 10 16:33:18 srv01 crmd: [4293]: info: ccm_event_detail: LOST: srv02 [nodeid=1, born=2] Feb 10 16:33:18 srv01 crmd: [4293]: info: ais_status_callback: status: srv02 is now lost (was member) Feb 10 16:33:18 srv01 crmd: [4293]: info: crm_update_peer: Node srv02: id=1 state=lost (new) addr=(null) votes=-1 born=2 seen=2 proc=0200 Feb 10 16:33:18 srv01 crmd: [4293]: info: erase_node_from_join: Removed node srv02 from join calculations: welcomed=0 itegrated=0 finalized=0 confirmed=1 Feb 10 16:33:18 srv01 crmd: [4293]: info: populate_cib_nodes_ha: Requesting the list of configured nodes Feb 10 16:33:19 srv01 crmd: [4293]: info: te_pseudo_action: Pseudo action 36 fired and confirmed Feb 10 16:33:19 srv01 crmd: [4293]: info: te_pseudo_action: Pseudo action 39 fired and confirmed Feb 10 16:33:19 srv01 crmd: [4293]: info: te_rsc_command: Initiating action 75: notify prmApPostgreSQLDB:0_post_notify_start_0 on srv01 (local) Feb 10 16:33:19 srv01 crmd: [4293]: info: do_lrm_rsc_op: Performing key=75:7:0:6918f8dc-fe1a-4c28-8aff-e8ac7a5e7143 op=prmApPostgreSQLDB:0_notify_0 ) Feb 10 16:33:19 srv01 lrmd: [4290]: info: rsc:prmApPostgreSQLDB:0:24: notify Feb 10 16:33:19 srv01 cib: [4289]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/99, version=0.9.22): ok (rc=0) Feb 10 16:33:19 srv01 crmd: [4293]: info: te_rsc_command: Initiating action 76: notify prmApPostgreSQLDB:1_post_notify_start_0 on srv02 Feb 10 16:33:19 srv01 lrmd: [4290]: info: RA output: (prmApPostgreSQLDB:0:notify:stdout) usage: /usr/lib/ocf/resource.d//pacemaker/Stateful {start|stop|promote|demote|monitor|validate-all|meta-data} Expects to have a fully populated OCF RA-compliant environment set. Feb 10 16:33:19 srv01 crmd: [4293]: info: process_lrm_event: LRM operation prmApPostgreSQLDB:0_notify_0 (call=24, rc=0, cib-update=101, confirmed=true) ok Feb 10 16:33:19 srv01 crmd: [4293]: info: match_graph_event: Action prmApPostgreSQLDB:0_post_notify_start_0 (75) confirmed on srv01 (rc=0) Feb 10 16:33:19 srv01 crmd: [4293]: info: te_pseudo_action: Pseudo action 40 fired and confirmed Feb 10 16:34:39 srv01 crmd: [4293]: WARN: action_timer_callback: Timer popped (timeout=2, abort_level=100, complete=false) Feb 10 16:34:39 srv01 crmd: [4293]: ERROR: print_elem: Aborting transition, action lost: [Action 76]: Failed (id: prmApPostgreSQLDB:1_post_notify_start_0, loc: srv02, priority: 100) Feb 10 16:34:39 srv01 crmd: [4293]: info: abort_transition_graph: action_timer_callback:486 - Triggered transition abort (complete=0) : Action lost Feb 10 16:34:39 srv01 crmd: [4293]: WARN: cib_action_update: rsc_op 76: prmApPostgreSQLDB:1_post_notify_start_0 on srv02 timed out Feb 10
Re: [Pacemaker] First confused (then enlightened ? :)
Andrew Beekhof @ 15/02/2011 04:25 -0300 dixit: For what I understand, you want the brains of the action at pacemaker, so VRRP, HSRP or (U)CARP seem more a trouble than a solution. (i.e. twin head) right ? In other words, it seems to better align with the solution idea to have pacemaker decide and some script-set do the changing. What you typically want to avoid is having two isolated entities trying to make decisions in the cluster - pulling it to pieces in the process. Right, makes a lot of sense, only one boss in the office and one place to define policy. But to integrate with other protocols thought as independent, like VRRP or (U)CARP, the dependency has to be implemented. Something like DRBD solves this by using crm_master to tell Pacemaker which instance it would like promoted, but not actually doing the promotion itself. I don't know if this is feasible for your application. In my case, it seems better to get rid of VRRP and use a more comprehensive look of pacemaker. Nevertheless, I don't see the concerns of MAC mutation being addressed anywhere. And I have my suspocious at ARP caches too. Both would be properties of the RA itself rather than Pacemaker or Heartbeat. So if you can script MAC mutation, you can also create an RA for it (or add it to an existing one). Is there a guide to implemented RAs ? I've seen that the shell can list them. Are they embedded or just showing a directory of entities found in some predefined places ? I'm currently thinking about a couple of ideas: -using mac-vlan to move an active mac from one server to another -using bonding to have something like a MEC, multichasis ether channel. (i.e. a way to not only migrate the MAC but also to signal the migration to the attachment switch using 802.1ad) Are there any statistics on how much time does it take to migrate an IP address by current resource ? (IPAddr2 I guess) I'm looking for a subsecond delay since failure detection, and I guess it's obvious, an active-standby setup. I've not done any measurements lately. Mostly its dependent on how long the RA takes. Ok, now I'm getting into RA arena I guess. For speedy failover, I would need a hot standby approach. Is that a pacemaker known state ? -- Carlos G Mendioroz t...@huapi.ba.ar LW7 EQI Argentina ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] [Problem]post_notify_start_0 is carried out in the node that disappeared.
Hi Andrew, Thank you for comment. Perhaps I misunderstood - does the node fail _while_ we're running post_notify_start_0? Is that the ordering you're talking about? Yes. I think that stonith do not have to wait for post_notify_start_0 of the inoperative node. If so, then the crmd is already supposed to be smart enough not to bother waiting for those actions - perhaps the logic got broken at some point. If you need detailed information, give me communication. Best Regards, Hideo Yamauchi. --- On Tue, 2011/2/15, Andrew Beekhof and...@beekhof.net wrote: On Feb 15, 2011, at 12:10 PM, renayama19661...@ybb.ne.jp wrote: Hi Andrew, Thank you for comment. Sorry...I may understand your opinion by mistake. Wait, what? Why would post_notify_start_0 of prmApPostgreSQLDB block stonith? Yes. I was able to see it like that. post_notify_start_0 of crmd seems to keep processing waiting to time-out. Feb 10 16:33:19 srv01 crmd: [4293]: info: te_rsc_command: Initiating action 76: notify prmApPostgreSQLDB:1_post_notify_start_0 on srv02 Feb 10 16:33:19 srv01 lrmd: [4290]: info: RA output: (prmApPostgreSQLDB:0:notify:stdout) usage: /usr/lib/ocf/resource.d//pacemaker/Stateful {start|stop|promote|demote|monitor|validate-all|meta-data} Expects to have a fully populated OCF RA-compliant environment set. Feb 10 16:33:19 srv01 crmd: [4293]: info: process_lrm_event: LRM operation prmApPostgreSQLDB:0_notify_0 (call=24, rc=0, cib-update=101, confirmed=true) ok Feb 10 16:33:19 srv01 crmd: [4293]: info: match_graph_event: Action prmApPostgreSQLDB:0_post_notify_start_0 (75) confirmed on srv01 (rc=0) Feb 10 16:33:19 srv01 crmd: [4293]: info: te_pseudo_action: Pseudo action 40 fired and confirmed Feb 10 16:34:39 srv01 crmd: [4293]: WARN: action_timer_callback: Timer popped (timeout=2, abort_level=100, complete=false) Feb 10 16:34:39 srv01 crmd: [4293]: ERROR: print_elem: Aborting transition, action lost: [Action 76]: Failed (id: prmApPostgreSQLDB:1_post_notify_start_0, loc: srv02, priority: 100) Feb 10 16:34:39 srv01 crmd: [4293]: info: abort_transition_graph: action_timer_callback:486 - Triggered transition abort (complete=0) : Action lost Feb 10 16:34:39 srv01 crmd: [4293]: WARN: cib_action_update: rsc_op 76: prmApPostgreSQLDB:1_post_notify_start_0 on srv02 timed out You didn't put a stonith resource in an ordering constraint did you? I did not set order of stonith. We want to carry out STONITH without waiting for time-out of post_notify_start_0 Is there a method to solve this problem? Perhaps I misunderstood - does the node fail _while_ we're running post_notify_start_0? Is that the ordering you're talking about? If so, then the crmd is already supposed to be smart enough not to bother waiting for those actions - perhaps the logic got broken at some point. Best Regards, Hideo Yamauchi. --- On Tue, 2011/2/15, Andrew Beekhof and...@beekhof.net wrote: On Tue, Feb 15, 2011 at 9:32 AM, renayama19661...@ybb.ne.jp wrote: Hi all, We test trouble at the time of the start of the Master/Slave resource. Step1) We start the first node and send cib. Last updated: Thu Feb 10 16:32:12 2011 Stack: Heartbeat Current DC: srv01 (c7435833-8bc5-43aa-8195-c666b818677f) - partition with quorum Version: 1.0.10-b0266dd5ffa9c51377c68b1f29d6bc84367f51dd 1 Nodes configured, unknown expected votes 5 Resources configured. Online: [ srv01 ] prmIpPostgreSQLDB (ocf::heartbeat:IPaddr2): Started srv01 Resource Group: grpStonith2 prmStonith2-2 (stonith:external/ssh): Started srv01 prmStonith2-3 (stonith:meatware): Started srv01 Master/Slave Set: msPostgreSQLDB Masters: [ srv01 ] Stopped: [ prmApPostgreSQLDB:1 ] Clone Set: clnPingd Started: [ srv01 ] Stopped: [ prmPingd:1 ] Migration summary: * Node srv01: Step2) We change Stateful of the node of the second of them. (snip) stateful_start() { ocf_log info Start of Stateful. sleep 120 add sleep. stateful_check_state master (snip) Step3) We start a node of the second of them. Step4) We confirm sleep and reboot a node of the second of them. [root@srv02 ~]# ps -ef |grep sleep Step5) The node of the first of them detects the disappearance of the node of the second of them. * But, STONITH is delayed because post_notify_start_0 of the node of the second of them is carried out. Wait, what? Why would post_notify_start_0 of prmApPostgreSQLDB block stonith? You didn't put a stonith resource in an ordering constraint did you? * In the srv02 node that disappeared, we do not need the practice of post_notify_start_0. * STONITH should be carried out immediately.(STONITH is kept waiting to time-out of
Re: [Pacemaker] [Problem]post_notify_start_0 is carried out in the node that disappeared.
On Tue, Feb 15, 2011 at 3:01 PM, renayama19661...@ybb.ne.jp wrote: Hi Andrew, Thank you for comment. Perhaps I misunderstood - does the node fail _while_ we're running post_notify_start_0? Is that the ordering you're talking about? Yes. I think that stonith do not have to wait for post_notify_start_0 of the inoperative node. If so, then the crmd is already supposed to be smart enough not to bother waiting for those actions - perhaps the logic got broken at some point. If you need detailed information, give me communication. Should be enough in the bug, i'll follow up there ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] Packages for Opensuse 11.3 don't build / install
Hi, the packages from rpm-next(64bit) for opensuse 11.3 do not install there (at least true for 1.1.4 and 1.1.5). The plugin is in ./usr/lib/lcrso/pacemaker.lcrso but should be in ./usr/lib64/lcrso/pacemaker.lcrso I think the patch below (borrowed from the 'official' packages) cures. Regards Holger diff -r 43a11c0daae4 pacemaker.spec --- a/pacemaker.specMon Feb 14 15:25:13 2011 +0100 +++ b/pacemaker.specTue Feb 15 17:50:27 2011 +0100 @@ -1,3 +1,7 @@ +%if 0%{?suse_version} +%define _libexecdir %{_libdir} +%endif + %global gname haclient %global uname hacluster %global pcmk_docdir %{_docdir}/%{name} ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] Pacemaker/Corosync Professional Services Help
Dear Mailing List: My company is investigating the use Pacemaker/Corosync and would like to create a proof of concept. We need professional services from a hands-on developer with detailed knowledge of Pacemaker/Corosync, Linux scripting experience, and general knowledge of the Linux OS. Please contact me if you or someone you know is able to provide such a service. __ Leo Papadopoulos (leo.papadopou...@ipc.com) Chief Technology Officer IPC Systems 777 Commerce Drive Fairfield, CT 06825-5500 Virtual Number: +1(203) 539-0448 - Please consider the environment before printing this email. DISCLAIMER: This e-mail may contain information that is confidential, privileged or otherwise protected from disclosure. If you are not an intended recipient of this e-mail, do not duplicate or redistribute it by any means. Please delete it and any attachments and notify the sender that you have received it in error. Unintended recipients are prohibited from taking action on the basis of information in this e-mail.E-mail messages may contain computer viruses or other defects, may not be accurately replicated on other systems, or may be intercepted, deleted or interfered with without the knowledge of the sender or the intended recipient. If you are not comfortable with the risks associated with e-mail messages, you may decide not to use e-mail to communicate with IPC. IPC reserves the right, to the extent and under circumstances permitted by applicable law, to retain, monitor and intercept e-mail messages to and from its systems. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] corosync+pacemaker support disk heartbeating?
Hi allI think there is something wrong with cluster communication,result in node reboot. so I want to use disk heartbeating, I use corosync-1.2.2-1.1.el5 and pacemaker-1.0.9.1-1.el5.is there any guide tell me how to realize disk heartbeating with corosync and pacemaker?Thanks a lot ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] First confused (then enlightened ? :)
On Tue, Feb 15, 2011 at 1:02 PM, Carlos G Mendioroz t...@huapi.ba.arwrote: Andrew Beekhof @ 15/02/2011 04:25 -0300 dixit: For what I understand, you want the brains of the action at pacemaker, so VRRP, HSRP or (U)CARP seem more a trouble than a solution. (i.e. twin head) right ? In other words, it seems to better align with the solution idea to have pacemaker decide and some script-set do the changing. What you typically want to avoid is having two isolated entities trying to make decisions in the cluster - pulling it to pieces in the process. Right, makes a lot of sense, only one boss in the office and one place to define policy. But to integrate with other protocols thought as independent, like VRRP or (U)CARP, the dependency has to be implemented. Something like DRBD solves this by using crm_master to tell Pacemaker which instance it would like promoted, but not actually doing the promotion itself. I don't know if this is feasible for your application. In my case, it seems better to get rid of VRRP and use a more comprehensive look of pacemaker. Nevertheless, I don't see the concerns of MAC mutation being addressed anywhere. And I have my suspocious at ARP caches too. Both would be properties of the RA itself rather than Pacemaker or Heartbeat. So if you can script MAC mutation, you can also create an RA for it (or add it to an existing one). Is there a guide to implemented RAs ? I've seen that the shell can list them. Are they embedded or just showing a directory of entities found in some predefined places ? http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html - The OCF Resource Agent Developer's Guide http://www.linux-ha.org/wiki/Resource_Agents - Resource Agents http://www.linux-ha.org/wiki/OCF_Resource_Agents - OCF Resource Agents http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/index.html#ap-ocf - OCF Resource Agents http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Clusters_from_Scratch/index.html#id2281146 - Listing Resource Agents HTH I'm currently thinking about a couple of ideas: -using mac-vlan to move an active mac from one server to another -using bonding to have something like a MEC, multichasis ether channel. (i.e. a way to not only migrate the MAC but also to signal the migration to the attachment switch using 802.1ad) Are there any statistics on how much time does it take to migrate an IP address by current resource ? (IPAddr2 I guess) I'm looking for a subsecond delay since failure detection, and I guess it's obvious, an active-standby setup. I've not done any measurements lately. Mostly its dependent on how long the RA takes. Ok, now I'm getting into RA arena I guess. For speedy failover, I would need a hot standby approach. Is that a pacemaker known state ? -- Carlos G Mendioroz t...@huapi.ba.ar LW7 EQI Argentina ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker -- Dan Frincu CCNA, RHCE ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] First confused (then enlightened ? :)
On Wed, Feb 16, 2011 at 9:32 AM, Dan Frincu df.clus...@gmail.com wrote: On Tue, Feb 15, 2011 at 1:02 PM, Carlos G Mendioroz t...@huapi.ba.arwrote: Andrew Beekhof @ 15/02/2011 04:25 -0300 dixit: For what I understand, you want the brains of the action at pacemaker, so VRRP, HSRP or (U)CARP seem more a trouble than a solution. (i.e. twin head) right ? In other words, it seems to better align with the solution idea to have pacemaker decide and some script-set do the changing. What you typically want to avoid is having two isolated entities trying to make decisions in the cluster - pulling it to pieces in the process. Right, makes a lot of sense, only one boss in the office and one place to define policy. But to integrate with other protocols thought as independent, like VRRP or (U)CARP, the dependency has to be implemented. Something like DRBD solves this by using crm_master to tell Pacemaker which instance it would like promoted, but not actually doing the promotion itself. I don't know if this is feasible for your application. In my case, it seems better to get rid of VRRP and use a more comprehensive look of pacemaker. Nevertheless, I don't see the concerns of MAC mutation being addressed anywhere. And I have my suspocious at ARP caches too. Both would be properties of the RA itself rather than Pacemaker or Heartbeat. So if you can script MAC mutation, you can also create an RA for it (or add it to an existing one). Is there a guide to implemented RAs ? I've seen that the shell can list them. Are they embedded or just showing a directory of entities found in some predefined places ? http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html - The OCF Resource Agent Developer's Guide http://www.linux-ha.org/wiki/Resource_Agents - Resource Agents http://www.linux-ha.org/wiki/OCF_Resource_Agents - OCF Resource Agents http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/index.html#ap-ocf - OCF Resource Agents http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Clusters_from_Scratch/index.html#id2281146 - Listing Resource Agents And not to forget http://www.linux-ha.org/doc/users-guide/users-guide.html - The Linux-HA User’s Guide and http://www.linux-ha.org/doc/man-pages/man-pages.html - Linux-HA Manual Pages HTH I'm currently thinking about a couple of ideas: -using mac-vlan to move an active mac from one server to another -using bonding to have something like a MEC, multichasis ether channel. (i.e. a way to not only migrate the MAC but also to signal the migration to the attachment switch using 802.1ad) Are there any statistics on how much time does it take to migrate an IP address by current resource ? (IPAddr2 I guess) I'm looking for a subsecond delay since failure detection, and I guess it's obvious, an active-standby setup. I've not done any measurements lately. Mostly its dependent on how long the RA takes. Ok, now I'm getting into RA arena I guess. For speedy failover, I would need a hot standby approach. Is that a pacemaker known state ? -- Carlos G Mendioroz t...@huapi.ba.ar LW7 EQI Argentina ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker -- Dan Frincu CCNA, RHCE -- Dan Frincu CCNA, RHCE ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker