Re: [Pacemaker] Pacemaker cluster with different operating systems
Hi Andrew, Yes, I added "pcmk" file under "/etc/corosync/service.d/" folder. Since you think the setup should work, here are the details of the problem. Could you please check for any problem? I hope I have given enough information about the setup and problem. [root@pmidea1 ~]# more /etc/redhat-release Red Hat Enterprise Linux Server release 6.2 (Santiago) [root@pmidea1 ~]# rpm -qa|egrep "corosync|pacemaker" pacemaker-libs-1.1.8-4.el6.x86_64 corosync-1.4.3-26.2.x86_64 pacemaker-cluster-libs-1.1.8-4.el6.x86_64 corosynclib-1.4.3-26.2.x86_64 pacemaker-cli-1.1.8-4.el6.x86_64 pacemaker-1.1.8-4.el6.x86_64 [root@pmidea1 ~]# more /etc/corosync/service.d/pcmk service { name: pacemaker ver: 1 } [root@pmidea2 ~]# more /etc/redhat-release Red Hat Enterprise Linux Server release 6.2 (Santiago) [root@pmidea2 ~]# rpm -qa|egrep "corosync|pacemaker" pacemaker-libs-1.1.8-4.el6.x86_64 corosync-1.4.3-26.2.x86_64 pacemaker-cluster-libs-1.1.8-4.el6.x86_64 corosynclib-1.4.3-26.2.x86_64 pacemaker-cli-1.1.8-4.el6.x86_64 pacemaker-1.1.8-4.el6.x86_64 [root@pmidea2 ~]# more /etc/corosync/service.d/pcmk service { name: pacemaker ver: 1 } [root@pmosidea ~]# more /etc/redhat-release Red Hat Enterprise Linux Server release 5.5 (Tikanga) [root@pmosidea ~]# rpm -qa|egrep "corosync|pacemaker" pacemaker-cluster-libs-1.1.8-2.el5 corosynclib-1.4.1-7.el5.1 pacemaker-1.1.8-2.el5 corosync-1.4.1-7.el5.1 pacemaker-cli-1.1.8-2.el5 pacemaker-libs-1.1.8-2.el5 [root@pmosidea ~]# more /etc/corosync/service.d/pcmk service { name: pacemaker ver: 1 } Corosync membership: [root@pmidea1 ~]# corosync-objctl | grep member totem.interface.member.memberaddr=10.34.38.46 totem.interface.member.memberaddr=10.34.38.47 totem.interface.member.memberaddr=10.34.38.48 runtime.totem.pg.mrp.srp.members.791028234.ip=r(0) ip(10.34.38.47) runtime.totem.pg.mrp.srp.members.791028234.join_count=1 runtime.totem.pg.mrp.srp.members.791028234.status=joined runtime.totem.pg.mrp.srp.members.774251018.ip=r(0) ip(10.34.38.46) runtime.totem.pg.mrp.srp.members.774251018.join_count=1 runtime.totem.pg.mrp.srp.members.774251018.status=joined runtime.totem.pg.mrp.srp.members.807805450.ip=r(0) ip(10.34.38.48) runtime.totem.pg.mrp.srp.members.807805450.join_count=1 runtime.totem.pg.mrp.srp.members.807805450.status=joined [root@pmosidea ~]# corosync-objctl | grep member totem.interface.member.memberaddr=10.34.38.46 totem.interface.member.memberaddr=10.34.38.47 totem.interface.member.memberaddr=10.34.38.48 runtime.totem.pg.mrp.srp.members.774251018.ip=r(0) ip(10.34.38.46) runtime.totem.pg.mrp.srp.members.774251018.join_count=1 runtime.totem.pg.mrp.srp.members.774251018.status=joined runtime.totem.pg.mrp.srp.members.791028234.ip=r(0) ip(10.34.38.47) runtime.totem.pg.mrp.srp.members.791028234.join_count=1 runtime.totem.pg.mrp.srp.members.791028234.status=joined runtime.totem.pg.mrp.srp.members.807805450.ip=r(0) ip(10.34.38.48) runtime.totem.pg.mrp.srp.members.807805450.join_count=1 runtime.totem.pg.mrp.srp.members.807805450.status=joined crm_mon outputs: >From pmidea1: - Last updated: Thu Mar 7 08:53:25 2013 Last change: Thu Mar 7 02:47:51 2013 via crmd on pmidea2 Stack: openais Current DC: pmidea2 - partition with quorum Version: 1.1.8-4.el6-394e906 3 Nodes configured, 2 expected votes 0 Resources configured. Online: [ pmidea1 pmidea2 ] OFFLINE: [ pmosidea ] >From pmosidea: -- Last updated: Thu Mar 7 08:53:21 2013 Last change: Thu Mar 7 03:16:50 2013 via crmd on pmosidea Stack: openais Current DC: pmosidea - partition WITHOUT quorum Version: 1.1.8-2.el5-394e906 3 Nodes configured, 2 expected votes 0 Resources configured. Node pmidea1: pending Node pmidea2: pending Online: [ pmosidea ] -Original Message- From: pacemaker-requ...@oss.clusterlabs.org [mailto:pacemaker-requ...@oss.clusterlabs.org] Sent: 07 Mart 2013 Perşembe 08:02 To: pacemaker@oss.clusterlabs.org Subject: Pacemaker Digest, Vol 64, Issue 30 -- Message: 2 Date: Thu, 7 Mar 2013 16:36:17 +1100 From: Andrew Beekhof To: The Pacemaker cluster resource manager Subject: Re: [Pacemaker] Pacemaker cluster with different operating systems Message-ID: Content-Type: text/plain; charset=ISO-8859-1 On Thu, Mar 7, 2013 at 4:09 PM, Osman Findik wrote: > Hi all, > We are using pacemaker with RHEL 6.2 successfully to manage pair of > MySQL databases. Pacemaker is coming from Red Hat High Availability Add-on. > Its version is 1.1.6 Our need is to add an observer to this cluster but our > existing servers are all RHEL 5.x servers. We could not locate same version > of pacemaker in clusterlabs repo. > So we tried to install provided rpms from clusterlabs repo to RHEL 5.5 and > RHEL 6.2 servers. > Provided rpm version for RHEL 5 is pacemaker 1.1.8.2. > Provided rpm version for RHEL 6 is pacemaker 1.1.8.4. > > In this setup although servers are members of th
Re: [Pacemaker] [Question]About "sequential" designation of resource_set.
Hi Andrew, > > You use the resource sets _instead_ of a group. > > If you want group.ordered=false, then use a colocation set (with > > sequential=true). In "colocation", I used "resource_set". However, a result did not include the change. Will this result be a mistake of my setting? Case 1) sequential=false (snip) (sip) [root@rh63-heartbeat2 ~]# grep "Initiating action" /var/log/ha-log Mar 8 00:20:52 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: Initiating action 2: probe_complete probe_complete on rh63-heartbeat1 - no waiting Mar 8 00:20:52 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: Initiating action 3: probe_complete probe_complete on rh63-heartbeat2 (local) - no waiting Mar 8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: Initiating action 4: monitor vip-master_monitor_0 on rh63-heartbeat1 Mar 8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: Initiating action 7: monitor vip-master_monitor_0 on rh63-heartbeat2 (local) Mar 8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: Initiating action 5: monitor vip-rep_monitor_0 on rh63-heartbeat1 Mar 8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: Initiating action 8: monitor vip-rep_monitor_0 on rh63-heartbeat2 (local) Mar 8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: Initiating action 6: probe_complete probe_complete on rh63-heartbeat2 (local) - no waiting Mar 8 00:20:56 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: Initiating action 3: probe_complete probe_complete on rh63-heartbeat1 - no waiting Mar 8 00:20:56 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: Initiating action 5: start vip-master_start_0 on rh63-heartbeat1 Mar 8 00:20:58 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: Initiating action 1: stop vip-master_stop_0 on rh63-heartbeat1 Case 2) sequential=true (snip) (snip) [root@rh63-heartbeat2 ~]# grep "Initiating action" /var/log/ha-log Mar 7 23:54:44 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: Initiating action 2: probe_complete probe_complete on rh63-heartbeat1 - no waiting Mar 7 23:54:44 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: Initiating action 3: probe_complete probe_complete on rh63-heartbeat2 (local) - no waiting Mar 7 23:54:48 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: Initiating action 4: monitor vip-master_monitor_0 on rh63-heartbeat1 Mar 7 23:54:48 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: Initiating action 7: monitor vip-master_monitor_0 on rh63-heartbeat2 (local) Mar 7 23:54:48 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: Initiating action 5: monitor vip-rep_monitor_0 on rh63-heartbeat1 Mar 7 23:54:48 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: Initiating action 8: monitor vip-rep_monitor_0 on rh63-heartbeat2 (local) Mar 7 23:54:48 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: Initiating action 6: probe_complete probe_complete on rh63-heartbeat2 (local) - no waiting Mar 7 23:54:49 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: Initiating action 3: probe_complete probe_complete on rh63-heartbeat1 - no waiting Mar 7 23:54:49 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: Initiating action 5: start vip-master_start_0 on rh63-heartbeat1 Mar 7 23:54:51 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: Initiating action 1: stop vip-master_stop_0 on rh63-heartbeat1 Best Regards, Hideo Yamauchi. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [RFC] Automatic nodelist synchronization between corosync and pacemaker
07.03.2013 03:37, Andrew Beekhof wrote: > On Thu, Mar 7, 2013 at 2:41 AM, Vladislav Bogdanov > wrote: >> 06.03.2013 08:35, Andrew Beekhof wrote: > >>> So basically, you want to be able to add/remove nodes from nodelist.* >>> in corosync.conf and have pacemaker automatically add/remove them from >>> itself? >> >> Not corosync.conf, but cmap which is initially (partially) filled with >> values from corosync.conf. >> >>> >>> If corosync.conf gets out of sync (admin error or maybe a node was >>> down when you updated last) they might well get added back - I assume >>> you're ok with that? >>> Because there's no real way to know the difference between "added >>> back" and "not removed from last time". >> >> Sorry, can you please reword? > > When node-A comes up with "node-X" that no-one else has, the cluster > has no way to know if node-X was just added, or if the admin forgot to > remove it on node-A. Exactly that is not problem if node does not appear in CIB until it is seen online. If node-A comes up, then it is just booted, and that means that it didn't see node-X online yet (if it is not actually online of course). And then node-X is not added to CIB. > >>> Or are you planning to never update the on-disk corosync.conf and only >>> modify the in-memory nodelist? >> >> That depends on the actual use case I think. >> >> Hm. Interesting, how corosync behave when new dynamic nodes are added to >> cluster... I mean following: we have static corosync.conf with nodelist >> containing f.e. 3 entries, then we add fourth entry via cmap and boot >> fourth node. What should be in corosync.conf of that node? > > I don't know actually. Try it and see if it works without the local > node being defined? > >> I believe in >> wont work without that _its_ fourth entry. Ugh. If so, then no fully >> dynamic "elastic" cluster which I was dreaming of is still possible >> because out-of-the-box when using dynamic nodelist. >> >> The only way to have this I see is to have static nodelist in >> corosync.conf with all possible nodes predefined. And never edit it in >> cmap. So, my original point >> * Remove nodes from CIB when they are removed from a nodelist. >> does not fit. >> >> By elastic I mean what was discussed on corosync list when Fabio started >> with votequorum design and what then appeared in votequorum manpage: >> === >> allow_downscale: 1 >> >> Enables allow downscale (AD) feature (default: 0). >> >> The general behaviour of votequorum is to never decrease expected votes >> or quorum. >> >> When AD is enabled, both expected votes and quorum are recalculated >> when a node leaves the cluster in a clean state (normal corosync shut- >> down process) down to configured expected_votes. > > But thats very different to removing the node completely. > You still want to know its in a sane state. Isn't it enough to trust corosync here? Of course if it supplies some event that "node X leaved cluster in a clean state and we lowered expected_votes and quorum. Clean corosync shutdown means that either 'no more corosync clients remain and it was safe to shutdown' or 'corosync has a bug'. Pacemaker is corosync client, and corosync should not stop in a clean state if pacemaker is still running there. And 'pacemaker is not running on node X' means that pacemaker instances on other nodes accepted that. Otherwise node is scheduled to stonith and there is no 'clean' shutdown. Am I correct here? > >> Example use case: >> >> 1) N node cluster (where N is any value higher than 3) >> 2) expected_votes set to 3 in corosync.conf >> 3) only 3 nodes are running >> 4) admin requires to increase processing power and adds 10 nodes >> 5) internal expected_votes is automatically set to 13 >> 6) minimum expected_votes is 3 (from configuration) >> - up to this point this is standard votequorum behavior - >> 7) once the work is done, admin wants to remove nodes from the cluster >> 8) using an ordered shutdown the admin can reduce the cluster size >>automatically back to 3, but not below 3, where normal quorum >>operation will work as usual. >> = >> >> What I would expect from pacemaker, is to automatically remove nodes >> down to 3 at step 8 (just follow quorum) if AD is enabled AND pacemaker >> is instructed to follow that (with some other cmap switch). And also to >> reduce number of allocated clone instances. Sure, all nodes must have >> equal number of votes (1). >> >> Is it ok for you? > > Not really. > We simply don't have enough information to do the removal. > All we get is "node gone", we have to do a fair bit of work to > calculate if it was clean at the time or not (and clean to corosync > doesn't always imply clean to pacemaker). Please see above. There is always (at least with mcp model) some time frame between pacemaker stop and corosync stop events. And pacemaker should accept "node leave" after first one (doesn't it mark node as 'pending' in that state?). And second event (corosync stop)
Re: [Pacemaker] [Question]About "sequential" designation of resource_set.
Hi Andrew, Thank you for comment. It was colocation. I make modifications and confirm movement. Many Thanks! Hideo Yamauchi. --- On Thu, 2013/3/7, Andrew Beekhof wrote: > Oh! > > You use the resource sets _instead_ of a group. > If you want group.ordered=false, then use a colocation set (with > sequential=true). > If you want group.colocated=false, then use an ordering set (with > sequential=true). > > Hope that helps :) > > On Thu, Mar 7, 2013 at 3:16 PM, wrote: > > Hi Andrew, > > > > Thank you for comments. > > > >> > Case 3) group resource_set sequential=false > >> > * Start of vip-rep waits for start of vip-master and is published. > >> > * I expected a result same as the first case. > >> > >> Me too. Have you got the relevant PE file? > > > > I attached the thing which just collected hb_report. > > > > Best Regards, > > Hideo Yamauchi. > > > > > > > > --- On Thu, 2013/3/7, Andrew Beekhof wrote: > > > >> On Thu, Mar 7, 2013 at 1:27 PM, wrote: > >> > Hi Andrew, > >> > > >> > I tried "resource_set sequential" designation. > >> > * http://www.gossamer-threads.com/lists/linuxha/pacemaker/84578 > >> > > >> > I caused an error in start of the vip-master resource and confirmed > >> > movement. > >> > > >> > (snip) > >> > > >> > >> >type="Dummy2"> > >> > > >> > >> >on-fail="restart" timeout="60s"/> > >> > >> >name="monitor" on-fail="restart" timeout="60s"/> > >> > >> >on-fail="block" timeout="60s"/> > >> > > >> > > >> > >> >type="Dummy"> > >> > > >> > >> >on-fail="stop" timeout="60s"/> > >> > >> >on-fail="restart" timeout="60s"/> > >> > >> >on-fail="block" timeout="60s"/> > >> > > >> > > >> > > >> > (snip) > >> > > >> > By the ordered designation of the group resource, the difference that I > >> > expected appeared.( Case 1 and Case 2) > >> > However, by the "sequential" designation, the difference that I expected > >> > did not appear.(Case 3 and Case 4) > >> > > >> > (snip) > >> > > >> > > >> > >> >id="test-order-resource_set"> ---> or "false" > >> > > >> > > >> > > >> > > >> > > >> > (snip) > >> > > >> > > >> > Case 1) group meta_attribute ordered=false > >> > * Start of vip-rep is published without waiting for start of vip-master. > >> > > >> > [root@rh63-heartbeat2 ~]# grep "Initiating action" /var/log/ha-log > >> > Mar 7 19:40:50 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: > >> > Initiating action 2: probe_complete probe_complete on rh63-heartbeat1 - > >> > no waiting > >> > Mar 7 19:40:50 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: > >> > Initiating action 3: probe_complete probe_complete on rh63-heartbeat2 > >> > (local) - no waiting > >> > Mar 7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: > >> > Initiating action 4: monitor vip-master_monitor_0 on rh63-heartbeat1 > >> > Mar 7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: > >> > Initiating action 7: monitor vip-master_monitor_0 on rh63-heartbeat2 > >> > (local) > >> > Mar 7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: > >> > Initiating action 5: monitor vip-rep_monitor_0 on rh63-heartbeat1 > >> > Mar 7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: > >> > Initiating action 8: monitor vip-rep_monitor_0 on rh63-heartbeat2 (local) > >> > Mar 7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: > >> > Initiating action 6: probe_complete probe_complete on rh63-heartbeat2 > >> > (local) - no waiting > >> > Mar 7 19:41:25 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: > >> > Initiating action 3: probe_complete probe_complete on rh63-heartbeat1 - > >> > no waiting > >> > Mar 7 19:41:25 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: > >> > Initiating action 5: start vip-master_start_0 on rh63-heartbeat1 > >> > Mar 7 19:41:25 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: > >> > Initiating action 7: start vip-rep_start_0 on rh63-heartbeat1 > >> > Mar 7 19:41:26 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: > >> > Initiating action 8: monitor vip-rep_monitor_1 on rh63-heartbeat1 > >> > Mar 7 19:41:27 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: > >> > Initiating action 2: stop vip-master_stop_0 on rh63-heartbeat1 > >> > Mar 7 19:41:28 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: > >> > Initiating action 6: stop vip-rep_stop_0 on rh63-heartbeat1 > >> > > >> > > >> > Case 2) group meta_attribute ordered=true > >> > * Start of vip-rep waits for start of vip-master and is published. > >> > > >> > [root@rh63-heartbeat2 ~]# grep "Initiating action" /var/log/ha-log > >> > Mar 7 19:34:37 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: > >> > Initiating action 2: probe_complete probe_co
Re: [Pacemaker] [Question]About "sequential" designation of resource_set.
Oh! You use the resource sets _instead_ of a group. If you want group.ordered=false, then use a colocation set (with sequential=true). If you want group.colocated=false, then use an ordering set (with sequential=true). Hope that helps :) On Thu, Mar 7, 2013 at 3:16 PM, wrote: > Hi Andrew, > > Thank you for comments. > >> > Case 3) group resource_set sequential=false >> > * Start of vip-rep waits for start of vip-master and is published. >> > * I expected a result same as the first case. >> >> Me too. Have you got the relevant PE file? > > I attached the thing which just collected hb_report. > > Best Regards, > Hideo Yamauchi. > > > > --- On Thu, 2013/3/7, Andrew Beekhof wrote: > >> On Thu, Mar 7, 2013 at 1:27 PM, wrote: >> > Hi Andrew, >> > >> > I tried "resource_set sequential" designation. >> > * http://www.gossamer-threads.com/lists/linuxha/pacemaker/84578 >> > >> > I caused an error in start of the vip-master resource and confirmed >> > movement. >> > >> > (snip) >> > >> > > > type="Dummy2"> >> > >> > > > on-fail="restart" timeout="60s"/> >> > > > on-fail="restart" timeout="60s"/> >> > > > on-fail="block" timeout="60s"/> >> > >> > >> > > > type="Dummy"> >> > >> > > > on-fail="stop" timeout="60s"/> >> > > > on-fail="restart" timeout="60s"/> >> > > > on-fail="block" timeout="60s"/> >> > >> > >> > >> > (snip) >> > >> > By the ordered designation of the group resource, the difference that I >> > expected appeared.( Case 1 and Case 2) >> > However, by the "sequential" designation, the difference that I expected >> > did not appear.(Case 3 and Case 4) >> > >> > (snip) >> > >> > >> > > > id="test-order-resource_set"> ---> or "false" >> > >> > >> > >> > >> > >> > (snip) >> > >> > >> > Case 1) group meta_attribute ordered=false >> > * Start of vip-rep is published without waiting for start of vip-master. >> > >> > [root@rh63-heartbeat2 ~]# grep "Initiating action" /var/log/ha-log >> > Mar 7 19:40:50 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: >> > Initiating action 2: probe_complete probe_complete on rh63-heartbeat1 - no >> > waiting >> > Mar 7 19:40:50 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: >> > Initiating action 3: probe_complete probe_complete on rh63-heartbeat2 >> > (local) - no waiting >> > Mar 7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: >> > Initiating action 4: monitor vip-master_monitor_0 on rh63-heartbeat1 >> > Mar 7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: >> > Initiating action 7: monitor vip-master_monitor_0 on rh63-heartbeat2 >> > (local) >> > Mar 7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: >> > Initiating action 5: monitor vip-rep_monitor_0 on rh63-heartbeat1 >> > Mar 7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: >> > Initiating action 8: monitor vip-rep_monitor_0 on rh63-heartbeat2 (local) >> > Mar 7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: >> > Initiating action 6: probe_complete probe_complete on rh63-heartbeat2 >> > (local) - no waiting >> > Mar 7 19:41:25 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: >> > Initiating action 3: probe_complete probe_complete on rh63-heartbeat1 - no >> > waiting >> > Mar 7 19:41:25 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: >> > Initiating action 5: start vip-master_start_0 on rh63-heartbeat1 >> > Mar 7 19:41:25 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: >> > Initiating action 7: start vip-rep_start_0 on rh63-heartbeat1 >> > Mar 7 19:41:26 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: >> > Initiating action 8: monitor vip-rep_monitor_1 on rh63-heartbeat1 >> > Mar 7 19:41:27 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: >> > Initiating action 2: stop vip-master_stop_0 on rh63-heartbeat1 >> > Mar 7 19:41:28 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: >> > Initiating action 6: stop vip-rep_stop_0 on rh63-heartbeat1 >> > >> > >> > Case 2) group meta_attribute ordered=true >> > * Start of vip-rep waits for start of vip-master and is published. >> > >> > [root@rh63-heartbeat2 ~]# grep "Initiating action" /var/log/ha-log >> > Mar 7 19:34:37 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: >> > Initiating action 2: probe_complete probe_complete on rh63-heartbeat1 - no >> > waiting >> > Mar 7 19:34:37 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: >> > Initiating action 3: probe_complete probe_complete on rh63-heartbeat2 >> > (local) - no waiting >> > Mar 7 19:35:42 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: >> > Initiating action 4: monitor vip-master_monitor_0 on rh63-heartbeat1 >> > Mar 7 19:35:42 rh63-heartbeat2 crmd: [18865]: info: te_rsc
Re: [Pacemaker] Pacemaker cluster with different operating systems
On Thu, Mar 7, 2013 at 4:09 PM, Osman Findik wrote: > Hi all, > We are using pacemaker with RHEL 6.2 successfully to manage pair of MySQL > databases. Pacemaker is coming from Red Hat High Availability Add-on. Its > version is 1.1.6 > Our need is to add an observer to this cluster but our existing servers are > all RHEL 5.x servers. We could not locate same version of pacemaker in > clusterlabs repo. > So we tried to install provided rpms from clusterlabs repo to RHEL 5.5 and > RHEL 6.2 servers. > Provided rpm version for RHEL 5 is pacemaker 1.1.8.2. > Provided rpm version for RHEL 6 is pacemaker 1.1.8.4. > > In this setup although servers are members of the cluster over corosync, they > could not see each other from pacemaker. > I also tried to install 1.1.8.1 rpm packages in order to use same pacemaker > releases, but that is also failed. > > Before giving details of the errors, my question is do you think a hybrid > setup with different OSs (RHEL 5.x and RHEL 6.x) is possible? It should be. You're using the pacemaker plugin for corosync? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Pacemaker cluster with different operating systems
Hi all, We are using pacemaker with RHEL 6.2 successfully to manage pair of MySQL databases. Pacemaker is coming from Red Hat High Availability Add-on. Its version is 1.1.6 Our need is to add an observer to this cluster but our existing servers are all RHEL 5.x servers. We could not locate same version of pacemaker in clusterlabs repo. So we tried to install provided rpms from clusterlabs repo to RHEL 5.5 and RHEL 6.2 servers. Provided rpm version for RHEL 5 is pacemaker 1.1.8.2. Provided rpm version for RHEL 6 is pacemaker 1.1.8.4. In this setup although servers are members of the cluster over corosync, they could not see each other from pacemaker. I also tried to install 1.1.8.1 rpm packages in order to use same pacemaker releases, but that is also failed. Before giving details of the errors, my question is do you think a hybrid setup with different OSs (RHEL 5.x and RHEL 6.x) is possible? Thanks, ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [Question]About "sequential" designation of resource_set.
On Thu, Mar 7, 2013 at 1:27 PM, wrote: > Hi Andrew, > > I tried "resource_set sequential" designation. > * http://www.gossamer-threads.com/lists/linuxha/pacemaker/84578 > > I caused an error in start of the vip-master resource and confirmed movement. > > (snip) > > type="Dummy2"> > > on-fail="restart" timeout="60s"/> > on-fail="restart" timeout="60s"/> > on-fail="block" timeout="60s"/> > > > > > on-fail="stop" timeout="60s"/> > on-fail="restart" timeout="60s"/> > on-fail="block" timeout="60s"/> > > > > (snip) > > By the ordered designation of the group resource, the difference that I > expected appeared.( Case 1 and Case 2) > However, by the "sequential" designation, the difference that I expected did > not appear.(Case 3 and Case 4) > > (snip) > > > > ---> or "false" > > > > > > (snip) > > > Case 1) group meta_attribute ordered=false > * Start of vip-rep is published without waiting for start of vip-master. > > [root@rh63-heartbeat2 ~]# grep "Initiating action" /var/log/ha-log > Mar 7 19:40:50 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: > Initiating action 2: probe_complete probe_complete on rh63-heartbeat1 - no > waiting > Mar 7 19:40:50 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: > Initiating action 3: probe_complete probe_complete on rh63-heartbeat2 (local) > - no waiting > Mar 7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: > Initiating action 4: monitor vip-master_monitor_0 on rh63-heartbeat1 > Mar 7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: > Initiating action 7: monitor vip-master_monitor_0 on rh63-heartbeat2 (local) > Mar 7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: > Initiating action 5: monitor vip-rep_monitor_0 on rh63-heartbeat1 > Mar 7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: > Initiating action 8: monitor vip-rep_monitor_0 on rh63-heartbeat2 (local) > Mar 7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: > Initiating action 6: probe_complete probe_complete on rh63-heartbeat2 (local) > - no waiting > Mar 7 19:41:25 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: > Initiating action 3: probe_complete probe_complete on rh63-heartbeat1 - no > waiting > Mar 7 19:41:25 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: > Initiating action 5: start vip-master_start_0 on rh63-heartbeat1 > Mar 7 19:41:25 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: > Initiating action 7: start vip-rep_start_0 on rh63-heartbeat1 > Mar 7 19:41:26 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: > Initiating action 8: monitor vip-rep_monitor_1 on rh63-heartbeat1 > Mar 7 19:41:27 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: > Initiating action 2: stop vip-master_stop_0 on rh63-heartbeat1 > Mar 7 19:41:28 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: > Initiating action 6: stop vip-rep_stop_0 on rh63-heartbeat1 > > > Case 2) group meta_attribute ordered=true > * Start of vip-rep waits for start of vip-master and is published. > > [root@rh63-heartbeat2 ~]# grep "Initiating action" /var/log/ha-log > Mar 7 19:34:37 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: > Initiating action 2: probe_complete probe_complete on rh63-heartbeat1 - no > waiting > Mar 7 19:34:37 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: > Initiating action 3: probe_complete probe_complete on rh63-heartbeat2 (local) > - no waiting > Mar 7 19:35:42 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: > Initiating action 4: monitor vip-master_monitor_0 on rh63-heartbeat1 > Mar 7 19:35:42 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: > Initiating action 7: monitor vip-master_monitor_0 on rh63-heartbeat2 (local) > Mar 7 19:35:42 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: > Initiating action 5: monitor vip-rep_monitor_0 on rh63-heartbeat1 > Mar 7 19:35:42 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: > Initiating action 8: monitor vip-rep_monitor_0 on rh63-heartbeat2 (local) > Mar 7 19:35:42 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: > Initiating action 6: probe_complete probe_complete on rh63-heartbeat2 (local) > - no waiting > Mar 7 19:35:43 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: > Initiating action 3: probe_complete probe_complete on rh63-heartbeat1 - no > waiting > Mar 7 19:35:43 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: > Initiating action 5: start vip-master_start_0 on rh63-heartbeat1 > Mar 7 19:35:45 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: > Initiating action 1: stop vip-master_stop_0 on rh63-heartbeat1 > > > Case 3) group resource_set sequential=false > * Start of vip-rep waits
[Pacemaker] [Question]About "sequential" designation of resource_set.
Hi Andrew, I tried "resource_set sequential" designation. * http://www.gossamer-threads.com/lists/linuxha/pacemaker/84578 I caused an error in start of the vip-master resource and confirmed movement. (snip) (snip) By the ordered designation of the group resource, the difference that I expected appeared.( Case 1 and Case 2) However, by the "sequential" designation, the difference that I expected did not appear.(Case 3 and Case 4) (snip) ---> or "false" (snip) Case 1) group meta_attribute ordered=false * Start of vip-rep is published without waiting for start of vip-master. [root@rh63-heartbeat2 ~]# grep "Initiating action" /var/log/ha-log Mar 7 19:40:50 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating action 2: probe_complete probe_complete on rh63-heartbeat1 - no waiting Mar 7 19:40:50 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating action 3: probe_complete probe_complete on rh63-heartbeat2 (local) - no waiting Mar 7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating action 4: monitor vip-master_monitor_0 on rh63-heartbeat1 Mar 7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating action 7: monitor vip-master_monitor_0 on rh63-heartbeat2 (local) Mar 7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating action 5: monitor vip-rep_monitor_0 on rh63-heartbeat1 Mar 7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating action 8: monitor vip-rep_monitor_0 on rh63-heartbeat2 (local) Mar 7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating action 6: probe_complete probe_complete on rh63-heartbeat2 (local) - no waiting Mar 7 19:41:25 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating action 3: probe_complete probe_complete on rh63-heartbeat1 - no waiting Mar 7 19:41:25 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating action 5: start vip-master_start_0 on rh63-heartbeat1 Mar 7 19:41:25 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating action 7: start vip-rep_start_0 on rh63-heartbeat1 Mar 7 19:41:26 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating action 8: monitor vip-rep_monitor_1 on rh63-heartbeat1 Mar 7 19:41:27 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating action 2: stop vip-master_stop_0 on rh63-heartbeat1 Mar 7 19:41:28 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating action 6: stop vip-rep_stop_0 on rh63-heartbeat1 Case 2) group meta_attribute ordered=true * Start of vip-rep waits for start of vip-master and is published. [root@rh63-heartbeat2 ~]# grep "Initiating action" /var/log/ha-log Mar 7 19:34:37 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: Initiating action 2: probe_complete probe_complete on rh63-heartbeat1 - no waiting Mar 7 19:34:37 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: Initiating action 3: probe_complete probe_complete on rh63-heartbeat2 (local) - no waiting Mar 7 19:35:42 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: Initiating action 4: monitor vip-master_monitor_0 on rh63-heartbeat1 Mar 7 19:35:42 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: Initiating action 7: monitor vip-master_monitor_0 on rh63-heartbeat2 (local) Mar 7 19:35:42 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: Initiating action 5: monitor vip-rep_monitor_0 on rh63-heartbeat1 Mar 7 19:35:42 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: Initiating action 8: monitor vip-rep_monitor_0 on rh63-heartbeat2 (local) Mar 7 19:35:42 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: Initiating action 6: probe_complete probe_complete on rh63-heartbeat2 (local) - no waiting Mar 7 19:35:43 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: Initiating action 3: probe_complete probe_complete on rh63-heartbeat1 - no waiting Mar 7 19:35:43 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: Initiating action 5: start vip-master_start_0 on rh63-heartbeat1 Mar 7 19:35:45 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: Initiating action 1: stop vip-master_stop_0 on rh63-heartbeat1 Case 3) group resource_set sequential=false * Start of vip-rep waits for start of vip-master and is published. * I expected a result same as the first case. [root@rh63-heartbeat2 ~]# grep "Initiating action" /var/log/ha-log Mar 7 19:43:50 rh63-heartbeat2 crmd: [19113]: info: te_rsc_command: Initiating action 2: probe_complete probe_complete on rh63-heartbeat1 - no waiting Mar 7 19:43:50 rh63-heartbeat2 crmd: [19113]: info: te_rsc_command: Initiating action 3: probe_complete probe_complete on rh63-heartbeat2 (local) - no waitin
Re: [Pacemaker] The correction request of the log of booth
Hi, Jiaju 2013/3/6 Jiaju Zhang : > On Wed, 2013-03-06 at 15:13 +0900, yusuke iida wrote: >> Hi, Jiaju >> >> There is a request about the log of booth. >> >> I want you to change a log level when a ticket expires into "info" from >> "debug". >> >> I think that this log is important since it means what occurred. >> >> And I want you to add the following information to log. >> * Which ticket is it? >> * Who had a ticket? >> >> For example, I want you to use the following forms. >> info: lease expires ... owner [0] ticket [ticketA] > > Sounds great, will improve that;) Thank you for accepting. Many thanks! Yusuke > > Thanks, > Jiaju > -- METRO SYSTEMS CO., LTD Yusuke Iida Mail: yusk.i...@gmail.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [Problem][crmsh]The designation of the 'ordered' attribute becomes the error.
Hi Dejan, The problem was settled with your patch. However, I have a question. I want to use "resource_set" which Mr. Andrew proposed, but do not understand a method to use with crm shell. I read two next cib.xml and confirmed it with crm shell. Case 1) sequential="false". (snip) (snip) * When I confirm it with crm shell ... (snip) group master-group vip-master vip-rep order test-order : _rsc_set_ ( vip-master vip-rep ) (snip) Case 2) sequential="true" (snip) (snip) * When I confirm it with crm shell ... (snip) group master-group vip-master vip-rep xml \ \ \ \ \ (snip) Does the designation of "sequential=true" have to describe it in xml? Is there a right method to appoint an attribute of "resource_set" with crm shell? Possibly is not "resource_set" usable with crm shell of Pacemaker1.0.13? Best Regards, Hideo Yamauchi. --- On Thu, 2013/3/7, renayama19661...@ybb.ne.jp wrote: > Hi Dejan, > Hi Andrew, > > Thank you for comment. > I confirm the movement of the patch and report it. > > Best Regards, > Hideo Yamauchi. > > --- On Wed, 2013/3/6, Dejan Muhamedagic wrote: > > > Hi Hideo-san, > > > > On Wed, Mar 06, 2013 at 10:37:44AM +0900, renayama19661...@ybb.ne.jp wrote: > > > Hi Dejan, > > > Hi Andrew, > > > > > > As for the crm shell, the check of the meta attribute was revised with > > > the next patch. > > > > > > * http://hg.savannah.gnu.org/hgweb/crmsh/rev/d1174f42f4b3 > > > > > > This patch was backported in Pacemaker1.0.13. > > > > > > * > > >https://github.com/ClusterLabs/pacemaker-1.0/commit/fa1a99ab36e0ed015f1bcbbb28f7db962a9d1abc#shell/modules/cibconfig.py > > > > > > However, the ordered,colocated attribute of the group resource is treated > > > as an error when I use crm Shell which adopted this patch. > > > > > > -- > > > (snip) > > > ### Group Configuration ### > > > group master-group \ > > > vip-master \ > > > vip-rep \ > > > meta \ > > > ordered="false" > > > (snip) > > > > > > [root@rh63-heartbeat1 ~]# crm configure load update test2339.crm > > > INFO: building help index > > > crm_verify[20028]: 2013/03/06_17:57:18 WARN: unpack_nodes: Blind faith: > > > not fencing unseen nodes > > > WARNING: vip-master: specified timeout 60s for start is smaller than the > > > advised 90 > > > WARNING: vip-master: specified timeout 60s for stop is smaller than the > > > advised 100 > > > WARNING: vip-rep: specified timeout 60s for start is smaller than the > > > advised 90 > > > WARNING: vip-rep: specified timeout 60s for stop is smaller than the > > > advised 100 > > > ERROR: master-group: attribute ordered does not exist -> WHY? > > > Do you still want to commit? y > > > -- > > > > > > If it chooses `yes` by a confirmation message, it is reflected, but it is > > > a problem that error message is displayed. > > > * The error occurs in the same way when I appoint colocated attribute. > > > AndI noticed that there was not explanation of ordered,colocated of > > > the group resource in online help of Pacemaker. > > > > > > I think that the designation of the ordered,colocated attribute should > > > not become the error in group resource. > > > In addition, I think that ordered,colocated should be added to online > > > help. > > > > These attributes are not listed in crmsh. Does the attached patch > > help? > > > > Thanks, > > > > Dejan > > > > > > Best Regards, > > > Hideo Yamauchi. > > > > > > > > > ___ > > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > > > Project Home: http://www.clusterlabs.org > > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > > Bugs: http://bugs.clusterlabs.org > > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [RFC] Automatic nodelist synchronization between corosync and pacemaker
On Thu, Mar 7, 2013 at 2:41 AM, Vladislav Bogdanov wrote: > 06.03.2013 08:35, Andrew Beekhof wrote: >> So basically, you want to be able to add/remove nodes from nodelist.* >> in corosync.conf and have pacemaker automatically add/remove them from >> itself? > > Not corosync.conf, but cmap which is initially (partially) filled with > values from corosync.conf. > >> >> If corosync.conf gets out of sync (admin error or maybe a node was >> down when you updated last) they might well get added back - I assume >> you're ok with that? >> Because there's no real way to know the difference between "added >> back" and "not removed from last time". > > Sorry, can you please reword? When node-A comes up with "node-X" that no-one else has, the cluster has no way to know if node-X was just added, or if the admin forgot to remove it on node-A. >> Or are you planning to never update the on-disk corosync.conf and only >> modify the in-memory nodelist? > > That depends on the actual use case I think. > > Hm. Interesting, how corosync behave when new dynamic nodes are added to > cluster... I mean following: we have static corosync.conf with nodelist > containing f.e. 3 entries, then we add fourth entry via cmap and boot > fourth node. What should be in corosync.conf of that node? I don't know actually. Try it and see if it works without the local node being defined? > I believe in > wont work without that _its_ fourth entry. Ugh. If so, then no fully > dynamic "elastic" cluster which I was dreaming of is still possible > because out-of-the-box when using dynamic nodelist. > > The only way to have this I see is to have static nodelist in > corosync.conf with all possible nodes predefined. And never edit it in > cmap. So, my original point > * Remove nodes from CIB when they are removed from a nodelist. > does not fit. > > By elastic I mean what was discussed on corosync list when Fabio started > with votequorum design and what then appeared in votequorum manpage: > === > allow_downscale: 1 > > Enables allow downscale (AD) feature (default: 0). > > The general behaviour of votequorum is to never decrease expected votes > or quorum. > > When AD is enabled, both expected votes and quorum are recalculated > when a node leaves the cluster in a clean state (normal corosync shut- > down process) down to configured expected_votes. But thats very different to removing the node completely. You still want to know its in a sane state. > Example use case: > > 1) N node cluster (where N is any value higher than 3) > 2) expected_votes set to 3 in corosync.conf > 3) only 3 nodes are running > 4) admin requires to increase processing power and adds 10 nodes > 5) internal expected_votes is automatically set to 13 > 6) minimum expected_votes is 3 (from configuration) > - up to this point this is standard votequorum behavior - > 7) once the work is done, admin wants to remove nodes from the cluster > 8) using an ordered shutdown the admin can reduce the cluster size >automatically back to 3, but not below 3, where normal quorum >operation will work as usual. > = > > What I would expect from pacemaker, is to automatically remove nodes > down to 3 at step 8 (just follow quorum) if AD is enabled AND pacemaker > is instructed to follow that (with some other cmap switch). And also to > reduce number of allocated clone instances. Sure, all nodes must have > equal number of votes (1). > > Is it ok for you? Not really. We simply don't have enough information to do the removal. All we get is "node gone", we have to do a fair bit of work to calculate if it was clean at the time or not (and clean to corosync doesn't always imply clean to pacemaker). So back to the start, why do you need pacemaker to forget about the other 10 nodes? (because everything apart from that should already work). > >> >>> > That > would be OK if number of clone instances does not raise with that... Why? If clone-node-max=1, then you'll never have more than the number of active nodes - even if clone-max is greater. >>> >>> Active (online) or known (existing in a section)? >>> I've seen that as soon as node appears in even in offline state, >>> new clone instance is allocated. >> >> $num_known instances will "exist", but only $num_active will be running. > > Yep, that's what I say. I see them in crm_mon or 'crm status' and they > make my life harder ;) > That remaining instances are "allocated" but not running. > > I can agree that this issue is very "cosmetic" one, but its existence > conflicts with my perfectionism so I'd like to resolve it ;) > >> >>> >>> Also, on one cluster with post-1.1.7 with openais plugin I have 16 nodes >>> configured in totem.interface.members, but only three nodes in >>> CIB section, And I'm able to allocate at least 8-9 instances of clones >>> with clone-max. >> >> Yes, but did you set clone-node-max? One is the global maximum, the >> other is the per-node maximum. >> >>> I be
Re: [Pacemaker] Pacemaker resource migration behaviour
On Thu, Mar 7, 2013 at 11:23 AM, Andrew Beekhof wrote: > On Wed, Mar 6, 2013 at 8:02 PM, James Guthrie wrote: >> Hi Andrew, >> >> Thanks for looking into this. We have since decided not to perform a >> failover on the failure of one of the sub-* resources for operational >> reasons. As a result, I can't reliably test if this issue is actually fixed >> in the current HEAD. (Speaking of which, do you have a date set yet for >> 1.1.9?) ASAP. I'm hoping in a couple of hours from now, otherwise tomorrow. >> >> On Mar 6, 2013, at 8:39 AM, Andrew Beekhof wrote: >> >>> I'm still very confused about why you're using master/slave though. >> >> The reason I went with master-slave was that we want the init script started >> on the "master" host and stopped on the "slave". > > You get those semantics from a normal primitive resource. > >> With a master-slave I have a monitor operation on the slave ensuring that >> the resource will be stopped on the slave if it were to be started manually >> (something I can't be sure wouldn't happen). AFAIK this wouldn't be the case >> with a "standard" resource. > > I think 1.1.8 allowed for operations with role=Stopped which would do > this for the highly paranoid :-) > >> >> Regards, >> James >> >> >> ___ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Pacemaker resource migration behaviour
On Wed, Mar 6, 2013 at 8:02 PM, James Guthrie wrote: > Hi Andrew, > > Thanks for looking into this. We have since decided not to perform a failover > on the failure of one of the sub-* resources for operational reasons. As a > result, I can't reliably test if this issue is actually fixed in the current > HEAD. (Speaking of which, do you have a date set yet for 1.1.9?) > > On Mar 6, 2013, at 8:39 AM, Andrew Beekhof wrote: > >> I'm still very confused about why you're using master/slave though. > > The reason I went with master-slave was that we want the init script started > on the "master" host and stopped on the "slave". You get those semantics from a normal primitive resource. > With a master-slave I have a monitor operation on the slave ensuring that the > resource will be stopped on the slave if it were to be started manually > (something I can't be sure wouldn't happen). AFAIK this wouldn't be the case > with a "standard" resource. I think 1.1.8 allowed for operations with role=Stopped which would do this for the highly paranoid :-) > > Regards, > James > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Pacemaker resource migration behaviour
On Wed, Mar 6, 2013 at 6:59 PM, James Guthrie wrote: > On Mar 6, 2013, at 7:34 AM, Andrew Beekhof wrote: > >> On Wed, Feb 6, 2013 at 11:41 PM, James Guthrie wrote: >>> Hi David, >>> >>> Unfortunately crm_report doesn't work correctly on my hosts as we have >>> compiled from source with custom paths and apparently the crm_report and >>> associated tools are not built to use the paths that can be customised with >>> autoconf. >> >> It certainly tries to: >> >> https://github.com/beekhof/pacemaker/blob/master/tools/report.common#L99 >> >> What does it say on your system (or, what paths did you give to autoconf)? > > You are correct, it does try to - there are a few problems though: > - the hardcoded depth (-maxdepth 5) that is used to search for the files is > no good on my host > - the fact that it assumes the local state did would be /var (despite what > was configured in autoconf) > > In my case all files are in the path /opt/OSAGpcmk/pcmk > > I submitted a pull-request which I was hoping to get some comment on, but > didn't. I don't comment much while I'm asleep. But I've applied your subsequent pull requests. > > https://github.com/ClusterLabs/pacemaker/pull/225 > > I know that it's not a complete solution and would suggest I resubmit the > pull request in two parts: > 1. Using the localstatedir and exec_prefix as configured in autoconf. > 2. Make the maxdepth parameter default to 5, but be overridable with a flag > to crm_report. > > Regards, > James > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Block stonith when drbd inconsistent
If you put drbd into maintenance mode, we'd not perform any state changes (stop/stop/promote/demote) on it that could fail. That would likely do what you're after. On Thu, Mar 7, 2013 at 4:59 AM, Jan Škoda wrote: > Dne 6.3.2013 06:38, Andrew Beekhof napsal(a): >> Nodes shouldn't be being fenced so often. Do you know what is causing >> this to happen? > I know that this shouldn't happen frequently, but not having access to > uptodate data is certainly unwanted and there should be a way to prevent it. > > DRBD is quite prone to demote failures, especially when filesystem can > not be umounted for some reason. Blocked process for example can't be > killed and filesystems accessed by it can't be unmounted. This problem > is causing 90% of fencing for me. > > -- > Honza 'Lefty' Škoda http://www.jskoda.cz/ > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [Problem][crmsh]The designation of the 'ordered' attribute becomes the error.
Hi Dejan, Hi Andrew, Thank you for comment. I confirm the movement of the patch and report it. Best Regards, Hideo Yamauchi. --- On Wed, 2013/3/6, Dejan Muhamedagic wrote: > Hi Hideo-san, > > On Wed, Mar 06, 2013 at 10:37:44AM +0900, renayama19661...@ybb.ne.jp wrote: > > Hi Dejan, > > Hi Andrew, > > > > As for the crm shell, the check of the meta attribute was revised with the > > next patch. > > > > * http://hg.savannah.gnu.org/hgweb/crmsh/rev/d1174f42f4b3 > > > > This patch was backported in Pacemaker1.0.13. > > > > * > >https://github.com/ClusterLabs/pacemaker-1.0/commit/fa1a99ab36e0ed015f1bcbbb28f7db962a9d1abc#shell/modules/cibconfig.py > > > > However, the ordered,colocated attribute of the group resource is treated > > as an error when I use crm Shell which adopted this patch. > > > > -- > > (snip) > > ### Group Configuration ### > > group master-group \ > > vip-master \ > > vip-rep \ > > meta \ > > ordered="false" > > (snip) > > > > [root@rh63-heartbeat1 ~]# crm configure load update test2339.crm > > INFO: building help index > > crm_verify[20028]: 2013/03/06_17:57:18 WARN: unpack_nodes: Blind faith: not > > fencing unseen nodes > > WARNING: vip-master: specified timeout 60s for start is smaller than the > > advised 90 > > WARNING: vip-master: specified timeout 60s for stop is smaller than the > > advised 100 > > WARNING: vip-rep: specified timeout 60s for start is smaller than the > > advised 90 > > WARNING: vip-rep: specified timeout 60s for stop is smaller than the > > advised 100 > > ERROR: master-group: attribute ordered does not exist -> WHY? > > Do you still want to commit? y > > -- > > > > If it chooses `yes` by a confirmation message, it is reflected, but it is a > > problem that error message is displayed. > > * The error occurs in the same way when I appoint colocated attribute. > > AndI noticed that there was not explanation of ordered,colocated of the > > group resource in online help of Pacemaker. > > > > I think that the designation of the ordered,colocated attribute should not > > become the error in group resource. > > In addition, I think that ordered,colocated should be added to online help. > > These attributes are not listed in crmsh. Does the attached patch > help? > > Thanks, > > Dejan > > > > Best Regards, > > Hideo Yamauchi. > > > > > > ___ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Does LVM resouce agent conflict with UDEV rules?
Dear all, on my test setup (iSCSI on top of LVM on top of DRBD) I was not able to reliable migrate the volume group from one node to the other. I have two logical volumes that are mapped to two different LUNs on one iSCSI Target. During migration of the resource group (LVM + iSCSI) the LVM resource often failed to start properly on the target node. I noticed that sometimes not all LVs would be activated. From the log files I suspected that udev could have something to do with this. In fact, disabling the udev rule SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}=="lvm*|LVM*",\ RUN+="watershed sh -c '/sbin/lvm vgscan; /sbin/lvm vgchange -a y'" seems to resolve the problem for me. Question(s): Is this a known problem? Am I supposed to tune udev when managing LVM resources by a cluster resource manager? Or should this be considered as a bug in either udev or LVM resource agent? I am using: pacemaker 1.1.6-2ubuntu3 resource-agents 1:3.9.2-5ubuntu4.1 udev 175-0ubuntu9.2 on Ubuntu 12.04 There is also a bug report on launchpad (although it is related only to 12.10 there): https://bugs.launchpad.net/ubuntu/+source/lvm2/+bug/1088081 Kind regards, Sven ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Block stonith when drbd inconsistent
Dne 6.3.2013 06:38, Andrew Beekhof napsal(a): > Nodes shouldn't be being fenced so often. Do you know what is causing > this to happen? I know that this shouldn't happen frequently, but not having access to uptodate data is certainly unwanted and there should be a way to prevent it. DRBD is quite prone to demote failures, especially when filesystem can not be umounted for some reason. Blocked process for example can't be killed and filesystems accessed by it can't be unmounted. This problem is causing 90% of fencing for me. -- Honza 'Lefty' Škoda http://www.jskoda.cz/ smime.p7s Description: Elektronicky podpis S/MIME ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [RFC] Automatic nodelist synchronization between corosync and pacemaker
06.03.2013 08:35, Andrew Beekhof wrote: > On Thu, Feb 28, 2013 at 5:13 PM, Vladislav Bogdanov > wrote: >> 28.02.2013 07:21, Andrew Beekhof wrote: >>> On Tue, Feb 26, 2013 at 7:36 PM, Vladislav Bogdanov >>> wrote: 26.02.2013 11:10, Andrew Beekhof wrote: > On Mon, Feb 18, 2013 at 6:18 PM, Vladislav Bogdanov > wrote: >> Hi Andrew, all, >> >> I had an idea last night, that it may be worth implementing >> fully-dynamic cluster resize support in pacemaker, > > We already support nodes being added on the fly. As soon as they show > up in the membership we add them to the cib. Membership (runtime.totem.pg.mrp.srp.members) or nodelist (nodelist.node)? >>> >>> To my knowledge, only one (first) gets updated at runtime. >>> Even if nodelist.node could be updated dynamically, we'd have to poll >>> or be prompted to find out. >> >> It can, please see at the end of cmap_keys(8). >> Please also see cmap_track_add(3) for CMAP_TRACK_PREFIX flag (and my >> original message ;) ). > > ACK :) > >> I recall that when I migrated from corosync 1.4 to 2.0 (somewhere near pacemaker 1.1.8 release time) and replaced old-style UDPU member list with nodelist.node, I saw all nodes configured in that nodelist appeared in a CIB. For me that was a regression, because with old-style config (and corosync 1.4) CIB contained only nodes seen online (4 of 16). >>> >>> That was a loophole that only worked when the entire cluster had been >>> down and the section was empty. >> >> Aha, that is what I've been hit by. >> >>> People filed bugs explicitly asking for that loophole to be closed >>> because it was inconsistent with what the cluster did on every >>> subsequent startup. >> >> That is what I'm interested too. And what I propose should fix that too. > > Ah, I must have misparsed, I thought you were looking for the opposite > behaviour. > > So basically, you want to be able to add/remove nodes from nodelist.* > in corosync.conf and have pacemaker automatically add/remove them from > itself? Not corosync.conf, but cmap which is initially (partially) filled with values from corosync.conf. > > If corosync.conf gets out of sync (admin error or maybe a node was > down when you updated last) they might well get added back - I assume > you're ok with that? > Because there's no real way to know the difference between "added > back" and "not removed from last time". Sorry, can you please reword? > > Or are you planning to never update the on-disk corosync.conf and only > modify the in-memory nodelist? That depends on the actual use case I think. Hm. Interesting, how corosync behave when new dynamic nodes are added to cluster... I mean following: we have static corosync.conf with nodelist containing f.e. 3 entries, then we add fourth entry via cmap and boot fourth node. What should be in corosync.conf of that node? I believe in wont work without that _its_ fourth entry. Ugh. If so, then no fully dynamic "elastic" cluster which I was dreaming of is still possible because out-of-the-box when using dynamic nodelist. The only way to have this I see is to have static nodelist in corosync.conf with all possible nodes predefined. And never edit it in cmap. So, my original point * Remove nodes from CIB when they are removed from a nodelist. does not fit. By elastic I mean what was discussed on corosync list when Fabio started with votequorum design and what then appeared in votequorum manpage: === allow_downscale: 1 Enables allow downscale (AD) feature (default: 0). The general behaviour of votequorum is to never decrease expected votes or quorum. When AD is enabled, both expected votes and quorum are recalculated when a node leaves the cluster in a clean state (normal corosync shut- down process) down to configured expected_votes. Example use case: 1) N node cluster (where N is any value higher than 3) 2) expected_votes set to 3 in corosync.conf 3) only 3 nodes are running 4) admin requires to increase processing power and adds 10 nodes 5) internal expected_votes is automatically set to 13 6) minimum expected_votes is 3 (from configuration) - up to this point this is standard votequorum behavior - 7) once the work is done, admin wants to remove nodes from the cluster 8) using an ordered shutdown the admin can reduce the cluster size automatically back to 3, but not below 3, where normal quorum operation will work as usual. = What I would expect from pacemaker, is to automatically remove nodes down to 3 at step 8 (just follow quorum) if AD is enabled AND pacemaker is instructed to follow that (with some other cmap switch). And also to reduce number of allocated clone instances. Sure, all nodes must have equal number of votes (1). Is it ok for you? > >> >>> That would be OK if number of clone instances does not raise with that... >>> >>> Why? If clone-node-max=1, then you'll never have more than the number >>> of a
Re: [Pacemaker] Pacemaker DRBD as Physical Volume on Encrypted RAID1
Thanks Lars - will digest your response a bit, get back - appreciate the help! -Original Message- From: Lars Ellenberg To: pacemaker Sent: Wed, Mar 6, 2013 5:40 am Subject: Re: [Pacemaker] Pacemaker DRBD as Physical Volume on Encrypted RAID1 On Mon, Mar 04, 2013 at 04:27:24PM -0500, senrab...@aol.com wrote: > Hi All: > > We're new to pacemaker (just got some great help from this forum > getting it working with LVM as backing device), and would like to > explore the Physical Volume option. We're trying configure on top of > an existing Encrypted RAID1 set up and employ LVM. > > NOTE: our goal is to run many virtual servers, each in its own > logical volume and it looks like putting LVM on top of the DRBD would > allow us to add logical volumes "on the fly", but also have a > "simpler" setup with one drbd device for all the logical volumes and > one related pacemaker config. Hence, exploring DRBD as a physical > volume. A single DRBD has a single "activity log", running "many virtual servers" from there will very likely cause the "worst possible" workload (many totally random writes). You really want to use DRBD 8.4.3, see https://blogs.linbit.com/p/469/843-random-writes-faster/ for why. > Q: For pacemaker to work, how do we do the DRBD disk/device mapping > in the drbd.conf file? And should we set things up and encrypt last, > or can we apply DRBD and Pacemaker to an existing Encypted RAID1 > setup? Neither Pacemaker nor DRBD do particularly care. If you want to stack the encryption layer on top of DRBD, fine. (you'd probably need to teach some pacemaker resource agent to "start" the encryption layer). If you want to stack DRBD on top of the encryption layer, just as fine. Unless you provide the decryption key in plaintext somewhere, failover will likely be easier to automate if you have DRBD on top of encryption, so if you want the real device encrypted, I'd recommend to put encryption below DRBD. Obviously, the DRBD replication traffic will still be "plaintext" in that case. > The examples we've seen show mapping between the drbd device and a > physical disk (e.g., sdb) in the drbd.conf, and then "pvcreate > /dev/drbdnum" and creating a volume group and logical volume on the > drbd device. > > So for this type of set up, drbd.conf might look like: > > device/dev/drbd1; > disk /dev/sdb; > address xx.xx.xx.xx:7789; > meta-disk internal; > > In our case, because we have an existing RAID1 (md2) and it's > encrypted (md2_crypt or /dev/dm-7 ... we're unsure which partition > actually has the data), any thoughts on how to do the DRBD mapping? > E.g., > > device /dev/drbd1 minor 1; > disk /dev/???; > address xx.xx.xx.xx:7789; > meta-disk internal; > > I.e., what goes in the "disk /dev/?;"? Would it be "disk /dev/md2_crypt;"? Yes. > And can we do our setup on an existing Encrypted RAID1 setup Yes. > (if we do pvcreate on drbd1, we get errors)? Huh? -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] The correction request of the log of booth
On Wed, 2013-03-06 at 15:13 +0900, yusuke iida wrote: > Hi, Jiaju > > There is a request about the log of booth. > > I want you to change a log level when a ticket expires into "info" from > "debug". > > I think that this log is important since it means what occurred. > > And I want you to add the following information to log. > * Which ticket is it? > * Who had a ticket? > > For example, I want you to use the following forms. > info: lease expires ... owner [0] ticket [ticketA] Sounds great, will improve that;) Thanks, Jiaju ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] standby attribute and same resources running at the same time
Am 06.03.2013 um 05:14 schrieb Andrew Beekhof : > On Tue, Mar 5, 2013 at 4:20 AM, Leon Fauster > wrote: >> >> So far all good. I am doing some stress test now and noticed that rebooting >> one node (n2), that node (n2) will be marked as standby in the cib (shown on >> the >> other node (n1)). >> >> After rebooting the node (n2) crm_mon on that node shows that the other node >> (n1) >> is offline and begins to start the ressources. While the other node (n1) >> that wasn't >> rebooted still shows n2 as standby. At that point both nodes are runnnig the >> "same" >> resources. After a couple of minutes that situation is noticed and both nodes >> renegotiate the current state. Then one node take over the responsibility to >> provide >> the resources. On both nodes the previously rebooted node is still listed as >> standby. >> >> >> cat /var/log/messages |grep error >> Mar 4 17:32:33 cn1 pengine[1378]:error: native_create_actions: >> Resource resIP (ocf::IPaddr2) is active on 2 nodes attempting recovery >> Mar 4 17:32:33 cn1 pengine[1378]:error: native_create_actions: >> Resource resApache (ocf::apache) is active on 2 nodes attempting recovery >> Mar 4 17:32:33 cn1 pengine[1378]:error: process_pe_message: Calculated >> Transition 1: /var/lib/pacemaker/pengine/pe-error-6.bz2 >> Mar 4 17:32:48 cn1 crmd[1379]: notice: run_graph: Transition 1 >> (Complete=9, Pending=0, Fired=0, Skipped=0, Incomplete=0, >> Source=/var/lib/pacemaker/pengine/pe-error-6.bz2): Complete >> >> >> crm_mon -1 >> Last updated: Mon Mar 4 17:49:08 2013 >> Last change: Mon Mar 4 10:22:53 2013 via crm_resource on cn1.localdomain >> Stack: cman >> Current DC: cn1.localdomain - partition with quorum >> Version: 1.1.8-7.el6-394e906 >> 2 Nodes configured, 2 expected votes >> 2 Resources configured. >> >> Node cn2.localdomain: standby >> Online: [ cn1.localdomain ] >> >> resIP (ocf::heartbeat:IPaddr2): Started cn1.localdomain >> resApache (ocf::heartbeat:apache):Started cn1.localdomain >> >> >> i checked the init scripts and found that the standby "behavior" comes >> from a function that is called on "service pacemaker stop" (added in >> rhel6.4). >> >> cman_pre_stop() >> { >>cname=`crm_node --name` >>crm_attribute -N $cname -n standby -v true -l reboot >>echo -n "Waiting for shutdown of managed resources" >> ... > > That will only last until the node comes back (the cluster will remove > it automatically), the core problem is that it appears not to have. > Can you file a bug and attach a crm_report for the period covered by > the restart? I used the redhat's bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=918502 as you are also the maintainer of the corresponding rpm. -- Thanks Leon ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [Problem][crmsh]The designation of the 'ordered' attribute becomes the error.
Hi Hideo-san, On Wed, Mar 06, 2013 at 10:37:44AM +0900, renayama19661...@ybb.ne.jp wrote: > Hi Dejan, > Hi Andrew, > > As for the crm shell, the check of the meta attribute was revised with the > next patch. > > * http://hg.savannah.gnu.org/hgweb/crmsh/rev/d1174f42f4b3 > > This patch was backported in Pacemaker1.0.13. > > * > https://github.com/ClusterLabs/pacemaker-1.0/commit/fa1a99ab36e0ed015f1bcbbb28f7db962a9d1abc#shell/modules/cibconfig.py > > However, the ordered,colocated attribute of the group resource is treated as > an error when I use crm Shell which adopted this patch. > > -- > (snip) > ### Group Configuration ### > group master-group \ > vip-master \ > vip-rep \ > meta \ > ordered="false" > (snip) > > [root@rh63-heartbeat1 ~]# crm configure load update test2339.crm > INFO: building help index > crm_verify[20028]: 2013/03/06_17:57:18 WARN: unpack_nodes: Blind faith: not > fencing unseen nodes > WARNING: vip-master: specified timeout 60s for start is smaller than the > advised 90 > WARNING: vip-master: specified timeout 60s for stop is smaller than the > advised 100 > WARNING: vip-rep: specified timeout 60s for start is smaller than the advised > 90 > WARNING: vip-rep: specified timeout 60s for stop is smaller than the advised > 100 > ERROR: master-group: attribute ordered does not exist -> WHY? > Do you still want to commit? y > -- > > If it chooses `yes` by a confirmation message, it is reflected, but it is a > problem that error message is displayed. > * The error occurs in the same way when I appoint colocated attribute. > AndI noticed that there was not explanation of ordered,colocated of the > group resource in online help of Pacemaker. > > I think that the designation of the ordered,colocated attribute should not > become the error in group resource. > In addition, I think that ordered,colocated should be added to online help. These attributes are not listed in crmsh. Does the attached patch help? Thanks, Dejan > > Best Regards, > Hideo Yamauchi. > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org >From 1f6ed514c8e53c79835aaaf26d152f2d840126f0 Mon Sep 17 00:00:00 2001 From: Dejan Muhamedagic Date: Wed, 6 Mar 2013 11:57:54 +0100 Subject: [PATCH] Low: shell: add group meta attributes --- shell/modules/cibconfig.py | 2 ++ shell/modules/vars.py.in | 1 + 2 files changed, 3 insertions(+) diff --git a/shell/modules/cibconfig.py b/shell/modules/cibconfig.py index 2dfaa92..1cf08fa 100644 --- a/shell/modules/cibconfig.py +++ b/shell/modules/cibconfig.py @@ -1152,6 +1152,8 @@ class CibContainer(CibObject): l += vars.clone_meta_attributes elif self.obj_type == "ms": l += vars.clone_meta_attributes + vars.ms_meta_attributes +elif self.obj_type == "group": +l += vars.group_meta_attributes rc = sanity_check_meta(self.obj_id,self.node,l) return rc diff --git a/shell/modules/vars.py.in b/shell/modules/vars.py.in index c83232e..dff86dc 100644 --- a/shell/modules/vars.py.in +++ b/shell/modules/vars.py.in @@ -117,6 +117,7 @@ class Vars(Singleton): "failure-timeout", "resource-stickiness", "target-role", "restart-type", "description", ) +group_meta_attributes = ("ordered", "colocated") clone_meta_attributes = ( "ordered", "notify", "interleave", "globally-unique", "clone-max", "clone-node-max", "clone-state", "description", -- 1.8.0 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Pacemaker DRBD as Physical Volume on Encrypted RAID1
On Mon, Mar 04, 2013 at 04:27:24PM -0500, senrab...@aol.com wrote: > Hi All: > > We're new to pacemaker (just got some great help from this forum > getting it working with LVM as backing device), and would like to > explore the Physical Volume option. We're trying configure on top of > an existing Encrypted RAID1 set up and employ LVM. > > NOTE: our goal is to run many virtual servers, each in its own > logical volume and it looks like putting LVM on top of the DRBD would > allow us to add logical volumes "on the fly", but also have a > "simpler" setup with one drbd device for all the logical volumes and > one related pacemaker config. Hence, exploring DRBD as a physical > volume. A single DRBD has a single "activity log", running "many virtual servers" from there will very likely cause the "worst possible" workload (many totally random writes). You really want to use DRBD 8.4.3, see https://blogs.linbit.com/p/469/843-random-writes-faster/ for why. > Q: For pacemaker to work, how do we do the DRBD disk/device mapping > in the drbd.conf file? And should we set things up and encrypt last, > or can we apply DRBD and Pacemaker to an existing Encypted RAID1 > setup? Neither Pacemaker nor DRBD do particularly care. If you want to stack the encryption layer on top of DRBD, fine. (you'd probably need to teach some pacemaker resource agent to "start" the encryption layer). If you want to stack DRBD on top of the encryption layer, just as fine. Unless you provide the decryption key in plaintext somewhere, failover will likely be easier to automate if you have DRBD on top of encryption, so if you want the real device encrypted, I'd recommend to put encryption below DRBD. Obviously, the DRBD replication traffic will still be "plaintext" in that case. > The examples we've seen show mapping between the drbd device and a > physical disk (e.g., sdb) in the drbd.conf, and then "pvcreate > /dev/drbdnum" and creating a volume group and logical volume on the > drbd device. > > So for this type of set up, drbd.conf might look like: > > device/dev/drbd1; > disk /dev/sdb; > address xx.xx.xx.xx:7789; > meta-disk internal; > > In our case, because we have an existing RAID1 (md2) and it's > encrypted (md2_crypt or /dev/dm-7 ... we're unsure which partition > actually has the data), any thoughts on how to do the DRBD mapping? > E.g., > > device /dev/drbd1 minor 1; > disk /dev/???; > address xx.xx.xx.xx:7789; > meta-disk internal; > > I.e., what goes in the "disk /dev/?;"? Would it be "disk > /dev/md2_crypt;"? Yes. > And can we do our setup on an existing Encrypted RAID1 setup Yes. > (if we do pvcreate on drbd1, we get errors)? Huh? -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Pacemaker resource migration behaviour
Hi Andrew, Thanks for looking into this. We have since decided not to perform a failover on the failure of one of the sub-* resources for operational reasons. As a result, I can't reliably test if this issue is actually fixed in the current HEAD. (Speaking of which, do you have a date set yet for 1.1.9?) On Mar 6, 2013, at 8:39 AM, Andrew Beekhof wrote: > I'm still very confused about why you're using master/slave though. The reason I went with master-slave was that we want the init script started on the "master" host and stopped on the "slave". With a master-slave I have a monitor operation on the slave ensuring that the resource will be stopped on the slave if it were to be started manually (something I can't be sure wouldn't happen). AFAIK this wouldn't be the case with a "standard" resource. Regards, James ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Pacemaker resource migration behaviour
On Mar 6, 2013, at 7:34 AM, Andrew Beekhof wrote: > On Wed, Feb 6, 2013 at 11:41 PM, James Guthrie wrote: >> Hi David, >> >> Unfortunately crm_report doesn't work correctly on my hosts as we have >> compiled from source with custom paths and apparently the crm_report and >> associated tools are not built to use the paths that can be customised with >> autoconf. > > It certainly tries to: > > https://github.com/beekhof/pacemaker/blob/master/tools/report.common#L99 > > What does it say on your system (or, what paths did you give to autoconf)? You are correct, it does try to - there are a few problems though: - the hardcoded depth (-maxdepth 5) that is used to search for the files is no good on my host - the fact that it assumes the local state did would be /var (despite what was configured in autoconf) In my case all files are in the path /opt/OSAGpcmk/pcmk I submitted a pull-request which I was hoping to get some comment on, but didn't. https://github.com/ClusterLabs/pacemaker/pull/225 I know that it's not a complete solution and would suggest I resubmit the pull request in two parts: 1. Using the localstatedir and exec_prefix as configured in autoconf. 2. Make the maxdepth parameter default to 5, but be overridable with a flag to crm_report. Regards, James ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org