Re: [Pacemaker] Balancing of clone resources (globally-unique=true)
On Fri, Nov 12, 2010 at 7:41 AM, Chris Picton wrote: > I have attached the output as requested Normally it would get balanced, but its being pushed to 01 because there are so many resources on 02 sort_node_weight: slb-test-02.ecntelecoms.za.net (12) > slb-test-01.ecntelecoms.za.net (2) : resources So the cluster is trying to balance out the resources, just not at the level you were expecting. > On Thu, 11 Nov 2010 11:21:51 +0100, Andrew Beekhof wrote: what version is this? >>> >>> >>> This is 1.0.9 >> >> Odd. I wouldn't have expected this behavior. Can you attach the >> output >> from cibadmin -Ql please? >> >> On Tue, Nov 9, 2010 at 5:51 PM, Chris Picton wrote: > From a previous thread (crm_resource - migrating/halt a cloned > resource) > > Andrew Beekhof wrote: >> bottom line, you don't get to chose where specific clone instances >> get placed. > > In my case, I have a clone: > primitive clusterip-9 ocf:heartbeat:IPaddr2 \ > params ip="192.168.0.9" cidr_netmask="24" \ > clusterip_hash="sourceip" nic="bondE" \ op monitor > interval="30s" \ > meta resource-stickiness="0" > > clone clusterip-9-clone clusterip-9 \ > meta globally-unique="true" clone-max="2" \ > clone-node-max="2" resource_stickiness="0" > > When I start the clone, both instances start on the same node: > > Clone Set: clusterip-9-clone (unique) > clusterip-9:0 (ocf::heartbeat:IPaddr2): Started > slb-test-01.ecntelecoms.za.net > clusterip-9:1 (ocf::heartbeat:IPaddr2): Started > slb-test-01.ecntelecoms.za.net > > The second node has a colocated set of standalone IP addresses > running, so I assume that pacemaker is pushing both clusterip > clones > to the second node to balance resources. > > My scores look like (0 for everything to do with this resource) > clone_color: clusterip-9-clone allocation score on > slb-test-01.ecntelecoms.za.net: 0 > clone_color: clusterip-9-clone allocation score on > slb-test-02.ecntelecoms.za.net: 0 > clone_color: clusterip-9:0 allocation score on > slb-test-01.ecntelecoms.za.net: 0 > clone_color: clusterip-9:0 allocation score on > slb-test-02.ecntelecoms.za.net: 0 > clone_color: clusterip-9:1 allocation score on > slb-test-01.ecntelecoms.za.net: 0 > clone_color: clusterip-9:1 allocation score on > slb-test-02.ecntelecoms.za.net: 0 > native_color: clusterip-9:0 allocation score on > slb-test-01.ecntelecoms.za.net: 0 > native_color: clusterip-9:0 allocation score on > slb-test-02.ecntelecoms.za.net: 0 > native_color: clusterip-9:1 allocation score on > slb-test-01.ecntelecoms.za.net: 0 > native_color: clusterip-9:1 allocation score on > slb-test-02.ecntelecoms.za.net: 0 > > > > Is there a way to request pacemaker to try split the clones up if > possible over the available nodes? > > Regards > > Chris > > > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] colocation that doesn't
On Fri, Nov 5, 2010 at 4:07 AM, Vadym Chepkov wrote: > > On Nov 4, 2010, at 12:53 PM, Alan Jones wrote: > >> If I understand you correctly, the role of the second resource in the >> colocation command was defaulting to that of the first "Master" which >> is not defined or is untested for none-ms resources. >> Unfortunately, after changed that line to: >> >> colocation mystateful-ms-loc inf: mystateful-ms:Master myprim:Started >> >> ...it still doesn't work: >> >> myprim (ocf::pacemaker:DummySlow): Started node6.acme.com >> Master/Slave Set: mystateful-ms >> Masters: [ node5.acme.com ] >> Slaves: [ node6.acme.com ] >> >> And after: >> location myprim-loc myprim -inf: node5.acme.com >> >> myprim (ocf::pacemaker:DummySlow): Started node6.acme.com >> Master/Slave Set: mystateful-ms >> Masters: [ node6.acme.com ] >> Slaves: [ node5.acme.com ] >> >> What I would like to do is enable logging for the code that calculates >> the weights, etc. >> It is obvious to me that the weights are calculated differently for >> mystateful-ms based on the weights used in myprim. >> Can you enable more verbose logging online or do you have to recompile? >> My version is 1.0.9-89bd754939df5150de7cd76835f98fe90851b677 which is >> different from Vadym's. >> BTW: Is there another release planned for the stable branch? 1.0.9.1 >> is now 4 months old. >> I understand that I could take the top of tree, but I would like to >> believe that others are running the same version. ;) >> Thank you! >> Alan >> >> On Thu, Nov 4, 2010 at 8:22 AM, Dejan Muhamedagic >> wrote: >>> Hi, >>> >>> On Thu, Nov 04, 2010 at 06:51:59AM -0400, Vadym Chepkov wrote: On Thu, Nov 4, 2010 at 5:37 AM, Dejan Muhamedagic wrote: > This should be: > > colocation mystateful-ms-loc inf: mystateful-ms:Master myprim:Started > Interesting, so in this case it is not necessary? colocation fs_on_drbd inf: WebFS WebDataClone:Master (taken from Cluster_from_Scratch) but other way around it is? >>> >>> Yes, the role of the second resource defaults to the role of the >>> first. Ditto for order and actions. A bit confusing, I know. >>> >>> Thanks, >>> >>> Dejan >>> > > > I did it a bit different this time and I observe the same anomaly. > > First I started stateful clone > > primitive s1 ocf:pacemaker:Stateful > ms ms1 s1 meta master-max="1" master-node-max="1" clone-max="2" > clone-node-max="1" notify="true" > > Then a primitive: > > primitive d1 ocf:pacemaker:Dummy > > Made sure Master and primitive are running on different hosts > location ld1 d1 10: xen-12 > > and then I added constraint > colocation c1 inf: ms1:Master d1:Started > > Master/Slave Set: ms1 > Masters: [ xen-11 ] > Slaves: [ xen-12 ] > d1 (ocf::pacemaker:Dummy): Started xen-12 > > > It seems colocation constraint is not enough to promote a clone. Looks like a > bug. > > # ptest -sL|grep s1 > clone_color: ms1 allocation score on xen-11: 0 > clone_color: ms1 allocation score on xen-12: 0 > clone_color: s1:0 allocation score on xen-11: 11 > clone_color: s1:0 allocation score on xen-12: 0 > clone_color: s1:1 allocation score on xen-11: 0 > clone_color: s1:1 allocation score on xen-12: 6 > native_color: s1:0 allocation score on xen-11: 11 > native_color: s1:0 allocation score on xen-12: 0 > native_color: s1:1 allocation score on xen-11: -100 > native_color: s1:1 allocation score on xen-12: 6 > s1:0 promotion score on xen-11: 20 > s1:1 promotion score on xen-12: 20 > > Vadym Could you attach the result of cibadmin -Ql when the cluster is in this state please? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] using xml for rules
On Thu, Nov 11, 2010 at 2:10 PM, Pavlos Parissis wrote: > I removed "score=2" from > Have a look at the schema file: http://hg.clusterlabs.org/pacemaker/stable-1.0/raw-file/tip/xml/pacemaker.rng.in http://hg.clusterlabs.org/pacemaker/stable-1.0/raw-file/tip/xml/rule.rng.in For starters: s/boolean_op/boolean-op/ > > 245 > 246 > 247 > 248 > 249 hours="6-23" weekdays="1-5"/> > 250 > 251 > 252 weekdays="6-7"/> > 253 > 254 > 255 name="resource-stickiness" value="INFINITY"/> > 256 > 257 > 258 name="resource-stickiness" value="0"/> > 259 > 260 > > and now I only get, from these I can't figure out where exactly is my > mistake on the rules, coffee didn't help that much > > Relax-NG validity error : Extra element rsc_defaults in interleave > /var/run/crm/cib-invalid.9yBICJ:245: element rsc_defaults: Relax-NG > validity error : Element configuration failed to validate content > /var/run/crm/cib-invalid.9yBICJ:1: element cib: Relax-NG validity > error : Element cib failed to validate content > cibadmin[9627]: 2010/11/11_14:07:26 ERROR: main: Call failed: Update > does not conform to the configured schema/DTD > Call failed: Update does not conform to the configured schema/DTD > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] drbd-xen and fencing
Don't use init.d/drbd, use the ocf script that comes with the drbd packages On Thu, Nov 11, 2010 at 2:19 PM, Vadym Chepkov wrote: > Hi, > > I posted a less elaborate version of this question to drbd mail-list, but, > unfortunately, didn't get a reply, > maybe audience of this list has more experience. > I am trying to make xen live migration to work reliably, but wasn't > successful so far. > Here is the problem. > In a cluster configuration I have two type of resources - file systems on > drbd, with explicit drbd resources configuration and > Xen resources with implicit, using drbd-xen block device helper. For the > former everything works great, but the latter doesn't work quite well. > In order for helper script to work, drbd module has to be loaded and > underlying resources up. So I have to start init.d/drbd script. > I can't make it an lsb cluster resource, because stop will be disastrous for > file system resources. Enable it in startup sequence breaks > /usr/lib/drbd/crm-unfence-peer.sh, because cluster stack is not completely > up by the time drbd script finishes, and there is no way to configure only > specific resources that need to be initialized. > Also, I can't find a way fence Xen resource. I tried fence-peer > "/usr/lib/drbd/crm-fence-peer.sh -i xen_vsvn", > where xen_svn is the name of Xen primitive, but it doesn't work, > so there is a danger of starting Xen VM on an out-of-date node. Then there > is no way of monitoring underlying drbd resources too. > I thought of adding underlying drbd resource explicitly in the cluster, but > I can't figure out what would be the configuration > for "this resource can be master on both nodes, but if just on one, it's > fine too". > allow-two-primaries has to be allowed for live migration and at the time of > migration resources are primary on both nodes, but when migration finishes, > it's again primary/slave. But if I configure drbd resource in the cluster > with meta master-max="2" master-node-max="1", > cluster insists on having them both primary all the time. > Hope I didn't bore you to death and there is an elegant solution for > this conundrum :) > Thank you, > Vadym > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] understanding scores
On Fri, Nov 12, 2010 at 7:54 PM, Pavlos Parissis wrote: > Hi, > > I am trying to understand how the scores are calculated based on the > output of ptest -sL and I have few questions > Below is my scores with a line number column and the bottom you will > find my configuration > > So, let's start > > 1 group_color: pbx_service_01 allocation score on node-01: 200 > 2 group_color: pbx_service_01 allocation score on node-03: 10 > 3 group_color: ip_01 allocation score on node-01: 1200 > 4 group_color: ip_01 allocation score on node-03: 10 > so for so good, ip_01 has 1000 due to resource-stickiness="1000" plus > 200 from the group location constraint > > 5 group_color: fs_01 allocation score on node-01: 1000 > 6 group_color: fs_01 allocation score on node-03: 0 > 7 group_color: pbx_01 allocation score on node-01: 1000 > 8 group_color: pbx_01 allocation score on node-03: 0 > 9 group_color: sshd_01 allocation score on node-01: 1000 > 10 group_color: sshd_01 allocation score on node-03: 0 > 11 group_color: mailAlert-01 allocation score on node-01: 1000 > 12 group_color: mailAlert-01 allocation score on node-03: 0 > hold on now, why all the above resources have 1000 on node-01 and not > 1200 as fs_01 its only applied to ip_01, the rest inherit it from there > > 13 native_color: ip_01 allocation score on node-01: 5200 > 5 resources x 1000 from resource-stickiness="1000" plus, right? what > is the difference between in native and group? Many things, can you be specific? > > 14 native_color: ip_01 allocation score on node-03: 10 > 15 clone_color: ms-drbd_01 allocation score on node-01: 4100 > why 4100? probably the promotion score > > 16 clone_color: ms-drbd_01 allocation score on node-03: -100 > I guess this comes out from the colocation constraint thats usually the reason > > 17 clone_color: drbd_01:0 allocation score on node-01: 11100 > i am lost now so I will stop here :-) > > 18 clone_color: drbd_01:0 allocation score on node-03: 0 > 19 clone_color: drbd_01:1 allocation score on node-01: 100 > 20 clone_color: drbd_01:1 allocation score on node-03: 11000 > 21 native_color: drbd_01:0 allocation score on node-01: 11100 > 22 native_color: drbd_01:0 allocation score on node-03: 0 > 23 native_color: drbd_01:1 allocation score on node-01: -100 > 24 native_color: drbd_01:1 allocation score on node-03: 11000 > 25 drbd_01:0 promotion score on node-01: 18100 > 26 drbd_01:1 promotion score on node-03: -100 > 27 native_color: fs_01 allocation score on node-01: 15100 > 28 native_color: fs_01 allocation score on node-03: -100 > 29 native_color: pbx_01 allocation score on node-01: 3000 > 30 native_color: pbx_01 allocation score on node-03: -100 > 31 native_color: sshd_01 allocation score on node-01: 2000 > 32 native_color: sshd_01 allocation score on node-03: -100 > 33 native_color: mailAlert-01 allocation score on node-01: 1000 > 34 native_color: mailAlert-01 allocation score on node-03: -100 > 35 group_color: pbx_service_02 allocation score on node-02: 200 > 36 group_color: pbx_service_02 allocation score on node-03: 10 > 37 group_color: ip_02 allocation score on node-02: 1200 > 38 group_color: ip_02 allocation score on node-03: 10 > 39 group_color: fs_02 allocation score on node-02: 1000 > 40 group_color: fs_02 allocation score on node-03: 0 > 41 group_color: pbx_02 allocation score on node-02: 1000 > 42 group_color: pbx_02 allocation score on node-03: 0 > 43 group_color: sshd_02 allocation score on node-02: 1000 > 44 group_color: sshd_02 allocation score on node-03: 0 > 45 group_color: mailAlert-02 allocation score on node-02: 1000 > 46 group_color: mailAlert-02 allocation score on node-03: 0 > 47 native_color: ip_02 allocation score on node-02: 5200 > 48 native_color: ip_02 allocation score on node-03: 10 > 49 clone_color: ms-drbd_02 allocation score on node-02: 4100 > 50 clone_color: ms-drbd_02 allocation score on node-03: -100 > 51 clone_color: drbd_02:0 allocation score on node-02: 11100 > 52 clone_color: drbd_02:0 allocation score on node-03: 0 > 53 clone_color: drbd_02:1 allocation score on node-02: 100 > 54 clone_color: drbd_02:1 allocation score on node-03: 11000 > 55 native_color: drbd_02:0 allocation score on node-02: 11100 > 56 native_color: drbd_02:0 allocation score on node-03: 0 > 57 native_color: drbd_02:1 allocation score on node-02: -100 > 58 native_color: drbd_02:1 allocation score on node-03: 11000 > 59 drbd_02:0 promotion score on node-02: 18100 > 60 drbd_02:2 promotion score on none: 0 > 61 drbd_02:1 promotion score on node-03: -100 > 62 native_color: fs_02 allocation score on node-02: 15100 > 63 native_color: fs_02 allocation score on node-03: -100 > 64 native_color: pbx_02 allocation score on node-02: 3000 > 65 native_color: pbx_02 allocation score on node-03: -100 > 66 native_color: sshd_02 allocation score on node-02: 2000 > 67 native_color: sshd_02 allocation score on node-03: -
Re: [Pacemaker] making resource managed
On Fri, Nov 12, 2010 at 10:47 AM, Vadim S. Khondar wrote: > У ср, 2010-11-10 у 09:03 +0100, Andrew Beekhof пише: >> On Tue, Nov 9, 2010 at 2:14 PM, Vadim S. Khondar wrote: >> > У вт, 2010-11-09 у 09:49 +0100, Andrew Beekhof пише: >> >> being unmanaged is a side-effect of a) the resource failing to stop >> >> and b) no fencing being configured >> >> once you've fixed the error, run crm resource cleanup as misch suggested >> >> >> > >> > I understand that. >> > However, for example, in situation when VPS fails to start (not to stop) >> >> Its failing to stop too: >> >> ca_stop_0 (node=ha-3, call=49, rc=1, status=complete): unknown error >> > >> >> Possibly an ordering constraint. Otherwise, no idea. >> Depends on how your resource agent works. > > No ordering constraints explicitly listed in configuration. Right, I'm suggesting that might be the problem. > :( > > Will try with moving to v1.1. > > -- > Vadim S. Khondar > > v.khon...@o3.ua > > ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Pacemaker-1.1.4, when?
If someone can fix the patch so that the regression tests pass I'll apply it, but I won't have any time to work on it for at least a few weeks. On Mon, Nov 15, 2010 at 2:59 AM, nozawat wrote: > Hi Andrew and Nikola, > > Oneself carried out regression test, too, and an error was given equally. > > Regards, > Tomo > > > 2010/11/12 Nikola Ciprich >> >> (resent) >> 1.1.4 with new glib2: tests pass smoothly >> 1.1.4 + patch and older glib2 - all tests are segfaulting... >> >> ie: >> Program terminated with signal 11, Segmentation fault. >> #0 IA__g_str_hash (v=0x0) at gstring.c:95 >> 95 guint32 h = *p; >> (gdb) bt >> #0 IA__g_str_hash (v=0x0) at gstring.c:95 >> #1 0x7fe087bb6128 in g_hash_table_lookup_node (hash_table=0x1390ec0, >> key=0x0, value=0x13a3b00) at ghash.c:231 >> #2 IA__g_hash_table_insert (hash_table=0x1390ec0, key=0x0, >> value=0x13a3b00) at ghash.c:336 >> #3 0x7fe089367953 in convert_graph_action (resource=0x13a30a0, >> action=0x139cb80, status=0, rc=7) at unpack.c:308 >> #4 0x0040362a in exec_rsc_action (graph=0x1394fa0, >> action=0x139cb80) at crm_inject.c:359 >> #5 0x7fe089368642 in initiate_action (graph=0x1394fa0, >> action=0x139cb80) at graph.c:172 >> #6 0x7fe08936899d in fire_synapse (graph=0x1394fa0, >> synapse=0x139ba60) at graph.c:204 >> #7 0x7fe089368dbd in run_graph (graph=0x1394fa0) at graph.c:262 >> #8 0x0040428f in run_simulation (data_set=0x7fff712280a0) at >> crm_inject.c:540 >> #9 0x0040632a in main (argc=9, argv=0x7fff71228308) at >> crm_inject.c:1148 >> >> >> On Fri, Nov 12, 2010 at 01:41:26PM +0100, Andrew Beekhof wrote: >> > 2010/11/12 Nikola Ciprich : >> > >> do the pe regression tests pass? >> > > Hi Andrew, >> > > how do I run PE tests? looking into regression directory, >> > > I'm a bit confused.. >> > >> > either pengine/regression.sh from the top of the source directory, or >> > from somewhere under /usr/share/pacemaker (check where the -devel >> > package puts them) >> > >> > > n. >> > > >> > > -- >> > > - >> > > Ing. Nikola CIPRICH >> > > LinuxBox.cz, s.r.o. >> > > 28. rijna 168, 709 01 Ostrava >> > > >> > > tel.: +420 596 603 142 >> > > fax: +420 596 621 273 >> > > mobil: +420 777 093 799 >> > > www.linuxbox.cz >> > > >> > > mobil servis: +420 737 238 656 >> > > email servis: ser...@linuxbox.cz >> > > - >> > > >> > >> > ___ >> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> > >> > Project Home: http://www.clusterlabs.org >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> > Bugs: >> > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >> > >> >> -- >> - >> Ing. Nikola CIPRICH >> LinuxBox.cz, s.r.o. >> 28. rijna 168, 709 01 Ostrava >> >> tel.: +420 596 603 142 >> fax: +420 596 621 273 >> mobil: +420 777 093 799 >> www.linuxbox.cz >> >> mobil servis: +420 737 238 656 >> email servis: ser...@linuxbox.cz >> - >> >> ___ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Stonith Device APC AP7900
Rick Cone wrote: > Perhaps I'll just use 1 outlet with the node name, > with a power splitter to the 2 redundant power supplies to reduce the > chances of problems. IMO, if you're going to use a chassis with redundant power supplies, you're better off with a system that uses an ALOM/DRAC/iLO, or equivalent that sits between the redundant power bus and the main boards, and then plug your redundant power supplies into two different circuits. Otherwise, for what's supposed to be an HA system, you're introducing unnecessary single points of failure. I know the arguments about using an external device for fencing, but in the redundant power case if you can't reach your ALOM/whatever you're already looking at the multiple failures. But it's your datacenter. FWIW, I can confirm that the AP7900 works just fine and should be added to the list of supported devices. Devin ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Pacemaker-1.1.4, when?
Hi Andrew and Nikola, Oneself carried out regression test, too, and an error was given equally. Regards, Tomo 2010/11/12 Nikola Ciprich > (resent) > 1.1.4 with new glib2: tests pass smoothly > 1.1.4 + patch and older glib2 - all tests are segfaulting... > > ie: > Program terminated with signal 11, Segmentation fault. > #0 IA__g_str_hash (v=0x0) at gstring.c:95 > 95guint32 h = *p; > (gdb) bt > #0 IA__g_str_hash (v=0x0) at gstring.c:95 > #1 0x7fe087bb6128 in g_hash_table_lookup_node (hash_table=0x1390ec0, > key=0x0, value=0x13a3b00) at ghash.c:231 > #2 IA__g_hash_table_insert (hash_table=0x1390ec0, key=0x0, > value=0x13a3b00) at ghash.c:336 > #3 0x7fe089367953 in convert_graph_action (resource=0x13a30a0, > action=0x139cb80, status=0, rc=7) at unpack.c:308 > #4 0x0040362a in exec_rsc_action (graph=0x1394fa0, > action=0x139cb80) at crm_inject.c:359 > #5 0x7fe089368642 in initiate_action (graph=0x1394fa0, > action=0x139cb80) at graph.c:172 > #6 0x7fe08936899d in fire_synapse (graph=0x1394fa0, synapse=0x139ba60) > at graph.c:204 > #7 0x7fe089368dbd in run_graph (graph=0x1394fa0) at graph.c:262 > #8 0x0040428f in run_simulation (data_set=0x7fff712280a0) at > crm_inject.c:540 > #9 0x0040632a in main (argc=9, argv=0x7fff71228308) at > crm_inject.c:1148 > > > On Fri, Nov 12, 2010 at 01:41:26PM +0100, Andrew Beekhof wrote: > > 2010/11/12 Nikola Ciprich : > > >> do the pe regression tests pass? > > > Hi Andrew, > > > how do I run PE tests? looking into regression directory, > > > I'm a bit confused.. > > > > either pengine/regression.sh from the top of the source directory, or > > from somewhere under /usr/share/pacemaker (check where the -devel > > package puts them) > > > > > n. > > > > > > -- > > > - > > > Ing. Nikola CIPRICH > > > LinuxBox.cz, s.r.o. > > > 28. rijna 168, 709 01 Ostrava > > > > > > tel.: +420 596 603 142 > > > fax:+420 596 621 273 > > > mobil: +420 777 093 799 > > > www.linuxbox.cz > > > > > > mobil servis: +420 737 238 656 > > > email servis: ser...@linuxbox.cz > > > - > > > > > > > ___ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > > > -- > - > Ing. Nikola CIPRICH > LinuxBox.cz, s.r.o. > 28. rijna 168, 709 01 Ostrava > > tel.: +420 596 603 142 > fax:+420 596 621 273 > mobil: +420 777 093 799 > www.linuxbox.cz > > mobil servis: +420 737 238 656 > email servis: ser...@linuxbox.cz > - > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > regression.log.gz Description: GNU Zip compressed data ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] start filesystem like this is right?
Message: 6 Date: Sun, 14 Nov 2010 13:17:23 +0800 (CST) From: jiaju liu To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] start filesystem like this is right? Message-ID: <476924.89659...@web15703.mail.cnb.yahoo.com> Content-Type: text/plain; charset="iso-8859-1" >>> start resource steps >> >step(1) >>> crm configure primitive?vol_mpath0 ocf:heartbeat:Filesystem meta >>> target->>>role=stopped params device=/dev/mapper/mpath0 >>> >>>directory=/mnt/mapper/mpath0 fstype='lustre' op start timeout=300s? op >>> stop >>>timeout=120s op monitor timeout=120s interval=60s op notify >>> timeout=60s >>> step(2)crm resource reprobe > > >>step(3) > >>crm configure location vol_mpath0_location_manage?datavol_mpath0 rule > >>->>inf: not_defined pingd_manage or pingd_manage lte 0 >>> > >>crm configure location vol_mpath0_location_data?datavol_mpath0 rule -inf: > >not_defined pingd_data or pingd_data lte 0 >>why do you have 2 location constraints? where is the definitions for >>pingd_data and? pingd_manage? >because we have two network. manage network is ethernet data network is ib. ? >the definitions of pingd >crm configure primitive pingd_data ocf:pacemaker:ping meta target-role=stopped >>params name=pingd_data op start timeout=100s op stop timeout=100s op >monitor >interval=90s timeout=100s"; ? >crm_resource -p host_list -r pingd_data -v IP_list ? >crm configure clone pingd_data_net pingd_data meta >globally-unique=falsetarget->role=stopped ? >crm resource start?pingd_data ? >crm configure primitive pingd_manage ocf:pacemaker:ping meta >target->role=stopped params name=pingd_manage op start timeout=90s op stop >>timeout=100s op monitor interval=90s timeout=100s ? >crm_resource -p host_list -r pingd_manage -v IP_list ? >crm configure clone pingd_manage_net pingd_manage meta globally->unique=false ? >crm resource start pingd_manage > >>>step(4) >> .crm resource start vol_mpath0 > >>>delete resource steps > >>>.step(1) >>>crm resource stop vol_mpath0 >> >>>step(2) >>>crm resource cleanup vol_mpath0 >> >>>step(3) >>>crm configure delete vol_mpath0 >> >>>above?is my steps? is it right? I repeat these steps for several times. at >>>begin >>>it works well. after 5 or 6 times the reosurce could not start .I >>>use crm >>>resource >>start vol_mpath0 again no use. >>Could be that your ping nodes are down? >the node is ok. and do you know the way to check pingd? and I found the >cluster >is something wrong.I check the log? I think node could not get the >situation from >crm I set parameter no-quorum-policy=ignore .This is the reason? package?are >>> ? ? pacemaker-1.0.8-6.1.el5 >>> ? ? pacemaker-libs-devel-1.0.8-6.1.el5 >>> ? ? pacemaker-libs-1.0.8-6.1.el5 >>> >>> ? ? openais packages?are >>> ? ? openaislib-devel-1.1.0-1.el5 >>> ? ? openais-1.1.0-1.el5 >>> ? ? openaislib-1.1.0-1.el5 >>> >>> ??? corosync packages are >>> ? ? corosync-1.2.2-1.1.el5 >>> ? ? corosynclib-devel-1.2.2-1.1.el5 >>> ? ? corosynclib-1.2.2-1.1.el5 >>> ? ? who know why thanks a lot >>> >>> ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] 2 node failover cluster + MySQL Master-Master replica setup
Ruzsinszky Attila wrote: That's what I said - I didn't see it either. but if you you check the current RA: What do you think about this: http://www.lathiat.net/files/MySQL%20-%20DRBD%20&%20Pacemaker.pdf I can't see if this is a real M-M or M-S setup. It's a Master-Slave setup. This is the PDF I mentioned. Offtopic: The setup in that PDF is pretty basic, I think the person that wrote the document and myself share a lot of common views related to the configuration, however I would advise using "drbdadm -- --clear-bitmap new-current-uuid mysql" instead of "drbdadm -- --overwrite-data-of-peer primary mysql" as the latter will start a synchronization process, which is pointless in this case as the DRBD block device is empty so it will be synchronizing empty space while the former synchronizes both servers' partitions "instantly" (starting from version 8.3). Also, I'm impressed to see naming like "ms ms_drbd_mysql drbd_mysql", "colocation mysql_on_drbd inf: mysql ms_drbd_mysql:Master", "order mysql_after_drbd inf: ms_drbd_mysql:promote mysql:start" in official documents, as this is the naming I use as well when defining primitives, collocation and ordering constraints. I know it's not much or that it really doesn't matter how you name the resources and constraints, as long as they are syntactically correct but I just couldn't get used to the resource naming used in the DRBD documentation, sorry guys, you do an awesome work, but 'primitive p_drbd_r0 ocf:linbit:drbd params drbd_resource="r0"', " colocation c_drbd_r0-U_on_drbd_r0 inf: ms_drbd_r0-U ms_drbd_r0:Master" and other such naming confused the life out of me :) Sorry for the Offtopic. Regards, Dan TIA, Ruzsi ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker -- Dan FRINCU Systems Engineer CCNA, RHCE Streamwide Romania ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] 2 node failover cluster + MySQL Master-Master replica setup
> Have you even read that PDF, it documents just that, a MS setup with MySQL I've read many-many pdfs; htmls; readmes etc. without a real working config. Anyway, which PDF do you mention? > Why not M-M? > You have an obsession, you should see a doctor about that. It is not my theory. I got this advice from #mysql-ndb channel. I think I've written it. TIA, Ruzsi ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] 2 node failover cluster + MySQL Master-Master replica setup
So I guess there are 2 ways for a MS setup with MySQL. OK. And where is a cookbook for setting up M-S config? Have you even read that PDF, it documents just that, a MS setup with MySQL ... Why not M-M? You have an obsession, you should see a doctor about that. -- Dan FRINCU Systems Engineer CCNA, RHCE Streamwide Romania ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] 2 node failover cluster + MySQL Master-Master replica setup
> So I guess there are 2 ways for a MS setup with MySQL. OK. And where is a cookbook for setting up M-S config? Why not M-M? I tried to install MySQL Workbench for SLES11 SP1. There are some broken dependencies. :-( Instead of that fact workbench started. (complained about missin SSH tunnel) I wanted to configure RELOAD and SUPER privs. I was surprised I'm not able to do that with workbench! :-( There are just some predefined role. So I have to learn mysql + GRANT commands. Pacemaker (mysql) wrote some lines in messages file about missing privs, like RELOAD and SUPER. Is that the only problem? I don't think so ... I'll check my DMC for mysql parameters ... TIA, Ruzsi ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] 2 node failover cluster + MySQL Master-Master replica setup
Hi, I am pretty sure Linbit announced mysql RA with replication capabilities. Haven't seen documentation though. # crm ra meta mysql|grep ^replica replication_user (string): MySQL replication user replication_passwd (string): MySQL replication user password replication_port (string, [3306]): MySQL replication port You're probably using a newer version of resource-agents, I have resource-agents-1.0.3-2.el5 and: # crm ra meta mysql|grep ^replica # echo $? # 1 I've found the patches for the MySQL RA though http://hydra.azilian.net/gitweb/?p=linux-ha/.git;a=summary And the original thread http://www.mail-archive.com/linux...@lists.linux-ha.org/msg14992.html The patches apply for a Master-Slave Replication setup, haven't tested them though. So now almost the only one possibilities is DRBD+MySQL? So I guess there are 2 ways for a MS setup with MySQL. Regards, Dan -- Dan FRINCU Systems Engineer CCNA, RHCE Streamwide Romania ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] (no subject)
On 13 November 2010 23:24, Bob Schatz wrote: > > Lunch this week? Yes, why not. where and at what time? Shall we go to Pacemaker cafeteria as the other time, they are always available for us :-) Cheers, Pavlos ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker