Re: [Pacemaker] Balancing of clone resources (globally-unique=true)

2010-11-14 Thread Andrew Beekhof
On Fri, Nov 12, 2010 at 7:41 AM, Chris Picton  wrote:
> I have attached the output as requested

Normally it would get balanced, but its being pushed to 01 because
there are so many resources on 02

   sort_node_weight: slb-test-02.ecntelecoms.za.net (12) >
slb-test-01.ecntelecoms.za.net (2) : resources

So the cluster is trying to balance out the resources, just not at the
level you were expecting.

> On Thu, 11 Nov 2010 11:21:51 +0100, Andrew Beekhof wrote:
 what version is this?
>>>
>>>
>>> This is 1.0.9
>>
>> Odd.  I wouldn't have expected this behavior. Can you attach the
>> output
>> from cibadmin -Ql please?
>>
>>
 On Tue, Nov 9, 2010 at 5:51 PM, Chris Picton
  wrote:
> From a previous thread (crm_resource - migrating/halt a cloned
> resource)
>
> Andrew Beekhof wrote:
>> bottom line, you don't get to chose where specific clone instances
>> get placed.
>
> In my case, I have a clone:
> primitive clusterip-9 ocf:heartbeat:IPaddr2 \
>        params ip="192.168.0.9" cidr_netmask="24" \
>        clusterip_hash="sourceip" nic="bondE" \ op monitor
>        interval="30s" \
>        meta resource-stickiness="0"
>
> clone clusterip-9-clone clusterip-9 \
>        meta globally-unique="true" clone-max="2" \
>        clone-node-max="2" resource_stickiness="0"
>
> When I start the clone, both instances start on the same node:
>
> Clone Set: clusterip-9-clone (unique)
>     clusterip-9:0      (ocf::heartbeat:IPaddr2):       Started
> slb-test-01.ecntelecoms.za.net
>     clusterip-9:1      (ocf::heartbeat:IPaddr2):       Started
> slb-test-01.ecntelecoms.za.net
>
> The second node has a colocated set of standalone IP addresses
> running, so I assume that pacemaker is pushing both clusterip
> clones
> to the second node to balance resources.
>
> My scores look like (0 for everything to do with this resource)
> clone_color: clusterip-9-clone allocation score on
> slb-test-01.ecntelecoms.za.net: 0
> clone_color: clusterip-9-clone allocation score on
> slb-test-02.ecntelecoms.za.net: 0
> clone_color: clusterip-9:0 allocation score on
> slb-test-01.ecntelecoms.za.net: 0
> clone_color: clusterip-9:0 allocation score on
> slb-test-02.ecntelecoms.za.net: 0
> clone_color: clusterip-9:1 allocation score on
> slb-test-01.ecntelecoms.za.net: 0
> clone_color: clusterip-9:1 allocation score on
> slb-test-02.ecntelecoms.za.net: 0
> native_color: clusterip-9:0 allocation score on
> slb-test-01.ecntelecoms.za.net: 0
> native_color: clusterip-9:0 allocation score on
> slb-test-02.ecntelecoms.za.net: 0
> native_color: clusterip-9:1 allocation score on
> slb-test-01.ecntelecoms.za.net: 0
> native_color: clusterip-9:1 allocation score on
> slb-test-02.ecntelecoms.za.net: 0
>
>
>
> Is there a way to request pacemaker to try split the clones up if
> possible over the available nodes?
>
> Regards
>
> Chris
>
>
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] colocation that doesn't

2010-11-14 Thread Andrew Beekhof
On Fri, Nov 5, 2010 at 4:07 AM, Vadym Chepkov  wrote:
>
> On Nov 4, 2010, at 12:53 PM, Alan Jones wrote:
>
>> If I understand you correctly, the role of the second resource in the
>> colocation command was defaulting to that of the first "Master" which
>> is not defined or is untested for none-ms resources.
>> Unfortunately, after changed that line to:
>>
>> colocation mystateful-ms-loc inf: mystateful-ms:Master myprim:Started
>>
>> ...it still doesn't work:
>>
>> myprim  (ocf::pacemaker:DummySlow):     Started node6.acme.com
>> Master/Slave Set: mystateful-ms
>>     Masters: [ node5.acme.com ]
>>     Slaves: [ node6.acme.com ]
>>
>> And after:
>> location myprim-loc myprim -inf: node5.acme.com
>>
>> myprim  (ocf::pacemaker:DummySlow):     Started node6.acme.com
>> Master/Slave Set: mystateful-ms
>>     Masters: [ node6.acme.com ]
>>     Slaves: [ node5.acme.com ]
>>
>> What I would like to do is enable logging for the code that calculates
>> the weights, etc.
>> It is obvious to me that the weights are calculated differently for
>> mystateful-ms based on the weights used in myprim.
>> Can you enable more verbose logging online or do you have to recompile?
>> My version is 1.0.9-89bd754939df5150de7cd76835f98fe90851b677 which is
>> different from Vadym's.
>> BTW: Is there another release planned for the stable branch?  1.0.9.1
>> is now 4 months old.
>> I understand that I could take the top of tree, but I would like to
>> believe that others are running the same version. ;)
>> Thank you!
>> Alan
>>
>> On Thu, Nov 4, 2010 at 8:22 AM, Dejan Muhamedagic  
>> wrote:
>>> Hi,
>>>
>>> On Thu, Nov 04, 2010 at 06:51:59AM -0400, Vadym Chepkov wrote:
 On Thu, Nov 4, 2010 at 5:37 AM, Dejan Muhamedagic  
 wrote:

> This should be:
>
> colocation mystateful-ms-loc inf: mystateful-ms:Master myprim:Started
>

 Interesting, so in this case it is not necessary?

 colocation fs_on_drbd inf: WebFS WebDataClone:Master
 (taken from Cluster_from_Scratch)

 but other way around it is?
>>>
>>> Yes, the role of the second resource defaults to the role of the
>>> first. Ditto for order and actions. A bit confusing, I know.
>>>
>>> Thanks,
>>>
>>> Dejan
>>>
>
>
> I did it a bit different this time and I observe the same anomaly.
>
> First I started stateful clone
>
> primitive s1 ocf:pacemaker:Stateful
> ms ms1 s1 meta master-max="1" master-node-max="1" clone-max="2" 
> clone-node-max="1" notify="true"
>
> Then a primitive:
>
> primitive d1 ocf:pacemaker:Dummy
>
> Made sure Master and primitive are running on different hosts
> location ld1 d1 10: xen-12
>
> and then I added constraint
> colocation c1 inf: ms1:Master d1:Started
>
>  Master/Slave Set: ms1
>     Masters: [ xen-11 ]
>     Slaves: [ xen-12 ]
>  d1     (ocf::pacemaker:Dummy): Started xen-12
>
>
> It seems colocation constraint is not enough to promote a clone. Looks like a 
> bug.
>
> # ptest -sL|grep s1
> clone_color: ms1 allocation score on xen-11: 0
> clone_color: ms1 allocation score on xen-12: 0
> clone_color: s1:0 allocation score on xen-11: 11
> clone_color: s1:0 allocation score on xen-12: 0
> clone_color: s1:1 allocation score on xen-11: 0
> clone_color: s1:1 allocation score on xen-12: 6
> native_color: s1:0 allocation score on xen-11: 11
> native_color: s1:0 allocation score on xen-12: 0
> native_color: s1:1 allocation score on xen-11: -100
> native_color: s1:1 allocation score on xen-12: 6
> s1:0 promotion score on xen-11: 20
> s1:1 promotion score on xen-12: 20
>
> Vadym

Could you attach the result of cibadmin -Ql when the cluster is in
this state please?

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] using xml for rules

2010-11-14 Thread Andrew Beekhof
On Thu, Nov 11, 2010 at 2:10 PM, Pavlos Parissis
 wrote:
> I removed "score=2" from
> 

Have a look at the schema file:
   
http://hg.clusterlabs.org/pacemaker/stable-1.0/raw-file/tip/xml/pacemaker.rng.in
   http://hg.clusterlabs.org/pacemaker/stable-1.0/raw-file/tip/xml/rule.rng.in

For starters: s/boolean_op/boolean-op/

>
>  245     
>  246       
>  247         
>  248           
>  249              hours="6-23" weekdays="1-5"/>
>  250           
>  251           
>  252              weekdays="6-7"/>
>  253           
>  254         
>  255          name="resource-stickiness" value="INFINITY"/>
>  256       
>  257       
>  258          name="resource-stickiness" value="0"/>
>  259       
>  260     
>
> and now I only get, from these I can't figure out where exactly is my
> mistake on the rules, coffee didn't help that much
>
> Relax-NG validity error : Extra element rsc_defaults in interleave
> /var/run/crm/cib-invalid.9yBICJ:245: element rsc_defaults: Relax-NG
> validity error : Element configuration failed to validate content
> /var/run/crm/cib-invalid.9yBICJ:1: element cib: Relax-NG validity
> error : Element cib failed to validate content
> cibadmin[9627]: 2010/11/11_14:07:26 ERROR: main: Call failed: Update
> does not conform to the configured schema/DTD
> Call failed: Update does not conform to the configured schema/DTD
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] drbd-xen and fencing

2010-11-14 Thread Andrew Beekhof
Don't use init.d/drbd, use the ocf script that comes with the drbd packages

On Thu, Nov 11, 2010 at 2:19 PM, Vadym Chepkov  wrote:
> Hi,
>
> I posted a less elaborate version of this question to drbd mail-list, but,
> unfortunately, didn't get a reply,
> maybe audience of this list has more experience.
> I am trying to make xen live migration to work reliably, but wasn't
> successful so far.
> Here is the problem.
> In a cluster configuration I have two type of resources - file systems on
> drbd, with explicit drbd resources configuration and
> Xen resources with implicit, using drbd-xen block device helper. For the
> former everything works great, but the latter doesn't work quite well.
> In order for helper script to work, drbd module has to be loaded and
> underlying resources up. So I have to start init.d/drbd script.
> I can't make it an lsb cluster resource, because stop will be disastrous for
> file system resources. Enable it in startup sequence breaks
> /usr/lib/drbd/crm-unfence-peer.sh, because cluster stack is not completely
> up by the time drbd script finishes, and there is no way to configure only
> specific resources that need to be initialized.
> Also, I can't find a way fence Xen resource. I tried fence-peer
> "/usr/lib/drbd/crm-fence-peer.sh -i xen_vsvn",
> where xen_svn is the name of Xen primitive, but it doesn't work,
> so there is a danger of starting Xen VM on an out-of-date node. Then there
> is no way of monitoring underlying drbd resources too.
> I thought of adding underlying drbd resource explicitly in the cluster, but
> I can't figure out what would be the configuration
> for "this resource can be master on both nodes, but if just on one, it's
> fine too".
> allow-two-primaries has to be allowed for live migration and at the time of
> migration resources are primary on both nodes, but when migration finishes,
> it's again primary/slave. But if I configure drbd resource in the cluster
> with meta master-max="2" master-node-max="1",
> cluster insists on having them both primary all the time.
> Hope I didn't bore you to death and there is an elegant solution for
> this conundrum :)
> Thank you,
> Vadym
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] understanding scores

2010-11-14 Thread Andrew Beekhof
On Fri, Nov 12, 2010 at 7:54 PM, Pavlos Parissis
 wrote:
> Hi,
>
> I am trying to understand how the scores are calculated based on the
> output of ptest -sL and I have few questions
> Below is my scores with a  line number column and the bottom you will
> find my configuration
>
> So, let's start
>
> 1 group_color: pbx_service_01 allocation score on node-01: 200
>  2 group_color: pbx_service_01 allocation score on node-03: 10
>  3 group_color: ip_01 allocation score on node-01: 1200
>  4 group_color: ip_01 allocation score on node-03: 10
> so for so good, ip_01 has 1000 due to resource-stickiness="1000" plus
> 200 from the group location constraint
>
>  5 group_color: fs_01 allocation score on node-01: 1000
>  6 group_color: fs_01 allocation score on node-03: 0
>  7 group_color: pbx_01 allocation score on node-01: 1000
>  8 group_color: pbx_01 allocation score on node-03: 0
>  9 group_color: sshd_01 allocation score on node-01: 1000
>  10 group_color: sshd_01 allocation score on node-03: 0
>  11 group_color: mailAlert-01 allocation score on node-01: 1000
>  12 group_color: mailAlert-01 allocation score on node-03: 0
> hold on now, why all the above resources have 1000 on node-01 and not
> 1200 as fs_01

its only applied to ip_01, the rest inherit it from there

>
>  13 native_color: ip_01 allocation score on node-01: 5200
> 5 resources x 1000 from resource-stickiness="1000" plus, right? what
> is the difference between in native and group?

Many things, can you be specific?

>
>  14 native_color: ip_01 allocation score on node-03: 10
>  15 clone_color: ms-drbd_01 allocation score on node-01: 4100
> why 4100?

probably the promotion score

>
>  16 clone_color: ms-drbd_01 allocation score on node-03: -100
> I guess this comes out from the colocation constraint

thats usually the reason

>
>  17 clone_color: drbd_01:0 allocation score on node-01: 11100
> i am lost now so I will stop here :-)
>
>  18 clone_color: drbd_01:0 allocation score on node-03: 0
>  19 clone_color: drbd_01:1 allocation score on node-01: 100
>  20 clone_color: drbd_01:1 allocation score on node-03: 11000
>  21 native_color: drbd_01:0 allocation score on node-01: 11100
>  22 native_color: drbd_01:0 allocation score on node-03: 0
>  23 native_color: drbd_01:1 allocation score on node-01: -100
>  24 native_color: drbd_01:1 allocation score on node-03: 11000
>  25 drbd_01:0 promotion score on node-01: 18100
>  26 drbd_01:1 promotion score on node-03: -100
>  27 native_color: fs_01 allocation score on node-01: 15100
>  28 native_color: fs_01 allocation score on node-03: -100
>  29 native_color: pbx_01 allocation score on node-01: 3000
>  30 native_color: pbx_01 allocation score on node-03: -100
>  31 native_color: sshd_01 allocation score on node-01: 2000
>  32 native_color: sshd_01 allocation score on node-03: -100
>  33 native_color: mailAlert-01 allocation score on node-01: 1000
>  34 native_color: mailAlert-01 allocation score on node-03: -100
>  35 group_color: pbx_service_02 allocation score on node-02: 200
>  36 group_color: pbx_service_02 allocation score on node-03: 10
>  37 group_color: ip_02 allocation score on node-02: 1200
>  38 group_color: ip_02 allocation score on node-03: 10
>  39 group_color: fs_02 allocation score on node-02: 1000
>  40 group_color: fs_02 allocation score on node-03: 0
>  41 group_color: pbx_02 allocation score on node-02: 1000
>  42 group_color: pbx_02 allocation score on node-03: 0
>  43 group_color: sshd_02 allocation score on node-02: 1000
>  44 group_color: sshd_02 allocation score on node-03: 0
>  45 group_color: mailAlert-02 allocation score on node-02: 1000
>  46 group_color: mailAlert-02 allocation score on node-03: 0
>  47 native_color: ip_02 allocation score on node-02: 5200
>  48 native_color: ip_02 allocation score on node-03: 10
>  49 clone_color: ms-drbd_02 allocation score on node-02: 4100
>  50 clone_color: ms-drbd_02 allocation score on node-03: -100
>  51 clone_color: drbd_02:0 allocation score on node-02: 11100
>  52 clone_color: drbd_02:0 allocation score on node-03: 0
>  53 clone_color: drbd_02:1 allocation score on node-02: 100
>  54 clone_color: drbd_02:1 allocation score on node-03: 11000
>  55 native_color: drbd_02:0 allocation score on node-02: 11100
>  56 native_color: drbd_02:0 allocation score on node-03: 0
>  57 native_color: drbd_02:1 allocation score on node-02: -100
>  58 native_color: drbd_02:1 allocation score on node-03: 11000
>  59 drbd_02:0 promotion score on node-02: 18100
>  60 drbd_02:2 promotion score on none: 0
>  61 drbd_02:1 promotion score on node-03: -100
>  62 native_color: fs_02 allocation score on node-02: 15100
>  63 native_color: fs_02 allocation score on node-03: -100
>  64 native_color: pbx_02 allocation score on node-02: 3000
>  65 native_color: pbx_02 allocation score on node-03: -100
>  66 native_color: sshd_02 allocation score on node-02: 2000
>  67 native_color: sshd_02 allocation score on node-03: -

Re: [Pacemaker] making resource managed

2010-11-14 Thread Andrew Beekhof
On Fri, Nov 12, 2010 at 10:47 AM, Vadim S. Khondar  wrote:
> У ср, 2010-11-10 у 09:03 +0100, Andrew Beekhof пише:
>> On Tue, Nov 9, 2010 at 2:14 PM, Vadim S. Khondar  wrote:
>> > У вт, 2010-11-09 у 09:49 +0100, Andrew Beekhof пише:
>> >> being unmanaged is a side-effect of a) the resource failing to stop
>> >> and b) no fencing being configured
>> >> once you've fixed the error, run crm resource cleanup as misch suggested
>> >>
>> >
>> > I understand that.
>> > However, for example, in situation when VPS fails to start (not to stop)
>>
>> Its failing to stop too:
>>
>> ca_stop_0 (node=ha-3, call=49, rc=1, status=complete): unknown error
>>    
>
>>
>> Possibly an ordering constraint.  Otherwise, no idea.
>> Depends on how your resource agent works.
>
> No ordering constraints explicitly listed in configuration.

Right, I'm suggesting that might be the problem.

> :(
>
> Will try with moving to v1.1.
>
> --
> Vadim S. Khondar
>
> v.khon...@o3.ua
>
>

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Pacemaker-1.1.4, when?

2010-11-14 Thread Andrew Beekhof
If someone can fix the patch so that the regression tests pass I'll
apply it, but I won't have any time to work on it for at least a few
weeks.

On Mon, Nov 15, 2010 at 2:59 AM, nozawat  wrote:
> Hi Andrew and Nikola,
>
>   Oneself carried out regression test, too, and an error was given equally.
>
> Regards,
> Tomo
>
>
> 2010/11/12 Nikola Ciprich 
>>
>> (resent)
>> 1.1.4 with new glib2: tests pass smoothly
>> 1.1.4 + patch and older glib2 - all tests are segfaulting...
>>
>> ie:
>> Program terminated with signal 11, Segmentation fault.
>> #0  IA__g_str_hash (v=0x0) at gstring.c:95
>> 95    guint32 h = *p;
>> (gdb) bt
>> #0  IA__g_str_hash (v=0x0) at gstring.c:95
>> #1  0x7fe087bb6128 in g_hash_table_lookup_node (hash_table=0x1390ec0,
>> key=0x0, value=0x13a3b00) at ghash.c:231
>> #2  IA__g_hash_table_insert (hash_table=0x1390ec0, key=0x0,
>> value=0x13a3b00) at ghash.c:336
>> #3  0x7fe089367953 in convert_graph_action (resource=0x13a30a0,
>> action=0x139cb80, status=0, rc=7) at unpack.c:308
>> #4  0x0040362a in exec_rsc_action (graph=0x1394fa0,
>> action=0x139cb80) at crm_inject.c:359
>> #5  0x7fe089368642 in initiate_action (graph=0x1394fa0,
>> action=0x139cb80) at graph.c:172
>> #6  0x7fe08936899d in fire_synapse (graph=0x1394fa0,
>> synapse=0x139ba60) at graph.c:204
>> #7  0x7fe089368dbd in run_graph (graph=0x1394fa0) at graph.c:262
>> #8  0x0040428f in run_simulation (data_set=0x7fff712280a0) at
>> crm_inject.c:540
>> #9  0x0040632a in main (argc=9, argv=0x7fff71228308) at
>> crm_inject.c:1148
>>
>>
>> On Fri, Nov 12, 2010 at 01:41:26PM +0100, Andrew Beekhof wrote:
>> > 2010/11/12 Nikola Ciprich :
>> > >> do the pe regression tests pass?
>> > > Hi Andrew,
>> > > how do I run PE tests? looking into regression directory,
>> > > I'm a bit confused..
>> >
>> > either pengine/regression.sh from the top of the source directory, or
>> > from somewhere under /usr/share/pacemaker (check where the -devel
>> > package puts them)
>> >
>> > > n.
>> > >
>> > > --
>> > > -
>> > > Ing. Nikola CIPRICH
>> > > LinuxBox.cz, s.r.o.
>> > > 28. rijna 168, 709 01 Ostrava
>> > >
>> > > tel.:   +420 596 603 142
>> > > fax:    +420 596 621 273
>> > > mobil:  +420 777 093 799
>> > > www.linuxbox.cz
>> > >
>> > > mobil servis: +420 737 238 656
>> > > email servis: ser...@linuxbox.cz
>> > > -
>> > >
>> >
>> > ___
>> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs:
>> > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>> >
>>
>> --
>> -
>> Ing. Nikola CIPRICH
>> LinuxBox.cz, s.r.o.
>> 28. rijna 168, 709 01 Ostrava
>>
>> tel.:   +420 596 603 142
>> fax:    +420 596 621 273
>> mobil:  +420 777 093 799
>> www.linuxbox.cz
>>
>> mobil servis: +420 737 238 656
>> email servis: ser...@linuxbox.cz
>> -
>>
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs:
>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Stonith Device APC AP7900

2010-11-14 Thread Devin Reade
Rick Cone  wrote:

> Perhaps I'll just use 1 outlet with the node name,
> with a power splitter to the 2 redundant power supplies to reduce the
> chances of problems.

IMO, if you're going to use a chassis with redundant power supplies,
you're better off with a system that uses an ALOM/DRAC/iLO, or equivalent
that sits between the redundant power bus and the main boards, and
then plug your redundant power supplies into two different circuits.
Otherwise, for what's supposed to be an HA system, you're introducing
unnecessary single points of failure.  I know the arguments about
using an external device for fencing, but in the redundant power case
if you can't reach your ALOM/whatever you're already looking at the
multiple failures.

But it's your datacenter.

FWIW, I can confirm that the AP7900 works just fine and should be 
added to the list of supported devices.

Devin


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Pacemaker-1.1.4, when?

2010-11-14 Thread nozawat
Hi Andrew and Nikola,

  Oneself carried out regression test, too, and an error was given equally.

Regards,
Tomo


2010/11/12 Nikola Ciprich 

> (resent)
> 1.1.4 with new glib2: tests pass smoothly
> 1.1.4 + patch and older glib2 - all tests are segfaulting...
>
> ie:
> Program terminated with signal 11, Segmentation fault.
> #0  IA__g_str_hash (v=0x0) at gstring.c:95
> 95guint32 h = *p;
> (gdb) bt
> #0  IA__g_str_hash (v=0x0) at gstring.c:95
> #1  0x7fe087bb6128 in g_hash_table_lookup_node (hash_table=0x1390ec0,
> key=0x0, value=0x13a3b00) at ghash.c:231
> #2  IA__g_hash_table_insert (hash_table=0x1390ec0, key=0x0,
> value=0x13a3b00) at ghash.c:336
> #3  0x7fe089367953 in convert_graph_action (resource=0x13a30a0,
> action=0x139cb80, status=0, rc=7) at unpack.c:308
> #4  0x0040362a in exec_rsc_action (graph=0x1394fa0,
> action=0x139cb80) at crm_inject.c:359
> #5  0x7fe089368642 in initiate_action (graph=0x1394fa0,
> action=0x139cb80) at graph.c:172
> #6  0x7fe08936899d in fire_synapse (graph=0x1394fa0, synapse=0x139ba60)
> at graph.c:204
> #7  0x7fe089368dbd in run_graph (graph=0x1394fa0) at graph.c:262
> #8  0x0040428f in run_simulation (data_set=0x7fff712280a0) at
> crm_inject.c:540
> #9  0x0040632a in main (argc=9, argv=0x7fff71228308) at
> crm_inject.c:1148
>
>
> On Fri, Nov 12, 2010 at 01:41:26PM +0100, Andrew Beekhof wrote:
> > 2010/11/12 Nikola Ciprich :
> > >> do the pe regression tests pass?
> > > Hi Andrew,
> > > how do I run PE tests? looking into regression directory,
> > > I'm a bit confused..
> >
> > either pengine/regression.sh from the top of the source directory, or
> > from somewhere under /usr/share/pacemaker (check where the -devel
> > package puts them)
> >
> > > n.
> > >
> > > --
> > > -
> > > Ing. Nikola CIPRICH
> > > LinuxBox.cz, s.r.o.
> > > 28. rijna 168, 709 01 Ostrava
> > >
> > > tel.:   +420 596 603 142
> > > fax:+420 596 621 273
> > > mobil:  +420 777 093 799
> > > www.linuxbox.cz
> > >
> > > mobil servis: +420 737 238 656
> > > email servis: ser...@linuxbox.cz
> > > -
> > >
> >
> > ___
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >
>
> --
> -
> Ing. Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28. rijna 168, 709 01 Ostrava
>
> tel.:   +420 596 603 142
> fax:+420 596 621 273
> mobil:  +420 777 093 799
> www.linuxbox.cz
>
> mobil servis: +420 737 238 656
> email servis: ser...@linuxbox.cz
> -
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>


regression.log.gz
Description: GNU Zip compressed data
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] start filesystem like this is right?

2010-11-14 Thread jiaju liu








Message: 6
Date: Sun, 14 Nov 2010 13:17:23 +0800 (CST)
From: jiaju liu 
To: pacemaker@oss.clusterlabs.org
Subject: Re: [Pacemaker] start filesystem like this is right?
Message-ID: <476924.89659...@web15703.mail.cnb.yahoo.com>
Content-Type: text/plain; charset="iso-8859-1"





>>> start resource steps
>> >step(1)
>>> crm configure primitive?vol_mpath0 ocf:heartbeat:Filesystem meta 
>>> target->>>role=stopped params device=/dev/mapper/mpath0 
>>> >>>directory=/mnt/mapper/mpath0 fstype='lustre' op start timeout=300s? op 
>>> stop >>>timeout=120s op monitor timeout=120s interval=60s op notify 
>>> timeout=60s
>>> step(2)crm resource reprobe
>
> >>step(3)
> >>crm configure location vol_mpath0_location_manage?datavol_mpath0 rule 
> >>->>inf: not_defined pingd_manage or pingd_manage lte 0
>>>
> >>crm configure location vol_mpath0_location_data?datavol_mpath0 rule -inf: 
> >not_defined pingd_data or pingd_data lte 0

>>why do you have 2 location constraints? where is the definitions for
>>pingd_data and? pingd_manage?

>because we have two network. manage network is ethernet data network is ib.
?
>the definitions of pingd
>crm configure primitive pingd_data ocf:pacemaker:ping meta target-role=stopped 
>>params name=pingd_data op start timeout=100s op stop timeout=100s op >monitor 
>interval=90s timeout=100s";
?
>crm_resource -p host_list -r pingd_data -v IP_list
?
>crm configure clone pingd_data_net pingd_data meta 
>globally-unique=falsetarget->role=stopped
?
>crm resource start?pingd_data
?
>crm configure primitive pingd_manage ocf:pacemaker:ping meta 
>target->role=stopped params name=pingd_manage op start timeout=90s op stop 
>>timeout=100s op monitor interval=90s timeout=100s
?
>crm_resource -p host_list -r pingd_manage -v IP_list
?
>crm configure clone pingd_manage_net pingd_manage meta globally->unique=false
?
>crm resource start pingd_manage

>
>>>step(4)
>> .crm resource start vol_mpath0
>
>>>delete resource steps
>
>>>.step(1)
>>>crm resource stop vol_mpath0
>>
>>>step(2)
>>>crm resource cleanup vol_mpath0
>>
>>>step(3)
>>>crm configure delete vol_mpath0
>>
>>>above?is my steps? is it right? I repeat these steps for several times. at 
>>>begin >>>it works well. after 5 or 6 times the reosurce could not start .I 
>>>use crm >>>resource >>start vol_mpath0 again no use.

>>Could be that your ping nodes are down?
>the node is ok. and do you know the way to check pingd? and I found the 
>cluster >is something wrong.I check the log? I think node could not get the 
>situation from >crm
 
 I set parameter no-quorum-policy=ignore .This is the reason?



package?are
>>> ? ? pacemaker-1.0.8-6.1.el5
>>> ? ? pacemaker-libs-devel-1.0.8-6.1.el5
>>> ? ? pacemaker-libs-1.0.8-6.1.el5
>>>
>>> ? ? openais packages?are
>>> ? ? openaislib-devel-1.1.0-1.el5
>>> ? ? openais-1.1.0-1.el5
>>> ? ? openaislib-1.1.0-1.el5
>>>
>>> ??? corosync packages are
>>> ? ? corosync-1.2.2-1.1.el5
>>> ? ? corosynclib-devel-1.2.2-1.1.el5
>>> ? ? corosynclib-1.2.2-1.1.el5
>>> ? ? who know why thanks a lot
>>>
>>>



  ___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] 2 node failover cluster + MySQL Master-Master replica setup

2010-11-14 Thread Dan Frincu

Ruzsinszky Attila wrote:

That's what I said - I didn't see it either.
but if you you check the current RA:


What do you think about this:
http://www.lathiat.net/files/MySQL%20-%20DRBD%20&%20Pacemaker.pdf

I can't see if this is a real M-M or M-S setup.
  

It's a Master-Slave setup.

This is the PDF I mentioned.

Offtopic: The setup in that PDF is pretty basic, I think the person that 
wrote the document and myself share a lot of common views related to the 
configuration, however I would advise using "drbdadm -- --clear-bitmap 
new-current-uuid mysql" instead of "drbdadm -- --overwrite-data-of-peer 
primary mysql" as the latter will start a synchronization process, which 
is pointless in this case as the DRBD block device is empty so it will 
be synchronizing empty space while the former synchronizes both servers' 
partitions "instantly" (starting from version 8.3). Also, I'm impressed 
to see naming like "ms ms_drbd_mysql drbd_mysql", "colocation 
mysql_on_drbd inf: mysql ms_drbd_mysql:Master", "order mysql_after_drbd 
inf: ms_drbd_mysql:promote mysql:start" in official documents, as this 
is the naming I use as well when defining primitives, collocation and 
ordering constraints. I know it's not much or that it really doesn't 
matter how you name the resources and constraints, as long as they are 
syntactically correct but I just couldn't get used to the resource 
naming used in the DRBD documentation, sorry guys, you do an awesome 
work, but 'primitive p_drbd_r0 ocf:linbit:drbd params 
drbd_resource="r0"', " colocation c_drbd_r0-U_on_drbd_r0 inf: 
ms_drbd_r0-U ms_drbd_r0:Master" and other such naming confused the life 
out of me :)


Sorry for the Offtopic.

Regards,

Dan

TIA,
Ruzsi

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
  


--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] 2 node failover cluster + MySQL Master-Master replica setup

2010-11-14 Thread Ruzsinszky Attila
> Have you even read that PDF, it documents just that, a MS setup with MySQL
I've read many-many pdfs; htmls; readmes etc. without a real working config.
Anyway, which PDF do you mention?

> Why not M-M?
> You have an obsession, you should see a doctor about that.
It is not my theory. I got this advice from #mysql-ndb channel.
I think I've written it.

TIA,
Ruzsi

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] 2 node failover cluster + MySQL Master-Master replica setup

2010-11-14 Thread Dan Frincu



So I guess there are 2 ways for a MS setup with MySQL.


OK.
And where is a cookbook for setting up M-S config?
  
Have you even read that PDF, it documents just that, a MS setup with 
MySQL ...

Why not M-M?
  

You have an obsession, you should see a doctor about that.

--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] 2 node failover cluster + MySQL Master-Master replica setup

2010-11-14 Thread Ruzsinszky Attila
> So I guess there are 2 ways for a MS setup with MySQL.
OK.
And where is a cookbook for setting up M-S config?
Why not M-M?

I tried to install MySQL Workbench for SLES11 SP1.
There are some broken dependencies. :-(
Instead of that fact workbench started. (complained about
missin SSH tunnel)

I wanted to configure RELOAD and SUPER privs. I was
surprised I'm not able to do that with workbench! :-(
There are just some predefined role. So I have to learn
mysql + GRANT commands.

Pacemaker (mysql) wrote some lines in messages file about
missing privs, like RELOAD and SUPER. Is that the only problem?
I don't think so ...

I'll check my DMC for mysql parameters ...

TIA,
Ruzsi

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] 2 node failover cluster + MySQL Master-Master replica setup

2010-11-14 Thread Dan Frincu

Hi,

I am pretty sure Linbit announced mysql RA with replication capabilities. 
Haven't seen documentation though.

# crm ra meta mysql|grep ^replica
replication_user (string): MySQL replication user
replication_passwd (string): MySQL replication user password
replication_port (string, [3306]): MySQL replication port
  
You're probably using a newer version of resource-agents, I have 
resource-agents-1.0.3-2.el5 and:


# crm ra meta mysql|grep ^replica
# echo $?
# 1

I've found the patches for the MySQL RA though

http://hydra.azilian.net/gitweb/?p=linux-ha/.git;a=summary

And the original thread

http://www.mail-archive.com/linux...@lists.linux-ha.org/msg14992.html

The patches apply for a Master-Slave Replication setup, haven't tested 
them though.

So now almost the only one possibilities is DRBD+MySQL?
  

So I guess there are 2 ways for a MS setup with MySQL.

Regards,

Dan

--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] (no subject)

2010-11-14 Thread Pavlos Parissis
On 13 November 2010 23:24, Bob Schatz  wrote:
>
> Lunch this week?

Yes, why not. where and at what time? Shall we go to Pacemaker
cafeteria as the other time, they are always available for us :-)

Cheers,
Pavlos

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker