Re: [Pacemaker] don't want to restart clone resource

2012-02-07 Thread Fanghao Sha
Hi Andrew,
Is crm_report included in pacemaker-1.0.12-1.el5.centos?
I couldn't find it.

2012/2/4 Andrew Beekhof 

> On Fri, Feb 3, 2012 at 9:35 PM, Fanghao Sha  wrote:
> > Sorry, I don't know how to file a bug,
>
> See the links at the bottom of every mail on this list?
>
> > and i have only "messages" file.
>
> man crm_report
>
> >
> > I have tried to set clone-max=3, and after removing node-1, the clone
> > resource running on node-2 has not restart.
> > But when I add another node-3 to cluster with "hb_addnode", the clone
> > resource running on node-2 became orphaned and restart.
> >
> > As attached "messages" file,
> > I couldn't understand this line:
> > "find_clone: Internally renamed node-app-rsc:2 on node-2 to
> node-app-rsc:3
> > (ORPHAN)".
> >
> > 2012/2/2 Andrew Beekhof 
> >>
> >> On Thu, Feb 2, 2012 at 4:57 AM, Lars Ellenberg
> >>  wrote:
> >> > On Wed, Feb 01, 2012 at 03:43:55PM +0100, Andreas Kurz wrote:
> >> >> Hello,
> >> >>
> >> >> On 02/01/2012 10:39 AM, Fanghao Sha wrote:
> >> >> > Hi Lars,
> >> >> >
> >> >> > Yes, you are right. But how to prevent the "orphaned" resources
> from
> >> >> > stopping by default, please?
> >> >>
> >> >> crm configure property stop-orphan-resources=false
> >> >
> >> > Well, sure. But for "normal" ophans,
> >> > you actually want them to be stopped.
> >> >
> >> > No, pacemaker needs some additional smarts to recognize
> >> > that there actually are no orphans, maybe by first relabling,
> >> > and only then checking for instance label > clone-max.
> >>
> >> Instance label doesn't come into the equation.
> >> It might look like it does on the outside, but its more complicated than
> >> that.
> >>
> >> >
> >> > Did you file a bugzilla?
> >> > Has that made progress?
> >> >
> >> >
> >> > --
> >> > : Lars Ellenberg
> >> > : LINBIT | Your Way to High Availability
> >> > : DRBD/HA support and consulting http://www.linbit.com
> >> >
> >> > ___
> >> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >> >
> >> > Project Home: http://www.clusterlabs.org
> >> > Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> > Bugs: http://bugs.clusterlabs.org
> >>
> >> ___
> >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> >
> >
> >
> > ___
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [ANNOUNCE] The Booth Cluster Ticket Manager - part of multi-site support in pacemaker

2012-02-07 Thread Jiaju Zhang
On Tue, 2012-02-07 at 11:09 +0900, 李泰勲 wrote:
> Hi Jiaju
> 
> I am testing about working of booth while investigating booth source code.
> 
> I don't understand ticket grant and revoke process perfectly that is 
> related to connecting each booth
> so I would like to know booth's working that would be matching your 
> offering source code.
> 
> Could you give me information about booth sequences that would be the 
> ticket's grant,revoke,lease logic and working of ticket's expiry time.

General speaking, you would want to grant some ticket on certain site
initially, which means the corresponding resources can be run at that
site. For the lease logic, ticket granting means that site has the
ticket lease. The lease has an expiry time, after the expiry time, that
lease is expired and the corresponding resources can't be run at that
site any longer.
If the site which has the original ticket granting is alive, it will
renew the lease before the ticket expired, but if that site is broken,
when the lease is expired, the lease logic will go into election stage
and a new site will get the ticket lease, thus the resources will be
able to run at the new site.  
You can revoke the ticket from the site as well, but in most cases, you
may not want to do this. The possible scenario I can think of is when
the admin wants to do some maintenance work, or wants to do the ticket
management manually.

> 
> when do you think the booth's working is fixed and completed ?

Oh, I have not finished it yet;) But I'm still working on it, since I
also have some other tasks, maybe the progress is not fast these days;)

> 
> Is there anything to help you about booth's implementation or etc?

The framework is finished, but there are still some bugs in it, so the
code may not work for you for the time being, I'll be more than happy if
anyone can help to fix bugs, or develop new features;) 
For the short term, I think adding the man pages, documentation and some
automation test programs/scripts would be very good. For the long term,
I also have something new in my mind, maybe I should add a TODO to
document it later.

Well, the primary thing for now is to fix current bugs to make it really
working, and I myself will spend more time on it these two weeks;)

Thanks,
Jiaju

> 
> Best Regards, Taihun
> 
> (2011/12/05 15:18), Jiaju Zhang wrote:
> > Hello everyone,
> >
> > I'm happy to announce to the Booth cluster ticket manager, which is part
> > of the key feature for pacemaker in 2011 - improving support for
> > multi-site clusters.
> >
> > Multi-site clusters can be considered as “overlay” clusters where each
> > cluster site corresponds to a cluster node in a traditional cluster. The
> > overlay cluster is managed by the booth mechanism. It guarantees that
> > the cluster resources will be highly available across different cluster
> > sites takes. This is achieved by using so-called tickets that are
> > treated as failover domain between cluster sites, in case a site should
> > be down.
> >
> > Booth is designed to be an add-on of pacemaker, and now it is also
> > hosted in ClusterLabs, together with pacemaker. You can find it from:
> >
> > https://github.com/ClusterLabs/booth
> >
> > Now, booth is still in heavy development, so it may not work for you for
> > the time being;) But I'll be working on it ...
> >
> > Review and comments are highly appreciated!
> >
> > Thanks,
> > Jiaju
> >
> >
> > ___
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
> >
> 


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Error building Pacemaker on OS X Lion

2012-02-07 Thread Andrew Beekhof
On Wed, Feb 8, 2012 at 12:39 AM, i...@sdips.de  wrote:
> Am 06.02.12 22:00, schrieb Andrew Beekhof:
>> On Tue, Feb 7, 2012 at 1:05 AM, i...@sdips.de  wrote:
>>> Am 29.01.12 23:03, schrieb Andrew Beekhof:
 On Thu, Jan 26, 2012 at 10:25 PM, i...@sdips.de  wrote:
> I've started all over with Macports. After some struggle with gettext,
> the only working configure is working with --prefix=/opt/local.
> But I stuck at the same issue to build pacemaker.
>
> ./configure --prefix=/opt/local --with-initdir=/private/etc/mach_init.d
> --with-heartbeat
> .
> .
> .
> checking for struct lrm_ops.fail_rsc... yes
> checking for ll_cluster_new in -lhbclient... no
> configure: error: in `/Users/admin/1.1':
> configure: error: Unable to support Heartbeat: client libraries not found
> See `config.log' for more details
>
>
>
> The only error I've had during building was in glue that logd can't been
> build.
> Is this the missing part that prevents Pacemaker to build?
>
> cc1: warnings being treated as errors
> ha_logd.c: In function ‘logd_make_daemon’:
> ha_logd.c:527: warning: ‘daemon’ is deprecated (declared at
> /usr/include/stdlib.h:292)
> make[1]: *** [ha_logd.o] Error 1
> make: *** [all-recursive] Error 1
 It might be necessary to configure with --disable-fatal-warnings (or
 something of that kind)
>>> Sorry, doesn't work, either.
>>> The build process finished without the previous error, but now
>>> "shelfuncs" is now missing.
>>>
>>>    /etc/mach_init.d/heartbeat start
>>>    /etc/mach_init.d/heartbeat: line 53: /opt/local/etc/ha.d/shellfuncs:
>>> No such file or directory
>>>
>>> The file isn't present in the system, hence it wasn't build?
>> There should be a similarly named file in the resource-agents package.
>> Evidently they changed the name and forgot to update heartbeat.
>
> my fault, resource-agent haven't been installed yet.
> and I'm running again in some new building errors ;(
>
>    In file included from /opt/local/include/libnet.h:81,
>                 from send_arp.libnet.c:44:
>    /opt/local/include/./libnet/libnet-functions.h:85: warning: function
> declaration isn’t a prototype
>    In file included from send_arp.libnet.c:44:
>    /opt/local/include/libnet.h:87:2: error: #error "byte order has not
> been specified, you'll need to #define either LIBNET_LIL_ENDIAN or
> LIBNET_BIG_ENDIAN.  See the documentation     regarding the
> libnet-config script."
>    send_arp.libnet.c: In function ‘main’:
>    send_arp.libnet.c:206: warning: comparison is always false due to
> limited range of data type
>    make[3]: *** [send_arp-send_arp.libnet.o] Error 1
>    make[2]: *** [all-recursive] Error 1
>    make[1]: *** [all-recursive] Error 1
>    make: *** [all] Error 2
>
> libnet is installed via macports, the LIBNET_LIL_ENDIAN is defined in
> /opt/local/bin/libnet-config. what's wrong now?

The error message seems reasonably helpful.  Did you read the
documentation it refers to?

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Proper way to migrate multistate resource?

2012-02-07 Thread Chet Burgess
OK that makes sense. I will give that a try. Thank you.

--
Chet Burgess
c...@liquidreality.org



On Feb 7, 2012, at 5:08 , Lars Ellenberg wrote:

> On Tue, Feb 07, 2012 at 02:03:32PM +0100, Michael Schwartzkopff wrote:
>>> On Mon, Feb 06, 2012 at 04:48:26PM -0800, Chet Burgess wrote:
 Greetings,
 
 I'm some what new to pacemaker and have been playing around with a
 number of configurations in a lab. Most recently I've been testing a
 multistate resource using the ofc:pacemaker:Stateful example RA.
 
 While I've gotten the agent to work and notice that if I shutdown or
 kill a node the resources migrate I can't seem to figure out the
 proper way to migrate the resource between nodes when they are both
 up.
 
 For regular resources I've used "crm resource migrate " without
 issue. However when I try this with a multistate resource it doesn't
 seem to work. When I run the command it just puts the slave node into
 a stopped state. If I try and tell it to migrate specifically to the
 slave node it claims to already be running their (which I suppose in a
 sense it is).
>>> 
>>> the crm shell does not support roles for the "move" or "migrate" command
>>> (yet; maybe in newer versions. Dejan?).
>>> 
>>> What you need to do is set a location constraint on the role.
>>> * force master role off from one node:
>>> 
>>> location you-name-it resource-id \
>>> rule $role=Master -inf: \
>>> #uname eq node-where-it-should-be-slave
>>> 
>>> * or force master role off from all but one node,
>>>   note the double negation in this one:
>>> 
>>> location you-name-it resource-id \
>>> rule $role=Master -inf: \
>>> #uname ne node-where-it-should-be-master
>> 
>> These constraints would prevent the MS resource to run in Master state even 
>> on 
>> that node. Even in case the preferred node is not available any more. This 
>> might be not what Chet wanted.
> 
> Well, it is just what crm resource migrate does, otherwise.
> 
> After migration, you obviously need to "unmigrate",
> i.e. delete that constraint again.
> 
> 
> -- 
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
> 
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Error building Pacemaker on OS X Lion

2012-02-07 Thread i...@sdips.de
Am 06.02.12 22:00, schrieb Andrew Beekhof:
> On Tue, Feb 7, 2012 at 1:05 AM, i...@sdips.de  wrote:
>> Am 29.01.12 23:03, schrieb Andrew Beekhof:
>>> On Thu, Jan 26, 2012 at 10:25 PM, i...@sdips.de  wrote:
 I've started all over with Macports. After some struggle with gettext,
 the only working configure is working with --prefix=/opt/local.
 But I stuck at the same issue to build pacemaker.

 ./configure --prefix=/opt/local --with-initdir=/private/etc/mach_init.d
 --with-heartbeat
 .
 .
 .
 checking for struct lrm_ops.fail_rsc... yes
 checking for ll_cluster_new in -lhbclient... no
 configure: error: in `/Users/admin/1.1':
 configure: error: Unable to support Heartbeat: client libraries not found
 See `config.log' for more details



 The only error I've had during building was in glue that logd can't been
 build.
 Is this the missing part that prevents Pacemaker to build?

 cc1: warnings being treated as errors
 ha_logd.c: In function ‘logd_make_daemon’:
 ha_logd.c:527: warning: ‘daemon’ is deprecated (declared at
 /usr/include/stdlib.h:292)
 make[1]: *** [ha_logd.o] Error 1
 make: *** [all-recursive] Error 1
>>> It might be necessary to configure with --disable-fatal-warnings (or
>>> something of that kind)
>> Sorry, doesn't work, either.
>> The build process finished without the previous error, but now
>> "shelfuncs" is now missing.
>>
>>/etc/mach_init.d/heartbeat start
>>/etc/mach_init.d/heartbeat: line 53: /opt/local/etc/ha.d/shellfuncs:
>> No such file or directory
>>
>> The file isn't present in the system, hence it wasn't build?
> There should be a similarly named file in the resource-agents package.
> Evidently they changed the name and forgot to update heartbeat.

my fault, resource-agent haven't been installed yet.
and I'm running again in some new building errors ;(

In file included from /opt/local/include/libnet.h:81,
 from send_arp.libnet.c:44:
/opt/local/include/./libnet/libnet-functions.h:85: warning: function
declaration isn’t a prototype
In file included from send_arp.libnet.c:44:
/opt/local/include/libnet.h:87:2: error: #error "byte order has not
been specified, you'll need to #define either LIBNET_LIL_ENDIAN or
LIBNET_BIG_ENDIAN.  See the documentation regarding the
libnet-config script."
send_arp.libnet.c: In function ‘main’:
send_arp.libnet.c:206: warning: comparison is always false due to
limited range of data type
make[3]: *** [send_arp-send_arp.libnet.o] Error 1
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

libnet is installed via macports, the LIBNET_LIL_ENDIAN is defined in
/opt/local/bin/libnet-config. what's wrong now?
>> In general, is it possible to get pacemaker running under OS X?
>> Otherwise I'll stop investing more time in something that wasn't tested.
> Its been a long time since I ran heartbeat anywhere, let alone on OSX.
> It did work at one point though (and hasn't changed much since), you
> might just need to tweak some init scripts
>
 I appreciate any help.



 Am 25.01.12 01:27, schrieb Andrew Beekhof:
> Have you been following this?
>   http://www.clusterlabs.org/wiki/Install#Darwin.2FMacOS_X
>
> On Tue, Jan 24, 2012 at 9:58 PM, i...@sdips.de  wrote:
>> Hi all,
>>
>> after a clean install of cluster-glue and heartbeat, I have a problem to
>> build Pacemaker 1.1.6 under OS X Lion.
>>
>> With the ./configure --prefix=/usr/local
>> --with-initdir=/private/etc/mach_init.d --with-heartbeat
>> --libexecdir=/usr/libexec/ I run into the following issue:
>>
>> configure: error: in `/Users/admin/1.1':
>> configure: error: Unable to support Heartbeat: client libraries not found
>> See `config.log' for more details
>>
>>
>> the "config.log" shows this:
>>
>> configure:4363: gcc -c conftest.c -o conftest2.o >&5
>> configure:4367: $? = 0
>> configure:4373: gcc -c conftest.c -o conftest2.o >&5
>> configure:4377: $? = 0
>> configure:4388: cc -c conftest.c >&5
>> configure:4392: $? = 0
>> configure:4400: cc -c conftest.c -o conftest2.o >&5
>> configure:4404: $? = 0
>> configure:4410: cc -c conftest.c -o conftest2.o >&5
>> configure:4414: $? = 0
>> configure:4432: result: yes
>> configure:4461: checking for gcc option to accept ISO C99
>> configure:4610: gcc  -c -g -O2  conftest.c >&5
>> conftest.c:62: error: expected ';', ',' or ')' before 'text'
>> conftest.c: In function 'main':
>> conftest.c:116: error: nested functions are disabled, use
>> -fnested-functions to re-enable
>> conftest.c:116: error: expected '=', ',', ';', 'asm' or '__attribute__'
>> before 'newvar'
>> conftest.c:116: error: 'newvar' undeclared (first use in this function)
>> conftest.c:116: error: (Each un

Re: [Pacemaker] Proper way to migrate multistate resource?

2012-02-07 Thread Lars Ellenberg
On Tue, Feb 07, 2012 at 02:03:32PM +0100, Michael Schwartzkopff wrote:
> > On Mon, Feb 06, 2012 at 04:48:26PM -0800, Chet Burgess wrote:
> > > Greetings,
> > > 
> > > I'm some what new to pacemaker and have been playing around with a
> > > number of configurations in a lab. Most recently I've been testing a
> > > multistate resource using the ofc:pacemaker:Stateful example RA.
> > > 
> > > While I've gotten the agent to work and notice that if I shutdown or
> > > kill a node the resources migrate I can't seem to figure out the
> > > proper way to migrate the resource between nodes when they are both
> > > up.
> > > 
> > > For regular resources I've used "crm resource migrate " without
> > > issue. However when I try this with a multistate resource it doesn't
> > > seem to work. When I run the command it just puts the slave node into
> > > a stopped state. If I try and tell it to migrate specifically to the
> > > slave node it claims to already be running their (which I suppose in a
> > > sense it is).
> > 
> > the crm shell does not support roles for the "move" or "migrate" command
> > (yet; maybe in newer versions. Dejan?).
> > 
> > What you need to do is set a location constraint on the role.
> >  * force master role off from one node:
> > 
> > location you-name-it resource-id \
> > rule $role=Master -inf: \
> > #uname eq node-where-it-should-be-slave
> > 
> >  * or force master role off from all but one node,
> >note the double negation in this one:
> > 
> > location you-name-it resource-id \
> > rule $role=Master -inf: \
> > #uname ne node-where-it-should-be-master
> 
> These constraints would prevent the MS resource to run in Master state even 
> on 
> that node. Even in case the preferred node is not available any more. This 
> might be not what Chet wanted.

Well, it is just what crm resource migrate does, otherwise.

After migration, you obviously need to "unmigrate",
i.e. delete that constraint again.


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Proper way to migrate multistate resource?

2012-02-07 Thread Michael Schwartzkopff
> On Mon, Feb 06, 2012 at 04:48:26PM -0800, Chet Burgess wrote:
> > Greetings,
> > 
> > I'm some what new to pacemaker and have been playing around with a
> > number of configurations in a lab. Most recently I've been testing a
> > multistate resource using the ofc:pacemaker:Stateful example RA.
> > 
> > While I've gotten the agent to work and notice that if I shutdown or
> > kill a node the resources migrate I can't seem to figure out the
> > proper way to migrate the resource between nodes when they are both
> > up.
> > 
> > For regular resources I've used "crm resource migrate " without
> > issue. However when I try this with a multistate resource it doesn't
> > seem to work. When I run the command it just puts the slave node into
> > a stopped state. If I try and tell it to migrate specifically to the
> > slave node it claims to already be running their (which I suppose in a
> > sense it is).
> 
> the crm shell does not support roles for the "move" or "migrate" command
> (yet; maybe in newer versions. Dejan?).
> 
> What you need to do is set a location constraint on the role.
>  * force master role off from one node:
> 
>   location you-name-it resource-id \
>   rule $role=Master -inf: \
>   #uname eq node-where-it-should-be-slave
> 
>  * or force master role off from all but one node,
>note the double negation in this one:
> 
>   location you-name-it resource-id \
>   rule $role=Master -inf: \
>   #uname ne node-where-it-should-be-master

These constraints would prevent the MS resource to run in Master state even on 
that node. Even in case the preferred node is not available any more. This 
might be not what Chet wanted.

Perhaps it would be easier if you give the resource some points if it runs in 
Master state on the preferred node:

location name-of-your-constraint resource-id \
  rule $role=Master 100: \
#uname eq name-of-the-preferred-node

-- 
Dr. Michael Schwartzkopff
Guardinistr. 63
81375 München

Tel: (0163) 172 50 98
Fax: (089) 620 304 13


signature.asc
Description: This is a digitally signed message part.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Proper way to migrate multistate resource?

2012-02-07 Thread Lars Ellenberg
On Mon, Feb 06, 2012 at 04:48:26PM -0800, Chet Burgess wrote:
> Greetings,
> 
> I'm some what new to pacemaker and have been playing around with a
> number of configurations in a lab. Most recently I've been testing a
> multistate resource using the ofc:pacemaker:Stateful example RA.
> 
> While I've gotten the agent to work and notice that if I shutdown or
> kill a node the resources migrate I can't seem to figure out the
> proper way to migrate the resource between nodes when they are both
> up. 
> 
> For regular resources I've used "crm resource migrate " without
> issue. However when I try this with a multistate resource it doesn't
> seem to work. When I run the command it just puts the slave node into
> a stopped state. If I try and tell it to migrate specifically to the
> slave node it claims to already be running their (which I suppose in a
> sense it is).

the crm shell does not support roles for the "move" or "migrate" command
(yet; maybe in newer versions. Dejan?).

What you need to do is set a location constraint on the role.
 * force master role off from one node:

location you-name-it resource-id \
rule $role=Master -inf: \
#uname eq node-where-it-should-be-slave

 * or force master role off from all but one node,
   note the double negation in this one:

location you-name-it resource-id \
rule $role=Master -inf: \
#uname ne node-where-it-should-be-master

Cheers,

Lars

---
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

> The only method I've found to safely and reliable migrate a multistate
> resource from one node to another is I think it has something to do
> with the resource constraints I used to prefer a particular node, but
> I'm not entirely sure how the constraints and the master/slave state
> updating stuff works.
> 
> Am I using the wrong tool to migrate a multistate resource or is my
> configuration wrong in some way?  Any input greatly appreciated. 
> Thank you.
> 
> 
> Configuration:
> r...@tst3.local1.mc:/home/cfb$ crm configure show
> node tst3.local1.mc.metacloud.com
> node tst4.local1.mc.metacloud.com
> primitive stateful-test ocf:pacemaker:Stateful \
>   op monitor interval="30s" role="Slave" \
>   op monitor interval="31s" role="Master"
> ms ms-test stateful-test \
>   meta clone-node-max="1" notify="false" master-max="1" 
> master-node-max="1" target-role="Master"
> location ms-test_constraint_1 ms-test 25: tst3.local1.mc.metacloud.com
> location ms-test_constraint_2 ms-test 20: tst4.local1.mc.metacloud.com
> property $id="cib-bootstrap-options" \
>   cluster-infrastructure="openais" \
>   dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
>   last-lrm-refresh="1325273678" \
>   expected-quorum-votes="2" \
>   no-quorum-policy="ignore" \
>   stonith-enabled="false"
> rsc_defaults $id="rsc-options" \
>   resource-stickiness="100"
> 
> --
> Chet Burgess
> c...@liquidreality.org
> 
> 
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Where is MAXMSG defined?

2012-02-07 Thread Lars Ellenberg
On Tue, Feb 07, 2012 at 01:13:19PM +0200, Adrian Fita wrote:
> Hi.
> 
> I can't find any trace of "define MAXMSG" in either pacemaker,
> corosync, heartbeat's source code. I tried with "grep -R 'MAXMSG' *"
> and nothing. Where is it defined?!

If you are asking about what I think you do,
then that would be in glue,
include/clplumbing/ipc.h

But be careful, when fiddling with it.

What are you trying to solve, btw?

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Where is MAXMSG defined?

2012-02-07 Thread Adrian Fita
Hi.

I can't find any trace of "define MAXMSG" in either pacemaker,
corosync, heartbeat's source code. I tried with "grep -R 'MAXMSG' *"
and nothing. Where is it defined?!

Thanks.
--
Fita Adrian

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Stopping heartbeat service on one node lead to restart of resources on other node in cluster

2012-02-07 Thread neha chatrath
Hello,
I have a 2 node cluster with following configuration:
**node $id="9e53a111-0dca-496c-9461-a38f3eec4d0e" mcg2 \
   attributes standby="off"
node $id="a90981f8-d993-4411-89f4-aff7156136d2" mcg1 \
   attributes standby="off"
primitive ClusterIP ocf:mcg:MCG_VIPaddr_RA \
   params ip="192.168.115.50" cidr_netmask="255.255.255.0"
nic="bond1.115:1" \
   op monitor interval="40" timeout="20" \
   meta target-role="Started"
primitive EMS ocf:heartbeat:jboss \
   params jboss_home="/opt/jboss-5.1.0.GA"
java_home="/opt/jdk1.6.0_29/" \
   op start interval="0" timeout="240" \
   op stop interval="0" timeout="240" \
   op monitor interval="30s" timeout="40s"
primitive NDB_MGMT ocf:mcg:NDB_MGM_RA \
   op monitor interval="120" timeout="120"
primitive NDB_VIP ocf:heartbeat:IPaddr2 \
   params ip="192.168.117.50" cidr_netmask="255.255.255.255"
nic="bond0.117:1" \*
  * op monitor interval="30" timeout="10"
primitive Rmgr ocf:mcg:RM_RA \
   op monitor interval="60" role="Master" timeout="30"
on-fail="restart" \
   op monitor interval="40" role="Slave" timeout="40" on-fail="restart"
primitive Tmgr ocf:mcg:TM_RA \
   op monitor interval="60" role="Master" timeout="30"
on-fail="restart" \
   op monitor interval="40" role="Slave" timeout="40" on-fail="restart"
primitive mysql ocf:mcg:MYSQLD_RA \
   op monitor interval="180" timeout="200"
primitive ndbd ocf:mcg:NDBD_RA \
   op monitor interval="120" timeout="120"
primitive pimd ocf:mcg:PIMD_RA \
   op monitor interval="60" role="Master" timeout="30"
on-fail="restart" \
   op monitor interval="40" role="Slave" timeout="40" on-fail="restart"
ms ms_Rmgr Rmgr \
   meta master-max="1" master-max-node="1" clone-max="2"
clone-node-max="1" interleave="true" notify="true"
ms ms_Tmgr Tmgr \
   meta master-max="1" master-max-node="1" clone-max="2"
clone-node-max="1" interleave="true" notify="true"
ms ms_pimd pimd \
   meta master-max="1" master-max-node="1" clone-max="2"
clone-node-max="1" interleave="true" notify="true"
clone EMS_CLONE EMS \
   meta globally-unique="false" clone-max="2" clone-node-max="1"
target-role="Started"
clone mysqld_clone mysql \
   meta globally-unique="false" clone-max="2" clone-node-max="1"
clone ndbdclone ndbd \
   meta globally-unique="false" clone-max="2" clone-node-max="1"
target-role="Started"
colocation ip_with_Pimd inf: ClusterIP ms_pimd:Master
colocation ip_with_RM inf: ClusterIP ms_Rmgr:Master
colocation ip_with_TM inf: ClusterIP ms_Tmgr:Master
colocation ndb_vip-with-ndb_mgm inf: NDB_MGMT NDB_VIP
order RM-after-mysqld inf: mysqld_clone ms_Rmgr
order TM-after-RM inf: ms_Rmgr ms_Tmgr
order ip-after-pimd inf: ms_pimd ClusterIP
order mysqld-after-ndbd inf: ndbdclone mysqld_clone
order pimd-after-TM inf: ms_Tmgr ms_pimd
property $id="cib-bootstrap-options" \
   dc-version="1.0.11-55a5f5be61c367cbd676c2f0ec4f1c62b38223d7" \
   cluster-infrastructure="Heartbeat" \
   no-quorum-policy="ignore" \
   stonith-enabled="false"
rsc_defaults $id="rsc-options" \
   migration_threshold="3" \
   resource-stickiness="100"*

*With both nodes up and running, if heartbeat service is stopped on any of
the nodes, following resources are restarted on the other node:
mysqld_clone, ms_Rmgr, ms_Tmgr, ms_pimd, ClusterIP

>From the Heartbeat debug logs, it seems policy engine is initiating a
restart operation for the above resources but the reason for the same is
not clear.

Following are some excerpts from the logs:

"*Feb 07 11:06:31 MCG1 pengine: [20534]: info: determine_online_status:
Node mcg2 is shutting down
Feb 07 11:06:31 MCG1 pengine: [20534]: info: determine_online_status: Node
mcg1 is online
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: clone_print:  Master/Slave
Set: ms_Rmgr
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource
Rmgr:0 active on mcg1
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_ac**tive: Resource
Rmgr:0 active on mcg1
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active:
Resource**Rmgr:1 active on mcg2
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource
Rmgr:1 active on mcg2
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: short_print:  Masters: [
mcg1 ]
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: short_print:  Slaves: [
mcg2 ]
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: clone_print:  Master/Slave
Set: ms_Tmgr
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource
Tmgr:0 active on mcg1
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource
Tmgr:0 active on mcg1
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource
Tmgr:1 active on mcg2
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource
Tmgr:1 active on mcg2
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: short_print:  Masters: [
mcg1 ]
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: short_print:  Slaves: [
mcg2 ]
Feb 07 11:06:31 MCG1 pengin