Re: [Pacemaker] Pacemaker cluster with different operating systems

2013-03-06 Thread Osman Findik
Hi Andrew,
Yes, I added "pcmk" file under "/etc/corosync/service.d/" folder.
Since you think the setup should work, here are the details of the problem.
Could you please check for any problem? I hope I have given enough information 
about the setup and problem.

[root@pmidea1 ~]# more /etc/redhat-release 
Red Hat Enterprise Linux Server release 6.2 (Santiago)
[root@pmidea1 ~]# rpm -qa|egrep "corosync|pacemaker"
pacemaker-libs-1.1.8-4.el6.x86_64
corosync-1.4.3-26.2.x86_64
pacemaker-cluster-libs-1.1.8-4.el6.x86_64
corosynclib-1.4.3-26.2.x86_64
pacemaker-cli-1.1.8-4.el6.x86_64
pacemaker-1.1.8-4.el6.x86_64
[root@pmidea1 ~]# more /etc/corosync/service.d/pcmk 
service {
  name: pacemaker
  ver: 1
}

[root@pmidea2 ~]# more /etc/redhat-release 
Red Hat Enterprise Linux Server release 6.2 (Santiago)
[root@pmidea2 ~]# rpm -qa|egrep "corosync|pacemaker"
pacemaker-libs-1.1.8-4.el6.x86_64
corosync-1.4.3-26.2.x86_64
pacemaker-cluster-libs-1.1.8-4.el6.x86_64
corosynclib-1.4.3-26.2.x86_64
pacemaker-cli-1.1.8-4.el6.x86_64
pacemaker-1.1.8-4.el6.x86_64
[root@pmidea2 ~]# more /etc/corosync/service.d/pcmk 
service {
  name: pacemaker
  ver: 1
}

[root@pmosidea ~]# more /etc/redhat-release 
Red Hat Enterprise Linux Server release 5.5 (Tikanga)
[root@pmosidea ~]# rpm -qa|egrep "corosync|pacemaker"
pacemaker-cluster-libs-1.1.8-2.el5
corosynclib-1.4.1-7.el5.1
pacemaker-1.1.8-2.el5
corosync-1.4.1-7.el5.1
pacemaker-cli-1.1.8-2.el5
pacemaker-libs-1.1.8-2.el5
[root@pmosidea ~]# more /etc/corosync/service.d/pcmk 
service {
  name: pacemaker
  ver: 1
}

Corosync membership:

[root@pmidea1 ~]# corosync-objctl | grep member
totem.interface.member.memberaddr=10.34.38.46
totem.interface.member.memberaddr=10.34.38.47
totem.interface.member.memberaddr=10.34.38.48
runtime.totem.pg.mrp.srp.members.791028234.ip=r(0) ip(10.34.38.47) 
runtime.totem.pg.mrp.srp.members.791028234.join_count=1
runtime.totem.pg.mrp.srp.members.791028234.status=joined
runtime.totem.pg.mrp.srp.members.774251018.ip=r(0) ip(10.34.38.46) 
runtime.totem.pg.mrp.srp.members.774251018.join_count=1
runtime.totem.pg.mrp.srp.members.774251018.status=joined
runtime.totem.pg.mrp.srp.members.807805450.ip=r(0) ip(10.34.38.48) 
runtime.totem.pg.mrp.srp.members.807805450.join_count=1
runtime.totem.pg.mrp.srp.members.807805450.status=joined

[root@pmosidea ~]# corosync-objctl | grep member
totem.interface.member.memberaddr=10.34.38.46
totem.interface.member.memberaddr=10.34.38.47
totem.interface.member.memberaddr=10.34.38.48
runtime.totem.pg.mrp.srp.members.774251018.ip=r(0) ip(10.34.38.46) 
runtime.totem.pg.mrp.srp.members.774251018.join_count=1
runtime.totem.pg.mrp.srp.members.774251018.status=joined
runtime.totem.pg.mrp.srp.members.791028234.ip=r(0) ip(10.34.38.47) 
runtime.totem.pg.mrp.srp.members.791028234.join_count=1
runtime.totem.pg.mrp.srp.members.791028234.status=joined
runtime.totem.pg.mrp.srp.members.807805450.ip=r(0) ip(10.34.38.48) 
runtime.totem.pg.mrp.srp.members.807805450.join_count=1
runtime.totem.pg.mrp.srp.members.807805450.status=joined

crm_mon outputs:

>From pmidea1:
-
Last updated: Thu Mar  7 08:53:25 2013
Last change: Thu Mar  7 02:47:51 2013 via crmd on pmidea2
Stack: openais
Current DC: pmidea2 - partition with quorum
Version: 1.1.8-4.el6-394e906
3 Nodes configured, 2 expected votes
0 Resources configured.


Online: [ pmidea1 pmidea2 ]
OFFLINE: [ pmosidea ]

>From pmosidea:
--
Last updated: Thu Mar  7 08:53:21 2013
Last change: Thu Mar  7 03:16:50 2013 via crmd on pmosidea
Stack: openais
Current DC: pmosidea - partition WITHOUT quorum
Version: 1.1.8-2.el5-394e906
3 Nodes configured, 2 expected votes
0 Resources configured.


Node pmidea1: pending
Node pmidea2: pending
Online: [ pmosidea ]





-Original Message-
From: pacemaker-requ...@oss.clusterlabs.org 
[mailto:pacemaker-requ...@oss.clusterlabs.org] 
Sent: 07 Mart 2013 Perşembe 08:02
To: pacemaker@oss.clusterlabs.org
Subject: Pacemaker Digest, Vol 64, Issue 30

--

Message: 2
Date: Thu, 7 Mar 2013 16:36:17 +1100
From: Andrew Beekhof 
To: The Pacemaker cluster resource manager

Subject: Re: [Pacemaker] Pacemaker cluster with different operating
systems
Message-ID:

Content-Type: text/plain; charset=ISO-8859-1

On Thu, Mar 7, 2013 at 4:09 PM, Osman Findik  wrote:
> Hi all,
> We are using pacemaker with RHEL 6.2 successfully to manage pair of 
> MySQL databases. Pacemaker is coming from Red Hat High Availability Add-on. 
> Its version is 1.1.6 Our need is to add an observer to this cluster but our 
> existing servers are all RHEL 5.x servers. We could not locate same version 
> of pacemaker in clusterlabs repo.
> So we tried to install provided rpms from clusterlabs repo to RHEL 5.5 and 
> RHEL 6.2 servers.
> Provided rpm version for RHEL 5 is pacemaker 1.1.8.2.
> Provided rpm version for RHEL 6 is pacemaker 1.1.8.4.
>
> In this setup although servers are members of th

Re: [Pacemaker] [Question]About "sequential" designation of resource_set.

2013-03-06 Thread renayama19661014
Hi Andrew,

> > You use the resource sets _instead_ of a group.
> > If you want group.ordered=false, then use a colocation set (with
> > sequential=true).

In "colocation", I used "resource_set".
However, a result did not include the change.

Will this result be a mistake of my setting?

Case 1) sequential=false
(snip)

  

  
  

  

(sip)
[root@rh63-heartbeat2 ~]# grep "Initiating action" /var/log/ha-log
Mar  8 00:20:52 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: Initiating 
action 2: probe_complete probe_complete on rh63-heartbeat1 - no waiting
Mar  8 00:20:52 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: Initiating 
action 3: probe_complete probe_complete on rh63-heartbeat2 (local) - no waiting
Mar  8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: Initiating 
action 4: monitor vip-master_monitor_0 on rh63-heartbeat1
Mar  8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: Initiating 
action 7: monitor vip-master_monitor_0 on rh63-heartbeat2 (local)
Mar  8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: Initiating 
action 5: monitor vip-rep_monitor_0 on rh63-heartbeat1
Mar  8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: Initiating 
action 8: monitor vip-rep_monitor_0 on rh63-heartbeat2 (local)
Mar  8 00:20:55 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: Initiating 
action 6: probe_complete probe_complete on rh63-heartbeat2 (local) - no waiting
Mar  8 00:20:56 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: Initiating 
action 3: probe_complete probe_complete on rh63-heartbeat1 - no waiting
Mar  8 00:20:56 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: Initiating 
action 5: start vip-master_start_0 on rh63-heartbeat1
Mar  8 00:20:58 rh63-heartbeat2 crmd: [22372]: info: te_rsc_command: Initiating 
action 1: stop vip-master_stop_0 on rh63-heartbeat1


Case 2) sequential=true
(snip)

  

  
  

  

(snip)
[root@rh63-heartbeat2 ~]# grep "Initiating action" /var/log/ha-log
Mar  7 23:54:44 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: Initiating 
action 2: probe_complete probe_complete on rh63-heartbeat1 - no waiting
Mar  7 23:54:44 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: Initiating 
action 3: probe_complete probe_complete on rh63-heartbeat2 (local) - no waiting
Mar  7 23:54:48 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: Initiating 
action 4: monitor vip-master_monitor_0 on rh63-heartbeat1
Mar  7 23:54:48 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: Initiating 
action 7: monitor vip-master_monitor_0 on rh63-heartbeat2 (local)
Mar  7 23:54:48 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: Initiating 
action 5: monitor vip-rep_monitor_0 on rh63-heartbeat1
Mar  7 23:54:48 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: Initiating 
action 8: monitor vip-rep_monitor_0 on rh63-heartbeat2 (local)
Mar  7 23:54:48 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: Initiating 
action 6: probe_complete probe_complete on rh63-heartbeat2 (local) - no waiting
Mar  7 23:54:49 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: Initiating 
action 3: probe_complete probe_complete on rh63-heartbeat1 - no waiting
Mar  7 23:54:49 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: Initiating 
action 5: start vip-master_start_0 on rh63-heartbeat1
Mar  7 23:54:51 rh63-heartbeat2 crmd: [4]: info: te_rsc_command: Initiating 
action 1: stop vip-master_stop_0 on rh63-heartbeat1


Best Regards,
Hideo Yamauchi.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [RFC] Automatic nodelist synchronization between corosync and pacemaker

2013-03-06 Thread Vladislav Bogdanov
07.03.2013 03:37, Andrew Beekhof wrote:
> On Thu, Mar 7, 2013 at 2:41 AM, Vladislav Bogdanov  
> wrote:
>> 06.03.2013 08:35, Andrew Beekhof wrote:
> 
>>> So basically, you want to be able to add/remove nodes from nodelist.*
>>> in corosync.conf and have pacemaker automatically add/remove them from
>>> itself?
>>
>> Not corosync.conf, but cmap which is initially (partially) filled with
>> values from corosync.conf.
>>
>>>
>>> If corosync.conf gets out of sync (admin error or maybe a node was
>>> down when you updated last) they might well get added back - I assume
>>> you're ok with that?
>>> Because there's no real way to know the difference between "added
>>> back" and "not removed from last time".
>>
>> Sorry, can you please reword?
> 
> When node-A comes up with "node-X" that no-one else has, the cluster
> has no way to know if node-X was just added, or if the admin forgot to
> remove it on node-A.

Exactly that is not problem if node does not appear in CIB until it is
seen online. If node-A comes up, then it is just booted, and that means
that it didn't see node-X online yet (if it is not actually online of
course). And then node-X is not added to CIB.

> 
>>> Or are you planning to never update the on-disk corosync.conf and only
>>> modify the in-memory nodelist?
>>
>> That depends on the actual use case I think.
>>
>> Hm. Interesting, how corosync behave when new dynamic nodes are added to
>> cluster... I mean following: we have static corosync.conf with nodelist
>> containing f.e. 3 entries, then we add fourth entry via cmap and boot
>> fourth node. What should be in corosync.conf of that node?
> 
> I don't know actually.  Try it and see if it works without the local
> node being defined?
> 
>> I believe in
>> wont work without that _its_ fourth entry. Ugh. If so, then no fully
>> dynamic "elastic" cluster which I was dreaming of is still possible
>> because out-of-the-box when using dynamic nodelist.
>>
>> The only way to have this I see is to have static nodelist in
>> corosync.conf with all possible nodes predefined. And never edit it in
>> cmap. So, my original point
>> * Remove nodes from CIB when they are removed from a nodelist.
>> does not fit.
>>
>> By elastic I mean what was discussed on corosync list when Fabio started
>> with votequorum design and what then appeared in votequorum manpage:
>> ===
>> allow_downscale: 1
>>
>> Enables allow downscale (AD) feature (default: 0).
>>
>> The general behaviour of votequorum is to never decrease expected votes
>> or quorum.
>>
>> When  AD  is  enabled,  both expected votes and quorum are recalculated
>> when a node leaves the cluster in a clean state (normal corosync  shut-
>> down process) down to configured expected_votes.
> 
> But thats very different to removing the node completely.
> You still want to know its in a sane state.

Isn't it enough to trust corosync here?
Of course if it supplies some event that "node X leaved cluster in a
clean state and we lowered expected_votes and quorum.

Clean corosync shutdown means that either 'no more corosync clients
remain and it was safe to shutdown' or 'corosync has a bug'.
Pacemaker is corosync client, and corosync should not stop in a clean
state if pacemaker is still running there.

And 'pacemaker is not running on node X' means that pacemaker instances
on other nodes accepted that. Otherwise node is scheduled to stonith and
there is no 'clean' shutdown.

Am I correct here?

> 
>> Example use case:
>>
>> 1) N node cluster (where N is any value higher than 3)
>> 2) expected_votes set to 3 in corosync.conf
>> 3) only 3 nodes are running
>> 4) admin requires to increase processing power and adds 10 nodes
>> 5) internal expected_votes is automatically set to 13
>> 6) minimum expected_votes is 3 (from configuration)
>> - up to this point this is standard votequorum behavior -
>> 7) once the work is done, admin wants to remove nodes from the cluster
>> 8) using an ordered shutdown the admin can reduce the cluster size
>>automatically back to 3, but not below 3, where normal quorum
>>operation will work as usual.
>> =
>>
>> What I would expect from pacemaker, is to automatically remove nodes
>> down to 3 at step 8 (just follow quorum) if AD is enabled AND pacemaker
>> is instructed to follow that (with some other cmap switch). And also to
>> reduce number of allocated clone instances. Sure, all nodes must have
>> equal number of votes (1).
>>
>> Is it ok for you?
> 
> Not really.
> We simply don't have enough information to do the removal.
> All we get is "node gone", we have to do a fair bit of work to
> calculate if it was clean at the time or not (and clean to corosync
> doesn't always imply clean to pacemaker).

Please see above.
There is always (at least with mcp model) some time frame between
pacemaker stop and corosync stop events. And pacemaker should accept
"node leave" after first one (doesn't it mark node as 'pending' in that
state?). And second event (corosync stop) 

Re: [Pacemaker] [Question]About "sequential" designation of resource_set.

2013-03-06 Thread renayama19661014
Hi Andrew,

Thank you for comment.

It was colocation.
I make modifications and confirm movement.

Many Thanks!
Hideo Yamauchi.

--- On Thu, 2013/3/7, Andrew Beekhof  wrote:

> Oh!
> 
> You use the resource sets _instead_ of a group.
> If you want group.ordered=false, then use a colocation set (with
> sequential=true).
> If you want group.colocated=false, then use an ordering set (with
> sequential=true).
> 
> Hope that helps :)
> 
> On Thu, Mar 7, 2013 at 3:16 PM,   wrote:
> > Hi Andrew,
> >
> > Thank you for comments.
> >
> >> > Case 3) group resource_set sequential=false
> >> >  * Start of vip-rep waits for start of vip-master and is published.
> >> >  * I expected a result same as the first case.
> >>
> >> Me too. Have you got the relevant PE file?
> >
> > I attached the thing which just collected hb_report.
> >
> > Best Regards,
> > Hideo Yamauchi.
> >
> >
> >
> > --- On Thu, 2013/3/7, Andrew Beekhof  wrote:
> >
> >> On Thu, Mar 7, 2013 at 1:27 PM,   wrote:
> >> > Hi Andrew,
> >> >
> >> > I tried "resource_set  sequential" designation.
> >> >  *  http://www.gossamer-threads.com/lists/linuxha/pacemaker/84578
> >> >
> >> > I caused an error in start of the vip-master resource and confirmed 
> >> > movement.
> >> >
> >> > (snip)
> >> >       
> >> >          >> >type="Dummy2">
> >> >           
> >> >              >> >on-fail="restart" timeout="60s"/>
> >> >              >> >name="monitor" on-fail="restart" timeout="60s"/>
> >> >              >> >on-fail="block" timeout="60s"/>
> >> >           
> >> >         
> >> >          >> >type="Dummy">
> >> >           
> >> >              >> >on-fail="stop" timeout="60s"/>
> >> >              >> >on-fail="restart" timeout="60s"/>
> >> >              >> >on-fail="block" timeout="60s"/>
> >> >           
> >> >         
> >> >       
> >> > (snip)
> >> >
> >> > By the ordered designation of the group resource, the difference that I 
> >> > expected appeared.( Case 1 and Case 2)
> >> > However, by the "sequential" designation, the difference that I expected 
> >> > did not appear.(Case 3 and Case 4)
> >> >
> >> > (snip)
> >> >     
> >> >         
> >> >                  >> >id="test-order-resource_set">  ---> or "false"
> >> >                         
> >> >                         
> >> >                 
> >> >         
> >> >     
> >> > (snip)
> >> >
> >> >
> >> > Case 1) group meta_attribute ordered=false
> >> >  * Start of vip-rep is published without waiting for start of vip-master.
> >> >
> >> > [root@rh63-heartbeat2 ~]# grep "Initiating action" /var/log/ha-log
> >> > Mar  7 19:40:50 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
> >> > Initiating action 2: probe_complete probe_complete on rh63-heartbeat1 - 
> >> > no waiting
> >> > Mar  7 19:40:50 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
> >> > Initiating action 3: probe_complete probe_complete on rh63-heartbeat2 
> >> > (local) - no waiting
> >> > Mar  7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
> >> > Initiating action 4: monitor vip-master_monitor_0 on rh63-heartbeat1
> >> > Mar  7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
> >> > Initiating action 7: monitor vip-master_monitor_0 on rh63-heartbeat2 
> >> > (local)
> >> > Mar  7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
> >> > Initiating action 5: monitor vip-rep_monitor_0 on rh63-heartbeat1
> >> > Mar  7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
> >> > Initiating action 8: monitor vip-rep_monitor_0 on rh63-heartbeat2 (local)
> >> > Mar  7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
> >> > Initiating action 6: probe_complete probe_complete on rh63-heartbeat2 
> >> > (local) - no waiting
> >> > Mar  7 19:41:25 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
> >> > Initiating action 3: probe_complete probe_complete on rh63-heartbeat1 - 
> >> > no waiting
> >> > Mar  7 19:41:25 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
> >> > Initiating action 5: start vip-master_start_0 on rh63-heartbeat1
> >> > Mar  7 19:41:25 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
> >> > Initiating action 7: start vip-rep_start_0 on rh63-heartbeat1
> >> > Mar  7 19:41:26 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
> >> > Initiating action 8: monitor vip-rep_monitor_1 on rh63-heartbeat1
> >> > Mar  7 19:41:27 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
> >> > Initiating action 2: stop vip-master_stop_0 on rh63-heartbeat1
> >> > Mar  7 19:41:28 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
> >> > Initiating action 6: stop vip-rep_stop_0 on rh63-heartbeat1
> >> >
> >> >
> >> > Case 2) group meta_attribute ordered=true
> >> >  * Start of vip-rep waits for start of vip-master and is published.
> >> >
> >> > [root@rh63-heartbeat2 ~]# grep "Initiating action" /var/log/ha-log
> >> > Mar  7 19:34:37 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: 
> >> > Initiating action 2: probe_complete probe_co

Re: [Pacemaker] [Question]About "sequential" designation of resource_set.

2013-03-06 Thread Andrew Beekhof
Oh!

You use the resource sets _instead_ of a group.
If you want group.ordered=false, then use a colocation set (with
sequential=true).
If you want group.colocated=false, then use an ordering set (with
sequential=true).

Hope that helps :)

On Thu, Mar 7, 2013 at 3:16 PM,   wrote:
> Hi Andrew,
>
> Thank you for comments.
>
>> > Case 3) group resource_set sequential=false
>> >  * Start of vip-rep waits for start of vip-master and is published.
>> >  * I expected a result same as the first case.
>>
>> Me too. Have you got the relevant PE file?
>
> I attached the thing which just collected hb_report.
>
> Best Regards,
> Hideo Yamauchi.
>
>
>
> --- On Thu, 2013/3/7, Andrew Beekhof  wrote:
>
>> On Thu, Mar 7, 2013 at 1:27 PM,   wrote:
>> > Hi Andrew,
>> >
>> > I tried "resource_set  sequential" designation.
>> >  *  http://www.gossamer-threads.com/lists/linuxha/pacemaker/84578
>> >
>> > I caused an error in start of the vip-master resource and confirmed 
>> > movement.
>> >
>> > (snip)
>> >   
>> > > > type="Dummy2">
>> >   
>> > > > on-fail="restart" timeout="60s"/>
>> > > > on-fail="restart" timeout="60s"/>
>> > > > on-fail="block" timeout="60s"/>
>> >   
>> > 
>> > > > type="Dummy">
>> >   
>> > > > on-fail="stop" timeout="60s"/>
>> > > > on-fail="restart" timeout="60s"/>
>> > > > on-fail="block" timeout="60s"/>
>> >   
>> > 
>> >   
>> > (snip)
>> >
>> > By the ordered designation of the group resource, the difference that I 
>> > expected appeared.( Case 1 and Case 2)
>> > However, by the "sequential" designation, the difference that I expected 
>> > did not appear.(Case 3 and Case 4)
>> >
>> > (snip)
>> > 
>> > 
>> > > > id="test-order-resource_set">  ---> or "false"
>> > 
>> > 
>> > 
>> > 
>> > 
>> > (snip)
>> >
>> >
>> > Case 1) group meta_attribute ordered=false
>> >  * Start of vip-rep is published without waiting for start of vip-master.
>> >
>> > [root@rh63-heartbeat2 ~]# grep "Initiating action" /var/log/ha-log
>> > Mar  7 19:40:50 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
>> > Initiating action 2: probe_complete probe_complete on rh63-heartbeat1 - no 
>> > waiting
>> > Mar  7 19:40:50 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
>> > Initiating action 3: probe_complete probe_complete on rh63-heartbeat2 
>> > (local) - no waiting
>> > Mar  7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
>> > Initiating action 4: monitor vip-master_monitor_0 on rh63-heartbeat1
>> > Mar  7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
>> > Initiating action 7: monitor vip-master_monitor_0 on rh63-heartbeat2 
>> > (local)
>> > Mar  7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
>> > Initiating action 5: monitor vip-rep_monitor_0 on rh63-heartbeat1
>> > Mar  7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
>> > Initiating action 8: monitor vip-rep_monitor_0 on rh63-heartbeat2 (local)
>> > Mar  7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
>> > Initiating action 6: probe_complete probe_complete on rh63-heartbeat2 
>> > (local) - no waiting
>> > Mar  7 19:41:25 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
>> > Initiating action 3: probe_complete probe_complete on rh63-heartbeat1 - no 
>> > waiting
>> > Mar  7 19:41:25 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
>> > Initiating action 5: start vip-master_start_0 on rh63-heartbeat1
>> > Mar  7 19:41:25 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
>> > Initiating action 7: start vip-rep_start_0 on rh63-heartbeat1
>> > Mar  7 19:41:26 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
>> > Initiating action 8: monitor vip-rep_monitor_1 on rh63-heartbeat1
>> > Mar  7 19:41:27 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
>> > Initiating action 2: stop vip-master_stop_0 on rh63-heartbeat1
>> > Mar  7 19:41:28 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
>> > Initiating action 6: stop vip-rep_stop_0 on rh63-heartbeat1
>> >
>> >
>> > Case 2) group meta_attribute ordered=true
>> >  * Start of vip-rep waits for start of vip-master and is published.
>> >
>> > [root@rh63-heartbeat2 ~]# grep "Initiating action" /var/log/ha-log
>> > Mar  7 19:34:37 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: 
>> > Initiating action 2: probe_complete probe_complete on rh63-heartbeat1 - no 
>> > waiting
>> > Mar  7 19:34:37 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: 
>> > Initiating action 3: probe_complete probe_complete on rh63-heartbeat2 
>> > (local) - no waiting
>> > Mar  7 19:35:42 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: 
>> > Initiating action 4: monitor vip-master_monitor_0 on rh63-heartbeat1
>> > Mar  7 19:35:42 rh63-heartbeat2 crmd: [18865]: info: te_rsc

Re: [Pacemaker] Pacemaker cluster with different operating systems

2013-03-06 Thread Andrew Beekhof
On Thu, Mar 7, 2013 at 4:09 PM, Osman Findik  wrote:
> Hi all,
> We are using pacemaker with RHEL 6.2 successfully to manage pair of MySQL 
> databases. Pacemaker is coming from Red Hat High Availability Add-on. Its 
> version is 1.1.6
> Our need is to add an observer to this cluster but our existing servers are 
> all RHEL 5.x servers. We could not locate same version of pacemaker in 
> clusterlabs repo.
> So we tried to install provided rpms from clusterlabs repo to RHEL 5.5 and 
> RHEL 6.2 servers.
> Provided rpm version for RHEL 5 is pacemaker 1.1.8.2.
> Provided rpm version for RHEL 6 is pacemaker 1.1.8.4.
>
> In this setup although servers are members of the cluster over corosync, they 
> could not see each other from pacemaker.
> I also tried to install 1.1.8.1 rpm packages in order to use same pacemaker 
> releases, but that is also failed.
>
> Before giving details of the errors, my question is do you think a hybrid 
> setup with different OSs (RHEL 5.x and RHEL 6.x) is possible?

It should be.
You're using the pacemaker plugin for corosync?

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Pacemaker cluster with different operating systems

2013-03-06 Thread Osman Findik
Hi all,
We are using pacemaker with RHEL 6.2 successfully to manage pair of MySQL 
databases. Pacemaker is coming from Red Hat High Availability Add-on. Its 
version is 1.1.6
Our need is to add an observer to this cluster but our existing servers are all 
RHEL 5.x servers. We could not locate same version of pacemaker in clusterlabs 
repo.
So we tried to install provided rpms from clusterlabs repo to RHEL 5.5 and RHEL 
6.2 servers.
Provided rpm version for RHEL 5 is pacemaker 1.1.8.2.
Provided rpm version for RHEL 6 is pacemaker 1.1.8.4.

In this setup although servers are members of the cluster over corosync, they 
could not see each other from pacemaker.
I also tried to install 1.1.8.1 rpm packages in order to use same pacemaker 
releases, but that is also failed.

Before giving details of the errors, my question is do you think a hybrid setup 
with different OSs (RHEL 5.x and RHEL 6.x) is possible?

Thanks,
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Question]About "sequential" designation of resource_set.

2013-03-06 Thread Andrew Beekhof
On Thu, Mar 7, 2013 at 1:27 PM,   wrote:
> Hi Andrew,
>
> I tried "resource_set  sequential" designation.
>  *  http://www.gossamer-threads.com/lists/linuxha/pacemaker/84578
>
> I caused an error in start of the vip-master resource and confirmed movement.
>
> (snip)
>   
>  type="Dummy2">
>   
>  on-fail="restart" timeout="60s"/>
>  on-fail="restart" timeout="60s"/>
>  on-fail="block" timeout="60s"/>
>   
> 
> 
>   
>  on-fail="stop" timeout="60s"/>
>  on-fail="restart" timeout="60s"/>
>  on-fail="block" timeout="60s"/>
>   
> 
>   
> (snip)
>
> By the ordered designation of the group resource, the difference that I 
> expected appeared.( Case 1 and Case 2)
> However, by the "sequential" designation, the difference that I expected did 
> not appear.(Case 3 and Case 4)
>
> (snip)
> 
> 
>  
>  ---> or "false"
> 
> 
> 
> 
> 
> (snip)
>
>
> Case 1) group meta_attribute ordered=false
>  * Start of vip-rep is published without waiting for start of vip-master.
>
> [root@rh63-heartbeat2 ~]# grep "Initiating action" /var/log/ha-log
> Mar  7 19:40:50 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
> Initiating action 2: probe_complete probe_complete on rh63-heartbeat1 - no 
> waiting
> Mar  7 19:40:50 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
> Initiating action 3: probe_complete probe_complete on rh63-heartbeat2 (local) 
> - no waiting
> Mar  7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
> Initiating action 4: monitor vip-master_monitor_0 on rh63-heartbeat1
> Mar  7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
> Initiating action 7: monitor vip-master_monitor_0 on rh63-heartbeat2 (local)
> Mar  7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
> Initiating action 5: monitor vip-rep_monitor_0 on rh63-heartbeat1
> Mar  7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
> Initiating action 8: monitor vip-rep_monitor_0 on rh63-heartbeat2 (local)
> Mar  7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
> Initiating action 6: probe_complete probe_complete on rh63-heartbeat2 (local) 
> - no waiting
> Mar  7 19:41:25 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
> Initiating action 3: probe_complete probe_complete on rh63-heartbeat1 - no 
> waiting
> Mar  7 19:41:25 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
> Initiating action 5: start vip-master_start_0 on rh63-heartbeat1
> Mar  7 19:41:25 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
> Initiating action 7: start vip-rep_start_0 on rh63-heartbeat1
> Mar  7 19:41:26 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
> Initiating action 8: monitor vip-rep_monitor_1 on rh63-heartbeat1
> Mar  7 19:41:27 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
> Initiating action 2: stop vip-master_stop_0 on rh63-heartbeat1
> Mar  7 19:41:28 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: 
> Initiating action 6: stop vip-rep_stop_0 on rh63-heartbeat1
>
>
> Case 2) group meta_attribute ordered=true
>  * Start of vip-rep waits for start of vip-master and is published.
>
> [root@rh63-heartbeat2 ~]# grep "Initiating action" /var/log/ha-log
> Mar  7 19:34:37 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: 
> Initiating action 2: probe_complete probe_complete on rh63-heartbeat1 - no 
> waiting
> Mar  7 19:34:37 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: 
> Initiating action 3: probe_complete probe_complete on rh63-heartbeat2 (local) 
> - no waiting
> Mar  7 19:35:42 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: 
> Initiating action 4: monitor vip-master_monitor_0 on rh63-heartbeat1
> Mar  7 19:35:42 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: 
> Initiating action 7: monitor vip-master_monitor_0 on rh63-heartbeat2 (local)
> Mar  7 19:35:42 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: 
> Initiating action 5: monitor vip-rep_monitor_0 on rh63-heartbeat1
> Mar  7 19:35:42 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: 
> Initiating action 8: monitor vip-rep_monitor_0 on rh63-heartbeat2 (local)
> Mar  7 19:35:42 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: 
> Initiating action 6: probe_complete probe_complete on rh63-heartbeat2 (local) 
> - no waiting
> Mar  7 19:35:43 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: 
> Initiating action 3: probe_complete probe_complete on rh63-heartbeat1 - no 
> waiting
> Mar  7 19:35:43 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: 
> Initiating action 5: start vip-master_start_0 on rh63-heartbeat1
> Mar  7 19:35:45 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: 
> Initiating action 1: stop vip-master_stop_0 on rh63-heartbeat1
>
>
> Case 3) group resource_set sequential=false
>  * Start of vip-rep waits

[Pacemaker] [Question]About "sequential" designation of resource_set.

2013-03-06 Thread renayama19661014
Hi Andrew,

I tried "resource_set  sequential" designation.
 *  http://www.gossamer-threads.com/lists/linuxha/pacemaker/84578

I caused an error in start of the vip-master resource and confirmed movement.

(snip)
  

  



  


  



  

  
(snip)

By the ordered designation of the group resource, the difference that I 
expected appeared.( Case 1 and Case 2)
However, by the "sequential" designation, the difference that I expected did 
not appear.(Case 3 and Case 4)

(snip)


  
---> or "false"





(snip)


Case 1) group meta_attribute ordered=false 
 * Start of vip-rep is published without waiting for start of vip-master.

[root@rh63-heartbeat2 ~]# grep "Initiating action" /var/log/ha-log
Mar  7 19:40:50 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating 
action 2: probe_complete probe_complete on rh63-heartbeat1 - no waiting
Mar  7 19:40:50 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating 
action 3: probe_complete probe_complete on rh63-heartbeat2 (local) - no waiting
Mar  7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating 
action 4: monitor vip-master_monitor_0 on rh63-heartbeat1
Mar  7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating 
action 7: monitor vip-master_monitor_0 on rh63-heartbeat2 (local)
Mar  7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating 
action 5: monitor vip-rep_monitor_0 on rh63-heartbeat1
Mar  7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating 
action 8: monitor vip-rep_monitor_0 on rh63-heartbeat2 (local)
Mar  7 19:41:24 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating 
action 6: probe_complete probe_complete on rh63-heartbeat2 (local) - no waiting
Mar  7 19:41:25 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating 
action 3: probe_complete probe_complete on rh63-heartbeat1 - no waiting
Mar  7 19:41:25 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating 
action 5: start vip-master_start_0 on rh63-heartbeat1
Mar  7 19:41:25 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating 
action 7: start vip-rep_start_0 on rh63-heartbeat1 
Mar  7 19:41:26 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating 
action 8: monitor vip-rep_monitor_1 on rh63-heartbeat1
Mar  7 19:41:27 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating 
action 2: stop vip-master_stop_0 on rh63-heartbeat1
Mar  7 19:41:28 rh63-heartbeat2 crmd: [18992]: info: te_rsc_command: Initiating 
action 6: stop vip-rep_stop_0 on rh63-heartbeat1


Case 2) group meta_attribute ordered=true
 * Start of vip-rep waits for start of vip-master and is published.

[root@rh63-heartbeat2 ~]# grep "Initiating action" /var/log/ha-log
Mar  7 19:34:37 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: Initiating 
action 2: probe_complete probe_complete on rh63-heartbeat1 - no waiting
Mar  7 19:34:37 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: Initiating 
action 3: probe_complete probe_complete on rh63-heartbeat2 (local) - no waiting
Mar  7 19:35:42 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: Initiating 
action 4: monitor vip-master_monitor_0 on rh63-heartbeat1
Mar  7 19:35:42 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: Initiating 
action 7: monitor vip-master_monitor_0 on rh63-heartbeat2 (local)
Mar  7 19:35:42 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: Initiating 
action 5: monitor vip-rep_monitor_0 on rh63-heartbeat1
Mar  7 19:35:42 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: Initiating 
action 8: monitor vip-rep_monitor_0 on rh63-heartbeat2 (local)
Mar  7 19:35:42 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: Initiating 
action 6: probe_complete probe_complete on rh63-heartbeat2 (local) - no waiting
Mar  7 19:35:43 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: Initiating 
action 3: probe_complete probe_complete on rh63-heartbeat1 - no waiting
Mar  7 19:35:43 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: Initiating 
action 5: start vip-master_start_0 on rh63-heartbeat1
Mar  7 19:35:45 rh63-heartbeat2 crmd: [18865]: info: te_rsc_command: Initiating 
action 1: stop vip-master_stop_0 on rh63-heartbeat1


Case 3) group resource_set sequential=false
 * Start of vip-rep waits for start of vip-master and is published.
 * I expected a result same as the first case.

[root@rh63-heartbeat2 ~]# grep "Initiating action" /var/log/ha-log
Mar  7 19:43:50 rh63-heartbeat2 crmd: [19113]: info: te_rsc_command: Initiating 
action 2: probe_complete probe_complete on rh63-heartbeat1 - no waiting
Mar  7 19:43:50 rh63-heartbeat2 crmd: [19113]: info: te_rsc_command: Initiating 
action 3: probe_complete probe_complete on rh63-heartbeat2 (local) - no waitin

Re: [Pacemaker] The correction request of the log of booth

2013-03-06 Thread yusuke iida
Hi, Jiaju

2013/3/6 Jiaju Zhang :
> On Wed, 2013-03-06 at 15:13 +0900, yusuke iida wrote:
>> Hi, Jiaju
>>
>> There is a request about the log of booth.
>>
>> I want you to change a log level when a ticket expires into "info" from 
>> "debug".
>>
>> I think that this log is important since it means what occurred.
>>
>> And I want you to add the following information to log.
>>  * Which ticket is it?
>>  * Who had a ticket?
>>
>> For example, I want you to use the following forms.
>> info: lease expires ... owner [0] ticket [ticketA]
>
> Sounds great, will improve that;)
Thank you for accepting.

Many thanks!
Yusuke

>
> Thanks,
> Jiaju
>



--

METRO SYSTEMS CO., LTD

Yusuke Iida
Mail: yusk.i...@gmail.com


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem][crmsh]The designation of the 'ordered' attribute becomes the error.

2013-03-06 Thread renayama19661014
Hi Dejan,

The problem was settled with your patch.

However, I have a question.
I want to use "resource_set" which Mr. Andrew proposed, but do not understand a 
method to use with crm shell.

I read two next cib.xml and confirmed it with crm shell.

Case 1) sequential="false". 
(snip)








(snip)
 * When I confirm it with crm shell ...
(snip)
group master-group vip-master vip-rep
order test-order : _rsc_set_ ( vip-master vip-rep )
(snip)

Case 2) sequential="true"
(snip)

  

  
  

  

(snip)
 * When I confirm it with crm shell ...
(snip)
   group master-group vip-master vip-rep
   xml  \
 \
 \
 \
 \

(snip)

Does the designation of "sequential=true" have to describe it in xml?
Is there a right method to appoint an attribute of "resource_set" with crm 
shell?
Possibly is not "resource_set" usable with crm shell of Pacemaker1.0.13?

Best Regards,
Hideo Yamauchi.

--- On Thu, 2013/3/7, renayama19661...@ybb.ne.jp  
wrote:

> Hi Dejan,
> Hi Andrew,
> 
> Thank you for comment.
> I confirm the movement of the patch and report it.
> 
> Best Regards,
> Hideo Yamauchi.
> 
> --- On Wed, 2013/3/6, Dejan Muhamedagic  wrote:
> 
> > Hi Hideo-san,
> > 
> > On Wed, Mar 06, 2013 at 10:37:44AM +0900, renayama19661...@ybb.ne.jp wrote:
> > > Hi Dejan,
> > > Hi Andrew,
> > > 
> > > As for the crm shell, the check of the meta attribute was revised with 
> > > the next patch.
> > > 
> > >  * http://hg.savannah.gnu.org/hgweb/crmsh/rev/d1174f42f4b3
> > > 
> > > This patch was backported in Pacemaker1.0.13.
> > > 
> > >  * 
> > >https://github.com/ClusterLabs/pacemaker-1.0/commit/fa1a99ab36e0ed015f1bcbbb28f7db962a9d1abc#shell/modules/cibconfig.py
> > > 
> > > However, the ordered,colocated attribute of the group resource is treated 
> > > as an error when I use crm Shell which adopted this patch.
> > > 
> > > --
> > > (snip)
> > > ### Group Configuration ###
> > > group master-group \
> > >         vip-master \
> > >         vip-rep \
> > >         meta \
> > >                 ordered="false"
> > > (snip)
> > > 
> > > [root@rh63-heartbeat1 ~]# crm configure load update test2339.crm 
> > > INFO: building help index
> > > crm_verify[20028]: 2013/03/06_17:57:18 WARN: unpack_nodes: Blind faith: 
> > > not fencing unseen nodes
> > > WARNING: vip-master: specified timeout 60s for start is smaller than the 
> > > advised 90
> > > WARNING: vip-master: specified timeout 60s for stop is smaller than the 
> > > advised 100
> > > WARNING: vip-rep: specified timeout 60s for start is smaller than the 
> > > advised 90
> > > WARNING: vip-rep: specified timeout 60s for stop is smaller than the 
> > > advised 100
> > > ERROR: master-group: attribute ordered does not exist  -> WHY?
> > > Do you still want to commit? y
> > > --
> > > 
> > > If it chooses `yes` by a confirmation message, it is reflected, but it is 
> > > a problem that error message is displayed.
> > >  * The error occurs in the same way when I appoint colocated attribute.
> > > AndI noticed that there was not explanation of ordered,colocated of 
> > > the group resource in online help of Pacemaker.
> > > 
> > > I think that the designation of the ordered,colocated attribute should 
> > > not become the error in group resource.
> > > In addition, I think that ordered,colocated should be added to online 
> > > help.
> > 
> > These attributes are not listed in crmsh. Does the attached patch
> > help?
> > 
> > Thanks,
> > 
> > Dejan
> > > 
> > > Best Regards,
> > > Hideo Yamauchi.
> > > 
> > > 
> > > ___
> > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > > 
> > > Project Home: http://www.clusterlabs.org
> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: http://bugs.clusterlabs.org
> > 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [RFC] Automatic nodelist synchronization between corosync and pacemaker

2013-03-06 Thread Andrew Beekhof
On Thu, Mar 7, 2013 at 2:41 AM, Vladislav Bogdanov  wrote:
> 06.03.2013 08:35, Andrew Beekhof wrote:

>> So basically, you want to be able to add/remove nodes from nodelist.*
>> in corosync.conf and have pacemaker automatically add/remove them from
>> itself?
>
> Not corosync.conf, but cmap which is initially (partially) filled with
> values from corosync.conf.
>
>>
>> If corosync.conf gets out of sync (admin error or maybe a node was
>> down when you updated last) they might well get added back - I assume
>> you're ok with that?
>> Because there's no real way to know the difference between "added
>> back" and "not removed from last time".
>
> Sorry, can you please reword?

When node-A comes up with "node-X" that no-one else has, the cluster
has no way to know if node-X was just added, or if the admin forgot to
remove it on node-A.

>> Or are you planning to never update the on-disk corosync.conf and only
>> modify the in-memory nodelist?
>
> That depends on the actual use case I think.
>
> Hm. Interesting, how corosync behave when new dynamic nodes are added to
> cluster... I mean following: we have static corosync.conf with nodelist
> containing f.e. 3 entries, then we add fourth entry via cmap and boot
> fourth node. What should be in corosync.conf of that node?

I don't know actually.  Try it and see if it works without the local
node being defined?

> I believe in
> wont work without that _its_ fourth entry. Ugh. If so, then no fully
> dynamic "elastic" cluster which I was dreaming of is still possible
> because out-of-the-box when using dynamic nodelist.
>
> The only way to have this I see is to have static nodelist in
> corosync.conf with all possible nodes predefined. And never edit it in
> cmap. So, my original point
> * Remove nodes from CIB when they are removed from a nodelist.
> does not fit.
>
> By elastic I mean what was discussed on corosync list when Fabio started
> with votequorum design and what then appeared in votequorum manpage:
> ===
> allow_downscale: 1
>
> Enables allow downscale (AD) feature (default: 0).
>
> The general behaviour of votequorum is to never decrease expected votes
> or quorum.
>
> When  AD  is  enabled,  both expected votes and quorum are recalculated
> when a node leaves the cluster in a clean state (normal corosync  shut-
> down process) down to configured expected_votes.

But thats very different to removing the node completely.
You still want to know its in a sane state.

> Example use case:
>
> 1) N node cluster (where N is any value higher than 3)
> 2) expected_votes set to 3 in corosync.conf
> 3) only 3 nodes are running
> 4) admin requires to increase processing power and adds 10 nodes
> 5) internal expected_votes is automatically set to 13
> 6) minimum expected_votes is 3 (from configuration)
> - up to this point this is standard votequorum behavior -
> 7) once the work is done, admin wants to remove nodes from the cluster
> 8) using an ordered shutdown the admin can reduce the cluster size
>automatically back to 3, but not below 3, where normal quorum
>operation will work as usual.
> =
>
> What I would expect from pacemaker, is to automatically remove nodes
> down to 3 at step 8 (just follow quorum) if AD is enabled AND pacemaker
> is instructed to follow that (with some other cmap switch). And also to
> reduce number of allocated clone instances. Sure, all nodes must have
> equal number of votes (1).
>
> Is it ok for you?

Not really.
We simply don't have enough information to do the removal.
All we get is "node gone", we have to do a fair bit of work to
calculate if it was clean at the time or not (and clean to corosync
doesn't always imply clean to pacemaker).

So back to the start, why do you need pacemaker to forget about the
other 10 nodes?
(because everything apart from that should already work).

>
>>
>>>

> That
> would be OK if number of clone instances does not raise with that...

 Why?  If clone-node-max=1, then you'll never have more than the number
 of active nodes - even if clone-max is greater.
>>>
>>> Active (online) or known (existing in a  section)?
>>> I've seen that as soon as node appears in  even in offline state,
>>> new clone instance is allocated.
>>
>> $num_known instances will "exist", but only $num_active will be running.
>
> Yep, that's what I say. I see them in crm_mon or 'crm status' and they
> make my life harder ;)
> That remaining instances are "allocated" but not running.
>
> I can agree that this issue is very "cosmetic" one, but its existence
> conflicts with my perfectionism so I'd like to resolve it ;)
>
>>
>>>
>>> Also, on one cluster with post-1.1.7 with openais plugin I have 16 nodes
>>> configured in totem.interface.members, but only three nodes in 
>>> CIB section, And I'm able to allocate at least 8-9 instances of clones
>>> with clone-max.
>>
>> Yes, but did you set clone-node-max?  One is the global maximum, the
>> other is the per-node maximum.
>>
>>> I be

Re: [Pacemaker] Pacemaker resource migration behaviour

2013-03-06 Thread Andrew Beekhof
On Thu, Mar 7, 2013 at 11:23 AM, Andrew Beekhof  wrote:
> On Wed, Mar 6, 2013 at 8:02 PM, James Guthrie  wrote:
>> Hi Andrew,
>>
>> Thanks for looking into this. We have since decided not to perform a 
>> failover on the failure of one of the sub-* resources for operational 
>> reasons. As a result, I can't reliably test if this issue is actually fixed 
>> in the current HEAD. (Speaking of which, do you have a date set yet for 
>> 1.1.9?)

ASAP.  I'm hoping in a couple of hours from now, otherwise tomorrow.

>>
>> On Mar 6, 2013, at 8:39 AM, Andrew Beekhof  wrote:
>>
>>> I'm still very confused about why you're using master/slave though.
>>
>> The reason I went with master-slave was that we want the init script started 
>> on the "master" host and stopped on the "slave".
>
> You get those semantics from a normal primitive resource.
>
>> With a master-slave I have a monitor operation on the slave ensuring that 
>> the resource will be stopped on the slave if it were to be started manually 
>> (something I can't be sure wouldn't happen). AFAIK this wouldn't be the case 
>> with a "standard" resource.
>
> I think 1.1.8 allowed for operations with role=Stopped which would do
> this for the highly paranoid :-)
>
>>
>> Regards,
>> James
>>
>>
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker resource migration behaviour

2013-03-06 Thread Andrew Beekhof
On Wed, Mar 6, 2013 at 8:02 PM, James Guthrie  wrote:
> Hi Andrew,
>
> Thanks for looking into this. We have since decided not to perform a failover 
> on the failure of one of the sub-* resources for operational reasons. As a 
> result, I can't reliably test if this issue is actually fixed in the current 
> HEAD. (Speaking of which, do you have a date set yet for 1.1.9?)
>
> On Mar 6, 2013, at 8:39 AM, Andrew Beekhof  wrote:
>
>> I'm still very confused about why you're using master/slave though.
>
> The reason I went with master-slave was that we want the init script started 
> on the "master" host and stopped on the "slave".

You get those semantics from a normal primitive resource.

> With a master-slave I have a monitor operation on the slave ensuring that the 
> resource will be stopped on the slave if it were to be started manually 
> (something I can't be sure wouldn't happen). AFAIK this wouldn't be the case 
> with a "standard" resource.

I think 1.1.8 allowed for operations with role=Stopped which would do
this for the highly paranoid :-)

>
> Regards,
> James
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker resource migration behaviour

2013-03-06 Thread Andrew Beekhof
On Wed, Mar 6, 2013 at 6:59 PM, James Guthrie  wrote:
> On Mar 6, 2013, at 7:34 AM, Andrew Beekhof  wrote:
>
>> On Wed, Feb 6, 2013 at 11:41 PM, James Guthrie  wrote:
>>> Hi David,
>>>
>>> Unfortunately crm_report doesn't work correctly on my hosts as we have 
>>> compiled from source with custom paths and apparently the crm_report and 
>>> associated tools are not built to use the paths that can be customised with 
>>> autoconf.
>>
>> It certainly tries to:
>>
>>   https://github.com/beekhof/pacemaker/blob/master/tools/report.common#L99
>>
>> What does it say on your system (or, what paths did you give to autoconf)?
>
> You are correct, it does try to - there are a few problems though:
> - the hardcoded depth (-maxdepth 5) that is used to search for the files is 
> no good on my host
> - the fact that it assumes the local state did would be /var (despite what 
> was configured in autoconf)
>
> In my case all files are in the path /opt/OSAGpcmk/pcmk
>
> I submitted a pull-request which I was hoping to get some comment on, but 
> didn't.

I don't comment much while I'm asleep. But I've applied your
subsequent pull requests.

>
> https://github.com/ClusterLabs/pacemaker/pull/225
>
> I know that it's not a complete solution and would suggest I resubmit the 
> pull request in two parts:
> 1. Using the localstatedir and exec_prefix as configured in autoconf.
> 2. Make the maxdepth parameter default to 5, but be overridable with a flag 
> to crm_report.
>
> Regards,
> James
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Block stonith when drbd inconsistent

2013-03-06 Thread Andrew Beekhof
If you put drbd into maintenance mode, we'd not perform any state
changes (stop/stop/promote/demote) on it that could fail.
That would likely do what you're after.

On Thu, Mar 7, 2013 at 4:59 AM, Jan Škoda  wrote:
> Dne 6.3.2013 06:38, Andrew Beekhof napsal(a):
>> Nodes shouldn't be being fenced so often.  Do you know what is causing
>> this to happen?
> I know that this shouldn't happen frequently, but not having access to
> uptodate data is certainly unwanted and there should be a way to prevent it.
>
> DRBD is quite prone to demote failures, especially when filesystem can
> not be umounted for some reason. Blocked process for example can't be
> killed and filesystems accessed by it can't be unmounted. This problem
> is causing 90% of fencing for me.
>
> --
> Honza 'Lefty' Škoda http://www.jskoda.cz/
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem][crmsh]The designation of the 'ordered' attribute becomes the error.

2013-03-06 Thread renayama19661014
Hi Dejan,
Hi Andrew,

Thank you for comment.
I confirm the movement of the patch and report it.

Best Regards,
Hideo Yamauchi.

--- On Wed, 2013/3/6, Dejan Muhamedagic  wrote:

> Hi Hideo-san,
> 
> On Wed, Mar 06, 2013 at 10:37:44AM +0900, renayama19661...@ybb.ne.jp wrote:
> > Hi Dejan,
> > Hi Andrew,
> > 
> > As for the crm shell, the check of the meta attribute was revised with the 
> > next patch.
> > 
> >  * http://hg.savannah.gnu.org/hgweb/crmsh/rev/d1174f42f4b3
> > 
> > This patch was backported in Pacemaker1.0.13.
> > 
> >  * 
> >https://github.com/ClusterLabs/pacemaker-1.0/commit/fa1a99ab36e0ed015f1bcbbb28f7db962a9d1abc#shell/modules/cibconfig.py
> > 
> > However, the ordered,colocated attribute of the group resource is treated 
> > as an error when I use crm Shell which adopted this patch.
> > 
> > --
> > (snip)
> > ### Group Configuration ###
> > group master-group \
> >         vip-master \
> >         vip-rep \
> >         meta \
> >                 ordered="false"
> > (snip)
> > 
> > [root@rh63-heartbeat1 ~]# crm configure load update test2339.crm 
> > INFO: building help index
> > crm_verify[20028]: 2013/03/06_17:57:18 WARN: unpack_nodes: Blind faith: not 
> > fencing unseen nodes
> > WARNING: vip-master: specified timeout 60s for start is smaller than the 
> > advised 90
> > WARNING: vip-master: specified timeout 60s for stop is smaller than the 
> > advised 100
> > WARNING: vip-rep: specified timeout 60s for start is smaller than the 
> > advised 90
> > WARNING: vip-rep: specified timeout 60s for stop is smaller than the 
> > advised 100
> > ERROR: master-group: attribute ordered does not exist  -> WHY?
> > Do you still want to commit? y
> > --
> > 
> > If it chooses `yes` by a confirmation message, it is reflected, but it is a 
> > problem that error message is displayed.
> >  * The error occurs in the same way when I appoint colocated attribute.
> > AndI noticed that there was not explanation of ordered,colocated of the 
> > group resource in online help of Pacemaker.
> > 
> > I think that the designation of the ordered,colocated attribute should not 
> > become the error in group resource.
> > In addition, I think that ordered,colocated should be added to online help.
> 
> These attributes are not listed in crmsh. Does the attached patch
> help?
> 
> Thanks,
> 
> Dejan
> > 
> > Best Regards,
> > Hideo Yamauchi.
> > 
> > 
> > ___
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Does LVM resouce agent conflict with UDEV rules?

2013-03-06 Thread Sven Arnold

Dear all,

on my test setup (iSCSI on top of LVM on top of DRBD) I was not able to 
reliable migrate the volume group from one node to the other.


I have two logical volumes that are mapped to two different LUNs on one 
iSCSI Target. During migration of the resource group (LVM + iSCSI) the 
LVM resource often failed to start properly on the target node. I 
noticed that sometimes not all LVs would be activated. From the log 
files I suspected that udev could have something to do with this.


In fact, disabling the udev rule

SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}=="lvm*|LVM*",\
  RUN+="watershed sh -c '/sbin/lvm vgscan; /sbin/lvm vgchange -a y'"

seems to resolve the problem for me.

Question(s):
Is this a known problem? Am I supposed to tune udev when managing LVM 
resources by a cluster resource manager? Or should this be considered as 
a bug in either udev or LVM resource agent?


I am using:
pacemaker 1.1.6-2ubuntu3
resource-agents 1:3.9.2-5ubuntu4.1
udev 175-0ubuntu9.2

on Ubuntu 12.04


There is also a bug report on launchpad (although it is related only to 
12.10 there):


https://bugs.launchpad.net/ubuntu/+source/lvm2/+bug/1088081

Kind regards,

Sven

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Block stonith when drbd inconsistent

2013-03-06 Thread Jan Škoda
Dne 6.3.2013 06:38, Andrew Beekhof napsal(a):
> Nodes shouldn't be being fenced so often.  Do you know what is causing
> this to happen?
I know that this shouldn't happen frequently, but not having access to
uptodate data is certainly unwanted and there should be a way to prevent it.

DRBD is quite prone to demote failures, especially when filesystem can
not be umounted for some reason. Blocked process for example can't be
killed and filesystems accessed by it can't be unmounted. This problem
is causing 90% of fencing for me.

-- 
Honza 'Lefty' Škoda http://www.jskoda.cz/



smime.p7s
Description: Elektronicky podpis S/MIME
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [RFC] Automatic nodelist synchronization between corosync and pacemaker

2013-03-06 Thread Vladislav Bogdanov
06.03.2013 08:35, Andrew Beekhof wrote:
> On Thu, Feb 28, 2013 at 5:13 PM, Vladislav Bogdanov
>  wrote:
>> 28.02.2013 07:21, Andrew Beekhof wrote:
>>> On Tue, Feb 26, 2013 at 7:36 PM, Vladislav Bogdanov
>>>  wrote:
 26.02.2013 11:10, Andrew Beekhof wrote:
> On Mon, Feb 18, 2013 at 6:18 PM, Vladislav Bogdanov
>  wrote:
>> Hi Andrew, all,
>>
>> I had an idea last night, that it may be worth implementing
>> fully-dynamic cluster resize support in pacemaker,
>
> We already support nodes being added on the fly.  As soon as they show
> up in the membership we add them to the cib.

 Membership (runtime.totem.pg.mrp.srp.members) or nodelist (nodelist.node)?
>>>
>>> To my knowledge, only one (first) gets updated at runtime.
>>> Even if nodelist.node could be updated dynamically, we'd have to poll
>>> or be prompted to find out.
>>
>> It can, please see at the end of cmap_keys(8).
>> Please also see cmap_track_add(3) for CMAP_TRACK_PREFIX flag (and my
>> original message ;) ).
> 
> ACK :)
> 
>>

 I recall that when I migrated from corosync 1.4 to 2.0 (somewhere near
 pacemaker 1.1.8 release time) and replaced old-style UDPU member list
 with nodelist.node, I saw all nodes configured in that nodelist appeared
 in a CIB. For me that was a regression, because with old-style config
 (and corosync 1.4) CIB contained only nodes seen online (4 of 16).
>>>
>>> That was a loophole that only worked when the entire cluster had been
>>> down and the  section was empty.
>>
>> Aha, that is what I've been hit by.
>>
>>> People filed bugs explicitly asking for that loophole to be closed
>>> because it was inconsistent with what the cluster did on every
>>> subsequent startup.
>>
>> That is what I'm interested too. And what I propose should fix that too.
> 
> Ah, I must have misparsed, I thought you were looking for the opposite
> behaviour.
> 
> So basically, you want to be able to add/remove nodes from nodelist.*
> in corosync.conf and have pacemaker automatically add/remove them from
> itself?

Not corosync.conf, but cmap which is initially (partially) filled with
values from corosync.conf.

> 
> If corosync.conf gets out of sync (admin error or maybe a node was
> down when you updated last) they might well get added back - I assume
> you're ok with that?
> Because there's no real way to know the difference between "added
> back" and "not removed from last time".

Sorry, can you please reword?

> 
> Or are you planning to never update the on-disk corosync.conf and only
> modify the in-memory nodelist?

That depends on the actual use case I think.

Hm. Interesting, how corosync behave when new dynamic nodes are added to
cluster... I mean following: we have static corosync.conf with nodelist
containing f.e. 3 entries, then we add fourth entry via cmap and boot
fourth node. What should be in corosync.conf of that node? I believe in
wont work without that _its_ fourth entry. Ugh. If so, then no fully
dynamic "elastic" cluster which I was dreaming of is still possible
because out-of-the-box when using dynamic nodelist.

The only way to have this I see is to have static nodelist in
corosync.conf with all possible nodes predefined. And never edit it in
cmap. So, my original point
* Remove nodes from CIB when they are removed from a nodelist.
does not fit.

By elastic I mean what was discussed on corosync list when Fabio started
with votequorum design and what then appeared in votequorum manpage:
===
allow_downscale: 1

Enables allow downscale (AD) feature (default: 0).

The general behaviour of votequorum is to never decrease expected votes
or quorum.

When  AD  is  enabled,  both expected votes and quorum are recalculated
when a node leaves the cluster in a clean state (normal corosync  shut-
down process) down to configured expected_votes.

Example use case:

1) N node cluster (where N is any value higher than 3)
2) expected_votes set to 3 in corosync.conf
3) only 3 nodes are running
4) admin requires to increase processing power and adds 10 nodes
5) internal expected_votes is automatically set to 13
6) minimum expected_votes is 3 (from configuration)
- up to this point this is standard votequorum behavior -
7) once the work is done, admin wants to remove nodes from the cluster
8) using an ordered shutdown the admin can reduce the cluster size
   automatically back to 3, but not below 3, where normal quorum
   operation will work as usual.
=

What I would expect from pacemaker, is to automatically remove nodes
down to 3 at step 8 (just follow quorum) if AD is enabled AND pacemaker
is instructed to follow that (with some other cmap switch). And also to
reduce number of allocated clone instances. Sure, all nodes must have
equal number of votes (1).

Is it ok for you?

> 
>>
>>>
 That
 would be OK if number of clone instances does not raise with that...
>>>
>>> Why?  If clone-node-max=1, then you'll never have more than the number
>>> of a

Re: [Pacemaker] Pacemaker DRBD as Physical Volume on Encrypted RAID1

2013-03-06 Thread senrabdet
Thanks Lars - will digest your response a bit, get back - appreciate the help!
 

 

 

-Original Message-
From: Lars Ellenberg 
To: pacemaker 
Sent: Wed, Mar 6, 2013 5:40 am
Subject: Re: [Pacemaker] Pacemaker DRBD as Physical Volume on Encrypted RAID1


On Mon, Mar 04, 2013 at 04:27:24PM -0500, senrab...@aol.com wrote:
> Hi All:
> 
> We're new to pacemaker (just got some great help from this forum
> getting it working with LVM as backing device), and would like to
> explore the Physical Volume option. We're trying configure on top of
> an existing Encrypted RAID1 set up and employ LVM.
> 
> NOTE:  our goal is to run many virtual servers, each in its own
> logical volume and it looks like putting LVM on top of the DRBD would
> allow us to add logical volumes "on the fly", but also have a
> "simpler" setup with one drbd device for all the logical volumes and
> one related pacemaker config.  Hence, exploring DRBD as a physical
> volume.


A single DRBD has a single "activity log",
running "many virtual servers" from there will very likely cause
the "worst possible" workload (many totally random writes).

You really want to use DRBD 8.4.3,
see https://blogs.linbit.com/p/469/843-random-writes-faster/
for why.


> Q:  For pacemaker to work, how do we do the DRBD disk/device mapping
> in the drbd.conf file?  And should we set things up and encrypt last,
> or can we apply DRBD and Pacemaker to an existing Encypted RAID1
> setup?


Neither Pacemaker nor DRBD do particularly care.

If you want to stack the encryption layer on top of DRBD, fine.
(you'd probably need to teach some pacemaker resource agent to "start"
the encryption layer).

If you want to stack DRBD on top of the encryption layer, just as fine.

Unless you provide the decryption key in plaintext somewhere, failover
will likely be easier to automate if you have DRBD on top of encryption,
so if you want the real device encrypted, I'd recommend to put
encryption below DRBD.

Obviously, the DRBD replication traffic will still be "plaintext" in
that case.

> The examples we've seen show mapping between the drbd device and a
> physical disk (e.g., sdb) in the drbd.conf, and then  "pvcreate
> /dev/drbdnum" and creating a volume group and logical volume on the
> drbd device.
> 
> So for this type of set up, drbd.conf might look like:
> 
> device/dev/drbd1;
> disk  /dev/sdb;
> address xx.xx.xx.xx:7789;
> meta-disk internal;
> 
> In our case, because we have an existing RAID1 (md2) and it's
> encrypted (md2_crypt or /dev/dm-7 ...  we're unsure which partition
> actually has the data), any thoughts on how to do the DRBD mapping?
> E.g., 
> 
> device /dev/drbd1 minor 1;
> disk /dev/???;
> address xx.xx.xx.xx:7789; 
> meta-disk internal;
> 
> I.e., what goes in the "disk /dev/?;"?  Would it be "disk 
/dev/md2_crypt;"?

Yes.

> And can we do our setup on an existing Encrypted RAID1 setup

Yes.

> (if we do pvcreate on drbd1, we get errors)?

Huh?

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

 
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] The correction request of the log of booth

2013-03-06 Thread Jiaju Zhang
On Wed, 2013-03-06 at 15:13 +0900, yusuke iida wrote:
> Hi, Jiaju
> 
> There is a request about the log of booth.
> 
> I want you to change a log level when a ticket expires into "info" from 
> "debug".
> 
> I think that this log is important since it means what occurred.
> 
> And I want you to add the following information to log.
>  * Which ticket is it?
>  * Who had a ticket?
> 
> For example, I want you to use the following forms.
> info: lease expires ... owner [0] ticket [ticketA]

Sounds great, will improve that;)

Thanks,
Jiaju


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] standby attribute and same resources running at the same time

2013-03-06 Thread Leon Fauster
Am 06.03.2013 um 05:14 schrieb Andrew Beekhof :
> On Tue, Mar 5, 2013 at 4:20 AM, Leon Fauster  
> wrote:
>> 
>> So far all good. I am doing some stress test now and noticed that rebooting
>> one node (n2), that node (n2) will be marked as standby in the cib (shown on 
>> the
>> other node (n1)).
>> 
>> After rebooting the node (n2) crm_mon on that node shows that the other node 
>> (n1)
>> is offline and begins to start the ressources. While the other node (n1) 
>> that wasn't
>> rebooted still shows n2 as standby. At that point both nodes are runnnig the 
>> "same"
>> resources. After a couple of minutes that situation is noticed and both nodes
>> renegotiate the current state. Then one node take over the responsibility to 
>> provide
>> the resources. On both nodes the previously rebooted node is still listed as 
>> standby.
>> 
>> 
>>  cat /var/log/messages |grep error
>>  Mar  4 17:32:33 cn1 pengine[1378]:error: native_create_actions: 
>> Resource resIP (ocf::IPaddr2) is active on 2 nodes attempting recovery
>>  Mar  4 17:32:33 cn1 pengine[1378]:error: native_create_actions: 
>> Resource resApache (ocf::apache) is active on 2 nodes attempting recovery
>>  Mar  4 17:32:33 cn1 pengine[1378]:error: process_pe_message: Calculated 
>> Transition 1: /var/lib/pacemaker/pengine/pe-error-6.bz2
>>  Mar  4 17:32:48 cn1 crmd[1379]:   notice: run_graph: Transition 1 
>> (Complete=9, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
>> Source=/var/lib/pacemaker/pengine/pe-error-6.bz2): Complete
>> 
>> 
>>  crm_mon -1
>>  Last updated: Mon Mar  4 17:49:08 2013
>>  Last change: Mon Mar  4 10:22:53 2013 via crm_resource on cn1.localdomain
>>  Stack: cman
>>  Current DC: cn1.localdomain - partition with quorum
>>  Version: 1.1.8-7.el6-394e906
>>  2 Nodes configured, 2 expected votes
>>  2 Resources configured.
>> 
>>  Node cn2.localdomain: standby
>>  Online: [ cn1.localdomain ]
>> 
>>  resIP (ocf::heartbeat:IPaddr2):   Started cn1.localdomain
>>  resApache (ocf::heartbeat:apache):Started cn1.localdomain
>> 
>> 
>> i checked the init scripts and found that the standby "behavior" comes
>> from a function that is called on "service pacemaker stop" (added in 
>> rhel6.4).
>> 
>> cman_pre_stop()
>> {
>>cname=`crm_node --name`
>>crm_attribute -N $cname -n standby -v true -l reboot
>>echo -n "Waiting for shutdown of managed resources"
>> ...
> 
> That will only last until the node comes back (the cluster will remove
> it automatically), the core problem is that it appears not to have.
> Can you file a bug and attach a crm_report for the period covered by
> the restart?


I used the redhat's bugzilla:

https://bugzilla.redhat.com/show_bug.cgi?id=918502

as you are also the maintainer of the corresponding rpm. 

--
Thanks
Leon











___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem][crmsh]The designation of the 'ordered' attribute becomes the error.

2013-03-06 Thread Dejan Muhamedagic
Hi Hideo-san,

On Wed, Mar 06, 2013 at 10:37:44AM +0900, renayama19661...@ybb.ne.jp wrote:
> Hi Dejan,
> Hi Andrew,
> 
> As for the crm shell, the check of the meta attribute was revised with the 
> next patch.
> 
>  * http://hg.savannah.gnu.org/hgweb/crmsh/rev/d1174f42f4b3
> 
> This patch was backported in Pacemaker1.0.13.
> 
>  * 
> https://github.com/ClusterLabs/pacemaker-1.0/commit/fa1a99ab36e0ed015f1bcbbb28f7db962a9d1abc#shell/modules/cibconfig.py
> 
> However, the ordered,colocated attribute of the group resource is treated as 
> an error when I use crm Shell which adopted this patch.
> 
> --
> (snip)
> ### Group Configuration ###
> group master-group \
> vip-master \
> vip-rep \
> meta \
> ordered="false"
> (snip)
> 
> [root@rh63-heartbeat1 ~]# crm configure load update test2339.crm 
> INFO: building help index
> crm_verify[20028]: 2013/03/06_17:57:18 WARN: unpack_nodes: Blind faith: not 
> fencing unseen nodes
> WARNING: vip-master: specified timeout 60s for start is smaller than the 
> advised 90
> WARNING: vip-master: specified timeout 60s for stop is smaller than the 
> advised 100
> WARNING: vip-rep: specified timeout 60s for start is smaller than the advised 
> 90
> WARNING: vip-rep: specified timeout 60s for stop is smaller than the advised 
> 100
> ERROR: master-group: attribute ordered does not exist  -> WHY?
> Do you still want to commit? y
> --
> 
> If it chooses `yes` by a confirmation message, it is reflected, but it is a 
> problem that error message is displayed.
>  * The error occurs in the same way when I appoint colocated attribute.
> AndI noticed that there was not explanation of ordered,colocated of the 
> group resource in online help of Pacemaker.
> 
> I think that the designation of the ordered,colocated attribute should not 
> become the error in group resource.
> In addition, I think that ordered,colocated should be added to online help.

These attributes are not listed in crmsh. Does the attached patch
help?

Thanks,

Dejan
> 
> Best Regards,
> Hideo Yamauchi.
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>From 1f6ed514c8e53c79835aaaf26d152f2d840126f0 Mon Sep 17 00:00:00 2001
From: Dejan Muhamedagic 
Date: Wed, 6 Mar 2013 11:57:54 +0100
Subject: [PATCH] Low: shell: add group meta attributes

---
 shell/modules/cibconfig.py | 2 ++
 shell/modules/vars.py.in   | 1 +
 2 files changed, 3 insertions(+)

diff --git a/shell/modules/cibconfig.py b/shell/modules/cibconfig.py
index 2dfaa92..1cf08fa 100644
--- a/shell/modules/cibconfig.py
+++ b/shell/modules/cibconfig.py
@@ -1152,6 +1152,8 @@ class CibContainer(CibObject):
 l += vars.clone_meta_attributes
 elif self.obj_type == "ms":
 l += vars.clone_meta_attributes + vars.ms_meta_attributes
+elif self.obj_type == "group":
+l += vars.group_meta_attributes
 rc = sanity_check_meta(self.obj_id,self.node,l)
 return rc
 
diff --git a/shell/modules/vars.py.in b/shell/modules/vars.py.in
index c83232e..dff86dc 100644
--- a/shell/modules/vars.py.in
+++ b/shell/modules/vars.py.in
@@ -117,6 +117,7 @@ class Vars(Singleton):
 "failure-timeout", "resource-stickiness", "target-role",
 "restart-type", "description",
 )
+group_meta_attributes = ("ordered", "colocated")
 clone_meta_attributes = (
 "ordered", "notify", "interleave", "globally-unique",
 "clone-max", "clone-node-max", "clone-state", "description",
-- 
1.8.0

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker DRBD as Physical Volume on Encrypted RAID1

2013-03-06 Thread Lars Ellenberg
On Mon, Mar 04, 2013 at 04:27:24PM -0500, senrab...@aol.com wrote:
> Hi All:
> 
> We're new to pacemaker (just got some great help from this forum
> getting it working with LVM as backing device), and would like to
> explore the Physical Volume option. We're trying configure on top of
> an existing Encrypted RAID1 set up and employ LVM.
> 
> NOTE:  our goal is to run many virtual servers, each in its own
> logical volume and it looks like putting LVM on top of the DRBD would
> allow us to add logical volumes "on the fly", but also have a
> "simpler" setup with one drbd device for all the logical volumes and
> one related pacemaker config.  Hence, exploring DRBD as a physical
> volume.


A single DRBD has a single "activity log",
running "many virtual servers" from there will very likely cause
the "worst possible" workload (many totally random writes).

You really want to use DRBD 8.4.3,
see https://blogs.linbit.com/p/469/843-random-writes-faster/
for why.


> Q:  For pacemaker to work, how do we do the DRBD disk/device mapping
> in the drbd.conf file?  And should we set things up and encrypt last,
> or can we apply DRBD and Pacemaker to an existing Encypted RAID1
> setup?


Neither Pacemaker nor DRBD do particularly care.

If you want to stack the encryption layer on top of DRBD, fine.
(you'd probably need to teach some pacemaker resource agent to "start"
the encryption layer).

If you want to stack DRBD on top of the encryption layer, just as fine.

Unless you provide the decryption key in plaintext somewhere, failover
will likely be easier to automate if you have DRBD on top of encryption,
so if you want the real device encrypted, I'd recommend to put
encryption below DRBD.

Obviously, the DRBD replication traffic will still be "plaintext" in
that case.

> The examples we've seen show mapping between the drbd device and a
> physical disk (e.g., sdb) in the drbd.conf, and then  "pvcreate
> /dev/drbdnum" and creating a volume group and logical volume on the
> drbd device.
> 
> So for this type of set up, drbd.conf might look like:
> 
> device/dev/drbd1;
> disk  /dev/sdb;
> address xx.xx.xx.xx:7789;
> meta-disk internal;
> 
> In our case, because we have an existing RAID1 (md2) and it's
> encrypted (md2_crypt or /dev/dm-7 ...  we're unsure which partition
> actually has the data), any thoughts on how to do the DRBD mapping?
> E.g., 
> 
> device /dev/drbd1 minor 1;
> disk /dev/???;
> address xx.xx.xx.xx:7789; 
> meta-disk internal;
> 
> I.e., what goes in the "disk /dev/?;"?  Would it be "disk 
> /dev/md2_crypt;"?

Yes.

> And can we do our setup on an existing Encrypted RAID1 setup

Yes.

> (if we do pvcreate on drbd1, we get errors)?

Huh?

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker resource migration behaviour

2013-03-06 Thread James Guthrie
Hi Andrew,

Thanks for looking into this. We have since decided not to perform a failover 
on the failure of one of the sub-* resources for operational reasons. As a 
result, I can't reliably test if this issue is actually fixed in the current 
HEAD. (Speaking of which, do you have a date set yet for 1.1.9?)

On Mar 6, 2013, at 8:39 AM, Andrew Beekhof  wrote:

> I'm still very confused about why you're using master/slave though.

The reason I went with master-slave was that we want the init script started on 
the "master" host and stopped on the "slave". With a master-slave I have a 
monitor operation on the slave ensuring that the resource will be stopped on 
the slave if it were to be started manually (something I can't be sure wouldn't 
happen). AFAIK this wouldn't be the case with a "standard" resource.

Regards,
James


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker resource migration behaviour

2013-03-06 Thread James Guthrie
On Mar 6, 2013, at 7:34 AM, Andrew Beekhof  wrote:

> On Wed, Feb 6, 2013 at 11:41 PM, James Guthrie  wrote:
>> Hi David,
>> 
>> Unfortunately crm_report doesn't work correctly on my hosts as we have 
>> compiled from source with custom paths and apparently the crm_report and 
>> associated tools are not built to use the paths that can be customised with 
>> autoconf.
> 
> It certainly tries to:
> 
>   https://github.com/beekhof/pacemaker/blob/master/tools/report.common#L99
> 
> What does it say on your system (or, what paths did you give to autoconf)?

You are correct, it does try to - there are a few problems though:
- the hardcoded depth (-maxdepth 5) that is used to search for the files is no 
good on my host
- the fact that it assumes the local state did would be /var (despite what was 
configured in autoconf)

In my case all files are in the path /opt/OSAGpcmk/pcmk

I submitted a pull-request which I was hoping to get some comment on, but 
didn't.

https://github.com/ClusterLabs/pacemaker/pull/225

I know that it's not a complete solution and would suggest I resubmit the pull 
request in two parts:
1. Using the localstatedir and exec_prefix as configured in autoconf.
2. Make the maxdepth parameter default to 5, but be overridable with a flag to 
crm_report.

Regards,
James
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org