Re: [Pacemaker] Pacemaker configuration with different dependencies

2013-04-17 Thread Ivor Prebeg
Hi Andreas, thank you for your answer.

Maybe my description was a little fuzzy, sorry for that.

What I want is following:

* if l3_ping fails on a particular node, all services should go to standby on 
that node (which probably works fine with on-fail="standby") 

* if sip service (active/active) fails on particular node, only floating ip 
assigned should be migrated to other node

* if any of services (active/active), be it database or java container fails, 
both database and java container should be stopped and floating ip migrated to 
another node

* failure of sip service should not affect database or java container and vice 
versa.

Hope this makes it more clear, not sure that I understood how to achieve 
dependency tree. 

Thanks,
Ivor Prebeg

On Apr 16, 2013, at 2:50 PM, Andreas Mock  wrote:

> Hi Ivor,
>  
> I don't know whether I understand you completely right:
> If you want independence of resources don't put them into a group.
>  
> Look at
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Pacemaker_Explained/ch10.html
>  
> A group is made to tie together several resources without
> declaring all necessary colocations and orderings to get
> a desired behaviour.
>  
> Otherwise. Name your resources ans how they should be spread across
> your cluster. (Show the technical dependency)
>  
> Best regards
> Andreas
>  
>  
> Von: Ivor Prebeg [mailto:ivor.pre...@gmail.com] 
> Gesendet: Dienstag, 16. April 2013 13:53
> An: pacemaker@oss.clusterlabs.org
> Betreff: [Pacemaker] Pacemaker configuration with different dependencies
>  
> Hi guys,
> 
> I need some help with pacemaker configuration, it is all new to me and can't 
> find solution...
> 
> I have two-node HA environment with services that I want to be partially 
> independent, in pacemaker/heartbeat configuration.
> 
> There is active/active sip service with two floating IPs, it should all just 
> migrate floating ip when one sip dies.
> 
> There is also two active/active master/slave services with java container and 
> rdbms with replication between them, should also fallback when one dies.
> 
> What I can't figure out how to configure those two to be independent (put 
> on-fail directive on group). What I want is to, e.g., in case my sip service 
> fails, java container stays active on that node, but floating ip to be moved 
> to other node.
> 
> Another thing is, in case one of rdbms fails, I want to put whole service 
> group on that node to standby, but leave sip service intact.
> 
> Whole node should go to standby (all services down) only when L3_ping to 
> gateway dies.
>  
> All suggestions and configuration examples are welcome.
> 
> Thanks in advance.
> 
>  
> Ivor Prebeg
>  
>  
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



smime.p7s
Description: S/MIME cryptographic signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Question about recovery policy after "Too many failures to fence"

2013-04-17 Thread Kazunori INOUE
Hi Andrew,

> -Original Message-
> From: Andrew Beekhof [mailto:and...@beekhof.net]
> Sent: Wednesday, April 17, 2013 2:28 PM
> To: The Pacemaker cluster resource manager
> Cc: shimaza...@intellilink.co.jp
> Subject: Re: [Pacemaker] Question about recovery policy after "Too many
> failures to fence"
> 
> 
> On 11/04/2013, at 7:23 PM, Kazunori INOUE 
wrote:
> 
> > Hi Andrew,
> >
> > (13.04.08 12:01), Andrew Beekhof wrote:
> >>
> >> On 27/03/2013, at 7:45 PM, Kazunori INOUE 
> wrote:
> >>
> >>> Hi,
> >>>
> >>> I'm using pacemaker-1.1 (c7910371a5. the latest devel).
> >>>
> >>> When fencing failed 10 times, S_TRANSITION_ENGINE state is kept.
> >>> (related: https://github.com/ClusterLabs/pacemaker/commit/e29d2f9)
> >>>
> >>> How should I recover?  what kind of procedure should I make S_IDLE in?
> >>
> >> The intention was that the node should proceed to S_IDLE when this
occurs,
> so you shouldn't have to do anything and the cluster would try again once
the
> recheck-interval expired or a config change was made.
> >>
> >> I assume you're saying this does not occur?
> >>
> >
> > I recognize that the timer of cluster-recheck-interval is invalid
> > between S_TRANSITION_ENGINE.
> > So even if waited for a long time, it was still S_TRANSITION_ENGINE.
> > * I attached crm_report.
> 
> I think
>https://github.com/beekhof/pacemaker/commit/ef8068e9
> should fix this part of the problem.
> 

I confirmed that this problem was fixed.
Thanks!!


> >
> > What do I have to do in order to make the cluster retry STONITH?
> > For example, I need to run 'crmadmin -E' to change config?
> >
> > 
> > Best Regards,
> > Kazunori INOUE
> >
> >>>
> >>>
> >>> Mar 27 15:34:34 dev2 crmd[17937]:   notice: tengine_stonith_callback:
> >>> Stonith operation 12/22:14:0:0927a8a0-8e09-494e-acf8-7fb273ca8c9e:
> Generic
> >>> Pacemaker error (-1001)
> >>> Mar 27 15:34:34 dev2 crmd[17937]:   notice: tengine_stonith_callback:
> >>> Stonith operation 12 for dev2 failed (Generic Pacemaker error):
aborting
> >>> transition.
> >>> Mar 27 15:34:34 dev2 crmd[17937]: info: abort_transition_graph:
> >>> tengine_stonith_callback:426 - Triggered transition abort (complete=0)
:
> >>> Stonith failed
> >>> Mar 27 15:34:34 dev2 crmd[17937]:   notice: tengine_stonith_notify:
Peer
> >>> dev2 was not terminated (st_notify_fence) by dev1 for dev2: Generic
> >>> Pacemaker error (ref=05f75ab8-34ae-4aae-bbc6-aa20dbfdc845) by client
> >>> crmd.17937
> >>> Mar 27 15:34:34 dev2 crmd[17937]:   notice: run_graph: Transition 14
> >>> (Complete=1, Pending=0, Fired=0, Skipped=8, Incomplete=0,
> >>> Source=/var/lib/pacemaker/pengine/pe-warn-2.bz2): Stopped
> >>> Mar 27 15:34:34 dev2 crmd[17937]:   notice: too_many_st_failures: Too
many
> >>> failures to fence dev2 (11), giving up
> >>>
> >>> $ crmadmin -S dev2
> >>> Status of crmd@dev2: S_TRANSITION_ENGINE (ok)
> >>>
> >>> $ crm_mon
> >>> Last updated: Wed Mar 27 15:35:12 2013
> >>> Last change: Wed Mar 27 15:33:16 2013 via cibadmin on dev1
> >>> Stack: corosync
> >>> Current DC: dev2 (3232261523) - partition with quorum
> >>> Version: 1.1.10-1.el6-c791037
> >>> 2 Nodes configured, unknown expected votes
> >>> 3 Resources configured.
> >>>
> >>>
> >>> Node dev2 (3232261523): UNCLEAN (online)
> >>> Online: [ dev1 ]
> >>>
> >>> prmDummy   (ocf::pacemaker:Dummy): Started dev2 FAILED
> >>> Resource Group: grpStonith1
> >>> prmStonith1(stonith:external/stonith-helper):  Started
> dev2
> >>> Resource Group: grpStonith2
> >>> prmStonith2(stonith:external/stonith-helper):  Started
> dev1
> >>>
> >>> Failed actions:
> >>>prmDummy_monitor_1 (node=dev2, call=23, rc=7, status=complete):
> not
> >>> running
> >>>
> >>> 
> >>> Best Regards,
> >>> Kazunori INOUE
> >>>
> >>>
> >>> ___
> >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>
> >>> Project Home: http://www.clusterlabs.org
> >>> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>> Bugs: http://bugs.clusterlabs.org
> >>
> >>
> >> ___
> >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> >
> __
> _
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http

Re: [Pacemaker] Question about the error when fencing failed

2013-04-17 Thread Kazunori INOUE
Hi Andrew,

I confirmed that this problem was fixed.
Thanks!


> -Original Message-
> From: Andrew Beekhof [mailto:and...@beekhof.net]
> Sent: Wednesday, April 17, 2013 2:04 PM
> To: The Pacemaker cluster resource manager
> Cc: shimaza...@intellilink.co.jp
> Subject: Re: [Pacemaker] Question about the error when fencing failed
> 
> This should solve your issue:
> 
>   https://github.com/beekhof/pacemaker/commit/dbbb6a6
> 
> On 11/04/2013, at 7:23 PM, Kazunori INOUE 
wrote:
> 
> > Hi Andrew,
> >
> > (13.04.08 11:04), Andrew Beekhof wrote:
> >>
> >> On 05/04/2013, at 3:21 PM, Kazunori INOUE 
> wrote:
> >>
> >>> Hi,
> >>>
> >>> When fencing failed (*1) on the following conditions, an error occurs
> >>> in stonith_perform_callback().
> >>>
> >>> - using fencing-topology. (*2)
> >>> - fence DC node. ($ crm node fence dev2)
> >>>
> >>> Apr  3 17:04:47 dev2 stonith-ng[2278]:   notice: handle_request:
Client
> crmd.2282.b9e69280 wants to fence (reboot) 'dev2' with device '(any)'
> >>> Apr  3 17:04:47 dev2 stonith-ng[2278]:   notice: handle_request:
> Forwarding complex self fencing request to peer dev1
> >>> Apr  3 17:04:47 dev2 stonith-ng[2278]: info: stonith_command:
> Processed st_fence from crmd.2282: Operation now in progress (-115)
> >>> Apr  3 17:04:47 dev2 pengine[2281]:  warning: process_pe_message:
> Calculated Transition 2: /var/lib/pacemaker/pengine/pe-warn-0.bz2
> >>> Apr  3 17:04:47 dev2 stonith-ng[2278]: info: stonith_command:
> Processed st_query from dev1: OK (0)
> >>> Apr  3 17:04:47 dev2 stonith-ng[2278]: info:
stonith_action_create:
> Initiating action list for agent fence_legacy (target=(null))
> >>> Apr  3 17:04:47 dev2 stonith-ng[2278]: info: stonith_command:
> Processed st_timeout_update from dev1: OK (0)
> >>> Apr  3 17:04:47 dev2 stonith-ng[2278]: info:
dynamic_list_search_cb:
> Refreshing port list for f-dev1
> >>> Apr  3 17:04:48 dev2 stonith-ng[2278]:   notice: remote_op_done:
> Operation reboot of dev2 by dev1 for crmd.2282@dev1.4494ed41: Generic
> Pacemaker error
> >>> Apr  3 17:04:48 dev2 stonith-ng[2278]: info: stonith_command:
> Processed st_notify reply from dev1: OK (0)
> >>> Apr  3 17:04:48 dev2 crmd[2282]:error: crm_abort:
> stonith_perform_callback: Triggered assert at st_client.c:1894 : call_id >
0
> >>> Apr  3 17:04:48 dev2 crmd[2282]:error: stonith_perform_callback:
Bad
> resultst_rc="-201" st_op="st_query" st_callid="0"
> st_clientid="b9e69280-e557-478e-aa94-fd7ca6a533b1"
> st_clientname="crmd.2282"
> st_remote_op="4494ed41-2306-4707-8406-fa066b7f3ef0" st_callopt="0"
> st_delegate="dev1">
> >>> Apr  3 17:04:48 dev2 crmd[2282]:error: stonith_perform_callback:
Bad
> result 
> >>> Apr  3 17:04:48 dev2 crmd[2282]:error: stonith_perform_callback:
Bad
> resultcount="1" src="dev1" state="4" st_target="dev2">
> >>> Apr  3 17:04:48 dev2 crmd[2282]:error: stonith_perform_callback:
Bad
> result 
> >>> Apr  3 17:04:48 dev2 crmd[2282]:error: stonith_perform_callback:
Bad
> resultst_device_action="reboot" st_delegate="dev1"
> st_remote_op="4494ed41-2306-4707-8406-fa066b7f3ef0" st_origin="dev1"
> st_clientid="b9e69280-e557-478e-aa94-fd7ca6a533b1"
> st_clientname="crmd.2282"/>
> >>> Apr  3 17:04:48 dev2 crmd[2282]:error: stonith_perform_callback:
Bad
> result 
> >>> Apr  3 17:04:48 dev2 crmd[2282]:error: stonith_perform_callback:
Bad
> result   
> >>> Apr  3 17:04:48 dev2 crmd[2282]:error: stonith_perform_callback:
Bad
> result 
> >>> Apr  3 17:04:48 dev2 crmd[2282]:error: stonith_perform_callback:
Bad
> result   
> >>> Apr  3 17:04:48 dev2 crmd[2282]:  warning: stonith_perform_callback:
> STONITH command failed: Generic Pacemaker error
> >>> Apr  3 17:04:48 dev2 crmd[2282]:   notice: tengine_stonith_notify:
Peer
> dev2 was not terminated (st_notify_fence) by dev1 for dev1: Generic
Pacemaker
> error (ref=4494ed41-2306-4707-8406-fa066b7f3ef0) by client crmd.2282
> >>> Apr  3 17:07:11 dev2 crmd[2282]:error:
> stonith_async_timeout_handler: Async call 2 timed out after 144000ms
> >>>
> >>> Is this the designed behavior?
> >>
> >> Definitely not :-(
> >> Is this the first fencing operation that has been initiated by the
cluster?
> >
> > Yes.
> > I attached crm_report.
> >
> >> Or has the cluster been running for some time?
> >>
> >
> > 
> > Best Regards,
> > Kazunori INOUE
> >
> >>>
> >>> *1: I added "exit 1" to reset() of stonith-plugin in order to make
> >>>fencing fail.
> >>>
> >>>  $ diff -u libvirt.ORG libvirt
> >>>  --- libvirt.ORG 2012-12-17 09:56:37.0 +0900
> >>>  +++ libvirt 2013-04-03 16:33:08.118157947 +0900
> >>>  @@ -240,6 +240,7 @@
> >>>   ;;
> >>>
> >>>   reset)
> >>>  +exit 1
> >>>   libvirt_check_config
> >>>   libvirt_set_domain_id $2
> >>>
> >>> *2:
> >>>  node $id="3232261523" dev2
> >>>  node $id="3232261525" dev1
> >>>  primitive f-dev1 stonith:external/libvirt \
> >>>  params pcmk_reboot_retries=

[Pacemaker] drbd resource operation monitor failed 'not configured'

2013-04-17 Thread Wolfgang Routschka
Hi guys,

one question today about drbd resource on rhel6.x clone scientific linux 6.4  
with pacemaker/cman

I configure drbd resource in pcs

pcs resource create drbd ocf:linbit:drbd drbd_resource="drbd0" 
drbdconf=/usr/local/etc/drbd.conf op monitor interval="15" meta master-max="1" 
master-node-max="1" clone-max="2" clone-node-max="1" notify="true"

in errorlog

drbd(drbd)[8870]:   2013/04/17_11:35:57 ERROR: meta parameter 
misconfigured, expected clone-max -le 2, but found unset.
lrmd:   notice: operation_finished:  drbd_monitor_0:8870 [ 
2013/04/17_11:35:57 ERROR: meta parameter misconfigured, expected clone-max -le 
2, but found unset.
error: unpack_rsc_op:   Preventing r_test from re-starting anywhere in the 
cluster : operation monitor failed 'not configured' (rc=6)

for debugging I found ocf-tester

ocf-tester  -n r_drbd /usr/lib/ocf/resource.d/linbit/drbd -o 
drbd_resource="drbd0"

Output:

* rc=6: Validation failed.  Did you supply enough options with -o ?
WARNING: You may be disappointed: This RA is intended for pacemaker 1.0 or 
better!
ERROR: meta parameter misconfigured, expected clone-max -le 2, but found unset.


Whats wrong in my configuration?

I  don´t understand the problem because options is set

version-overview

drbd-8.4.3
pacemaker- 1.1.8
corosync- 1.4.1
cman- 3.0.12.1

I hope anybody can help

Greetings Wolfgang


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Question about the error when fencing failed

2013-04-17 Thread Andrew Beekhof

On 17/04/2013, at 6:52 PM, Kazunori INOUE  wrote:

> Hi Andrew,
> 
> I confirmed that this problem was fixed.

Excellent

> Thanks!

And thank you for bringing it to my attention :)

> 
> 
>> -Original Message-
>> From: Andrew Beekhof [mailto:and...@beekhof.net]
>> Sent: Wednesday, April 17, 2013 2:04 PM
>> To: The Pacemaker cluster resource manager
>> Cc: shimaza...@intellilink.co.jp
>> Subject: Re: [Pacemaker] Question about the error when fencing failed
>> 
>> This should solve your issue:
>> 
>>  https://github.com/beekhof/pacemaker/commit/dbbb6a6
>> 
>> On 11/04/2013, at 7:23 PM, Kazunori INOUE 
> wrote:
>> 
>>> Hi Andrew,
>>> 
>>> (13.04.08 11:04), Andrew Beekhof wrote:
 
 On 05/04/2013, at 3:21 PM, Kazunori INOUE 
>> wrote:
 
> Hi,
> 
> When fencing failed (*1) on the following conditions, an error occurs
> in stonith_perform_callback().
> 
> - using fencing-topology. (*2)
> - fence DC node. ($ crm node fence dev2)
> 
> Apr  3 17:04:47 dev2 stonith-ng[2278]:   notice: handle_request:
> Client
>> crmd.2282.b9e69280 wants to fence (reboot) 'dev2' with device '(any)'
> Apr  3 17:04:47 dev2 stonith-ng[2278]:   notice: handle_request:
>> Forwarding complex self fencing request to peer dev1
> Apr  3 17:04:47 dev2 stonith-ng[2278]: info: stonith_command:
>> Processed st_fence from crmd.2282: Operation now in progress (-115)
> Apr  3 17:04:47 dev2 pengine[2281]:  warning: process_pe_message:
>> Calculated Transition 2: /var/lib/pacemaker/pengine/pe-warn-0.bz2
> Apr  3 17:04:47 dev2 stonith-ng[2278]: info: stonith_command:
>> Processed st_query from dev1: OK (0)
> Apr  3 17:04:47 dev2 stonith-ng[2278]: info:
> stonith_action_create:
>> Initiating action list for agent fence_legacy (target=(null))
> Apr  3 17:04:47 dev2 stonith-ng[2278]: info: stonith_command:
>> Processed st_timeout_update from dev1: OK (0)
> Apr  3 17:04:47 dev2 stonith-ng[2278]: info:
> dynamic_list_search_cb:
>> Refreshing port list for f-dev1
> Apr  3 17:04:48 dev2 stonith-ng[2278]:   notice: remote_op_done:
>> Operation reboot of dev2 by dev1 for crmd.2282@dev1.4494ed41: Generic
>> Pacemaker error
> Apr  3 17:04:48 dev2 stonith-ng[2278]: info: stonith_command:
>> Processed st_notify reply from dev1: OK (0)
> Apr  3 17:04:48 dev2 crmd[2282]:error: crm_abort:
>> stonith_perform_callback: Triggered assert at st_client.c:1894 : call_id >
> 0
> Apr  3 17:04:48 dev2 crmd[2282]:error: stonith_perform_callback:
> Bad
>> result   > st_rc="-201" st_op="st_query" st_callid="0"
>> st_clientid="b9e69280-e557-478e-aa94-fd7ca6a533b1"
>> st_clientname="crmd.2282"
>> st_remote_op="4494ed41-2306-4707-8406-fa066b7f3ef0" st_callopt="0"
>> st_delegate="dev1">
> Apr  3 17:04:48 dev2 crmd[2282]:error: stonith_perform_callback:
> Bad
>> result 
> Apr  3 17:04:48 dev2 crmd[2282]:error: stonith_perform_callback:
> Bad
>> result   > count="1" src="dev1" state="4" st_target="dev2">
> Apr  3 17:04:48 dev2 crmd[2282]:error: stonith_perform_callback:
> Bad
>> result 
> Apr  3 17:04:48 dev2 crmd[2282]:error: stonith_perform_callback:
> Bad
>> result   > st_device_action="reboot" st_delegate="dev1"
>> st_remote_op="4494ed41-2306-4707-8406-fa066b7f3ef0" st_origin="dev1"
>> st_clientid="b9e69280-e557-478e-aa94-fd7ca6a533b1"
>> st_clientname="crmd.2282"/>
> Apr  3 17:04:48 dev2 crmd[2282]:error: stonith_perform_callback:
> Bad
>> result 
> Apr  3 17:04:48 dev2 crmd[2282]:error: stonith_perform_callback:
> Bad
>> result   
> Apr  3 17:04:48 dev2 crmd[2282]:error: stonith_perform_callback:
> Bad
>> result 
> Apr  3 17:04:48 dev2 crmd[2282]:error: stonith_perform_callback:
> Bad
>> result   
> Apr  3 17:04:48 dev2 crmd[2282]:  warning: stonith_perform_callback:
>> STONITH command failed: Generic Pacemaker error
> Apr  3 17:04:48 dev2 crmd[2282]:   notice: tengine_stonith_notify:
> Peer
>> dev2 was not terminated (st_notify_fence) by dev1 for dev1: Generic
> Pacemaker
>> error (ref=4494ed41-2306-4707-8406-fa066b7f3ef0) by client crmd.2282
> Apr  3 17:07:11 dev2 crmd[2282]:error:
>> stonith_async_timeout_handler: Async call 2 timed out after 144000ms
> 
> Is this the designed behavior?
 
 Definitely not :-(
 Is this the first fencing operation that has been initiated by the
> cluster?
>>> 
>>> Yes.
>>> I attached crm_report.
>>> 
 Or has the cluster been running for some time?
 
>>> 
>>> 
>>> Best Regards,
>>> Kazunori INOUE
>>> 
> 
> *1: I added "exit 1" to reset() of stonith-plugin in order to make
>   fencing fail.
> 
> $ diff -u libvirt.ORG libvirt
> --- libvirt.ORG 2012-12-17 09:56:37.0 +0900
> +++ libvirt 2013-04-03 16:33:08.118157947 +0900
> @@ -240,6 +240,7 @@
>  ;;
> 
>  reset)
> +exit 1
>  libvirt_check_c

Re: [Pacemaker] CentOS 6.4 - pacemaker 1.1.8 - heartbeat

2013-04-17 Thread Andreas

No one else using pacemaker and heartbeat on CentOS 6.4?

Any help is appreciated!

Thanks
Andreas


Am 09.04.2013 10:41, schrieb Andreas:

Am 09.04.2013 01:18, schrieb Andrew Beekhof:


On 08/04/2013, at 7:24 PM, Andreas  wrote:


Just tried this to isolate the problem.

Trying this there was definitely no other cluster component running!


You mean no other copies in the "ps" output?



Damn, I was hunting the wrong animal. Sorry for that!

Indeed there was a old process still running that causes this error
messages.

But the main problem still persists:

"Could not establish lrmd connection: Connection refused (111)"


Here is the output from:

  /usr/lib64/heartbeat/crmd -V

http://pastebin.com/tAcNNK8s

And the corresponding logfile entries:

http://pastebin.com/eq8JbstD


Thanks for looking in this!
Andreas



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] failure handling on a cloned resource

2013-04-17 Thread Johan Huysmans

Hi All,

I'm trying to setup a specific configuration in our cluster, however I'm 
struggling with my configuration.


This is what I'm trying to achieve:
On both nodes of the cluster a daemon must be running (tomcat).
Some failover addresses are configured and must be running on the node 
with a correctly running tomcat.


I have this achieved with a cloned tomcat resource and an collocation 
between the cloned tomcat and the failover addresses.
When I cause a failure in the tomcat on the node running the failover 
addresses, the failover addresses will failover to the other node as 
expected.

crm_mon shows that this tomcat has a failure.
When I configure the tomcat resource with failure-timeout=0, the failure 
alarm in crm_mon isn't cleared whenever the tomcat failure is fixed.
When I configure the tomcat resource with failure-timeout=30, the 
failure alarm in crm_mon is cleared after 30seconds however the tomcat 
is still having a failure.


What I expect is that pacemaker reports the failure as the failure 
exists and as long as it exists and that pacemaker reports that 
everything is ok once everything is back ok.


Do I do something wrong with my configuration?
Or how can I achieve my wanted setup?

Here is my configuration:

node CSE-1
node CSE-2
primitive d_tomcat ocf:custom:tomcat \
op monitor interval="15s" timeout="510s" on-fail="block" \
op start interval="0" timeout="510s" \
params instance_name="NMS" monitor_use_ssl="no" 
monitor_urls="/cse/health" monitor_timeout="120" \

meta migration-threshold="1" failure-timeout="0"
primitive ip_1 ocf:heartbeat:IPaddr2 \
op monitor interval="10s" \
params nic="bond0" broadcast="10.1.1.1" iflabel="ha" ip="10.1.1.1"
primitive ip_2 ocf:heartbeat:IPaddr2 \
op monitor interval="10s" \
params nic="bond0" broadcast="10.1.1.2" iflabel="ha" ip="10.1.1.2"
group svc-cse ip_1 ip_2
clone cl_tomcat d_tomcat
colocation colo_tomcat inf: svc-cse cl_tomcat
order order_tomcat inf: cl_tomcat svc-cse
property $id="cib-bootstrap-options" \
dc-version="1.1.8-7.el6-394e906" \
cluster-infrastructure="cman" \
no-quorum-policy="ignore" \
stonith-enabled="false"

Thanks!

Greetings,
Johan Huysmans

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] HA KVM over DRBD primary/secondary configuration

2013-04-17 Thread Gerrit Jacobsen
High-availability KVM over DRBD active/passive with Pacemaker and Corosync:

- Two hardware nodes with two NICs. One NIC directly connected to the other 
node for DRBD mirroring, the second to the network, Operating system Debian 
Wheezy RC1 (no joy with Debian Squeeze)
- A spare partition on each machine is configured as a LVM volume group
- Inside the volume group we carve out logical volumes, one logical volume for 
each guest machine
- Logical volumes are mirrored with DRBD in an active/passive configuration. 
Each mirrored logical volume is used as a raw storage device for one KVM 
machine. Alternatively one could use a file system and store the VMs as files.
The setup allows flexibly move guest machines across the 2 nodes, depending on 
load. Also disk access is almost as fast as on physical hardware, no fancy 
cluster file system etc. needed. Overall it is near bare-metal performance.

We used the amazing LCMC tool for the hosts, drbd and pacemaker and Corosync 
setup
http://lcmc.sourceforge.net/ 

LCMC does lots of the setup automatic - a huge timesaver. Some fine-tuning had 
to be from the command line with CRM configure as the colocation and order 
parameters could not be configured from LCMC

Config file for hosting HA KVM machines

node ffmnode3 \
attributes standby="off"
node ffmnode4 \
attributes standby="off"
primitive res_VirtualDomain_1 ocf:heartbeat:VirtualDomain \
params config="/etc/libvirt/qemu/ffmttavpx.xml" \
operations $id="res_VirtualDomain_1-operations" \
op start interval="0" timeout="90" \
op stop interval="0" timeout="90" \
op monitor interval="10" timeout="30" start-delay="0" \
op migrate_from interval="0" timeout="60" \
op migrate_to interval="0" timeout="120" \
meta target-role="started" migration-threshold="1"
primitive res_drbd_1 ocf:linbit:drbd \
params drbd_resource="ffmttapx" \
operations $id="res_drbd_1-operations" \
op start interval="0" timeout="240" \
op promote interval="0" timeout="90" \
op demote interval="0" timeout="90" \
op stop interval="0" timeout="100" \
op monitor interval="10" timeout="20" start-delay="0" \
op notify interval="0" timeout="90" \
meta target-role="started" resource-stickiness="0"
ms ms_drbd_1 res_drbd_1 \
meta clone-max="2" notify="true" target-role="started"
colocation ms_drbd_1-with-ffmttavpx inf: res_VirtualDomain_1 ms_drbd_1:Master
order ms_drbd_1-before-ffmttavpx inf: ms_drbd_1:promote 
res_VirtualDomain_1:start
property $id="cib-bootstrap-options" \
expected-quorum-votes="2" \
stonith-enabled="false" \
dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
no-quorum-policy="ignore" \
cluster-infrastructure="openais"
rsc_defaults $id="rsc-options" \
resource-stickiness="100"

I hope it is useful for someone.

Cheers

Gerry


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] pcs equivalent of crm configure erase

2013-04-17 Thread T.
Hi,

> b) If I can't do it woith pcs, is there a reliable
> and secure way to do it with pacemaker low level tools?
why not just installing the crmsh from a different repository?

This is what I have done on CentOS 6.4.
-- 

Chau y hasta luego,

Thorolf


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] pcs equivalent of crm configure erase

2013-04-17 Thread Vadym Chepkov

On Apr 17, 2013, at 11:57 AM, T. wrote:

> Hi,
> 
>> b) If I can't do it woith pcs, is there a reliable
>> and secure way to do it with pacemaker low level tools?
> why not just installing the crmsh from a different repository?
> 
> This is what I have done on CentOS 6.4.

My sentiments exactly. And "erase" is not the most important missed 
functionality. 
crm configure save, crm configure load (update | replace) is what made 
configurations easily manageable 
and trackable with a version control software.

Cheers,
Vadym




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] CentOS 6.4 - pacemaker 1.1.8 - heartbeat

2013-04-17 Thread T.
Hi,

> No one else using pacemaker and heartbeat on CentOS 6.4?
no, I switched to corosync/pacemaker, but it has not only advantages.

For me, the configuration is much more powerfull, but also more
complicated via the crm-shell.
-- 

Chau y hasta luego,

Thorolf


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] pcs equivalent of crm configure erase

2013-04-17 Thread Andreas Mock
Hi all,

thank you for your hints. 

Can you please point me to a repository where I can find
crmsh fitting to RHEL6.4 or clones?

Best regards
Andreas Mock



-Ursprüngliche Nachricht-
Von: Vadym Chepkov [mailto:vchep...@gmail.com] 
Gesendet: Mittwoch, 17. April 2013 18:13
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] pcs equivalent of crm configure erase


On Apr 17, 2013, at 11:57 AM, T. wrote:

> Hi,
> 
>> b) If I can't do it woith pcs, is there a reliable and secure way to 
>> do it with pacemaker low level tools?
> why not just installing the crmsh from a different repository?
> 
> This is what I have done on CentOS 6.4.

My sentiments exactly. And "erase" is not the most important missed
functionality. 
crm configure save, crm configure load (update | replace) is what made
configurations easily manageable and trackable with a version control
software.

Cheers,
Vadym




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] CentOS 6.4 - pacemaker 1.1.8 - heartbeat

2013-04-17 Thread Andreas Mock
Hi Thorolf,

both solutions heartbeat + pacemaker and corosync + pacemaker 
use pacemaker which can be configured using crm-shell.
So, I don't understand why the usage of crm-shell in your
case is more complicated? (besides the fact that you can only
make a two node cluster with heartbeat).


Best regards
Andreas Mock


-Ursprüngliche Nachricht-
Von: T. [mailto:nos...@godawa.de] 
Gesendet: Mittwoch, 17. April 2013 18:47
An: pacema...@clusterlabs.org
Betreff: Re: [Pacemaker] CentOS 6.4 - pacemaker 1.1.8 - heartbeat

Hi,

> No one else using pacemaker and heartbeat on CentOS 6.4?
no, I switched to corosync/pacemaker, but it has not only advantages.

For me, the configuration is much more powerfull, but also more complicated
via the crm-shell.
-- 

Chau y hasta luego,

Thorolf


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] CentOS 6.4 - pacemaker 1.1.8 - heartbeat

2013-04-17 Thread T.
Hi,

> So, I don't understand why the usage of crm-shell in your
> case is more complicated?
because in the "past", with the heartbeat (1) I was used, I only had to
put my resources into a file and sync it to the other node.

For me this was easier to understand and I hadn't the config issues I
have now with the crm shell (see my other post).

But the new HA is much more flexible and modern, than the old one, I was
using for the last 6 years or longer.
-- 

Chau y hasta luego,

Thorolf


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] pcs equivalent of crm configure erase

2013-04-17 Thread T.
Hi,

> Can you please point me to a repository where I can find
> crmsh fitting to RHEL6.4 or clones?
haven't looked if there is a repo-file, I just installed via RPM:

http://download.opensuse.org/repositories/network:/ha-clustering/CentOS_CentOS-6/x86_64/crmsh-1.2.5-55.3.x86_64.rpm

http://download.opensuse.org/repositories/network:/ha-clustering/CentOS_CentOS-6/x86_64/pssh-2.3.1-15.1.x86_64.rpm


-- 
To Answer please replace "invalid" with "de" !
Zum Antworten bitte "invalid" durch "de" ersetzen !


Chau y hasta luego,

Thorolf


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] CentOS 6.4 - pacemaker 1.1.8 - heartbeat

2013-04-17 Thread Andreas Mock
Hi Thorolf,

ah, ok. You meant hearbeat 1. Yes, this is really pre-pacemaker-time  ;-)

Best regards
Andreas


-Ursprüngliche Nachricht-
Von: T. [mailto:nos...@godawa.de] 
Gesendet: Mittwoch, 17. April 2013 21:41
An: pacema...@clusterlabs.org
Betreff: Re: [Pacemaker] CentOS 6.4 - pacemaker 1.1.8 - heartbeat

Hi,

> So, I don't understand why the usage of crm-shell in your case is more 
> complicated?
because in the "past", with the heartbeat (1) I was used, I only had to put
my resources into a file and sync it to the other node.

For me this was easier to understand and I hadn't the config issues I have
now with the crm shell (see my other post).

But the new HA is much more flexible and modern, than the old one, I was
using for the last 6 years or longer.
-- 

Chau y hasta luego,

Thorolf


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] pcs equivalent of crm configure erase

2013-04-17 Thread Andreas Mock
Thank you for the links.

Best regards
Andreas Mock


-Ursprüngliche Nachricht-
Von: T. [mailto:nos...@godawa.de] 
Gesendet: Mittwoch, 17. April 2013 21:44
An: pacema...@clusterlabs.org
Betreff: Re: [Pacemaker] pcs equivalent of crm configure erase

Hi,

> Can you please point me to a repository where I can find crmsh fitting 
> to RHEL6.4 or clones?
haven't looked if there is a repo-file, I just installed via RPM:

http://download.opensuse.org/repositories/network:/ha-clustering/CentOS_Cent
OS-6/x86_64/crmsh-1.2.5-55.3.x86_64.rpm

http://download.opensuse.org/repositories/network:/ha-clustering/CentOS_Cent
OS-6/x86_64/pssh-2.3.1-15.1.x86_64.rpm


--
To Answer please replace "invalid" with "de" !
Zum Antworten bitte "invalid" durch "de" ersetzen !


Chau y hasta luego,

Thorolf


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] CentOS 6.4 - pacemaker 1.1.8 - heartbeat

2013-04-17 Thread Andrew Beekhof

On 18/04/2013, at 5:41 AM, T.  wrote:

> Hi,
> 
>> So, I don't understand why the usage of crm-shell in your
>> case is more complicated?
> because in the "past", with the heartbeat (1) I was used, I only had to
> put my resources into a file and sync it to the other node.
> 
> For me this was easier to understand and I hadn't the config issues I
> have now with the crm shell (see my other post).

Seems appropriate :)

  http://blog.clusterlabs.org/blog/2009/configuring-heartbeat-v1-was-so-simple/

> 
> But the new HA is much more flexible and modern, than the old one, I was
> using for the last 6 years or longer.
> -- 
> 
> Chau y hasta luego,
> 
> Thorolf
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] pcs equivalent of crm configure erase

2013-04-17 Thread Andrew Beekhof

On 18/04/2013, at 2:13 AM, Vadym Chepkov  wrote:

> 
> On Apr 17, 2013, at 11:57 AM, T. wrote:
> 
>> Hi,
>> 
>>> b) If I can't do it woith pcs, is there a reliable
>>> and secure way to do it with pacemaker low level tools?
>> why not just installing the crmsh from a different repository?
>> 
>> This is what I have done on CentOS 6.4.
> 
> My sentiments exactly. And "erase" is not the most important missed 
> functionality. 
> crm configure save, crm configure load (update | replace) is what made 
> configurations easily manageable 
> and trackable with a version control software.

I'm sure Chris is listening.
Maybe he knows of some way to approximate this behaviour already.

> 
> Cheers,
> Vadym
> 
> 
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] pcs equivalent of crm configure erase

2013-04-17 Thread Chris Feist

On 04/17/13 11:13, Vadym Chepkov wrote:


On Apr 17, 2013, at 11:57 AM, T. wrote:


Hi,


b) If I can't do it woith pcs, is there a reliable
and secure way to do it with pacemaker low level tools?

why not just installing the crmsh from a different repository?

This is what I have done on CentOS 6.4.


My sentiments exactly. And "erase" is not the most important missed 
functionality.
crm configure save, crm configure load (update | replace) is what made 
configurations easily manageable
and trackable with a version control software.


There is currently a command in pcs ('pcs cluster cib' & 'pcs cluster push cib') 
to save and replace the current cib, however it will save the actual xml from 
the cib, so reading/editing the file might be a little more complicated than 
output from 'crm configure save'.


Thanks!
Chris



Cheers,
Vadym




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Patch] An error may occur to be behind with a stop of pingd.

2013-04-17 Thread renayama19661014
Hi All,

I sent the pull request of this patch.

 * https://github.com/ClusterLabs/pacemaker-1.0/pull/13

Best Regards,
Hideo Yamauchi.

--- On Wed, 2013/4/10, renayama19661...@ybb.ne.jp  
wrote:

> Hi All,
> 
> We confirmed the phenomenon that an error generated to be behind with a stop 
> of pingd.
> 
> The problem seems to be to be behind with receiving of SIGTERM of pingd until 
> stand_alone_ping processing is completed.
> 
> 
> Apr 11 00:48:33 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 
> 192.168.40.1 is unreachable (read)
> Apr 11 00:48:36 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 
> 192.168.40.1 is unreachable (read)
> Apr 11 00:48:39 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 
> 192.168.40.1 is unreachable (read)
> Apr 11 00:48:42 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 
> 192.168.40.1 is unreachable (read)
> Apr 11 00:48:45 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 
> 192.168.40.1 is unreachable (read)
> Apr 11 00:48:48 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 
> 192.168.40.1 is unreachable (read)
> (snip)
> Apr 11 00:48:50 rh64-heartbeat1 heartbeat: [2413]: info: killing 
> /usr/lib64/heartbeat/crmd process group 2427 with signal 15
> Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: crm_signal_dispatch: 
> Invoking handler for signal 15: Terminated
> Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: crm_shutdown: Requesting 
> shutdown
> (snip)
> Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: te_rsc_command: 
> Initiating action 9: stop prmPingd:0_stop_0 on rh64-heartbeat1 (local)
> Apr 11 00:48:50 rh64-heartbeat1 lrmd: [2424]: info: cancel_op: operation 
> monitor[5] on prmPingd:0 for client 2427, its parameters: CRM_meta_clone=[0] 
> host_list=[192.168.40.1] name=[default_ping_set] attempts=[2] 
> CRM_meta_clone_node_max=[1] CRM_meta_clone_max=[1] CRM_meta_notify=[false] 
> CRM_meta_globally_unique=[false] crm_feature_set=[3.0.1] interval=[1] 
> timeout=[2] CRM_meta_on_fail=[restart] CRM_meta_name=[monitor] 
> multiplier=[100] CRM_meta_interval=[1] CRM_meta_timeout=[6]  cancelled
> Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: do_lrm_rsc_op: Performing 
> key=9:4:0:948901c2-4e97-4715-9f6b-1611810f8ef7 op=prmPingd:0_stop_0 )
> Apr 11 00:48:50 rh64-heartbeat1 lrmd: [2424]: info: rsc:prmPingd:0 stop[9] 
> (pid 2570)
> Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: process_lrm_event: LRM 
> operation prmPingd:0_monitor_1 (call=5, status=1, cib-update=0, 
> confirmed=true) Cancelled
> Apr 11 00:48:50 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 
> 192.168.40.1 is unreachable (read)
> Apr 11 00:48:50 rh64-heartbeat1 lrmd: [2424]: info: operation stop[9] on 
> prmPingd:0 for client 2427: pid 2570 exited with return code 0
> Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: process_lrm_event: LRM 
> operation prmPingd:0_stop_0 (call=9, rc=0, cib-update=59, confirmed=true) ok
> Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: match_graph_event: Action 
> prmPingd:0_stop_0 (9) confirmed on rh64-heartbeat1 (rc=0)
> (snip)
> Apr 11 00:48:50 rh64-heartbeat1 heartbeat: [2413]: info: killing 
> /usr/lib64/heartbeat/ccm process group 2422 with signal 15
> Apr 11 00:48:50 rh64-heartbeat1 ccm: [2422]: info: received SIGTERM, going to 
> shut down
> Apr 11 00:48:51 rh64-heartbeat1 pingd: [2505]: ERROR: send_ipc_message: IPC 
> Channel to 2426 is not connected                        ---> ERROR
> Apr 11 00:48:51 rh64-heartbeat1 pingd: [2505]: info: attrd_update: Could not 
> send update: default_ping_set=0 for localhost
> Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: killing HBWRITE 
> process 2418 with signal 15
> Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: killing HBREAD 
> process 2419 with signal 15
> Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: killing HBFIFO 
> process 2417 with signal 15
> Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: Core process 2417 
> exited. 3 remaining
> Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: Core process 2418 
> exited. 2 remaining
> Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: Core process 2419 
> exited. 1 remaining
> Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: rh64-heartbeat1 
> Heartbeat shutdown complete.
> Apr 11 00:48:53 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: 
> Connecting to cluster... 4 retries remaining                > Pingd 
> do not yet stop
> Apr 11 00:48:55 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: 
> Connecting to cluster... 3 retries remaining
> Apr 11 00:48:57 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: 
> Connecting to cluster... 2 retries remaining
> Apr 11 00:48:59 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: 
> Connecting to cluster

Re: [Pacemaker] [Problem][crmsh]The designation of the 'ordered' attribute becomes the error.

2013-04-17 Thread renayama19661014
Hi Dejan,
Hi Andreas,

> The shell in pacemaker v1.0.x is in maintenance mode and shipped
> along with the pacemaker code. The v1.1.x doesn't have the
> ordered and collocated meta attributes.

I sent the pull request of the patch which Mr. Dejan donated.
 * https://github.com/ClusterLabs/pacemaker-1.0/pull/14

Many Thanks!
Hideo Yamauchi.
--- On Tue, 2013/4/2, Dejan Muhamedagic  wrote:

> Hi,
> 
> On Mon, Apr 01, 2013 at 09:19:51PM +0200, Andreas Kurz wrote:
> > Hi Dejan,
> > 
> > On 2013-03-06 11:59, Dejan Muhamedagic wrote:
> > > Hi Hideo-san,
> > > 
> > > On Wed, Mar 06, 2013 at 10:37:44AM +0900, renayama19661...@ybb.ne.jp 
> > > wrote:
> > >> Hi Dejan,
> > >> Hi Andrew,
> > >>
> > >> As for the crm shell, the check of the meta attribute was revised with 
> > >> the next patch.
> > >>
> > >>  * http://hg.savannah.gnu.org/hgweb/crmsh/rev/d1174f42f4b3
> > >>
> > >> This patch was backported in Pacemaker1.0.13.
> > >>
> > >>  * 
> > >>https://github.com/ClusterLabs/pacemaker-1.0/commit/fa1a99ab36e0ed015f1bcbbb28f7db962a9d1abc#shell/modules/cibconfig.py
> > >>
> > >> However, the ordered,colocated attribute of the group resource is 
> > >> treated as an error when I use crm Shell which adopted this patch.
> > >>
> > >> --
> > >> (snip)
> > >> ### Group Configuration ###
> > >> group master-group \
> > >>         vip-master \
> > >>         vip-rep \
> > >>         meta \
> > >>                 ordered="false"
> > >> (snip)
> > >>
> > >> [root@rh63-heartbeat1 ~]# crm configure load update test2339.crm 
> > >> INFO: building help index
> > >> crm_verify[20028]: 2013/03/06_17:57:18 WARN: unpack_nodes: Blind faith: 
> > >> not fencing unseen nodes
> > >> WARNING: vip-master: specified timeout 60s for start is smaller than the 
> > >> advised 90
> > >> WARNING: vip-master: specified timeout 60s for stop is smaller than the 
> > >> advised 100
> > >> WARNING: vip-rep: specified timeout 60s for start is smaller than the 
> > >> advised 90
> > >> WARNING: vip-rep: specified timeout 60s for stop is smaller than the 
> > >> advised 100
> > >> ERROR: master-group: attribute ordered does not exist  -> WHY?
> > >> Do you still want to commit? y
> > >> --
> > >>
> > >> If it chooses `yes` by a confirmation message, it is reflected, but it 
> > >> is a problem that error message is displayed.
> > >>  * The error occurs in the same way when I appoint colocated attribute.
> > >> AndI noticed that there was not explanation of ordered,colocated of 
> > >> the group resource in online help of Pacemaker.
> > >>
> > >> I think that the designation of the ordered,colocated attribute should 
> > >> not become the error in group resource.
> > >> In addition, I think that ordered,colocated should be added to online 
> > >> help.
> > > 
> > > These attributes are not listed in crmsh. Does the attached patch
> > > help?
> > 
> > Dejan, will this patch for the missing "ordered" and "collocated" group
> > meta-attribute be included in the next crmsh release? ... can't see the
> > patch in the current tip.
> 
> The shell in pacemaker v1.0.x is in maintenance mode and shipped
> along with the pacemaker code. The v1.1.x doesn't have the
> ordered and collocated meta attributes.
> 
> Thanks,
> 
> Dejan
> 
> 
> > Thanks & Regards,
> > Andreas
> > 
> > > 
> > > Thanks,
> > > 
> > > Dejan
> > >>
> > >> Best Regards,
> > >> Hideo Yamauchi.
> > >>
> > >>
> > >> ___
> > >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >>
> > >> Project Home: http://www.clusterlabs.org
> > >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > >> Bugs: http://bugs.clusterlabs.org
> > >>
> > >>
> > >> ___
> > >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >>
> > >> Project Home: http://www.clusterlabs.org
> > >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > >> Bugs: http://bugs.clusterlabs.org
> > 
> > 
> > 
> 
> 
> 
> > ___
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailma

Re: [Pacemaker] pcs equivalent of crm configure erase

2013-04-17 Thread Vadym Chepkov

On Apr 17, 2013, at 8:04 PM, Chris Feist wrote:

> On 04/17/13 11:13, Vadym Chepkov wrote:
>> 
>> On Apr 17, 2013, at 11:57 AM, T. wrote:
>> 
>>> Hi,
>>> 
 b) If I can't do it woith pcs, is there a reliable
 and secure way to do it with pacemaker low level tools?
>>> why not just installing the crmsh from a different repository?
>>> 
>>> This is what I have done on CentOS 6.4.
>> 
>> My sentiments exactly. And "erase" is not the most important missed 
>> functionality.
>> crm configure save, crm configure load (update | replace) is what made 
>> configurations easily manageable
>> and trackable with a version control software.
> 
> There is currently a command in pcs ('pcs cluster cib' & 'pcs cluster push 
> cib') to save and replace the current cib, however it will save the actual 
> xml from the cib, so reading/editing the file might be a little more 
> complicated than output from 'crm configure save'.

I might be missing something, but how is it different from old dark cibadmin 
days ;) ?

Thanks,
Vadym




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] How to display interface link status in corosync

2013-04-17 Thread Yuichi SEINO
Hi,

2013/4/15 Andrew Beekhof :
>
> On 15/04/2013, at 3:38 PM, Yuichi SEINO  wrote:
>
>> Hi,
>>
>> 2013/4/8 Andrew Beekhof :
>>> I'm not 100% sure what the best approach is here.
>>>
>>> Traditionally this is done with resource agents (ie. ClusterMon or ping) 
>>> which update attrd.
>>> We could potentially build it into attrd directly, but then we'd need to 
>>> think about how to turn it on/off.
>>>
>>> I think I'd lean towards a new agent+daemon or a new daemon launched by 
>>> ClusterMon.
>> I check to see if I implement this function by a new agent+daemon.
>> I have a question. I am not sure how to launch daemon by ClusterMon.
>> Do you mean to use "crm_mon -E"?
>
> No. I mean the same way the Apache agent starts httpd.
apache RA has the parameter which a binary path can specify. So, This
RA may be able to launch the other daemon.
However, it seems that ClusterMon doesn't have the parameter which a
binary path can specify.
Can you launch the other daemon without using the parameter?

Sincerely,
Yuichi

>
>>
>> Sincerely,
>> Yuichi
>>
>>>
>>> On 04/04/2013, at 8:59 PM, Yuichi SEINO  wrote:
>>>
 Hi All,

 I want to display interface link status in corosync. So, I think that
 I will add this function to the part of "pacemakerd".
 I am going to display this status to "Node Attributes"  in crm_mon.
 When the state of link change, corosync can run the callback function.
 When it happens, we update attributes. And, this function need to
 start after "attrd" started. "pacemakerd" of mainloop start after
 sub-process started. So, I think that this is the best timing.

 I show the expected crm_mon.

 # crm_mon -fArc1
 Last updated: Thu Apr  4 08:08:08 2013
 Last change: Wed Apr  3 04:15:48 2013 via crmd on coro-n2
 Stack: corosync
 Current DC: coro-n1 (168427526) - partition with quorum
 Version: 1.1.9-c791037
 2 Nodes configured, unknown expected votes
 2 Resources configured.


 Online: [ coro-n1 coro-n2 ]

 Full list of resources:

 Clone Set: OFclone [openstack-fencing]
Started: [ coro-n1 coro-n2 ]

 Node Attributes:
 * Node coro-n1:
   + ringnumber(0)   : 10.10.0.6 is FAULTY
   + ringnumber(1)   : 10.20.0.6 is UP
 * Node coro-n2:
   + ringnumber(0)   : 10.10.0.7 is FAULTY
   + ringnumber(1)   : 10.20.0.7 is UP

 Migration summary:
 * Node coro-n2:
 * Node coro-n1:

 Tickets:


 Sincerely,
 Yuichi

>>
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



--
Yuichi SEINO
METROSYSTEMS CORPORATION
E-mail:seino.clust...@gmail.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org