Re: [Pacemaker] pcs equivalent of crm configure erase

2013-04-16 Thread Andreas Mock
Hi all,

I try to bring that topic up once again because
it's still unresolved for me:

a) How can I do the equivalent of 'crm configure erase'
in pcs? Is there a way?

b) If I can't do it woith pcs, is there a reliable
and secure way to do it with pacemaker low level tools?

Thank you in advance.

Best regards
Andreas


-Ursprüngliche Nachricht-
Von: Andrew Beekhof [mailto:and...@beekhof.net] 
Gesendet: Montag, 15. April 2013 05:49
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] pcs equivalent of crm configure erase


On 14/04/2013, at 5:52 PM, Andreas Mock andreas.m...@web.de wrote:

 Hi all,
  
 can someone tell me what the pcs equivalent to
 crm configure erase is?
  
 Is there a pcs cheat sheet showing the common tasks?
 Or a documentation?

pcs help should be reasonably informative, but I don't see anything
equivalent
Chris?

  
 Best regards
 Andreas
  
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] cman+pacemaker - way to join/leave cluster?

2013-04-16 Thread Yuriy Demchenko

Hi,

Can anyone point me to the correct procedure of adding/removing node 
to/from cluster in cman+pacemaker stack, preferably without stopping 
whole cluster?
pcs cluster node add|remove doesn't work, as there's no pcsd daemon 
in centos6.4; cman tools (cman_tool, ccs_tool, ccs_sync) - doesn't work 
either, as it expects configured pki infrastructure as ricci service 
does...


I can just edit CIB  cluster.conf and remove/add node by hands, but it 
seems kind of crude way - requires restart of cmanpacemaker on all nodes


--
Yuriy Demchenko


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] cman+pacemaker - way to join/leave cluster?

2013-04-16 Thread Daniel Black

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html-single/Pacemaker_Explained/index.html#_options_2

example 2.4

# cibadmin --query  tmp.xml
# vi tmp.xml
# cibadmin --replace --xml-file tmp.xml


- Original Message -
 Hi,
 
 Can anyone point me to the correct procedure of adding/removing node
 to/from cluster in cman+pacemaker stack, preferably without stopping
 whole cluster?
 pcs cluster node add|remove doesn't work, as there's no pcsd
 daemon
 in centos6.4; cman tools (cman_tool, ccs_tool, ccs_sync) - doesn't
 work
 either, as it expects configured pki infrastructure as ricci service
 does...
 
 I can just edit CIB  cluster.conf and remove/add node by hands, but
 it
 seems kind of crude way - requires restart of cmanpacemaker on all
 nodes
 
 --
 Yuriy Demchenko
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started:
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

-- 
-- 
Daniel Black, Engineer @ Open Query (http://openquery.com)
Remote expertise  maintenance for MySQL/MariaDB server environments.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] cman+pacemaker - way to join/leave cluster?

2013-04-16 Thread Yuriy Demchenko

yea, that's what i meant by edit CIBcluster.conf by hands

however, two problems with it:
1. it requires restart of cman on each node to forget expelled node 
after cluster.conf edit
2. after editing CIB (removing all entries belonging to expelled node) 
and replacing CIB via pcs cluster push cib.xml (or cibadmin --replace; 
doesn't matter) cluster DC still automaticly inserts node and 
node_status entries for expelled node which I cannot cleanup.
node_state id=node-2 uname=node-2 in_ccm=false 
crmd=offline join=down expected=down 
crm-debug-origin=do_cib_replaced/



Yuriy Demchenko

On 04/16/2013 12:16 PM, Daniel Black wrote:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html-single/Pacemaker_Explained/index.html#_options_2

example 2.4

# cibadmin --query  tmp.xml
# vi tmp.xml
# cibadmin --replace --xml-file tmp.xml


- Original Message -

Hi,

Can anyone point me to the correct procedure of adding/removing node
to/from cluster in cman+pacemaker stack, preferably without stopping
whole cluster?
pcs cluster node add|remove doesn't work, as there's no pcsd
daemon
in centos6.4; cman tools (cman_tool, ccs_tool, ccs_sync) - doesn't
work
either, as it expects configured pki infrastructure as ricci service
does...

I can just edit CIB  cluster.conf and remove/add node by hands, but
it
seems kind of crude way - requires restart of cmanpacemaker on all
nodes

--
Yuriy Demchenko


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] resource-stickness-issue

2013-04-16 Thread ravindra.rauthan
Hi All,
I have created cluster with these versions in fedora 17.
pacemaker-1.1.7-2.fc17.x86_64
corosync-2.0.0-1.fc17.x86_64

Everything is working fine for me except resource stickiness.

Any idea on this ?


Regards,
Rauthan



Please do not print this email unless it is absolutely necessary. 

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should 
check this email and any attachments for the presence of viruses. The company 
accepts no liability for any damage caused by any virus transmitted by this 
email. 

www.wipro.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] pcs equivalent of crm configure erase

2013-04-16 Thread Rasto Levrinc
On Tue, Apr 16, 2013 at 9:38 AM, Andreas Mock andreas.m...@web.de wrote:
 Hi all,

 I try to bring that topic up once again because
 it's still unresolved for me:

 a) How can I do the equivalent of 'crm configure erase'
 in pcs? Is there a way?

 b) If I can't do it woith pcs, is there a reliable
 and secure way to do it with pacemaker low level tools?

I don't think so. cibadmin has a drastic version of erase, but this is
probably not what you want. If you don't want to use any higher level
tools, the best way is to probably make a loop and use pcs to remove the
resources, since it also removes also the constraints, not sure about other
objects.

something like:

for r in `crm_resource -l`; do pcs resource delete $r; done

But test it first, I haven't used pcs myself yet.

Rasto

-- 
Dipl.-Ing. Rastislav Levrinc
rasto.levr...@gmail.com
Linux Cluster Management Console
http://lcmc.sf.net/

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] pcs: Return code handling not clean

2013-04-16 Thread Andreas Mock
Hi all,

as I don't really know, where to address this
issue, I do post it here. On the one handside
as an information for guys scripting with the
help of 'pcs' and on the other handside with
the hope that one maintainer is listening
and will have a look at this.

Problem: When cluster is down a 'pcs resource'
shows an error message coming from a subprocess
call of 'crm_resource -L' but exits with an
error code of 0. That's something which can
be improved. Especially while the python code
does have error handling in other paces.

So I guess it is a simple oversight.

Look at the following piece of code in
pcs/resource.py:

915 if len(argv) == 0:
916 args = [crm_resource,-L]
917 output,retval = utils.run(args)
918 preg = re.compile(r'.*(stonith:.*)')
919 for line in output.split('\n'):
920 if not preg.match(line) and line != :
921 print line
922 return

retval is totally ignored, while being handled on
other places. That leads to the fact that the script
returns with status 0.

Interestingly the error handling of the utils.run call
used all over the module is IMHO a little bit inconsistent.
If I remember correctly Andrew did some efforts in the
past to have a set of return codes comming from the
base cibXXX and crm_XXX tools. (I really don't know
how much they are differentiated). Why not pass them
through?

Best regards
Andreas Mock




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Pacemaker configuration with different dependencies

2013-04-16 Thread Ivor Prebeg
Hi guys,

I need some help with pacemaker configuration, it is all new to me and can't 
find solution...

I have two-node HA environment with services that I want to be partially 
independent, in pacemaker/heartbeat configuration.

There is active/active sip service with two floating IPs, it should all just 
migrate floating ip when one sip dies.

There is also two active/active master/slave services with java container and 
rdbms with replication between them, should also fallback when one dies.

What I can't figure out how to configure those two to be independent (put 
on-fail directive on group). What I want is to, e.g., in case my sip service 
fails, java container stays active on that node, but floating ip to be moved to 
other node.

Another thing is, in case one of rdbms fails, I want to put whole service group 
on that node to standby, but leave sip service intact.

Whole node should go to standby (all services down) only when L3_ping to 
gateway dies.

All suggestions and configuration examples are welcome.

Thanks in advance.


Ivor Prebeg




smime.p7s
Description: S/MIME cryptographic signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker configuration with different dependencies

2013-04-16 Thread Andreas Mock
Hi Ivor,

 

I don't know whether I understand you completely right:

If you want independence of resources don't put them into a group.

 

Look at 

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Pacemaker_Explain
ed/ch10.html

 

A group is made to tie together several resources without

declaring all necessary colocations and orderings to get

a desired behaviour.

 

Otherwise. Name your resources ans how they should be spread across

your cluster. (Show the technical dependency)

 

Best regards

Andreas

 

 

Von: Ivor Prebeg [mailto:ivor.pre...@gmail.com] 
Gesendet: Dienstag, 16. April 2013 13:53
An: pacemaker@oss.clusterlabs.org
Betreff: [Pacemaker] Pacemaker configuration with different dependencies

 

Hi guys,

I need some help with pacemaker configuration, it is all new to me and can't
find solution...

I have two-node HA environment with services that I want to be partially
independent, in pacemaker/heartbeat configuration.

There is active/active sip service with two floating IPs, it should all just
migrate floating ip when one sip dies.

There is also two active/active master/slave services with java container
and rdbms with replication between them, should also fallback when one dies.

What I can't figure out how to configure those two to be independent (put
on-fail directive on group). What I want is to, e.g., in case my sip service
fails, java container stays active on that node, but floating ip to be moved
to other node.

Another thing is, in case one of rdbms fails, I want to put whole service
group on that node to standby, but leave sip service intact.

Whole node should go to standby (all services down) only when L3_ping to
gateway dies.

 

All suggestions and configuration examples are welcome.

Thanks in advance.

 

Ivor Prebeg

 

 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] pcs equivalent of crm configure erase

2013-04-16 Thread Andreas Mock
Hi Rastislav,

thank you for your hints.

In this case, only to rely on pcs, I could
probably use the following to get the list
of resources:

pcs resource show --all | perl -M5.010 -ane 'say $F[1] if $F[0] eq
Resource:'

Best regards
Andreas Mock



-Ursprüngliche Nachricht-
Von: Rasto Levrinc [mailto:rasto.levr...@gmail.com] 
Gesendet: Dienstag, 16. April 2013 10:45
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] pcs equivalent of crm configure erase

On Tue, Apr 16, 2013 at 9:38 AM, Andreas Mock andreas.m...@web.de wrote:
 Hi all,

 I try to bring that topic up once again because
 it's still unresolved for me:

 a) How can I do the equivalent of 'crm configure erase'
 in pcs? Is there a way?

 b) If I can't do it woith pcs, is there a reliable
 and secure way to do it with pacemaker low level tools?

I don't think so. cibadmin has a drastic version of erase, but this is
probably not what you want. If you don't want to use any higher level
tools, the best way is to probably make a loop and use pcs to remove the
resources, since it also removes also the constraints, not sure about other
objects.

something like:

for r in `crm_resource -l`; do pcs resource delete $r; done

But test it first, I haven't used pcs myself yet.

Rasto

-- 
Dipl.-Ing. Rastislav Levrinc
rasto.levr...@gmail.com
Linux Cluster Management Console
http://lcmc.sf.net/

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues

2013-04-16 Thread pavan tc
On Fri, Apr 12, 2013 at 9:27 AM, pavan tc pavan...@gmail.com wrote:

  Absolutely none in the syslog. Only the regular monitor logs from my
 resource agent which continued to report as secondary.


 This is very strange, because the thing that caused the I_PE_CALC is a
 timer that goes off every 15 minutes.
 Which would seem to imply that there was a transition of some kind about
 when the failure happened - but somehow it didnt go into the logs.

 Could you post the complete logs from 14:00 to 14:30?


 Sure. Here goes. Attached are two logs and corosync.conf -
 1. syslog (Edited, messages from other modules removed. I have not touched
 the pacemaker/corosync related messages)
 2 corosync.log (Unedited)
 3 corosync.conf

 Wanted to mention a couple of things:
 -- 14:06 is when the system was coming back up from a reboot. I have
 started from the earliest message during boot to the point the I_PE_CALC
 timer popped and a promote was called.
 -- I see the following during boot up. Does that mean pacemaker did not
 start?
 Apr 10 14:06:26 corosync [pcmk  ] info: process_ais_conf: Enabling MCP
 mode: Use the Pacemaker init script to complete Pacemaker startup

 Could that contribute to any of this behaviour?

 I'll be glad to provide any other information.


Did anybody get a chance to look at the information attached in the
previous email?

Thanks,
Pavan



 Pavan


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] warning: reload: operation not recognized

2013-04-16 Thread Dejan Muhamedagic
Hi,

On Tue, Apr 09, 2013 at 08:23:43PM +, Xavier Lashmar wrote:
 Hi,
 
 I am investigating XML parsing errors appearing in my 
 /var/log/cluster/corosync.log files.  While doing so, I've been using the crm 
 tool to 'view' my configuration and I get the following interesting errors:
 
 crm(live)# configure
 WARNING: reload: operation not recognized

It's just not in the list of possible operations. And that check
seems to be misplaced a bit. Will fix that.

Thanks,

Dejan

 --- pacemaker / corosync versions ---
 
 # rpm -qa | grep pacemaker
 pacemaker-cluster-libs-1.1.8-7.el6.x86_64
 pacemaker-1.1.8-7.el6.x86_64
 pacemaker-libs-1.1.8-7.el6.x86_64
 pacemaker-cli-1.1.8-7.el6.x86_64
 
 # rpm -qa | grep corosync
 corosync-1.4.1-15.el6.x86_64
 corosynclib-1.4.1-15.el6.x86_64
 
 ---
 
 Any ideas what might be causing the 'operation not recognized' - these 
 commands work perfectly well on another cluster running pacemaker 1.1.7-6 and 
 corosync 1.4.1-7.  For example:
 
 # crm
 crm(live)# configure
 crm(live)configure#
 
 
 
 
 [Description: Description: cid:D85E51EA-D618-4CBC-9F88-34F696123DED]
 
 
 
 Xavier Lashmar
 Analyste de Systèmes | Systems Analyst
 Service étudiants, service de l'informatique et des communications/Student 
 services, computing and communications services.
 1 Nicholas Street (810)
 Ottawa ON K1N 7B7
 Tél. | Tel. 613-562-5800 (2120)
 
 
 
 
 





 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] pcs: Return code handling not clean

2013-04-16 Thread Chris Feist

On 04/16/13 06:46, Andreas Mock wrote:

Hi all,

as I don't really know, where to address this
issue, I do post it here. On the one handside
as an information for guys scripting with the
help of 'pcs' and on the other handside with
the hope that one maintainer is listening
and will have a look at this.

Problem: When cluster is down a 'pcs resource'
shows an error message coming from a subprocess
call of 'crm_resource -L' but exits with an
error code of 0. That's something which can
be improved. Especially while the python code
does have error handling in other paces.

So I guess it is a simple oversight.

Look at the following piece of code in
pcs/resource.py:

915 if len(argv) == 0:
916 args = [crm_resource,-L]
917 output,retval = utils.run(args)
918 preg = re.compile(r'.*(stonith:.*)')
919 for line in output.split('\n'):
920 if not preg.match(line) and line != :
921 print line
922 return

retval is totally ignored, while being handled on
other places. That leads to the fact that the script
returns with status 0.


This is an oversight on my part, I've updated the code to check retval and 
return an error.  Currently I'm not passing through the full error code (I'm 
only returning 0 on success and 1 on failure).  However, if you think it would 
be useful to have this information I would be happy to look at it and see what I 
can do.  I'm planning on eventually having pcs interpret the crm_resource error 
code and provide a more user friendly output instead of just a return code.


Thanks,
Chris



Interestingly the error handling of the utils.run call
used all over the module is IMHO a little bit inconsistent.
If I remember correctly Andrew did some efforts in the
past to have a set of return codes comming from the
base cibXXX and crm_XXX tools. (I really don't know
how much they are differentiated). Why not pass them
through?

Best regards
Andreas Mock




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] pcs: Return code handling not clean

2013-04-16 Thread Andrew Beekhof

On 17/04/2013, at 8:33 AM, Chris Feist cfe...@redhat.com wrote:

 On 04/16/13 06:46, Andreas Mock wrote:
 Hi all,
 
 as I don't really know, where to address this
 issue, I do post it here. On the one handside
 as an information for guys scripting with the
 help of 'pcs' and on the other handside with
 the hope that one maintainer is listening
 and will have a look at this.
 
 Problem: When cluster is down a 'pcs resource'
 shows an error message coming from a subprocess
 call of 'crm_resource -L' but exits with an
 error code of 0. That's something which can
 be improved. Especially while the python code
 does have error handling in other paces.
 
 So I guess it is a simple oversight.
 
 Look at the following piece of code in
 pcs/resource.py:
 
 915 if len(argv) == 0:
 916 args = [crm_resource,-L]
 917 output,retval = utils.run(args)
 918 preg = re.compile(r'.*(stonith:.*)')
 919 for line in output.split('\n'):
 920 if not preg.match(line) and line != :
 921 print line
 922 return
 
 retval is totally ignored, while being handled on
 other places. That leads to the fact that the script
 returns with status 0.
 
 This is an oversight on my part, I've updated the code to check retval and 
 return an error.  Currently I'm not passing through the full error code (I'm 
 only returning 0 on success and 1 on failure).  However, if you think it 
 would be useful to have this information I would be happy to look at it and 
 see what I can do.  I'm planning on eventually having pcs interpret the 
 crm_resource error code and provide a more user friendly output instead of 
 just a return code.

there is a crm_perror binary that might be useful for this

 
 Thanks,
 Chris
 
 
 Interestingly the error handling of the utils.run call
 used all over the module is IMHO a little bit inconsistent.
 If I remember correctly Andrew did some efforts in the
 past to have a set of return codes comming from the
 base cibXXX and crm_XXX tools. (I really don't know
 how much they are differentiated). Why not pass them
 through?
 
 Best regards
 Andreas Mock
 
 
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] pcs equivalent of crm configure erase

2013-04-16 Thread Chris Feist

On 04/14/13 02:52, Andreas Mock wrote:

Hi all,

can someone tell me what the pcs equivalent to

crm configure erase is?


From my understanding, 'crm configure erase' will remove everything from the 
configuration file except for the nodes.


Are you trying to clear your configuration out and start from scratch?

pcs has a destroy command (pcs cluster destroy), which will remove all 
pacemaker/corosync configuration and allow you to create your cluster from 
scratch.  Is this what you're looking for?


Or do you need a specific command to keep the cluster running, but reset the cib 
to its defaults?


Thanks!
Chris



Is there a pcs cheat sheet showing the common tasks?

Or a documentation?

Best regards

Andreas



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Cleanup over secondary node

2013-04-16 Thread Daniel Bareiro
Ho Andrew.

On Monday, 15 April 2013 14:36:48 +1000,
Andrew Beekhof wrote:

  I'm testing Pacemaker+Corosync cluster with KVM virtual machines. When
  restarting a node, I got the following status:
  
  # crm status
  
  Last updated: Sun Apr 14 11:50:00 2013
  Last change: Sun Apr 14 11:49:54 2013
  Stack: openais
  Current DC: daedalus - partition with quorum
  Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
  2 Nodes configured, 2 expected votes
  8 Resources configured.
  
  
  Online: [ atlantis daedalus ]
  
  Resource Group: servicios
  fs_drbd_servicios  (ocf::heartbeat:Filesystem):Started daedalus
  clusterIP  (ocf::heartbeat:IPaddr2):   Started daedalus
  Mysql  (ocf::heartbeat:mysql): Started daedalus
  Apache (ocf::heartbeat:apache):Started daedalus
  Pure-FTPd  (ocf::heartbeat:Pure-FTPd): Started daedalus
  Asterisk   (ocf::heartbeat:asterisk):  Started daedalus
  Master/Slave Set: drbd_serviciosClone [drbd_servicios]
  Masters: [ daedalus ]
  Slaves: [ atlantis ]
  
  Failed actions:
 Asterisk_monitor_0 (node=atlantis, call=12, rc=5, status=complete): not 
  installed
  
  
  The problem is that if I do a cleanup of the Asterisk resource in the
  secondary, this has no effect. It seems to be Paceemaker needs to have
  access to the config file to the resource.

 Not Pacemaker, the resource agent.
 Pacemaker runs a non-recurring monitor operation to see what state the
 service is in, it seems the asterisk agent needs that config file.
 
 I'd suggest changing the agent so that if the asterisk process is not
 running, the agent returns 7 (not running) before trying to access the
 config file.

I was reviewing the resource definition assuming there I might have made
some reference to the Asterisk configuration file, but this was not the
case:

primitive Asterisk ocf:heartbeat:asterisk \
params realtime=true \
op monitor interval=60s \
meta target-role=Started

This agent is the one that is available in the resource-agents package
from Debian Backports repository:

atlantis:~# aptitude show resource-agents
Paquete: resource-agents
Nuevo: sí
Estado: instalado
Instalado automáticamente: sí
Versión: 1:3.9.2-5~bpo60+1
Prioridad: opcional
Sección: admin
Desarrollador: Debian HA Maintainers 
debian-ha-maintain...@lists.alioth.debian.org
Tamaño sin comprimir: 2.228 k
Depende de: libc6 (= 2.4), libglib2.0-0 (= 2.12.0), libnet1 (= 1.1.2.1), 
libplumb2, libplumbgpl2, cluster-glue, python
Tiene conflictos con: cluster-agents (= 1:1.0.4-1), rgmanager (= 3.0.12-2+b1)
Reemplaza: cluster-agents (= 1:1.0.4-1), rgmanager (= 3.0.12-2+b1)
Descripción: Cluster Resource Agents
 The Cluster Resource Agents are a set of scripts to interface with several 
services to operate in a High Availability environment for both Pacemaker and
 rgmanager resource managers.
Página principal: https://github.com/ClusterLabs/resource-agents




Do you know if there is any way to get the behavior that you suggested
me using this agent?


Thanks for your reply.


Regards,
Daniel
-- 
Ing. Daniel Bareiro - GNU/Linux registered user #188.598
Proudly running Debian GNU/Linux with uptime:
21:54:06 up 52 days,  6:01, 11 users,  load average: 0.00, 0.02, 0.00


signature.asc
Description: Digital signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Cleanup over secondary node

2013-04-16 Thread Andrew Beekhof

On 17/04/2013, at 11:28 AM, Daniel Bareiro daniel-lis...@gmx.net wrote:

 Ho Andrew.
 
 On Monday, 15 April 2013 14:36:48 +1000,
 Andrew Beekhof wrote:
 
 I'm testing Pacemaker+Corosync cluster with KVM virtual machines. When
 restarting a node, I got the following status:
 
 # crm status
 
 Last updated: Sun Apr 14 11:50:00 2013
 Last change: Sun Apr 14 11:49:54 2013
 Stack: openais
 Current DC: daedalus - partition with quorum
 Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
 2 Nodes configured, 2 expected votes
 8 Resources configured.
 
 
 Online: [ atlantis daedalus ]
 
 Resource Group: servicios
fs_drbd_servicios  (ocf::heartbeat:Filesystem):Started daedalus
clusterIP  (ocf::heartbeat:IPaddr2):   Started daedalus
Mysql  (ocf::heartbeat:mysql): Started daedalus
Apache (ocf::heartbeat:apache):Started daedalus
Pure-FTPd  (ocf::heartbeat:Pure-FTPd): Started daedalus
Asterisk   (ocf::heartbeat:asterisk):  Started daedalus
 Master/Slave Set: drbd_serviciosClone [drbd_servicios]
Masters: [ daedalus ]
Slaves: [ atlantis ]
 
 Failed actions:
   Asterisk_monitor_0 (node=atlantis, call=12, rc=5, status=complete): not 
 installed
 
 
 The problem is that if I do a cleanup of the Asterisk resource in the
 secondary, this has no effect. It seems to be Paceemaker needs to have
 access to the config file to the resource.
 
 Not Pacemaker, the resource agent.
 Pacemaker runs a non-recurring monitor operation to see what state the
 service is in, it seems the asterisk agent needs that config file.
 
 I'd suggest changing the agent so that if the asterisk process is not
 running, the agent returns 7 (not running) before trying to access the
 config file.
 
 I was reviewing the resource definition assuming there I might have made
 some reference to the Asterisk configuration file, but this was not the
 case:
 
 primitive Asterisk ocf:heartbeat:asterisk \
params realtime=true \
op monitor interval=60s \
meta target-role=Started
 
 This agent is the one that is available in the resource-agents package
 from Debian Backports repository:
 
 atlantis:~# aptitude show resource-agents
 Paquete: resource-agents
 Nuevo: sí
 Estado: instalado
 Instalado automáticamente: sí
 Versión: 1:3.9.2-5~bpo60+1
 Prioridad: opcional
 Sección: admin
 Desarrollador: Debian HA Maintainers 
 debian-ha-maintain...@lists.alioth.debian.org
 Tamaño sin comprimir: 2.228 k
 Depende de: libc6 (= 2.4), libglib2.0-0 (= 2.12.0), libnet1 (= 1.1.2.1), 
 libplumb2, libplumbgpl2, cluster-glue, python
 Tiene conflictos con: cluster-agents (= 1:1.0.4-1), rgmanager (= 
 3.0.12-2+b1)
 Reemplaza: cluster-agents (= 1:1.0.4-1), rgmanager (= 3.0.12-2+b1)
 Descripción: Cluster Resource Agents
 The Cluster Resource Agents are a set of scripts to interface with several 
 services to operate in a High Availability environment for both Pacemaker and
 rgmanager resource managers.
 Página principal: https://github.com/ClusterLabs/resource-agents
 
 
 
 
 Do you know if there is any way to get the behavior that you suggested
 me using this agent?

You'll have to edit it and submit the changes upstream.
If whatever it is looking for is not found when a monitor is requested, it 
should probably return 7 (STOPPED)

 
 
 Thanks for your reply.
 
 
 Regards,
 Daniel
 -- 
 Ing. Daniel Bareiro - GNU/Linux registered user #188.598
 Proudly running Debian GNU/Linux with uptime:
 21:54:06 up 52 days,  6:01, 11 users,  load average: 0.00, 0.02, 0.00
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Question about the error when fencing failed

2013-04-16 Thread Andrew Beekhof
This should solve your issue:

https://github.com/beekhof/pacemaker/commit/dbbb6a6

On 11/04/2013, at 7:23 PM, Kazunori INOUE inouek...@intellilink.co.jp wrote:

 Hi Andrew,
 
 (13.04.08 11:04), Andrew Beekhof wrote:
 
 On 05/04/2013, at 3:21 PM, Kazunori INOUE inouek...@intellilink.co.jp 
 wrote:
 
 Hi,
 
 When fencing failed (*1) on the following conditions, an error occurs
 in stonith_perform_callback().
 
 - using fencing-topology. (*2)
 - fence DC node. ($ crm node fence dev2)
 
 Apr  3 17:04:47 dev2 stonith-ng[2278]:   notice: handle_request: Client 
 crmd.2282.b9e69280 wants to fence (reboot) 'dev2' with device '(any)'
 Apr  3 17:04:47 dev2 stonith-ng[2278]:   notice: handle_request: Forwarding 
 complex self fencing request to peer dev1
 Apr  3 17:04:47 dev2 stonith-ng[2278]: info: stonith_command: Processed 
 st_fence from crmd.2282: Operation now in progress (-115)
 Apr  3 17:04:47 dev2 pengine[2281]:  warning: process_pe_message: 
 Calculated Transition 2: /var/lib/pacemaker/pengine/pe-warn-0.bz2
 Apr  3 17:04:47 dev2 stonith-ng[2278]: info: stonith_command: Processed 
 st_query from dev1: OK (0)
 Apr  3 17:04:47 dev2 stonith-ng[2278]: info: stonith_action_create: 
 Initiating action list for agent fence_legacy (target=(null))
 Apr  3 17:04:47 dev2 stonith-ng[2278]: info: stonith_command: Processed 
 st_timeout_update from dev1: OK (0)
 Apr  3 17:04:47 dev2 stonith-ng[2278]: info: dynamic_list_search_cb: 
 Refreshing port list for f-dev1
 Apr  3 17:04:48 dev2 stonith-ng[2278]:   notice: remote_op_done: Operation 
 reboot of dev2 by dev1 for crmd.2282@dev1.4494ed41: Generic Pacemaker error
 Apr  3 17:04:48 dev2 stonith-ng[2278]: info: stonith_command: Processed 
 st_notify reply from dev1: OK (0)
 Apr  3 17:04:48 dev2 crmd[2282]:error: crm_abort: 
 stonith_perform_callback: Triggered assert at st_client.c:1894 : call_id  0
 Apr  3 17:04:48 dev2 crmd[2282]:error: stonith_perform_callback: Bad 
 result   st-reply st_origin=stonith_construct_reply t=stonith-ng 
 st_rc=-201 st_op=st_query st_callid=0 
 st_clientid=b9e69280-e557-478e-aa94-fd7ca6a533b1 
 st_clientname=crmd.2282 
 st_remote_op=4494ed41-2306-4707-8406-fa066b7f3ef0 st_callopt=0 
 st_delegate=dev1
 Apr  3 17:04:48 dev2 crmd[2282]:error: stonith_perform_callback: Bad 
 result st_calldata
 Apr  3 17:04:48 dev2 crmd[2282]:error: stonith_perform_callback: Bad 
 result   st-reply t=st_notify subt=broadcast st_op=reboot 
 count=1 src=dev1 state=4 st_target=dev2
 Apr  3 17:04:48 dev2 crmd[2282]:error: stonith_perform_callback: Bad 
 result st_calldata
 Apr  3 17:04:48 dev2 crmd[2282]:error: stonith_perform_callback: Bad 
 result   st_notify_fence state=4 st_rc=-201 st_target=dev2 
 st_device_action=reboot st_delegate=dev1 
 st_remote_op=4494ed41-2306-4707-8406-fa066b7f3ef0 st_origin=dev1 
 st_clientid=b9e69280-e557-478e-aa94-fd7ca6a533b1 
 st_clientname=crmd.2282/
 Apr  3 17:04:48 dev2 crmd[2282]:error: stonith_perform_callback: Bad 
 result /st_calldata
 Apr  3 17:04:48 dev2 crmd[2282]:error: stonith_perform_callback: Bad 
 result   /st-reply
 Apr  3 17:04:48 dev2 crmd[2282]:error: stonith_perform_callback: Bad 
 result /st_calldata
 Apr  3 17:04:48 dev2 crmd[2282]:error: stonith_perform_callback: Bad 
 result   /st-reply
 Apr  3 17:04:48 dev2 crmd[2282]:  warning: stonith_perform_callback: 
 STONITH command failed: Generic Pacemaker error
 Apr  3 17:04:48 dev2 crmd[2282]:   notice: tengine_stonith_notify: Peer 
 dev2 was not terminated (st_notify_fence) by dev1 for dev1: Generic 
 Pacemaker error (ref=4494ed41-2306-4707-8406-fa066b7f3ef0) by client 
 crmd.2282
 Apr  3 17:07:11 dev2 crmd[2282]:error: stonith_async_timeout_handler: 
 Async call 2 timed out after 144000ms
 
 Is this the designed behavior?
 
 Definitely not :-(
 Is this the first fencing operation that has been initiated by the cluster?
 
 Yes.
 I attached crm_report.
 
 Or has the cluster been running for some time?
 
 
 
 Best Regards,
 Kazunori INOUE
 
 
 *1: I added exit 1 to reset() of stonith-plugin in order to make
fencing fail.
 
  $ diff -u libvirt.ORG libvirt
  --- libvirt.ORG 2012-12-17 09:56:37.0 +0900
  +++ libvirt 2013-04-03 16:33:08.118157947 +0900
  @@ -240,6 +240,7 @@
   ;;
 
   reset)
  +exit 1
   libvirt_check_config
   libvirt_set_domain_id $2
 
 *2:
  node $id=3232261523 dev2
  node $id=3232261525 dev1
  primitive f-dev1 stonith:external/libvirt \
  params pcmk_reboot_retries=1 hostlist=dev1 \
  hypervisor_uri=qemu+ssh://bl460g1n5/system
  primitive f-dev2 stonith:external/libvirt \
  params pcmk_reboot_retries=1 hostlist=dev2 \
  hypervisor_uri=qemu+ssh://bl460g1n6/system
  location rsc_location-f-dev1 f-dev1 \
  rule $id=rsc_location-f-dev1-rule -inf: #uname eq dev1
  location rsc_location-f-dev2 f-dev2 \
  rule $id=rsc_location-f-dev2-rule -inf: #uname eq dev2
  fencing_topology \

Re: [Pacemaker] Question about recovery policy after Too many failures to fence

2013-04-16 Thread Andrew Beekhof

On 11/04/2013, at 7:23 PM, Kazunori INOUE inouek...@intellilink.co.jp wrote:

 Hi Andrew,
 
 (13.04.08 12:01), Andrew Beekhof wrote:
 
 On 27/03/2013, at 7:45 PM, Kazunori INOUE inouek...@intellilink.co.jp 
 wrote:
 
 Hi,
 
 I'm using pacemaker-1.1 (c7910371a5. the latest devel).
 
 When fencing failed 10 times, S_TRANSITION_ENGINE state is kept.
 (related: https://github.com/ClusterLabs/pacemaker/commit/e29d2f9)
 
 How should I recover?  what kind of procedure should I make S_IDLE in?
 
 The intention was that the node should proceed to S_IDLE when this occurs, 
 so you shouldn't have to do anything and the cluster would try again once 
 the recheck-interval expired or a config change was made.
 
 I assume you're saying this does not occur?
 
 
 I recognize that the timer of cluster-recheck-interval is invalid
 between S_TRANSITION_ENGINE.
 So even if waited for a long time, it was still S_TRANSITION_ENGINE.
 * I attached crm_report.

I think 
   https://github.com/beekhof/pacemaker/commit/ef8068e9
should fix this part of the problem.

 
 What do I have to do in order to make the cluster retry STONITH?
 For example, I need to run 'crmadmin -E' to change config?
 
 
 Best Regards,
 Kazunori INOUE
 
 
 
 Mar 27 15:34:34 dev2 crmd[17937]:   notice: tengine_stonith_callback:
 Stonith operation 12/22:14:0:0927a8a0-8e09-494e-acf8-7fb273ca8c9e: Generic
 Pacemaker error (-1001)
 Mar 27 15:34:34 dev2 crmd[17937]:   notice: tengine_stonith_callback:
 Stonith operation 12 for dev2 failed (Generic Pacemaker error): aborting
 transition.
 Mar 27 15:34:34 dev2 crmd[17937]: info: abort_transition_graph:
 tengine_stonith_callback:426 - Triggered transition abort (complete=0) :
 Stonith failed
 Mar 27 15:34:34 dev2 crmd[17937]:   notice: tengine_stonith_notify: Peer
 dev2 was not terminated (st_notify_fence) by dev1 for dev2: Generic
 Pacemaker error (ref=05f75ab8-34ae-4aae-bbc6-aa20dbfdc845) by client
 crmd.17937
 Mar 27 15:34:34 dev2 crmd[17937]:   notice: run_graph: Transition 14
 (Complete=1, Pending=0, Fired=0, Skipped=8, Incomplete=0,
 Source=/var/lib/pacemaker/pengine/pe-warn-2.bz2): Stopped
 Mar 27 15:34:34 dev2 crmd[17937]:   notice: too_many_st_failures: Too many
 failures to fence dev2 (11), giving up
 
 $ crmadmin -S dev2
 Status of crmd@dev2: S_TRANSITION_ENGINE (ok)
 
 $ crm_mon
 Last updated: Wed Mar 27 15:35:12 2013
 Last change: Wed Mar 27 15:33:16 2013 via cibadmin on dev1
 Stack: corosync
 Current DC: dev2 (3232261523) - partition with quorum
 Version: 1.1.10-1.el6-c791037
 2 Nodes configured, unknown expected votes
 3 Resources configured.
 
 
 Node dev2 (3232261523): UNCLEAN (online)
 Online: [ dev1 ]
 
 prmDummy   (ocf::pacemaker:Dummy): Started dev2 FAILED
 Resource Group: grpStonith1
 prmStonith1(stonith:external/stonith-helper):  Started dev2
 Resource Group: grpStonith2
 prmStonith2(stonith:external/stonith-helper):  Started dev1
 
 Failed actions:
prmDummy_monitor_1 (node=dev2, call=23, rc=7, status=complete): not
 running
 
 
 Best Regards,
 Kazunori INOUE
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 too-many-failures-to-fence.tar.bz2___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] pcs equivalent of crm configure erase

2013-04-16 Thread Andreas Mock
Hi Chris,

I would like to see something where you can start your
pacemaker configuration (only) from scratch.
In a way, so that you know nothing is left (constraints, etc.).

Best regards
Andreas


-Ursprüngliche Nachricht-
Von: Chris Feist [mailto:cfe...@redhat.com] 
Gesendet: Mittwoch, 17. April 2013 00:23
An: The Pacemaker cluster resource manager
Cc: Andreas Mock
Betreff: Re: [Pacemaker] pcs equivalent of crm configure erase

On 04/14/13 02:52, Andreas Mock wrote:
 Hi all,

 can someone tell me what the pcs equivalent to

 crm configure erase is?

 From my understanding, 'crm configure erase' will remove everything from
the configuration file except for the nodes.

Are you trying to clear your configuration out and start from scratch?

pcs has a destroy command (pcs cluster destroy), which will remove all
pacemaker/corosync configuration and allow you to create your cluster from
scratch.  Is this what you're looking for?

Or do you need a specific command to keep the cluster running, but reset the
cib to its defaults?

Thanks!
Chris


 Is there a pcs cheat sheet showing the common tasks?

 Or a documentation?

 Best regards

 Andreas



 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org Getting started: 
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?

2013-04-16 Thread Andrew Beekhof

On 15/04/2013, at 7:08 PM, Pavlos Parissis pavlos.paris...@gmail.com wrote:

 Hoi,
 
 I upgraded 1st node and here are the logs
 https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node1.debuglog
 https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node2.debuglog
 
 Enabling tracing on the mentioned functions didn't give at least to me any 
 more information.

10:22:08 pacemakerd[53588]:   notice: crm_add_logfile: Additional logging 
available in /var/log/pacemaker.log

Thats the file(s) we need :)

 
 Cheers,
 Pavlos
 
 
 On 15 April 2013 01:42, Andrew Beekhof and...@beekhof.net wrote:
 
 On 15/04/2013, at 7:31 AM, Pavlos Parissis pavlos.paris...@gmail.com wrote:
 
  On 12/04/2013 09:37 μμ, Pavlos Parissis wrote:
  Hoi,
 
  As I wrote to another post[1] I failed to upgrade to 1.1.8 for a 2 node
  cluster.
 
  Before the upgrade process both nodes are using CentOS 6.3, corosync
  1.4.1-7 and pacemaker-1.1.7.
 
  I followed the rolling upgrade process, so I stopped pacemaker and then
  corosync on node1 and upgraded to CentOS 6.4. The OS upgrade upgrades
  also pacemaker to 1.1.8-7 and corosync to 1.4.1-15.
  The upgrade of rpms went smoothly as I knew about the crmsh issue so I
  made sure I had crmsh rpm on my repos.
 
  Corosync started without any problems and both nodes could see each
  other[2]. But for some reason node2 failed to receive a reply on join
  offer from node1 and node1 never joined the cluster. Node1 formed a new
  cluster as it never got an reply from node2, so I ended up with a
  split-brain situation.
 
  Logs of node1 can be found here
  https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node1.log
  and of node2 here
  https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node2.log
 
 
  Doing a Disconnect  Reattach upgrade of both nodes at the same time
  brings me a working 1.1.8 cluster. Any attempt to make a 1.1.8 node to
  join a cluster with a 1.1.7 failed.
 
 There wasn't enough detail in the logs to suggest a solution, but if you add 
 the following to /etc/sysconfig/pacemaker and re-test, it might shed some 
 additional light on the problem.
 
 export PCMK_trace_functions=ais_dispatch_message
 
 Certainly there was no intention to make them incompatible.
 
 
  Cheers,
  Pavlos
 
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] pcs: Return code handling not clean

2013-04-16 Thread Andreas Mock
Hi Chris,

just seen in the github repo - which I found after posting here -
that you made a fix.

Thank you for the very fast reaction.

Best regards
Andreas

-Ursprüngliche Nachricht-
Von: Chris Feist [mailto:cfe...@redhat.com] 
Gesendet: Mittwoch, 17. April 2013 00:34
An: The Pacemaker cluster resource manager; Andreas Mock
Betreff: Re: [Pacemaker] pcs: Return code handling not clean

On 04/16/13 06:46, Andreas Mock wrote:
 Hi all,

 as I don't really know, where to address this issue, I do post it 
 here. On the one handside as an information for guys scripting with 
 the help of 'pcs' and on the other handside with the hope that one 
 maintainer is listening and will have a look at this.

 Problem: When cluster is down a 'pcs resource'
 shows an error message coming from a subprocess call of 'crm_resource 
 -L' but exits with an error code of 0. That's something which can be 
 improved. Especially while the python code does have error handling in 
 other paces.

 So I guess it is a simple oversight.

 Look at the following piece of code in
 pcs/resource.py:

 915 if len(argv) == 0:
 916 args = [crm_resource,-L]
 917 output,retval = utils.run(args)
 918 preg = re.compile(r'.*(stonith:.*)')
 919 for line in output.split('\n'):
 920 if not preg.match(line) and line != :
 921 print line
 922 return

 retval is totally ignored, while being handled on other places. That 
 leads to the fact that the script returns with status 0.

This is an oversight on my part, I've updated the code to check retval and
return an error.  Currently I'm not passing through the full error code (I'm
only returning 0 on success and 1 on failure).  However, if you think it
would be useful to have this information I would be happy to look at it and
see what I can do.  I'm planning on eventually having pcs interpret the
crm_resource error code and provide a more user friendly output instead of
just a return code.

Thanks,
Chris


 Interestingly the error handling of the utils.run call used all over 
 the module is IMHO a little bit inconsistent.
 If I remember correctly Andrew did some efforts in the past to have a 
 set of return codes comming from the base cibXXX and crm_XXX tools. (I 
 really don't know how much they are differentiated). Why not pass them 
 through?

 Best regards
 Andreas Mock




 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org Getting started: 
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] pacemakerd does not daemonize?

2013-04-16 Thread Andrew Beekhof

On 10/04/2013, at 2:21 PM, Andrei Belov defana...@gmail.com wrote:

 
 On Apr 10, 2013, at 2:06 , Andrew Beekhof and...@beekhof.net wrote:
 
 
 On 09/04/2013, at 4:13 PM, Andrei Belov defana...@gmail.com wrote:
 
 Hello pacemaker users,
 
 I noticed that neither -p nor -f option does not make any sense for 
 pacemakerd -
 pid_file is never used, and f option marked as Legacy.
 
 Is the ability to run as a daemon disappeared completely?
 
 Is pacemakerd  insufficient?
 This is what the init script uses.
 
 That's ok, I just was a little confused by meaningless options in pacemakerd 
 --help.

I've updated it to:

[03:27 PM] beekhof@f17 ~/Development/sources/pacemaker/devel ☺ # mcp/pacemakerd 
--help
pacemakerd - Start/Stop Pacemaker

Usage: pacemakerd mode [options]
Options:
 -?, --help This text
 -$, --version  Version information
 -V, --verbose  Increase debug output
 -S, --shutdown Instruct Pacemaker to shutdown on this machine
 -F, --features Display the full version and list of features 
Pacemaker was built with

Additional Options:
 -f, --foreground   (Ignored) Pacemaker always runs in the 
foreground
 -p, --pid-file=value   (Ignored) Daemon pid file location

Report bugs to pacemaker@oss.clusterlabs.org


 
 
 Also I'd like to know if there are any reasons to worry about the following:
 
 Absolutely... four processes crashed/aborted.
 
 
 Apr 08 19:54:20 [6025] pacemakerd: info: pcmk_child_exit:   Child 
 process crmd exited (pid=6031, rc=0)
 Apr 08 19:54:20 [6025] pacemakerd: info: pcmk_child_exit:   Child 
 process pengine exited (pid=6030, rc=0)
 Apr 08 19:54:24 [6025] pacemakerd:   notice: pcmk_child_exit:   Child 
 process attrd terminated with signal 6 (pid=6029, core=128)
 Apr 08 19:54:29 [6025] pacemakerd:   notice: pcmk_child_exit:   Child 
 process lrmd terminated with signal 6 (pid=6028, core=128)
 Apr 08 19:54:33 [6025] pacemakerd:   notice: pcmk_child_exit:   Child 
 process stonith-ng terminated with signal 6 (pid=6027, core=128)
 Apr 08 19:54:38 [6025] pacemakerd:   notice: pcmk_child_exit:   Child 
 process cib terminated with signal 6 (pid=6026, core=128)
 
 Why some helper daemons could be terminated using abort() ?
 
 Something _really_ bad happened.
 
 I suspect something wrong with pacemaker + libqb and QB_IPC_SOCKET.
 Would appreciate any advices - my knowledge of pacemaker/libqb internals
 is very limited.
 
 It looks like the reason for abort() is somewhere in 
 qb_ipcs_connection_unref():

This is on non-linux right?
I think Angus was of the opinion that $thing_i_cant_remember did reference 
counting a bit differently on non-linux.
I'm not sure he made much progress with it.  Can you confirm which arch this is 
before we continue?

 
 Core was generated by `/opt/local/libexec/pacemaker/attrd'.
 Program terminated with signal 6, Aborted.
 #0  0xfd7fff0e061a in _lwp_kill () from /lib/64/libc.so.1
 (gdb) bt
 #0  0xfd7fff0e061a in _lwp_kill () from /lib/64/libc.so.1
 #1  0xfd7fff0d4ddd in thr_kill () from /lib/64/libc.so.1
 #2  0xfd7fff06a971 in raise () from /lib/64/libc.so.1
 #3  0xfd7fff0400a1 in abort () from /lib/64/libc.so.1
 #4  0xfd7fff0403f5 in _assert () from /lib/64/libc.so.1
 #5  0xfd7fc021274e in qb_ipcs_connection_unref () from 
 /opt/local/lib/libqb.so.0
 #6  0x004044f9 in main ()
 
 Core was generated by `/opt/local/libexec/pacemaker/cib'.
 Program terminated with signal 6, Aborted.
 #0  0xfd7fff0f061a in _lwp_kill () from /lib/64/libc.so.1
 (gdb) bt
 #0  0xfd7fff0f061a in _lwp_kill () from /lib/64/libc.so.1
 #1  0xfd7fff0e4ddd in thr_kill () from /lib/64/libc.so.1
 #2  0xfd7fff07a971 in raise () from /lib/64/libc.so.1
 #3  0xfd7fff0500a1 in abort () from /lib/64/libc.so.1
 #4  0xfd7fff0503f5 in _assert () from /lib/64/libc.so.1
 #5  0xfd7fc021274e in qb_ipcs_connection_unref () from 
 /opt/local/lib/libqb.so.0
 #6  0x00410438 in cib_shutdown ()
 #7  0xfd7fbfc5533f in crm_signal_dispatch (source=0x49be80, 
 callback=optimized out, userdata=optimized out)
at mainloop.c:203
 #8  0xfd7fc555f9e0 in g_main_context_dispatch () from 
 /opt/local/lib/libglib-2.0.so.0
 #9  0xfd7fc555fd40 in g_main_context_iterate.isra.24 () from 
 /opt/local/lib/libglib-2.0.so.0
 #10 0xfd7fc5560152 in g_main_loop_run () from 
 /opt/local/lib/libglib-2.0.so.0
 #11 0x00411056 in cib_init ()
 #12 0x0041163e in main ()
 
 Core was generated by `/opt/local/libexec/pacemaker/lrmd'.
 Program terminated with signal 6, Aborted.
 #0  0xfd7fff0e061a in _lwp_kill () from /lib/64/libc.so.1
 (gdb) bt
 #0  0xfd7fff0e061a in _lwp_kill () from /lib/64/libc.so.1
 #1  0xfd7fff0d4ddd in thr_kill () from /lib/64/libc.so.1
 #2  0xfd7fff06a971 in raise () from /lib/64/libc.so.1
 #3  0xfd7fff0400a1 in abort () from /lib/64/libc.so.1
 #4  0xfd7fff0403f5 in _assert () from /lib/64/libc.so.1
 #5