Re: [Pacemaker] Some questions on the currenct state
Hi David, thank you for your answers. Best regards Andreas Mock -Ursprüngliche Nachricht- Von: David Vossel [mailto:dvos...@redhat.com] Gesendet: Montag, 12. Januar 2015 18:28 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] Some questions on the currenct state - Original Message - > Hi Trevor, > > thank you for answering so fast. > > 2) Besides the fact that rpm packages are available do you know how to > make rpm packages from git repository? ./autogen.sh && ./configure && make rpm That will generate rpms from the source tree. > 4) Is RHEL 7.x using corosync 2.x and pacemaker plugin for cluster > membership? no. RHEL 7.x uses corosync 2.x and the new corosync vote quorum api. The plugins are a thing of the past for rhel7. > Best regards > Andreas Mock > > > > -Ursprüngliche Nachricht- > > Von: Trevor Hemsley [mailto:thems...@voiceflex.com] > > Gesendet: Montag, 12. Januar 2015 16:42 > > An: The Pacemaker cluster resource manager > > Betreff: Re: [Pacemaker] Some questions on the currenct state > > > > On 12/01/15 15:09, Andreas Mock wrote: > > > Hi all, > > > > > > almost allways when I'm forced to do some major upgrades to our > > > core machines in terms of hardware and/or software (OS) I'm forced > > > to have a look at the current state of pacemaker based HA. Things > > > are going on and things change. Projects converge and diverge, > > > tool(s)/chains come and go and distributions marketing strategies > > > change. Therefor I want to ask the following question in the hope > > > list members deeply involved can answer easily. > > > > > > 1) Are there pacemaker packages für RHEL 6.6 and clones? > > > When yes where? > > > > In the CentOS (etc) base/updates repos. For RHEL they're in the HA > > channel. > > > > > > > > 2) How can I create a pacemaker package 1.1.12 on my own from the > > > git sources? > > It's already in base/updates. > > > > > > > > 3) How can I get the current versions of pcs and/or crmsh? > > > Is pcs competitive to crmsh meanwhile? > > pcs is in el6.6 and now includes pcsd. You can get crmsh from an > > opensuse build repo for el6. > > > > > > 4) Is the pacemaker HA solution of RHEL 7.x still bound to use of > > > cman? > > No > > > > > > 5) Where can I find a currenct workable version of the agents for > > > RHEL 6.6 (and clones) and RHEL 7.x? > > Probably you want the resource-agents package. > > > > T > > > > ___ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org Getting started: > > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Some questions on the currenct state
Hi Trevor, thank you for answering so fast. 2) Besides the fact that rpm packages are available do you know how to make rpm packages from git repository? 4) Is RHEL 7.x using corosync 2.x and pacemaker plugin for cluster membership? Best regards Andreas Mock > -Ursprüngliche Nachricht- > Von: Trevor Hemsley [mailto:thems...@voiceflex.com] > Gesendet: Montag, 12. Januar 2015 16:42 > An: The Pacemaker cluster resource manager > Betreff: Re: [Pacemaker] Some questions on the currenct state > > On 12/01/15 15:09, Andreas Mock wrote: > > Hi all, > > > > almost allways when I'm forced to do some major upgrades > > to our core machines in terms of hardware and/or software (OS) > > I'm forced to have a look at the current state of pacemaker > > based HA. Things are going on and things change. Projects > > converge and diverge, tool(s)/chains come and go and > > distributions marketing strategies change. Therefor I want > > to ask the following question in the hope list members > > deeply involved can answer easily. > > > > 1) Are there pacemaker packages für RHEL 6.6 and clones? > > When yes where? > > In the CentOS (etc) base/updates repos. For RHEL they're in the HA > channel. > > > > > 2) How can I create a pacemaker package 1.1.12 on my own from > > the git sources? > It's already in base/updates. > > > > > 3) How can I get the current versions of pcs and/or crmsh? > > Is pcs competitive to crmsh meanwhile? > pcs is in el6.6 and now includes pcsd. You can get crmsh from an > opensuse build repo for el6. > > > > 4) Is the pacemaker HA solution of RHEL 7.x still bound to use > > of cman? > No > > > > 5) Where can I find a currenct workable version of the agents > > for RHEL 6.6 (and clones) and RHEL 7.x? > Probably you want the resource-agents package. > > T > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Some questions on the currenct state
Hi all, almost allways when I'm forced to do some major upgrades to our core machines in terms of hardware and/or software (OS) I'm forced to have a look at the current state of pacemaker based HA. Things are going on and things change. Projects converge and diverge, tool(s)/chains come and go and distributions marketing strategies change. Therefor I want to ask the following question in the hope list members deeply involved can answer easily. 1) Are there pacemaker packages für RHEL 6.6 and clones? When yes where? 2) How can I create a pacemaker package 1.1.12 on my own from the git sources? 3) How can I get the current versions of pcs and/or crmsh? Is pcs competitive to crmsh meanwhile? 4) Is the pacemaker HA solution of RHEL 7.x still bound to use of cman? 5) Where can I find a currenct workable version of the agents for RHEL 6.6 (and clones) and RHEL 7.x? It would be really nice if someone could give answers or helpful pointers for answering the questions on my own. Thank you all in advance. Best regards Andreas Mock ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Enabling pacemaker debug logging while running
Hi Andrew, thank you for your answer. I found that blog entry before. I'm pretty sure I'm too stupid to get my information out of that blog entry. You write there: "[...] If the level of detail in the cluster log file is still insufficient, or you simply wish to go blind, you can turn on debugging in Corosync/CMAN, or set PCMK_debug in /etc/sysconfig/pacemaker.[...]". I did enable the debug option in cman as I described in my initial post. But it seemed that this option change was only propagated to corosync but not to pacemaker (and resource agents). Does this and the reference to PCMK_debug mean that I can't enable debugging in pacemaker without restart? Or is the "only" option the backbox feature? Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Andrew Beekhof [mailto:and...@beekhof.net] Gesendet: Montag, 24. März 2014 00:36 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] Enabling pacemaker debug logging while running On 20 Mar 2014, at 11:24 pm, Andreas Mock wrote: > Hi all, > > today I faced a problem which I couldn't solve reading > several man pages and other found hint on the web. > > I have a clone of RHEL 6.5, cman based cluster and > pacemaker 1.1.10+. I was able to change the value > debug="on" in cluster.conf as described in the man page. > I was able to propagate this change with 'cman_tool -r -S version'. > The result was, that I could see debug messages from > the corosync layer, vut not from pacemaker and agents. > > What do I have to do to enable debug logging of pacemaker at > runtime? (And how can I switch it off afterwards?) http://blog.clusterlabs.org/blog/2013/pacemaker-logging/ > > Best regards > Andreas Mock > > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Enabling pacemaker debug logging while running
Hi all, today I faced a problem which I couldn't solve reading several man pages and other found hint on the web. I have a clone of RHEL 6.5, cman based cluster and pacemaker 1.1.10+. I was able to change the value debug="on" in cluster.conf as described in the man page. I was able to propagate this change with 'cman_tool -r -S version'. The result was, that I could see debug messages from the corosync layer, vut not from pacemaker and agents. What do I have to do to enable debug logging of pacemaker at runtime? (And how can I switch it off afterwards?) Best regards Andreas Mock ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Stoping clone on one node
Hi all, probably a totally stupid question: But how can I stop a clone resource on one node? Is there a way with crm? The only thing which comes to my mind is creating a -inf location contraint temporarily. Best regards Andreas Mock ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Solving a resource allocation problem
Hi Lars, that's why I wrote: The interested reader of that list does now know why I tried crm_simulate... :-) Thank you Andreas Mock -Ursprüngliche Nachricht- Von: Lars Marowsky-Bree [mailto:l...@suse.com] Gesendet: Donnerstag, 19. September 2013 12:18 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] Solving a resource allocation problem On 2013-09-19T12:12:31, Andreas Mock wrote: > For a solution where I like to push a certain resource > to the new node (this service interruption doesn't > hurt too much) while being sure that the other gets > started on the newly upcoming node I have to balance > the stickiness and negative constraint scores. "negative" constraint scores are always absolute. You can set stickiness per resource. So for that that you want shifted, just set it to zero, and to non-zero for the others. Utilization can be used to perform the load balancing bit. Is that not working? > Therefore I would like to see the simulated scores. That's something you've got a separate thread going for. ;-) Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Solving a resource allocation problem
Hi Lars, no you're not missing something. I just intermixed two acceptable solutions and the way I asked for it. So, for letting the resources stay where they are, you're absolutly right. For a solution where I like to push a certain resource to the new node (this service interruption doesn't hurt too much) while being sure that the other gets started on the newly upcoming node I have to balance the stickiness and negative constraint scores. Therefore I would like to see the simulated scores. Thank you for answering. Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Lars Marowsky-Bree [mailto:l...@suse.com] Gesendet: Donnerstag, 19. September 2013 11:08 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] Solving a resource allocation problem On 2013-09-19T10:20:07, Andreas Mock wrote: > Hi all, > > I need a hint how to solve a resource allocation problem > on a two node cluster (pmck 1.1.11). > > I have two resource blocks (some stacked resources colocation inf) > which shall run on seperate nodes. I did this with a small negativ > colocation constraint. This works so far. > > But now I want to achieve the following. When one node is > brought down all resources are moved correctly. But when I > bring that node up again, than all resources which where > on that node are pushed back because of that negative colocation. > > I would like the cluster to leave the resources on that one node > and manually migrate (rebalance) the resources avoiding another > interrupt of service. Right, so your anti-colocation constraint is not actually a hard requirement, you want it to be optional - scatter resources if possible, but don't restart resources for it. You can do this using the utilization feature, placement-strategy="balanced" and resource-stickiness=inf. Or am I missing something still? Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Solving a resource allocation problem
Hi all, I need a hint how to solve a resource allocation problem on a two node cluster (pmck 1.1.11). I have two resource blocks (some stacked resources colocation inf) which shall run on seperate nodes. I did this with a small negativ colocation constraint. This works so far. But now I want to achieve the following. When one node is brought down all resources are moved correctly. But when I bring that node up again, than all resources which where on that node are pushed back because of that negative colocation. I would like the cluster to leave the resources on that one node and manually migrate (rebalance) the resources avoiding another interrupt of service. I thought that resource stickyness is the right feature for that. But as the resource blocks are a little complicated resource calculation is not straight forward. How can I solve that problem? Is resource stickyness applied even in a case where the node is brought down manually? (The interested readers now know why I want to simulate this issue.) Best regards Andreas Mock ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Howto test/simulate the reaction of the cluster to node up and down
Hi Lars, hi Andrew, thank you for your answers. But I'm still stuck. When I do have both nodes online and the resources are spread over these nodes and I do a crm_simulate -Ls -R -d node1 I do see nicly what would happen to the cluster when the node goes down. Allocation scores and a transition summary showing the movements of the resources. But in the case vice versa, that means the node is down (service pacemaker stop) and I want to simulate the "going online of the node" with crm_simulate -Ls -R -u node1 I do see the current cluster status, the scores (without node being online (=> -INFINITY) and no transitions. It looks like another state transition is missing and I only see the result of one of one or more steps involved. Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Lars Marowsky-Bree [mailto:l...@suse.com] Gesendet: Donnerstag, 19. September 2013 09:20 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] Howto test/simulate the reaction of the cluster to node up and down On 2013-09-17T13:37:54, Andreas Mock wrote: > I have the problem that after a node rejoins the cluster some > resources are move back to that node. > Now I want to see the calculated scores to see where I do > have to adjust the stickyness to get the behaviour I like. > > I'm not sure how to use crm_simulate to get these values. > When both nodes are online I can simulate a node down > by crm_simulate -Ls -d . > But how do I simulate thr transition from a state where one > node is down? When I bring down a node by 'service pacemaker stop' > and try a crm_simulate -Ls -u I don't see resource transitions. crm cib cibstatus crm(live)cib cibstatus# node hex-1 online crm(live)cib cibstatus# simulate nograph scores For more details, see "help simulate" Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Howto test/simulate the reaction of the cluster to node up and down
Hallo Andreas, thank you for your reply. I use 1.1.11-git. What I did: I put one node down (servive pacemaker stop) and then execute a crm_simulate -Ls -u node and I only see the output as said before. When I bring up the node in reality pacemaker is moving resources to that node. The output of crm_simulate doen't reflect these operations. I don't know whether I do something wrong or I'm hitting a bug. Anyway, thank you. Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Andreas Kurz [mailto:andr...@hastexo.com] Gesendet: Mittwoch, 18. September 2013 15:45 An: pacemaker@oss.clusterlabs.org Betreff: Re: [Pacemaker] Howto test/simulate the reaction of the cluster to node up and down On 2013-09-18 15:08, Andreas Mock wrote: > Hi all, > > really nobody here with deeper experience of crm_simulate? > Or with a hint for good documentation? What Pacemaker version are you using? I did a quick test here on older 1.1.6 and 1.1.7 clusters and they show a nice output on "crm_simulate -Ls -u testnode" with transitions and scores. Regards, Andreas -- Need help with Pacemaker? http://www.hastexo.com/now > > Best regards > Andreas Mock > > > -Ursprüngliche Nachricht- > Von: Andreas Mock [mailto:andreas.m...@web.de] > Gesendet: Dienstag, 17. September 2013 13:38 > An: 'The Pacemaker cluster resource manager' > Betreff: [Pacemaker] Howto test/simulate the reaction of the cluster to node > up and down > > Hi all, > > I have the problem that after a node rejoins the cluster some > resources are move back to that node. > Now I want to see the calculated scores to see where I do > have to adjust the stickyness to get the behaviour I like. > > I'm not sure how to use crm_simulate to get these values. > When both nodes are online I can simulate a node down > by crm_simulate -Ls -d . > But how do I simulate thr transition from a state where one > node is down? When I bring down a node by 'service pacemaker stop' > and try a crm_simulate -Ls -u I don't see resource transitions. > I only see: > --8< > Performing requested modifications > + Bringing node dis04 online > --8< > > Any hints appreciated. > > Best regards > Andreas Mock > > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Howto test/simulate the reaction of the cluster to node up and down
Hi all, really nobody here with deeper experience of crm_simulate? Or with a hint for good documentation? Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Andreas Mock [mailto:andreas.m...@web.de] Gesendet: Dienstag, 17. September 2013 13:38 An: 'The Pacemaker cluster resource manager' Betreff: [Pacemaker] Howto test/simulate the reaction of the cluster to node up and down Hi all, I have the problem that after a node rejoins the cluster some resources are move back to that node. Now I want to see the calculated scores to see where I do have to adjust the stickyness to get the behaviour I like. I'm not sure how to use crm_simulate to get these values. When both nodes are online I can simulate a node down by crm_simulate -Ls -d . But how do I simulate thr transition from a state where one node is down? When I bring down a node by 'service pacemaker stop' and try a crm_simulate -Ls -u I don't see resource transitions. I only see: --8< Performing requested modifications + Bringing node dis04 online --8< Any hints appreciated. Best regards Andreas Mock ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Howto test/simulate the reaction of the cluster to node up and down
Hi all, I have the problem that after a node rejoins the cluster some resources are move back to that node. Now I want to see the calculated scores to see where I do have to adjust the stickyness to get the behaviour I like. I'm not sure how to use crm_simulate to get these values. When both nodes are online I can simulate a node down by crm_simulate -Ls -d . But how do I simulate thr transition from a state where one node is down? When I bring down a node by 'service pacemaker stop' and try a crm_simulate -Ls -u I don't see resource transitions. I only see: --8< Performing requested modifications + Bringing node dis04 online --8< Any hints appreciated. Best regards Andreas Mock ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Problems with fence_ipmilan
Hi Digimer, your hint concerning acpid was very valueable. I didn't know about that recommendation. After disabling acpid I could stonith instantly as I like to do. The video has no context. It was meant to make this dry stuff a little bit funny. IMHO worth looking anyway. Thank you! Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Digimer [mailto:li...@alteeve.ca] Gesendet: Dienstag, 17. September 2013 06:37 An: The Pacemaker cluster resource manager Cc: Andreas Mock Betreff: Re: [Pacemaker] Problems with fence_ipmilan On 16/09/13 16:53, Andreas Mock wrote: > Hi all, > > I'm using (want to use) RHEL 6.4 fence_ipmilan for our IBM x3650 M4 (IMM). > My problem is the following. In contrast to the documented behaviour > a 'chassis power off' or a 'chassis power reset' is doing a soft reset as if > you have pressed the on-off-button of the server. That means the > shutdown process is initiated. > > As you can imagine this is like stonithing this way: > http://www.youtube.com/watch?v=fVJiwuk75Ig#t=1m23s > Especially when a SAN volume is blocking in 'D' state. > > What I want is a hard reset. It seems that the only solution > at the moment is to send a 'chassis power reset'. > fence_ipmilan doesn't support that ipmi command at the > moment. > > Has anybody experience with similar (bad) behaviour and workarounds? > > Best regards > Andreas Mock I can't watch the video (yay hotel internet \o/), so if there is context there, I am missing it. The FenceAgentAPI says that "reset" should be "off -> verify -> try on but don't care if that fails". This is because "reset" doesn't have a verifiable "off" state. Next is that you probably have acpid enabled. Most (all?) systems will instantly turn off if acpid is disabled. For this reason, Red Hat actually recommends disabling acpid to help avoid this issue. Third; With IPMI type fence devices, there is no way to prevent one fence from starting after another one has started because the devices are independent. So to help deal with this, it's a good idea to set a 'delay="15"' to one of the node's fence methods. This way, if there is a break and both nodes try to fence the other, the node with the delay will not be fenced immediately. Say you set the delay against node 1. Then there is a break and both start a fence. Node 2 will see that Node 1 has a delay of fifteen seconds and pauses. Node 1 will see no delay against node 2, so it fences immediately. Node 2 will be long dead before it's timer expires, so you avoid the dual fence. Had node 1 really crashed, node 2 would delay 15 seconds, then proceed with the fence. digimer -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Problems with fence_ipmilan
Hi all, I'm using (want to use) RHEL 6.4 fence_ipmilan for our IBM x3650 M4 (IMM). My problem is the following. In contrast to the documented behaviour a 'chassis power off' or a 'chassis power reset' is doing a soft reset as if you have pressed the on-off-button of the server. That means the shutdown process is initiated. As you can imagine this is like stonithing this way: http://www.youtube.com/watch?v=fVJiwuk75Ig#t=1m23s Especially when a SAN volume is blocking in 'D' state. What I want is a hard reset. It seems that the only solution at the moment is to send a 'chassis power reset'. fence_ipmilan doesn't support that ipmi command at the moment. Has anybody experience with similar (bad) behaviour and workarounds? Best regards Andreas Mock ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] CMAN nodes online
Hi, tell us on which OS you want to install and run cman et. al. Show us what you've done so far. (e.g. Communication paths, IP addresses) Best regards Andreas Mock Von: Gopalakrishnan N [mailto:gopalakrishnan...@gmail.com] Gesendet: Montag, 16. September 2013 14:01 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] CMAN nodes online Again the when i restarted the pacemaker and cman not the nodes are not in online, back to square 1. node1 shows only node1 online, and node2 says node2 online. I don't know what's happening in the background... Any advice would be appreciated.. Thanks. On Mon, Sep 16, 2013 at 6:47 PM, Gopalakrishnan N wrote: Hi guys, I got it, basically it tool some time to propogate and now two nodes are showing online... Thanks. On Mon, Sep 16, 2013 at 6:39 PM, Gopalakrishnan N wrote: I have configured CMAN as per the link http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Clusters_f rom_Scratch/index.html#_configuring_cman but when I type cman_tools nodes only one node is online even thought the cluster.conf is propogated in other node as well. what could be the reason, in node1, cman_tool nodes shows only node1 online, in node2 it shows only node2 is online. How to make two nodes as online, even thought CMAN service is running in both nodes. Thanks in advance. Regards, Gopal ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [Linux-HA] Probably a regression of the linbit drbd agent between pacemaker 1.1.8 and 1.1.10
Hi Lars, hi all, we took the time and tested drbd 8.4.4-rc in our problematic scenario. We were able to reproduce the promote error regularly with drbd 8.4.3. After installing 8.4.4-rc we were not able to get this error any more. So, concerning the changes made to get around the known race condition, 8.4.4-rc seems to work. We didn't look at other aspects of the new version. If there is something we should test with your knowledge let us know. Best regards Andreas -Ursprüngliche Nachricht- Von: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Lars Ellenberg Gesendet: Dienstag, 10. September 2013 14:10 An: linux...@lists.linux-ha.org; pacemaker@oss.clusterlabs.org Betreff: Re: [Linux-HA] [Pacemaker] Probably a regression of the linbit drbd agent between pacemaker 1.1.8 and 1.1.10 On Mon, Sep 09, 2013 at 01:41:17PM +0200, Andreas Mock wrote: > Hi Lars, > > here also my official "Thank you very much" looking > at the problem. > I've been looking forward to the official release > of drbd 8.4.4. > > Or do you need disoriented rc testers like me? ;-) Why not? That's what release candidates are intended for. You'd only have to confirm that it works for you now. Respectively, that it still does not, in which case you better report that now than after the release, right? -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Linux-HA mailing list linux...@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [Linux-HA] Probably a regression of the linbit drbd agent between pacemaker 1.1.8 and 1.1.10
Hi Lars, here also my official "Thank you very much" looking at the problem. Also thank you for writing a summary that - coming from your knowing and insider standpoint - is much better than I could do while trying to understand all details presented by you here and in our offlist communication. Additionally such a post gains much more value for a list archive when sent by a famous drbd, HA, pacemaker contributor like you are. I've been looking forward to the official release of drbd 8.4.4. Or do you need disoriented rc testers like me? ;-) Best regards Andreas Mock -Ursprüngliche Nachricht- Von: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Lars Ellenberg Gesendet: Montag, 9. September 2013 12:21 An: linux...@lists.linux-ha.org; pacemaker@oss.clusterlabs.org Betreff: Re: [Linux-HA] [Pacemaker] Probably a regression of the linbit drbd agent between pacemaker 1.1.8 and 1.1.10 On Mon, Sep 09, 2013 at 02:42:45PM +1000, Andrew Beekhof wrote: > > On 06/09/2013, at 5:51 PM, Lars Ellenberg wrote: > > > On Tue, Aug 27, 2013 at 06:51:45AM +0200, Andreas Mock wrote: > >> Hi Andrew, > >> > >> as this is a real showstopper at the moment I invested some other > >> hours to be sure (as far as possible) not having made an error. > >> > >> Some additions: > >> 1) I mirrored the whole mini drbd config to another pacemaker cluster. > >> Same result: pacemaker 1.1.8 works, pacemaker 1.1.10 not > >> 2) When I remove the target role Stopped from the drbd ms resource > >> and insert the config snippet related to the drbd device via crm -f > >> to a lean running pacemaker config (pacemaker cluster options, stonith > >> resources), > >> it seems to work. That means one of the nodes gets promoted. > >> > >> Then after stopping 'crm resource stop ms_drbd_xxx' and starting again > >> I see the same promotion error as described. > >> > >> The drbd resource agent is using /usr/sbin/crm_master. > >> Is there a possibility that feedback given through this client tool > >> is changing the timing behaviour of pacemaker? Or the way > >> transitions are scheduled? > >> Any idea that may be related to a change in pacemaker? > > > > I think that recent pacemaker allows for "start" and "promote" in the > > same transition. > > At least in the one case I saw logs of, this wasn't the case. > The PE computed: > > Current cluster status: > Online: [ db05 db06 ] > > r_stonith-db05(stonith:fence_imm):Started db06 > r_stonith-db06(stonith:fence_imm):Started db05 > Master/Slave Set: ms_drbd_fodb [r_drbd_fodb] > Slaves: [ db05 db06 ] > Master/Slave Set: ms_drbd_fodblog [r_drbd_fodblog] > Slaves: [ db05 db06 ] > > Transition Summary: > * Promote r_drbd_fodb:0 (Slave -> Master db05) > * Promote r_drbd_fodblog:0(Slave -> Master db05) > > and it was the promotion of r_drbd_fodb:0 that failed. Right. Off-list communication revealed that DRBD came up as "Consistent" only, which is a normal and expected state, when using resource level fencing. The promotion attempt then raced with the connection handshake. The DRBD fence-peer handler is run (because it's only Consistent, not UpToDate) and returns successfully, but due to that race, this result is ignored, DRBD stays "only Consistent", which is not good enough to be promoted ("need access to UpToDate data"). Once the handshake is done, that also results in "access to good data", which is why the next promotion attempt succeeds. Something in the timing of pacemaker actions has changed between the affected and unaffected versions. Apparently before there was enough time to do the connection handshake before the promote request was made. This race is fixed with DRBD 8.3.16 and 8.4.4 (currently rc1) You can avoid that race by not allowing Pacemaker to promote if DRBD is only "Consistent". Pacemaker will only attempt promotion, if there is a positive master score for the resource. The ocf:linbit:drbd RA hardcodes the master score for "Consistent" to 5. So you may edit the RA and instead remove the master score for the "only Consistent". (above mentioned fixed DRBD versions also introduce a new "adjust_master_score" paramater, and this becomes configurable) Or you can add a location constraint like this: location no-master-if-only-consistent ms_drbd_XY \ rule $role="Master" -10: defined #uname where "defined #uname" is a funny way to express "true", as in this constraint reduces the resulting master score by 10,
Re: [Pacemaker] Resource ordering/colocating question (DRBD + LVM + FS)
Hi Heikki, it has to be crm_simulate -L -s. Sorry for the wrong command line parameters. Best regards Andreas -Ursprüngliche Nachricht- Von: Heikki Manninen [mailto:h...@iki.fi] Gesendet: Montag, 9. September 2013 10:46 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] Resource ordering/colocating question (DRBD + LVM + FS) Hello Andreas, thanks for your input, much appreciated. On 5.9.2013, at 16.39, "Andreas Mock" wrote: > 1) The second output of crm_mon show a resource IP_database > which is not shown in the initial crm_mon output and also > not in the config. => Reduce your problem/config to the > minimum being reproducible. True. I edited out the resource from the e-mail that did not have anything to do with the problem as such (works ok all the time). Just forgot to remove it from the second copy-paste also. And yes, no more IP resource in the configuration. > 2) Enable logging and look out which node is the DC. > There in the logs you find many many informations showing > what is going on. Hint: Open a terminal session with an > opened tail -f logfile. Watch it while inserting commands. > You'll get used to it. Seems that node #2 was the DC (also visible in the pcs status output). I have looked at the logs all the time, just not yet too familiar with the contents of pacemaker logging. Here's the thing that keeps repeating everytime those LVM and FS resources stay in stopped state: Sep 3 20:01:23 pgdbsrv02 pengine[1667]: notice: LogActions: Start LVM_vgdata01#011(pgdbsrv01.cl1.local - blocked) Sep 3 20:01:23 pgdbsrv02 pengine[1667]: notice: LogActions: Start FS_data01#011(pgdbsrv01.cl1.local - blocked) Sep 3 20:01:23 pgdbsrv02 pengine[1667]: notice: LogActions: Start LVM_vgdata02#011(pgdbsrv01.cl1.local - blocked) Sep 3 20:01:23 pgdbsrv02 pengine[1667]: notice: LogActions: Start FS_data02#011(pgdbsrv01.cl1.local - blocked) So what does blocked mean here? Is it that the node #1 in this case is in need of fencing/stonithing and thus being blocked or something else (I have a backgroud in the RHCS/HACMP/LifeKeeper etc. world). No quorum policy is set to ignore. > 3) The shown status of a drbd resource (crm_mon) doesn't show > you all informations of the drbd devices. Have a look at > drbd-overview on both nodes. (e.g. syncing status). True, DRBD is working fine on these occations. Connected, Synced etc. > 4) This setup CRIES for stonithing. Even in a test environment. > When stonith happens (this is what you see immediately) you > know something went wrong. This is a good indicator for > errors in agents or in the config. Believe me, as tedious stonithing > is the valuable it is for getting hints for bad cluster state. > On virtual machines stonithing is not as painful as on real > servers. Very much true. I have implemented some custom fencing/stonithing agents before on physical and virtual cluster environments. Problem being here is that I'm not aware of reasonably simple ways to implement stonith with VMware Fusion that I'm bound to use for this test setup. Have to dig more into this though. So fencing from cman cluster.conf is chained to pacemaker fencing and pacemaker stonithing is disabled, no quorum policy is ignore. > 5) Is the drbd fencing script enabled? If yes, in certain circumstances > -INF rules are inserted to deny promoting of "wrong" nodes. > You should grep for them 'cibadmin -Q | grep ' No, DRBD fencing is not enabled and split-brain recovery is done manually. > 6) crm_simulate -L -v gives you an output of the scores of > the resources on each node. I really don't know how to read it > exactly (Is there a documentation of that anywhere?), but it > gives you a hint where to look at, when resources don't start. > Especially the aggregation of stickiness values in groups are > sometimes misleading. Could be that I have some different version maybe, because -v is unknown option and: # crm_simulate -L -V Current cluster status: Online: [ pgdbsrv01.cl1.local pgdbsrv02.cl1.local ] Master/Slave Set: DRBD_ms_data01 [DRBD_data01] Masters: [ pgdbsrv01.cl1.local ] Slaves: [ pgdbsrv02.cl1.local ] Master/Slave Set: DRBD_ms_data02 [DRBD_data02] Masters: [ pgdbsrv01.cl1.local ] Slaves: [ pgdbsrv02.cl1.local ] Resource Group: GRP_data01 LVM_vgdata01(ocf::heartbeat:LVM): Stopped FS_data01 (ocf::heartbeat:Filesystem):Stopped Resource Group: GRP_data02 LVM_vgdata02(ocf::heartbeat:LVM): Stopped FS_data02 (ocf::heartbeat:Filesystem):Stopped Only shows that much. Original problem description left quoted below. -- Heikki M > -Ursprüngliche Nachricht- > Von: Heikki Manninen [mailto:h...@iki.fi] > Gesendet: Donnerstag, 5. September 2013 14:08 > An: pacemaker@oss.clusterlabs.org > Betreff: [Pacemak
Re: [Pacemaker] Resource ordering/colocating question (DRBD + LVM + FS)
Hi Heikki, just some comments for helping yourself. 1) The second output of crm_mon show a resource IP_database which is not shown in the initial crm_mon output and also not in the config. => Reduce your problem/config to the minimum being reproducible. 2) Enable logging and look out which node is the DC. There in the logs you find many many informations showing what is going on. Hint: Open a terminal session with an opened tail -f logfile. Watch it while inserting commands. You'll get used to it. 3) The shown status of a drbd resource (crm_mon) doesn't show you all informations of the drbd devices. Have a look at drbd-overview on both nodes. (e.g. syncing status). 4) This setup CRIES for stonithing. Even in a test environment. When stonith happens (this is what you see immediately) you know something went wrong. This is a good indicator for errors in agents or in the config. Believe me, as tedious stonithing is the valuable it is for getting hints for bad cluster state. On virtual machines stonithing is not as painful as on real servers. 5) Is the drbd fencing script enabled? If yes, in certain circumstances -INF rules are inserted to deny promoting of "wrong" nodes. You should grep for them 'cibadmin -Q | grep ' 6) crm_simulate -L -v gives you an output of the scores of the resources on each node. I really don't know how to read it exactly (Is there a documentation of that anywhere?), but it gives you a hint where to look at, when resources don't start. Especially the aggregation of stickiness values in groups are sometimes misleading. 7) Sometimes behaviour of pacemaker changed and it is possible that you hit a bug. But this hard to find out. Possibility: Check a newer version. Hope this helps. Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Heikki Manninen [mailto:h...@iki.fi] Gesendet: Donnerstag, 5. September 2013 14:08 An: pacemaker@oss.clusterlabs.org Betreff: [Pacemaker] Resource ordering/colocating question (DRBD + LVM + FS) Hello, I'm having a bit of a problem understanding what's going on with my simple two-node demo cluster here. My resources come up correctly after restarting the whole cluster but the LVM and Filesystem resources fail to start after a single node restart or standby/unstandby (after node comes back online - why do they even stop/start after the second node comes back?). OS: CentOS 6.4 (cman stack) Pacemaker: pacemaker-1.1.8-7.el6.x86_64 DRBD: drbd84-utils-8.4.3-1.el6.elrepo.x86_64 Everything is configured using: pcs-0.9.26-10.el6_4.1.noarch Two DRBD resources configured and working: data01 & data02 Two nodes: pgdbsrv01.cl1.local & pgdbsrv02.cl1.local Configuration: node pgdbsrv01.cl1.local node pgdbsrv02.cl1.local primitive DRBD_data01 ocf:linbit:drbd \ params drbd_resource="data01" \ op monitor interval="30s" primitive DRBD_data02 ocf:linbit:drbd \ params drbd_resource="data02" \ op monitor interval="30s" primitive FS_data01 ocf:heartbeat:Filesystem \ params device="/dev/mapper/vgdata01-lvdata01" directory="/data01" fstype="ext4" \ op monitor interval="30s" primitive FS_data02 ocf:heartbeat:Filesystem \ params device="/dev/mapper/vgdata02-lvdata02" directory="/data02" fstype="ext4" \ op monitor interval="30s" primitive LVM_vgdata01 ocf:heartbeat:LVM \ params volgrpname="vgdata01" exclusive="true" \ op monitor interval="30s" primitive LVM_vgdata02 ocf:heartbeat:LVM \ params volgrpname="vgdata02" exclusive="true" \ op monitor interval="30s" group GRP_data01 LVM_vgdata01 FS_data01 group GRP_data02 LVM_vgdata02 FS_data02 ms DRBD_ms_data01 DRBD_data01 \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" ms DRBD_ms_data02 DRBD_data02 \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" colocation colocation-GRP_data01-DRBD_ms_data01-INFINITY inf: GRP_data01 DRBD_ms_data01:Master colocation colocation-GRP_data02-DRBD_ms_data02-INFINITY inf: GRP_data02 DRBD_ms_data02:Master order order-DRBD_data01-GRP_data01-mandatory : DRBD_data01:promote GRP_data01:start order order-DRBD_data02-GRP_data02-mandatory : DRBD_data02:promote GRP_data02:start property $id="cib-bootstrap-options" \ dc-version="1.1.8-7.el6-394e906" \ cluster-infrastructure="cman" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ migration-threshold="1" rsc_defaults $id="rsc_defaults-options" \ resource-stickiness="100" 1) After starting the cluster, everything runs happily: Last updated: Tue
[Pacemaker] Howto recover from node state UNCLEAN (online)
Hi all, is there a way to recover from node state UNCLEAN (online) without rebooting? Background: - RHEL6.4 - cman-cluster with pacemaker - stonith enabled and working - resource monitoring failed on node 1 => stop of resource on node 1 failed => stonith off node 1 worked - more or less parallel as resource is clone resource resource monitoring failed on node 2 => stop of resource on node 2 failed => stonith of node 2 failed as stonith resource agent on node 1 is unreachable caused by stonithing of node1 - Error message stating, giving up stonithing. => node 2 in the state above Interestingly: a "service stop pacemaker" doesn't work as pacemaker seems to be blocked by this node state. The questions: 1) How to recover from this state without rebooting? 2) Is self-stonithing allowed meanwhile, so that a self-stonithing device could be added in a fencing topology? Best regards Andreas Mock ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] why not updated http://clusterlabs.org/rpm-next/ ..... ?
Thank you. I'll have a look at it. Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Andrew Beekhof [mailto:and...@beekhof.net] Gesendet: Mittwoch, 4. September 2013 07:05 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] why not updated http://clusterlabs.org/rpm-next/ . ? On 04/09/2013, at 2:56 PM, "Andreas Mock" wrote: > Hi Andrew, > > meanwhile I do know how to build it. > Therefor it is really doable for dummies like me. > > Can you tell me how to build a certain git revision? > > I found out, that 'make rpm' is building packages from the current git > head. > > Can you also tell us how to set a certain rpm package name, so someone > can distinguish several git head builds? > Like 1.1.11-a4fdre and 1.1.11-5fa45? This will get you most of the way there: make TAG=a4fdre WITH="--with pre_release" rpm > > Best regards > Andreas Mock > > > -Ursprüngliche Nachricht- > Von: Andrew Beekhof [mailto:and...@beekhof.net] > Gesendet: Mittwoch, 4. September 2013 06:31 > An: The Pacemaker cluster resource manager > Betreff: Re: [Pacemaker] why not updated > http://clusterlabs.org/rpm-next/ . ? > > > On 23/08/2013, at 3:02 PM, Andreas Mock wrote: > >> Hi Andrew, >> >> I can only talk for myself: Please, please provide rpm-Packages of >> pacemaker 1.1.10 + fitting for RHEL 6.x. >> >> Is this feasible with not too much effort for you? * > > I had hoped to get out of the packaging business by making it not too > much effort for anyone :-) > > 1. Obtain RHEL6.x box > 2. Install dependancies if you haven't already: > # sudo yum install -y yum-utils > # make rpm-dep > 3. Build Pacemaker > # make release > > Not good? > >> >> Best regards >> Andreas Mock >> >> (* This is the question ;) ) >> >> -Ursprüngliche Nachricht- >> Von: Andrew Beekhof [mailto:and...@beekhof.net] >> Gesendet: Freitag, 23. August 2013 06:27 >> An: The Pacemaker cluster resource manager >> Betreff: Re: [Pacemaker] why not updated >> http://clusterlabs.org/rpm-next/ . ? >> >> >> On 20/08/2013, at 8:49 PM, Andrey Groshev wrote: >> >>> Hello Andrew! >>> Why not updated http://clusterlabs.org/rpm-next/* ? >> >> No-one asked :) >> >> >> ___ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] why not updated http://clusterlabs.org/rpm-next/ ..... ?
Hi Andrew, meanwhile I do know how to build it. Therefor it is really doable for dummies like me. Can you tell me how to build a certain git revision? I found out, that 'make rpm' is building packages from the current git head. Can you also tell us how to set a certain rpm package name, so someone can distinguish several git head builds? Like 1.1.11-a4fdre and 1.1.11-5fa45? Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Andrew Beekhof [mailto:and...@beekhof.net] Gesendet: Mittwoch, 4. September 2013 06:31 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] why not updated http://clusterlabs.org/rpm-next/ . ? On 23/08/2013, at 3:02 PM, Andreas Mock wrote: > Hi Andrew, > > I can only talk for myself: Please, please provide rpm-Packages of > pacemaker 1.1.10 + fitting for RHEL 6.x. > > Is this feasible with not too much effort for you? * I had hoped to get out of the packaging business by making it not too much effort for anyone :-) 1. Obtain RHEL6.x box 2. Install dependancies if you haven't already: # sudo yum install -y yum-utils # make rpm-dep 3. Build Pacemaker # make release Not good? > > Best regards > Andreas Mock > > (* This is the question ;) ) > > -Ursprüngliche Nachricht- > Von: Andrew Beekhof [mailto:and...@beekhof.net] > Gesendet: Freitag, 23. August 2013 06:27 > An: The Pacemaker cluster resource manager > Betreff: Re: [Pacemaker] why not updated > http://clusterlabs.org/rpm-next/ . ? > > > On 20/08/2013, at 8:49 PM, Andrey Groshev wrote: > >> Hello Andrew! >> Why not updated http://clusterlabs.org/rpm-next/* ? > > No-one asked :) > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [Problem]Two error information is displayed.
Hi Hideo san, the two line shall emphasis that you do not only have trouble but real trouble... ;-) But to be seriously: I see this phaenomena, too. (pacemaker 1.1.11-1.el6-4f672bc) Best regards Andreas Mock -Ursprüngliche Nachricht- Von: renayama19661...@ybb.ne.jp [mailto:renayama19661...@ybb.ne.jp] Gesendet: Donnerstag, 29. August 2013 02:38 An: PaceMaker-ML Betreff: [Pacemaker] [Problem]Two error information is displayed. Hi All, Though the trouble is only once, two error information is displayed in crm_mon. - [root@rh64-coro2 ~]# crm_mon -1 -Af Last updated: Thu Aug 29 18:11:00 2013 Last change: Thu Aug 29 18:10:45 2013 via cibadmin on rh64-coro2 Stack: corosync Current DC: NONE 1 Nodes configured 1 Resources configured Online: [ rh64-coro2 ] Node Attributes: * Node rh64-coro2: Migration summary: * Node rh64-coro2: dummy: migration-threshold=1 fail-count=1 last-failure='Thu Aug 29 18:10:57 2013' Failed actions: dummy_monitor_3000 on (null) 'not running' (7): call=11, status=complete, last-rc-change='Thu Aug 29 18:10:57 2013', queued=0ms, exec=0ms dummy_monitor_3000 on rh64-coro2 'not running' (7): call=11, status=complete, last-rc-change='Thu Aug 29 18:10:57 2013', queued=0ms, exec=0ms - There seems to be the problem with an additional judgment of the error information somehow or other. - static void unpack_rsc_op_failure(resource_t *rsc, node_t *node, int rc, xmlNode *xml_op, enum action_fail_response *on_fail, pe_working_set_t * data_set) { int interval = 0; bool is_probe = FALSE; action_t *action = NULL; (snip) if (rc != PCMK_OCF_NOT_INSTALLED || is_set(data_set->flags, pe_flag_symmetric_cluster)) { if ((node->details->shutdown == FALSE) || (node->details->online == TRUE)) { add_node_copy(data_set->failed, xml_op); } } crm_xml_add(xml_op, XML_ATTR_UNAME, node->details->uname); if ((node->details->shutdown == FALSE) || (node->details->online == TRUE)) { add_node_copy(data_set->failed, xml_op); } (snip) - Please revise the additional handling of error information. Best Regards, Hideo Yamauchi. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [Linux-HA] Probably a regression of the linbit drbd agent between pacemaker 1.1.8 and 1.1.10
Hi Andrew, thank you having still an eye on that issue. I'll do my best to present the requested reports. Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Andrew Beekhof [mailto:and...@beekhof.net] Gesendet: Mittwoch, 28. August 2013 00:12 An: General Linux-HA mailing list Cc: 'The Pacemaker cluster resource manager' Betreff: Re: [Pacemaker] [Linux-HA] Probably a regression of the linbit drbd agent between pacemaker 1.1.8 and 1.1.10 On 27/08/2013, at 2:51 PM, Andreas Mock wrote: > Hi Andrew, > > as this is a real showstopper at the moment I invested some other > hours to be sure (as far as possible) not having made an error. > > Some additions: > 1) I mirrored the whole mini drbd config to another pacemaker cluster. > Same result: pacemaker 1.1.8 works, pacemaker 1.1.10 not The version of drbd is the same too? > 2) When I remove the target role Stopped from the drbd ms resource and > insert the config snippet related to the drbd device via crm -f > to a lean running pacemaker config (pacemaker cluster options, stonith > resources), it seems to work. That means one of the nodes gets > promoted. > > Then after stopping 'crm resource stop ms_drbd_xxx' and starting again > I see the same promotion error as described. > > The drbd resource agent is using /usr/sbin/crm_master. > Is there a possibility that feedback given through this client tool is > changing the timing behaviour of pacemaker? Or the way transitions are > scheduled? > Any idea that may be related to a change in pacemaker? # git diff --stat Pacemaker-1.1.8..Pacemaker-1.1.10 | tail -n 1 1610 files changed, 109697 insertions(+), 62940 deletions(-) Needle, meet haystack. Particularly since I have no idea what that drbd error means. If you want me to have a look, you'll need to create a crm_report archive of "works" and "not works". Logs aren't enough. > > Best regards > Andreas Mock > > > -Ursprüngliche Nachricht- > Von: Andrew Beekhof [mailto:and...@beekhof.net] > Gesendet: Dienstag, 27. August 2013 05:02 > An: General Linux-HA mailing list > Cc: pacemaker@oss.clusterlabs.org > Betreff: Re: [Pacemaker] [Linux-HA] Probably a regression of the > linbit drbd agent between pacemaker 1.1.8 and 1.1.10 > > > On 27/08/2013, at 3:31 AM, Andreas Mock wrote: > >> Hi all, >> >> while the linbit drbd resource agent seems to work perfectly on >> pacemaker 1.1.8 (standard software repository) we have problems with >> the last release 1.1.10 and also with the newest head 1.1.11.xxx. >> >> As using drbd is not so uncommon I really hope to find interested >> people helping me out. I can provide as much debug information as you >> want. >> >> >> Environment: >> RHEL 6.4 clone (Scientific Linux 6.4) cman based cluster. >> DRBD 8.4.3 compiled from sources. >> 64bit >> >> - A drbd resource configured following the linbit documentation. >> - Manual start and stop (up/down) and setting primary of drbd >> resource working smoothly. >> - 2 nodes dis03-test/dis04-test >> >> >> >> - Following simple config on pacemaker 1.1.8 configure >> property no-quorum-policy=stop >> property stonith-enabled=true >> rsc_defaults resource-stickiness=2 >> primitive r_stonith-dis03-test stonith:fence_mock \ >> meta resource-stickiness="INFINITY" target-role="Started" \ >> op monitor interval="180" timeout="300" requires="nothing" \ >> op start interval="0" timeout="300" \ >> op stop interval="0" timeout="300" \ >> params vmname=dis03-test pcmk_host_list="dis03-test" >> primitive r_stonith-dis04-test stonith:fence_mock \ >> meta resource-stickiness="INFINITY" target-role="Started" \ >> op monitor interval="180" timeout="300" requires="nothing" \ >> op start interval="0" timeout="300" \ >> op stop interval="0" timeout="300" \ >> params vmname=dis04-test pcmk_host_list="dis04-test" >> location r_stonith-dis03_hates_dis03 r_stonith-dis03-test \ >> rule $id="r_stonith-dis03_hates_dis03-test_rule" -inf: #uname >> eq dis03-test >> location r_stonith-dis04_hates_dis04 r_stonith-dis04-test \ >> rule $id="r_stonith-dis04_hates_dis04-test_rule" -inf: #uname >> eq dis04-test >> primitive r_drbd_postfix ocf:linbit:drbd \ >> params drbd_resource="postfix" dr
Re: [Pacemaker] [Linux-HA] Probably a regression of the linbit drbd agent between pacemaker 1.1.8 and 1.1.10
Hi Andrew, as this is a real showstopper at the moment I invested some other hours to be sure (as far as possible) not having made an error. Some additions: 1) I mirrored the whole mini drbd config to another pacemaker cluster. Same result: pacemaker 1.1.8 works, pacemaker 1.1.10 not 2) When I remove the target role Stopped from the drbd ms resource and insert the config snippet related to the drbd device via crm -f to a lean running pacemaker config (pacemaker cluster options, stonith resources), it seems to work. That means one of the nodes gets promoted. Then after stopping 'crm resource stop ms_drbd_xxx' and starting again I see the same promotion error as described. The drbd resource agent is using /usr/sbin/crm_master. Is there a possibility that feedback given through this client tool is changing the timing behaviour of pacemaker? Or the way transitions are scheduled? Any idea that may be related to a change in pacemaker? Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Andrew Beekhof [mailto:and...@beekhof.net] Gesendet: Dienstag, 27. August 2013 05:02 An: General Linux-HA mailing list Cc: pacemaker@oss.clusterlabs.org Betreff: Re: [Pacemaker] [Linux-HA] Probably a regression of the linbit drbd agent between pacemaker 1.1.8 and 1.1.10 On 27/08/2013, at 3:31 AM, Andreas Mock wrote: > Hi all, > > while the linbit drbd resource agent seems to work perfectly on > pacemaker 1.1.8 (standard software repository) we have problems with > the last release 1.1.10 and also with the newest head 1.1.11.xxx. > > As using drbd is not so uncommon I really hope to find interested > people helping me out. I can provide as much debug information as you > want. > > > Environment: > RHEL 6.4 clone (Scientific Linux 6.4) cman based cluster. > DRBD 8.4.3 compiled from sources. > 64bit > > - A drbd resource configured following the linbit documentation. > - Manual start and stop (up/down) and setting primary of drbd resource > working smoothly. > - 2 nodes dis03-test/dis04-test > > > > - Following simple config on pacemaker 1.1.8 configure >property no-quorum-policy=stop >property stonith-enabled=true >rsc_defaults resource-stickiness=2 >primitive r_stonith-dis03-test stonith:fence_mock \ >meta resource-stickiness="INFINITY" target-role="Started" \ >op monitor interval="180" timeout="300" requires="nothing" \ >op start interval="0" timeout="300" \ >op stop interval="0" timeout="300" \ >params vmname=dis03-test pcmk_host_list="dis03-test" >primitive r_stonith-dis04-test stonith:fence_mock \ >meta resource-stickiness="INFINITY" target-role="Started" \ >op monitor interval="180" timeout="300" requires="nothing" \ >op start interval="0" timeout="300" \ >op stop interval="0" timeout="300" \ >params vmname=dis04-test pcmk_host_list="dis04-test" >location r_stonith-dis03_hates_dis03 r_stonith-dis03-test \ >rule $id="r_stonith-dis03_hates_dis03-test_rule" -inf: #uname > eq dis03-test >location r_stonith-dis04_hates_dis04 r_stonith-dis04-test \ >rule $id="r_stonith-dis04_hates_dis04-test_rule" -inf: #uname > eq dis04-test >primitive r_drbd_postfix ocf:linbit:drbd \ >params drbd_resource="postfix" drbdconf="/usr/local/etc/drbd.conf" \ >op monitor interval="15s" timeout="60s" role="Master" \ >op monitor interval="45s" timeout="60s" role="Slave" \ >op start timeout="240" \ >op stop timeout="240" \ >meta target-role="Stopped" migration-threshold="2" >ms ms_drbd_postfix r_drbd_postfix \ >meta master-max="1" master-node-max="1" \ >clone-max="2" clone-node-max="1" \ >notify="true" \ >meta target-role="Stopped" > commit > > - Pacemaker is started from scratch > - Config above is applied by crm -f where has the above > config snippet. > > - After that crm_mon shows the following status > --8<- > Last updated: Mon Aug 26 18:42:47 2013 Last change: Mon Aug 26 > 18:42:42 2013 via cibadmin on dis03-test > Stack: cman > Current DC: dis03-test - partition with quorum > Version: 1.1.10-1.el6-9abe687 > 2 Nodes configured > 4 Resources configured > > > Online: [ dis03-test dis04-test ] > > Full list o
Re: [Pacemaker] Probably a regression of the linbit drbd agent betweenpacemaker 1.1.8 and 1.1.10
Hi Matthew, thank you for that hint. I'll recheck once again. I'm pretty sure this is not the problem. But who knows... ;-) Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Matthew O'Connor [mailto:m...@ecsorl.com] Gesendet: Montag, 26. August 2013 21:12 An: The Pacemaker cluster resource manager Cc: Andreas Mock; 'General Linux-HA mailing list' Betreff: Re: [Pacemaker] Probably a regression of the linbit drbd agent betweenpacemaker 1.1.8 and 1.1.10 On 08/26/2013 01:31 PM, Andreas Mock wrote: > cat /proc/drbd > version: 8.4.3 (api:1/proto:86-101) > GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@dis03-test, > 2013-07-24 17:19:24 > > on both nodes. The drbd resource was shutdown previously in a clean state, > so that any node can be the primary. > Not sure if this will be helpful or not, but I ran into similar symptoms when I manually upgraded to DRBD 8.4.3 from 8.3.11; it turned out my resource agent script for drbd was not up-to-date* and had problems when starting my drbd resources from a full stop. I could manually start them, and even take one node down and bring it back, but if the resource was completely stopped then neither node was able to start the resource back up. Making sure I had the resource agent that ships with 8.4.3 fixed this for me. * In my case it was an issue with bad install paths, which were my own doing at the time. -- Matthew -- Thank you! Matthew O'Connor (GPG Key ID: 55F981C4) CONFIDENTIAL NOTICE: The information contained in this electronic message is legally privileged, confidential and exempt from disclosure under applicable law. It is intended only for the use of the individual or entity named above. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender immediately by return e-mail and delete the original message and any copies of it from your computer system. Thank you. EXPORT CONTROL WARNING: This document may contain technical data that is subject to the International Traffic in Arms Regulations (ITAR) controls and may not be exported or otherwise disclosed to any foreign person or firm, whether in the US or abroad, without first complying with all requirements of the ITAR, 22 CFR 120-130, including the requirement for obtaining an export license if applicable. In addition, this document may contain technology that is subject to the Export Administration Regulations (EAR) and may not be exported or otherwise disclosed to any non-U.S. person, whether in the US or abroad, without first complying with all requirements of the EAR, 15 CFR 730-774, including the requirement for obtaining an export license if applicable. Violation of these export laws is subject to severe criminal penalties. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Probably a regression of the linbit drbd agent between pacemaker 1.1.8 and 1.1.10
- In the log of the drbd agent I can find the following when the promoting request is handled on dis03-test --8<- ++ drbdadm -c /usr/local/etc/drbd.conf primary postfix 0: State change failed: (-2) Need access to UpToDate data Command 'drbdsetup primary 0' terminated with exit code 17 + cmd_out= + ret=17 + '[' 17 '!=' 0 ']' + ocf_log err 'postfix: Called drbdadm -c /usr/local/etc/drbd.conf primary postfix' + '[' 2 -lt 2 ']' + __OCF_PRIO=err + shift --8<- While working without problems on pacemaker 1.1.8 it doesn't work here. The error message let me assume that there is a kind of race condition where pacemaker is firing the promotion too early. Probably it has something to do with applying attributes from the drbd resource agent. But this is just a guess and I really don't know. ONE ADDITIONAL information: As soon as I do a resource cleanup on the "defective" node the master is promoted as expected. That means a: crm resource cleanup r_drbd_postfix dis03-test results in the following: --8<- Last updated: Mon Aug 26 19:29:38 2013 Last change: Mon Aug 26 19:29:28 2013 via cibadmin on dis04-test Stack: cman Current DC: dis03-test - partition with quorum Version: 1.1.10-1.el6-9abe687 2 Nodes configured 4 Resources configured Online: [ dis03-test dis04-test ] Full list of resources: r_stonith-dis03-test (stonith:fence_mock): Started dis04-test r_stonith-dis04-test (stonith:fence_mock): Started dis03-test Master/Slave Set: ms_drbd_postfix [r_drbd_postfix] Masters: [ dis03-test ] Slaves: [ dis04-test ] Migration summary: * Node dis03-test: * Node dis04-test: --8<- I really hope I can get some attention as pacemaker 1.1.10 is a milestone for Andrew and drbd from linbit is pretty sure a building block of many pacemaker based clusters. Cluster log of DC dis03-test at http://pastebin.com/2S9Y6V3P DRBD agent log at http://pastebin.com/ceYNEAhH So, any help welcome. Best regards Andreas Mock ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] why not updated http://clusterlabs.org/rpm-next/ ..... ?
Hi Andrew, I can only talk for myself: Please, please provide rpm-Packages of pacemaker 1.1.10 + fitting for RHEL 6.x. Is this feasible with not too much effort for you? * Best regards Andreas Mock (* This is the question ;) ) -Ursprüngliche Nachricht- Von: Andrew Beekhof [mailto:and...@beekhof.net] Gesendet: Freitag, 23. August 2013 06:27 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] why not updated http://clusterlabs.org/rpm-next/ . ? On 20/08/2013, at 8:49 PM, Andrey Groshev wrote: > Hello Andrew! > Why not updated http://clusterlabs.org/rpm-next/* ? No-one asked :) ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Weird behaviour of crm_resource -N
Hi all, I just wanted to cleanup a failed stonith device with crm_resource -C -r r_stonith -N node but I made an error taking the wrong node name 'node'. 'node' is not existent in the cluster, but on the commandline I get a 8<- Cleaning up r_stonith on node Waiting for 1 replies from the CRMd 8<- until it times out after 60 seconds with 8<- Cleaning up r_stonith on node Waiting for 1 replies from the CRMdNo messages received in 60 seconds.. aborting. 8<- Is there a reason why there is no check against the cluster membership in advance so that crm_resource could just say: No such node 'node' in cluster? (crm_rsource from latest git (1.1.10+)) Best regards Andreas Mock ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Compiling head of git clone --depth 0 git://github.com/ClusterLabs/pacemaker.git
Thank you! -Ursprüngliche Nachricht- Von: David Vossel [mailto:dvos...@redhat.com] Gesendet: Mittwoch, 21. August 2013 23:38 An: The Pacemaker cluster resource manager Cc: pacema...@clusterlabs.org Betreff: Re: [Pacemaker] Compiling head of git clone --depth 0 git://github.com/ClusterLabs/pacemaker.git - Original Message - > From: "Andreas Mock" > To: "The Pacemaker cluster resource manager" > , pacema...@clusterlabs.org > Sent: Wednesday, August 21, 2013 10:05:38 AM > Subject: Re: [Pacemaker] Compiling head of git clone --depth 0 > git://github.com/ClusterLabs/pacemaker.git > > Hi all, > > for the archive: > It seems that I have found it. > > A simple 'make rpm' does the job. > > I had problems because of a make run with another tag before. So I > don't know how to clean up the whole build environment without > deleting the whole git repository. git reset --hard git clean -f -d -x > > Best regards > Andreas Mock > > > -Ursprüngliche Nachricht- > Von: Andreas Mock [mailto:andreas.m...@web.de] > Gesendet: Mittwoch, 21. August 2013 16:34 > An: pacema...@clusterlabs.org > Betreff: [Pacemaker] Compiling head of git clone --depth 0 > git://github.com/ClusterLabs/pacemaker.git > > Hi all, > > can someone tell me how I can compile and build a rpm from the current > head of the git repository as git clone --depth 0 > git://github.com/ClusterLabs/pacemaker.git ? > > Best regards > Andreas Mock > > > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Compiling head of git clone --depth 0 git://github.com/ClusterLabs/pacemaker.git
Hi all, for the archive: It seems that I have found it. A simple 'make rpm' does the job. I had problems because of a make run with another tag before. So I don't know how to clean up the whole build environment without deleting the whole git repository. Best regards Andreas Mock -Ursprüngliche Nachricht----- Von: Andreas Mock [mailto:andreas.m...@web.de] Gesendet: Mittwoch, 21. August 2013 16:34 An: pacema...@clusterlabs.org Betreff: [Pacemaker] Compiling head of git clone --depth 0 git://github.com/ClusterLabs/pacemaker.git Hi all, can someone tell me how I can compile and build a rpm from the current head of the git repository as git clone --depth 0 git://github.com/ClusterLabs/pacemaker.git ? Best regards Andreas Mock ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Compiling head of git clone --depth 0 git://github.com/ClusterLabs/pacemaker.git
Hi all, can someone tell me how I can compile and build a rpm from the current head of the git repository as git clone --depth 0 git://github.com/ClusterLabs/pacemaker.git ? Best regards Andreas Mock ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] 2 node Cluster
Hi Christian, show us please how all these services should depend on each other. The dependency graph shown in the pacemaker docs is a nice way to do it. And then people here can give advice. Otherwise we only can guess what you want to achieve. Best regards Andreas Mock Von: Christian Gebler [mailto:geblerchrist...@googlemail.com] Gesendet: Dienstag, 13. August 2013 11:57 An: The Pacemaker cluster resource manager Betreff: [Pacemaker] 2 node Cluster Hi, I am trying to set up a 2 node Pacemaker-Cluster with a few services (drbd, psql, ip, tomcat, nginx). All these services should run on one node, all the time, if one service is down, everything must migrate to the other node. So I created one colocation and one order, that works fine and all services run and migrate as expected. But I have one problem...if I stop my Tomcat or Nginx (on the CRM CLI), the database and the ip goes down too, but that should not happen. I have no idea how to fix this problem, so I hope you can help me. Or is the only solution to unmanage the Service at first? Thx! Chris Here is my Config: http://goo.gl/FkeqlH ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] New action for resource running in multiple nodes
Hi Adrián, IMHO the effort would focus on the wrong issue. Make your network for clustering reliable. It is THE building block of a cluster besides the nodes. - Additional network cards - Different vendor - Bonding - Different path through switches On a two-node-cluster without the necessary option to increase the number of nodes I almost always take a crosscable for one of the interconnects. Best regards Andreas Mock P.S. The story sounds to me that you also don't have stonith enabled. Another building block IMHO. Von: Adrián López Tejedor [mailto:adrian...@gmail.com] Gesendet: Montag, 12. August 2013 16:26 An: pacemaker@oss.clusterlabs.org Betreff: [Pacemaker] New action for resource running in multiple nodes Hi! In the environment we use corosync/pacemaker, recently we are having some problems with the network used to maintain the cluster. This short interruptions cause the passive node (we have a two node active-passive configuration with apache tomcat) to think he is alone, and start another instance of tomcat. Few seconds later, the cluster reconnects, and the resource is found active in both nodes. The default behaviour (as seen in http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-re source-options.html) is to stop both, and start one of them. For us, this implies that service is down everytime a short interruption in the network occurs. Maybe a new option for "multiple-active" like "stop_old" and/or "stop_new" could be useful, stopping only the newest instance of the resource. Thanks! Adrián ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Problems with SBD fencing
Hi Dejan, can you explain how the SDB agent works, when this resource is running on exactly that node which has to be stonithed? Thank you in advance. Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Dejan Muhamedagic [mailto:deja...@fastmail.fm] Gesendet: Dienstag, 6. August 2013 11:15 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] Problems with SBD fencing Hi, On Thu, Aug 01, 2013 at 07:58:55PM +0200, Jan Christian Kaldestad wrote: > Thanks for the explanation. But I'm quite confused about the SBD stonith > resource configuration, as the SBD fencing wiki clearly states: > "The sbd agent does not need to and should not be cloned. If all of your > nodes run SBD, as is most likely, not even a monitor action provides a real > benefit, since the daemon would suicide the node if there was a problem. " > > and also this thread > http://oss.clusterlabs.org/pipermail/pacemaker/2012-March/013507.htmlmention > that there should be only one SBD resource configured. > > Can someone please clarify? Should I configure 2 separate SBD resources, > one for each cluster node? No. One sbd resource is sufficient. Thanks, Dejan > > -- > Best regards > Jan > > > On Thu, Aug 1, 2013 at 6:47 PM, Andreas Mock wrote: > > > Hi Jan, > > > > ** ** > > > > first of all I don't know the SBD-Fencing-Infrastructure (just read the*** > > * > > > > article linked by you). But as far as I understand the "normal" fencing*** > > * > > > > (initiated on behalf of pacemaker) is done in the following way. > > > > ** ** > > > > SBD fencing resoure (agent) is writing a request for self-stonithing into* > > *** > > > > one or more SBD partitions where the SBD-daemon is listening and hopefully > > > > > > reacting on. > > > > So, I'm pretty sure (without knowing) that you have to configure the > > > > stonith agent in a way that pacemaker knows howto talk to the stonith agent > > > > > > to kill a certain cluster node. > > > > What is the problem in you scenario: The agent which should be contacted** > > ** > > > > to stonith the node2 is/was running on node2 and can't be connected > > anymore. > > > > ** ** > > > > Because of that stonith agent configuration is most of the times done the* > > *** > > > > following way in a two node cluster: > > > > On every node runs a stonith agent. The stonith agent is configured to > > > > stonith the OTHER node. You have to be sure that this is technically > > > > always possible. > > > > This can be achieved with resource clones or - which is IMHO simpler - in > > > > > > a 2-node-environment with two stonith resources and a negative colocation* > > *** > > > > constraint. > > > > ** ** > > > > As far as I know there is also a self-stonith-safty-belt implemented > > > > in a way that a stonith agent on a node to be shot is never contacted. > > > > (Do I remember correct?) > > > > ** ** > > > > I'm sure this may solve your problem. > > > > ** ** > > > > Best regards > > > > Andreas Mock > > > > ** ** > > > > ** ** > > > > *Von:* Jan Christian Kaldestad [mailto:janc...@gmail.com] > > *Gesendet:* Donnerstag, 1. August 2013 15:46 > > *An:* pacemaker@oss.clusterlabs.org > > *Betreff:* [Pacemaker] Problems with SBD fencing > > > > ** ** > > > > Hi, > > > > > > I am evaluating the SLES HA Extension 11 SP3 product. The cluster > > consists of 2-nodes (active/passive), using SBD stonith resource on a > > shared SAN disk. Configuration according to > > http://www.linux-ha.org/wiki/SBD_Fencing > > > > The SBD daemon is running on both nodes, and the stontih resource (defined > > as primitive) is running on one node only. > > There is also a monitor operation for the stonith resource > > (interval=36000, timeout=20) > > > > I am having some problems getting failover/fencing to work as expected in > > the following scenario: > > - Node 1 is running the resources that I created (except stonith) > > - Node 2 is running the stonith resource > > - Disconnect Node 2 from the network by bringing the interface down > > - Node 2 status changes to UNCLEAN (offline), but the stonith resource > > does not switch over to Node 1 and Node 2 does not reb
Re: [Pacemaker] Problems with SBD fencing
Hi Jan, first of all I don't know the SBD-Fencing-Infrastructure (just read the article linked by you). But as far as I understand the "normal" fencing (initiated on behalf of pacemaker) is done in the following way. SBD fencing resoure (agent) is writing a request for self-stonithing into one or more SBD partitions where the SBD-daemon is listening and hopefully reacting on. So, I'm pretty sure (without knowing) that you have to configure the stonith agent in a way that pacemaker knows howto talk to the stonith agent to kill a certain cluster node. What is the problem in you scenario: The agent which should be contacted to stonith the node2 is/was running on node2 and can't be connected anymore. Because of that stonith agent configuration is most of the times done the following way in a two node cluster: On every node runs a stonith agent. The stonith agent is configured to stonith the OTHER node. You have to be sure that this is technically always possible. This can be achieved with resource clones or - which is IMHO simpler - in a 2-node-environment with two stonith resources and a negative colocation constraint. As far as I know there is also a self-stonith-safty-belt implemented in a way that a stonith agent on a node to be shot is never contacted. (Do I remember correct?) I'm sure this may solve your problem. Best regards Andreas Mock Von: Jan Christian Kaldestad [mailto:janc...@gmail.com] Gesendet: Donnerstag, 1. August 2013 15:46 An: pacemaker@oss.clusterlabs.org Betreff: [Pacemaker] Problems with SBD fencing Hi, I am evaluating the SLES HA Extension 11 SP3 product. The cluster consists of 2-nodes (active/passive), using SBD stonith resource on a shared SAN disk. Configuration according to http://www.linux-ha.org/wiki/SBD_Fencing The SBD daemon is running on both nodes, and the stontih resource (defined as primitive) is running on one node only. There is also a monitor operation for the stonith resource (interval=36000, timeout=20) I am having some problems getting failover/fencing to work as expected in the following scenario: - Node 1 is running the resources that I created (except stonith) - Node 2 is running the stonith resource - Disconnect Node 2 from the network by bringing the interface down - Node 2 status changes to UNCLEAN (offline), but the stonith resource does not switch over to Node 1 and Node 2 does not reboot as I would expect. - Checking the logs on Node 1, I notice the following: Aug 1 12:00:01 slesha1n1i-u pengine[8915]: warning: pe_fence_node: Node slesha1n2i-u will be fenced because the node is no longer part of the cluster Aug 1 12:00:01 slesha1n1i-u pengine[8915]: warning: determine_online_status: Node slesha1n2i-u is unclean Aug 1 12:00:01 slesha1n1i-u pengine[8915]: warning: custom_action: Action stonith_sbd_stop_0 on slesha1n2i-u is unrunnable (offline) Aug 1 12:00:01 slesha1n1i-u pengine[8915]: warning: stage6: Scheduling Node slesha1n2i-u for STONITH Aug 1 12:00:01 slesha1n1i-u pengine[8915]: notice: LogActions: Move stonith_sbd (Started slesha1n2i-u -> slesha1n1i-u) ... Aug 1 12:00:01 slesha1n1i-u crmd[8916]: notice: te_fence_node: Executing reboot fencing operation (24) on slesha1n2i-u (timeout=6) Aug 1 12:00:01 slesha1n1i-u stonith-ng[8912]: notice: handle_request: Client crmd.8916.3144546f wants to fence (reboot) 'slesha1n2i-u' with device '(any)' Aug 1 12:00:01 slesha1n1i-u stonith-ng[8912]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for slesha1n2i-u: 8c00ff7b-2986-4b2a-8b4a-760e8346349b (0) Aug 1 12:00:01 slesha1n1i-u stonith-ng[8912]:error: remote_op_done: Operation reboot of slesha1n2i-u by slesha1n1i-u for crmd.8916@slesha1n1i-u.8c00ff7b: No route to host Aug 1 12:00:01 slesha1n1i-u crmd[8916]: notice: tengine_stonith_callback: Stonith operation 3/24:3:0:8a0f32b2-f91c-4cdf-9cee-1ba9b6e187ab: No route to host (-113) Aug 1 12:00:01 slesha1n1i-u crmd[8916]: notice: tengine_stonith_callback: Stonith operation 3 for slesha1n2i-u failed (No route to host): aborting transition. Aug 1 12:00:01 slesha1n1i-u crmd[8916]: notice: tengine_stonith_notify: Peer slesha1n2i-u was not terminated (st_notify_fence) by slesha1n1i-u for slesha1n1i-u: No route to host (ref=8c00ff7b-2986-4b2a-8b4a-760e8346349b) by client crmd.8916 Aug 1 12:00:01 slesha1n1i-u crmd[8916]: notice: run_graph: Transition 3 (Complete=1, Pending=0, Fired=0, Skipped=5, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-15.bz2): Stopped Aug 1 12:00:01 slesha1n1i-u pengine[8915]: notice: unpack_config: On loss of CCM Quorum: Ignore Aug 1 12:00:01 slesha1n1i-u pengine[8915]: warning: pe_fence_node: Node slesha1n2i-u will be fenced because the node is no longer part of the cluster Aug 1 12:00:01 slesha1n1i-u pengine[8915]: warning: determine_online_status: Node slesha1n2i-u is unclean Aug 1 12:00:0
Re: [Pacemaker] order required if group is present?
Hi Stefan, a) yes, the ordered behaviour is intentional. b) In former version you could change this behaviour with an attribute. But this attribute is depreciated in newer versions of pacemaker. c) The solution for parallel starting resources are resource sets. Best regards Andreas Mock P.S.: Always give information about used versions of elements of the cluster stack. Behaviour changed over time. Von: Bauer, Stefan (IZLBW Extern) [mailto:stefan.ba...@iz.bwl.de] Gesendet: Donnerstag, 25. Juli 2013 12:53 An: Pacemaker@oss.clusterlabs.org Betreff: [Pacemaker] order required if group is present? Hi List, i have 5 resources configured (p_bond1, p_conntrackd, p_vlan118,p_vlan119, p_openvpn) additionally I have put all of them in a group with: group cluster1 p_bond1,p_vlan118,p_vlan119,p_openvpn,p_conntrackd By this, crm is starting the resources in the order, the group is defined (p_bond1,p_vlan118 and so on.) Is this an expected behavior? If so, it's providing the function `order` was made for? Thanks in advance Stefan ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Simulating that a node is down.
Hi Jacobo, 1) corosync communicates through 2 ports, don't forget the second one. 2) IMHO, when you block both ports, it's like a classical split brain. I've done it to test split brain and hopefully fencing behaviour. ´ Best regards Andreas Mock Von: Jacobo García [mailto:jacobo.gar...@gmail.com] Gesendet: Freitag, 12. Juli 2013 11:04 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] Simulating that a node is down. Thanks Andreas for your kind answer, I'll add this to my test battery. Also, my other question, is it a good idea to close the corosync port? Should corosync behave in a expected way? I am getting odd behaviors on this one, but not sure if where to put the blame. Thanks in advance. Jacobo García López de Araujo http://thebourbaki.com | http://twitter.com/clapkent On Thu, Jul 11, 2013 at 8:39 PM, Andreas Mock wrote: Hi Jacobo, one very interesting thing is missing. Overload the node. Make a programm/script which generates many IO-operations, many flushes and meanwhile requesting more and more memory from the OS until swapping begins. Ohhh, yes, swapping and IO is nice… …then you can prove your monitor and stop action timeouts… ;-) Best regards Andreas Mock Von: Jacobo García [mailto:jacobo.gar...@gmail.com] Gesendet: Donnerstag, 11. Juli 2013 19:14 An: pacemaker@oss.clusterlabs.org Betreff: [Pacemaker] Simulating that a node is down. Hello, I am looking for different ways of testing that a node is down. I am finding a strange behavior with one of them (closing with IPtables the UDP communication port). I would like to know if closing the port is a recommended way of achieving my testing purposes. Also I would like to know other ways of testing apart from the ones compiled in the list below: 1. Stopping corosync. 2. Shutting down the node. 3. Shutting down the eth0 interface. 4. Killing corosync process. 5. Closing the corosync communication port. Thanks, Jacobo García López de Araujo ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Simulating that a node is down.
Hi Jacobo, one very interesting thing is missing. Overload the node. Make a programm/script which generates many IO-operations, many flushes and meanwhile requesting more and more memory from the OS until swapping begins. Ohhh, yes, swapping and IO is nice… …then you can prove your monitor and stop action timeouts… ;-) Best regards Andreas Mock Von: Jacobo García [mailto:jacobo.gar...@gmail.com] Gesendet: Donnerstag, 11. Juli 2013 19:14 An: pacemaker@oss.clusterlabs.org Betreff: [Pacemaker] Simulating that a node is down. Hello, I am looking for different ways of testing that a node is down. I am finding a strange behavior with one of them (closing with IPtables the UDP communication port). I would like to know if closing the port is a recommended way of achieving my testing purposes. Also I would like to know other ways of testing apart from the ones compiled in the list below: 1. Stopping corosync. 2. Shutting down the node. 3. Shutting down the eth0 interface. 4. Killing corosync process. 5. Closing the corosync communication port. Thanks, Jacobo García López de Araujo ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] cib_process_diff: Failed application of an update diff
Hi Johan, I do forget it also often, but more than years ago you have to give detailed informations about your stack: - OS - corosync version (or heartbeat) - pacemaker version - agent version - etc. Best regards Andreas -Ursprüngliche Nachricht- Von: Johan Huysmans [mailto:johan.huysm...@inuits.be] Gesendet: Mittwoch, 10. Juli 2013 15:17 An: The Pacemaker cluster resource manager Betreff: [Pacemaker] cib_process_diff: Failed application of an update diff Hi All, Every time a resource fails or recovers or any other action is performed I see following messages in my log. Which can be the cause for this problem, how can I see more information about this message (view the patch / diff which is failing). stonith-ng[25994]: warning: cib_process_diff: Diff 0.90.29 -> 0.90.30 from local not applied to 0.90.29: Failed application of an update diff stonith-ng[25994]: notice: update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an update diff failed (-206) thx! Johan Huysmans ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Question concerning pacemaker-1-1-10-rc6
Hi Andrew, is it much work to provide the release candidate also as complete package on the mentioned site? Or is it against some policy? Anyway: The last version pacemaker-1.1.10-3.1736.37b9108.git.el6.x86_64.rpm seems to work pretty well. Best regards Andreas -Ursprüngliche Nachricht- Von: Andrew Beekhof [mailto:and...@beekhof.net] Gesendet: Dienstag, 9. Juli 2013 04:09 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] Question concerning pacemaker-1-1-10-rc6 On 08/07/2013, at 5:57 PM, Andreas Mock wrote: > Hi Andrew, > > I'm taking the builds from > http://clusterlabs.org/rpm-test-next/rhel-6/x86_64/ > to avoid compiling on my own. > Do these build relate to the release candidates you're announcing? Not at all, those are whatever I happen to be testing at the time. They do include the git hash in the rpm names though. > If yes, could you also announce the version strings? > > Best regards > Andreas Mock > > > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Best way to notify stonith action
Hi all, thank you for your recommendations. I just hoped that there is something pacemaker internal, e.g. like sending traps via snmp or something like that. Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Digimer [mailto:li...@alteeve.ca] Gesendet: Montag, 8. Juli 2013 16:01 An: The Pacemaker cluster resource manager Cc: Andreas Mock Betreff: Re: [Pacemaker] Best way to notify stonith action On 08/07/13 03:48, Andreas Mock wrote: > Hi all, > > I'm just wondering what the best way is to > let an admin know that the cluster (rest of > a cluster) has stonithed some other nodes? > > What is the recommended way? > (The fact that the machine rebooted or is > halted is not the problem. I want to know > that stonithing was done) > > Best regards > Andreas Mock Personally, I have a little monitoring script I wrote that watches the cluster resources, local hardware (via the IPMI BMC), UPSes and what-not. It loop every 30 seconds and sends an email if/when anything of note changes. A node being fenced certainly raises a flag and emails go out. My script is principally for cman + rgmanager, but it should be easy to craft your own, too. I just read in the current state of things, compare against the values in the last scan, decide whether to send an email or not, copy the just-read values over to the last-scan values and delete the "new" values and go back to sleep for 30 seconds. hth -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Question concerning pacemaker-1-1-10-rc6
Hi Andrew, I'm taking the builds from http://clusterlabs.org/rpm-test-next/rhel-6/x86_64/ to avoid compiling on my own. Do these build relate to the release candidates you're announcing? If yes, could you also announce the version strings? Best regards Andreas Mock ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Best way to notify stonith action
Hi all, I'm just wondering what the best way is to let an admin know that the cluster (rest of a cluster) has stonithed some other nodes? What is the recommended way? (The fact that the machine rebooted or is halted is not the problem. I want to know that stonithing was done) Best regards Andreas Mock ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Full API description for Fence Agent
Hi all, after doing a bigger debugging session and reading the documentation more than once, I got the fence agent to work. In this case, the fence agent is a program which can be used by cman/fenced (RHEL cluster) and by pacemaker running in this environment as stonith device. Only for completion: https://fedorahosted.org/cluster/wiki/FenceAgentAPI My findings to get it work: a) Additionally (that was also said somewhere else before) the agent needs to implement the 'metadata' call which prints a xml-document on STDOUT and was adapted from other scripted fence agents. I couldn't find a spec for that xml. I let this call also return 0. b) Contrary to the spec above, stonith_ng seems to send the parameters 'nodename' and 'port'. As my stonith agent doesn't need that, I've thrown an exception scanning these parameters which led to an error in the logs. => Now these parameters are valid even when not used. Someone should clarify how to react on that parameters correctly when not used. See c) c) One thing I missed when configuring the stonith agent in pacemaker was the parameter 'pcmk_host_list'. Look here http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/_c onfiguring_stonith.html to see details. Therefor pacemaker couldn't know how to fence the node. This was also seen by issuing the command 'stonith_admin -l ', what I was wondering about before solving the problem. Additions and corrections welcome for all fence agent programmers. Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Andrew Beekhof [mailto:and...@beekhof.net] Gesendet: Montag, 8. Juli 2013 05:27 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] Full API description for Fence Agent On 04/07/2013, at 9:52 PM, Andreas Mock wrote: > Hi Andrew, > > is there some kind of agreement how to tag a message? > Like (DEBUG/TRACE/ERROR/WARN)? No. But pacemaker obeys the general convention of "errors to stderr, everything else to stdout". > Is there a way message level filtering is done? There is no filtering. > > Best regards > Andreas > > > -Ursprüngliche Nachricht- > Von: Andrew Beekhof [mailto:and...@beekhof.net] > Gesendet: Donnerstag, 4. Juli 2013 13:41 > An: The Pacemaker cluster resource manager > Betreff: Re: [Pacemaker] Full API description for Fence Agent > > > On 04/07/2013, at 7:24 PM, Andreas Mock wrote: > >> Hi digimer, >> >> I would like to take your offer and asking the following: >> >> The API documents says nothing about the correct way >> of giving messages back to the stonith daemon. >> So, what is the right way to write error/warn/info messages. >> >> Looking at the scripted agents available I can find a nice >> mixture of using STDERR and STDOUT. >> What is the rule here? >> Can you give insights, whether STDOUT/STDERR is captured by >> the calling program and logged somewher (and where)? > > In the case of pacemaker, we capture and log both. > >> >> By the way: How is it going with merging the stonith/fencing API? ;-) >> >> Best regards >> Andreas >> >> -Ursprüngliche Nachricht- >> Von: Digimer [mailto:li...@alteeve.ca] >> Gesendet: Dienstag, 11. Juni 2013 15:34 >> An: The Pacemaker cluster resource manager >> Cc: Andreas Mock >> Betreff: Re: [Pacemaker] Full API description for Fence Agent >> >> Hi Andreas, >> >> The metadata section of the document has not been added yet, but we >> are aware of it missing and are working to add it. The rest of the >> document is accurate though. If you build an agent to follow that API, >> it will work with red hat's cluster and pacemaker. >> >> In the meantime, it's not ideal, but if you call any other fence >> agent and pass '-o metadata', you will see the output that the cluster >> expects. It should be easy to adapt to your new agent. >> >> If you have any trouble, please don't hesitate to ask here and we >> will do our best to help. >> >> digimer >> >> On 06/11/2013 07:04 AM, Andreas Mock wrote: >>> Hi all, >>> >>> we need to implement a fence_agent (stonith agent) for >>> cman/corosync/pacemaker (RHEL 6.x). I found the following documentation >>> https://fedorahosted.org/cluster/wiki/FenceAgentAPI >>> >>> But in this document the required metadata action is not >>> described. Can anybody point me to a documentation which >>> is complete? >>> >>> Where is the schema of the xml returned by 'metadata'? >>> >
Re: [Pacemaker] Another question about fencing/stonithing
Thank you for your hint. There is a German saying which I try to translate: "You don't see the forest 'cause of all the trees" So, I'll see. Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Digimer [mailto:li...@alteeve.ca] Gesendet: Freitag, 5. Juli 2013 17:22 An: Andreas Mock Cc: 'The Pacemaker cluster resource manager'; 'Marek Grac' Betreff: Re: AW: [Pacemaker] Another question about fencing/stonithing Andrew might know the trick. In theory, putting your agent into the /usr/sbin or /sbin directory (where ever the other agents are) should "just work". You're sure the exit codes are appropriate? I am sure they are, but just thinking out loud about too-obvious-to-see possible issues. On 05/07/13 11:17, Andreas Mock wrote: > Hi Digimer, > > sorry I forget to mention that I implemented the metadata-call > accordingly. But it may be the "registration" thing which > is necessary to make it know to the stonith/fencing daemon. > > I don't know. I'm wondering a little bit that there is no > pointer how to do it. > > Thank you for your answer! > > Best regards > Andreas Mock > > > -Ursprüngliche Nachricht- > Von: Digimer [mailto:li...@alteeve.ca] > Gesendet: Freitag, 5. Juli 2013 16:52 > An: The Pacemaker cluster resource manager > Cc: Andreas Mock; Marek Grac > Betreff: Re: [Pacemaker] Another question about fencing/stonithing > > On 05/07/13 03:34, Andreas Mock wrote: >> Hi all, >> >> I just wrote a stonith agent which IMHO implements the >> API spec found at https://fedorahosted.org/cluster/wiki/FenceAgentAPI. >> >> But it seems it has a problem when used as pacemaker stonith device. >> >> What has to be done, to have a stonith/fencing agent which implements >> both roles. I'm pretty sure something is missing. >> It's just a guess that it has something to do with listing "registered" >> agents. >> >> What is a registered stonith agent and what is done while registering it? >> >> When I configure my own fencing agent as packemaker stonith device >> and try to do a "stonith_admin --list=nodename" I get a "no such device" >> error. >> >> Any pointer appreciated. >> >> Best regards >> Andreas Mock > > The API doesn't (yet) cover the metadata action. The agents now have to > print out XML validation of valid attributes and elements for your > agent. If you call any existing fence_* agent with just -o metadata, you > will see the format. > > I know rhcs can be forced to see the new agent by putting it in the same > directory as the other agents and then running 'ccs_update_schema'. If > pacemaker doesn't immediately see it, then there might be an equivalent > command you can run. > > I will try to get the API updated. I'm not a cardinal source, but > something is better than nothing. Marek (who I have cc'ed) is, so I can > run the changes by him when done to ensure they're accurate. > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Another question about fencing/stonithing
Hi Digimer, sorry I forget to mention that I implemented the metadata-call accordingly. But it may be the "registration" thing which is necessary to make it know to the stonith/fencing daemon. I don't know. I'm wondering a little bit that there is no pointer how to do it. Thank you for your answer! Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Digimer [mailto:li...@alteeve.ca] Gesendet: Freitag, 5. Juli 2013 16:52 An: The Pacemaker cluster resource manager Cc: Andreas Mock; Marek Grac Betreff: Re: [Pacemaker] Another question about fencing/stonithing On 05/07/13 03:34, Andreas Mock wrote: > Hi all, > > I just wrote a stonith agent which IMHO implements the > API spec found at https://fedorahosted.org/cluster/wiki/FenceAgentAPI. > > But it seems it has a problem when used as pacemaker stonith device. > > What has to be done, to have a stonith/fencing agent which implements > both roles. I'm pretty sure something is missing. > It's just a guess that it has something to do with listing "registered" > agents. > > What is a registered stonith agent and what is done while registering it? > > When I configure my own fencing agent as packemaker stonith device > and try to do a "stonith_admin --list=nodename" I get a "no such device" > error. > > Any pointer appreciated. > > Best regards > Andreas Mock The API doesn't (yet) cover the metadata action. The agents now have to print out XML validation of valid attributes and elements for your agent. If you call any existing fence_* agent with just -o metadata, you will see the format. I know rhcs can be forced to see the new agent by putting it in the same directory as the other agents and then running 'ccs_update_schema'. If pacemaker doesn't immediately see it, then there might be an equivalent command you can run. I will try to get the API updated. I'm not a cardinal source, but something is better than nothing. Marek (who I have cc'ed) is, so I can run the changes by him when done to ensure they're accurate. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Another question about fencing/stonithing
Hi all, I just wrote a stonith agent which IMHO implements the API spec found at https://fedorahosted.org/cluster/wiki/FenceAgentAPI. But it seems it has a problem when used as pacemaker stonith device. What has to be done, to have a stonith/fencing agent which implements both roles. I'm pretty sure something is missing. It's just a guess that it has something to do with listing "registered" agents. What is a registered stonith agent and what is done while registering it? When I configure my own fencing agent as packemaker stonith device and try to do a "stonith_admin --list=nodename" I get a "no such device" error. Any pointer appreciated. Best regards Andreas Mock ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Full API description for Fence Agent
Hi Digimer, hi all, there is a little thing in the API doc which is also unclear to me. It says: "[...] status - this is not implemented by most agents nor used by fenced at this time. Return values: 0 if the fence device is reachable and the port is in the on state 1 if the fence device could not be contacted 2 if the fence device is reachable but is in the off state [...]" What is meant with return code 2? Does it mean I could contact the fence device and it says that the PORT is in off state? How do I have to understand the state "fence device in off state"? Best regards Andreas Mock -Ursprüngliche Nachricht----- Von: Andreas Mock [mailto:andreas.m...@web.de] Gesendet: Donnerstag, 4. Juli 2013 11:25 An: 'Digimer'; 'The Pacemaker cluster resource manager' Betreff: Re: [Pacemaker] Full API description for Fence Agent Hi digimer, I would like to take your offer and asking the following: The API documents says nothing about the correct way of giving messages back to the stonith daemon. So, what is the right way to write error/warn/info messages. Looking at the scripted agents available I can find a nice mixture of using STDERR and STDOUT. What is the rule here? Can you give insights, whether STDOUT/STDERR is captured by the calling program and logged somewher (and where)? By the way: How is it going with merging the stonith/fencing API? ;-) Best regards Andreas -Ursprüngliche Nachricht- Von: Digimer [mailto:li...@alteeve.ca] Gesendet: Dienstag, 11. Juni 2013 15:34 An: The Pacemaker cluster resource manager Cc: Andreas Mock Betreff: Re: [Pacemaker] Full API description for Fence Agent Hi Andreas, The metadata section of the document has not been added yet, but we are aware of it missing and are working to add it. The rest of the document is accurate though. If you build an agent to follow that API, it will work with red hat's cluster and pacemaker. In the meantime, it's not ideal, but if you call any other fence agent and pass '-o metadata', you will see the output that the cluster expects. It should be easy to adapt to your new agent. If you have any trouble, please don't hesitate to ask here and we will do our best to help. digimer On 06/11/2013 07:04 AM, Andreas Mock wrote: > Hi all, > > we need to implement a fence_agent (stonith agent) for > cman/corosync/pacemaker (RHEL 6.x). I found the following documentation > https://fedorahosted.org/cluster/wiki/FenceAgentAPI > > But in this document the required metadata action is not > described. Can anybody point me to a documentation which > is complete? > > Where is the schema of the xml returned by 'metadata'? > > What has to be done that a fence_agent can also be used > by pacemaker? > > What is the right return code of action 'metadata'? > > Is there some explanation how the stonith/fence parts > play together? > > Best regards > Andreas Mock > > > > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Full API description for Fence Agent
Hi Andrew, is there some kind of agreement how to tag a message? Like (DEBUG/TRACE/ERROR/WARN)? Is there a way message level filtering is done? Best regards Andreas -Ursprüngliche Nachricht- Von: Andrew Beekhof [mailto:and...@beekhof.net] Gesendet: Donnerstag, 4. Juli 2013 13:41 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] Full API description for Fence Agent On 04/07/2013, at 7:24 PM, Andreas Mock wrote: > Hi digimer, > > I would like to take your offer and asking the following: > > The API documents says nothing about the correct way > of giving messages back to the stonith daemon. > So, what is the right way to write error/warn/info messages. > > Looking at the scripted agents available I can find a nice > mixture of using STDERR and STDOUT. > What is the rule here? > Can you give insights, whether STDOUT/STDERR is captured by > the calling program and logged somewher (and where)? In the case of pacemaker, we capture and log both. > > By the way: How is it going with merging the stonith/fencing API? ;-) > > Best regards > Andreas > > -Ursprüngliche Nachricht- > Von: Digimer [mailto:li...@alteeve.ca] > Gesendet: Dienstag, 11. Juni 2013 15:34 > An: The Pacemaker cluster resource manager > Cc: Andreas Mock > Betreff: Re: [Pacemaker] Full API description for Fence Agent > > Hi Andreas, > > The metadata section of the document has not been added yet, but we > are aware of it missing and are working to add it. The rest of the > document is accurate though. If you build an agent to follow that API, > it will work with red hat's cluster and pacemaker. > > In the meantime, it's not ideal, but if you call any other fence > agent and pass '-o metadata', you will see the output that the cluster > expects. It should be easy to adapt to your new agent. > > If you have any trouble, please don't hesitate to ask here and we > will do our best to help. > > digimer > > On 06/11/2013 07:04 AM, Andreas Mock wrote: >> Hi all, >> >> we need to implement a fence_agent (stonith agent) for >> cman/corosync/pacemaker (RHEL 6.x). I found the following documentation >> https://fedorahosted.org/cluster/wiki/FenceAgentAPI >> >> But in this document the required metadata action is not >> described. Can anybody point me to a documentation which >> is complete? >> >> Where is the schema of the xml returned by 'metadata'? >> >> What has to be done that a fence_agent can also be used >> by pacemaker? >> >> What is the right return code of action 'metadata'? >> >> Is there some explanation how the stonith/fence parts >> play together? >> >> Best regards >> Andreas Mock >> >> >> >> >> >> ___ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > > -- > Digimer > Papers and Projects: https://alteeve.ca/w/ > What if the cure for cancer is trapped in the mind of a person without > access to education? > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Full API description for Fence Agent
Hi digimer, I would like to take your offer and asking the following: The API documents says nothing about the correct way of giving messages back to the stonith daemon. So, what is the right way to write error/warn/info messages. Looking at the scripted agents available I can find a nice mixture of using STDERR and STDOUT. What is the rule here? Can you give insights, whether STDOUT/STDERR is captured by the calling program and logged somewher (and where)? By the way: How is it going with merging the stonith/fencing API? ;-) Best regards Andreas -Ursprüngliche Nachricht- Von: Digimer [mailto:li...@alteeve.ca] Gesendet: Dienstag, 11. Juni 2013 15:34 An: The Pacemaker cluster resource manager Cc: Andreas Mock Betreff: Re: [Pacemaker] Full API description for Fence Agent Hi Andreas, The metadata section of the document has not been added yet, but we are aware of it missing and are working to add it. The rest of the document is accurate though. If you build an agent to follow that API, it will work with red hat's cluster and pacemaker. In the meantime, it's not ideal, but if you call any other fence agent and pass '-o metadata', you will see the output that the cluster expects. It should be easy to adapt to your new agent. If you have any trouble, please don't hesitate to ask here and we will do our best to help. digimer On 06/11/2013 07:04 AM, Andreas Mock wrote: > Hi all, > > we need to implement a fence_agent (stonith agent) for > cman/corosync/pacemaker (RHEL 6.x). I found the following documentation > https://fedorahosted.org/cluster/wiki/FenceAgentAPI > > But in this document the required metadata action is not > described. Can anybody point me to a documentation which > is complete? > > Where is the schema of the xml returned by 'metadata'? > > What has to be done that a fence_agent can also be used > by pacemaker? > > What is the right return code of action 'metadata'? > > Is there some explanation how the stonith/fence parts > play together? > > Best regards > Andreas Mock > > > > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Question to fencing/stonithing
Hi Leon, thank you for the pointer to the manuals. I read it already. My 2-node-cluster seems not to fence the other node at startup. And I do not have an explanation. That's the reason I asked (after reading the docs). - CMAN_QUORUM_TIMEOUT=0 As the inline doc says: # CMAN_QUORUM_TIMEOUT -- amount of time to wait for a quorate cluster on # startup quorum is needed by many other applications, so we may as # well wait here. If CMAN_QUORUM_TIMEOUT is zero, quorum will # be ignored. => quorum is ignored => fence-domain is created and enabled with the first node joining (isn't it?). - as man fenced says: When the fence domain is first created in the cluster (by the first node to join it) and subsequently enabled (by the cluster gaining quorum) any nodes listed in cluster.conf that are not presently members of the corosync cluster are fenced. - so, does quorum ignore mean: You don't have quorum but it doesn't matter or does it mean the first node does get quorum even it's the one an only node. So, my questions more precise: Why does a startup-fencing not happen in my 2-node-cluster? Is there a way to get this behaviour? Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Leon Fauster [mailto:leonfaus...@googlemail.com] Gesendet: Montag, 1. Juli 2013 19:45 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] Question to fencing/stonithing Am 01.07.2013 um 14:28 schrieb Andreas Mock : > Hi all, > > just want to get clear about startup fencing. > > Scenario: RHEL 6.4, cman, 2-node-cluster, pacemaker, fence via > pcmk-redirect. pacemaker stonith enabled, no-quorum-policy=ignore, > CMAN_QUORUM_TIMEOUT=0 > > > When should a startup fencing operation occure? > I thought a freshly starting node not seeing the other members in a > timeout interval will try to stonith the other node to get sure that > this one doesn't run resources. Is this true? > Where is the config variable for that timeout? > > Can someone put light on that, please? /etc/sysconfig/cman man fenced -- LF ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Question to fencing/stonithing
Hi all, just want to get clear about startup fencing. Scenario: RHEL 6.4, cman, 2-node-cluster, pacemaker, fence via pcmk-redirect. pacemaker stonith enabled, no-quorum-policy=ignore, CMAN_QUORUM_TIMEOUT=0 When should a startup fencing operation occure? I thought a freshly starting node not seeing the other members in a timeout interval will try to stonith the other node to get sure that this one doesn't run resources. Is this true? Where is the config variable for that timeout? Can someone put light on that, please? Best regards Andreas Mock ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] known problem with corosync 1.4.1 on centos64 ?
Hi Andreas, my two cents to your questions: a) If you want to learn most, take any distro and compile the components from source and afterwards use them. => Most learned. b) I don't know how others think about it: But I use a cluster to try to increase uptime. If I know that a disto's component is buggy causing failures while doing the first steps with a more or less standard config (corosync/pacemaker/drbd + some service) I have two choices when I have to stick to a distro's repos: 1) Take the next step distro 6.4 in your case. But it can have bugs too. 2) Ask why it is important to stick to the ditro's repos with a certain software stack. In your case I don't know why it is "allowed" to build drbd from source and it's not "allowed" to build the cluster stack from source. Especially while getting the feet wet with corosync/pacemaker and all the stuff is much more effort compared to the effort understanding, configuring and maintaining a cluster. My policy is also to keep as close as possible to the distro's repos. But when I need a newer or more stable version of a software, I have to use it. Best regards Andreas Von: andreas graeper [mailto:agrae...@googlemail.com] Gesendet: Freitag, 21. Juni 2013 15:00 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] known problem with corosync 1.4.1 on centos64 ? hi, > old version : i shall maintain a centos63 with, except drbd (build from source), only standard-repos are used. for testing i installed newest centos64, but .. . there is no chance to get rid of that centos63, but for learning/testing what are the best distros ? not in general, but for use with drbd+corosync+pacemaker. 2013/6/21 Lars Marowsky-Bree On 2013-06-21T10:56:29, andreas graeper wrote: > hi, > when only i remove or add resources, corosync starts to eat up all cpu. > drbd 8.4.1 (build from source) > corosync 1.4.1 yes, corosync 1.4.1 had one such error, I recall. If you're building from source, why are you sticking to such an old version? Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] uname eq node-name
Hi Andrew, can you tell me what the attribute #uname is holding? Is it the node-name or the 'uname -n' of the node? (I justt read http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Pacemaker_ Explained/index.html#_which_resource_instance_is_promoted) Is there an attribute like '#node'or '#nodename'? Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Andrew Beekhof [mailto:and...@beekhof.net] Gesendet: Mittwoch, 12. Juni 2013 06:45 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] uname eq node-name On 12/06/2013, at 2:40 PM, "Andreas Mock" wrote: > Hi Andrew, > > thank you for that information. You know, often one answer is followed > by many other questions. The same here: > > Is there a tool, where a script is able to determine the node name > based on the uname? > For a script it is easy to find the nodename (uname -n) it is running > on. But what has to be done when the script needs to know the > node-name it is running on? crm_node -n is a good place to start, but requires a running cluster. > > Best regards > Andreas Mock > > > -Ursprüngliche Nachricht- > Von: Andrew Beekhof [mailto:and...@beekhof.net] > Gesendet: Mittwoch, 12. Juni 2013 00:27 > An: The Pacemaker cluster resource manager > Betreff: Re: [Pacemaker] uname eq node-name > > > On 11/06/2013, at 2:33 AM, Andreas Mock wrote: > >> Hi all, >> >> I couldn't find a definitive source stating that a >> corosync/pacemaker/cman cluster must follow the >> rule: uname -n == node-name (== DNS-name of communication-IP) > > In older versions this is true (an artefact of our heartbeat heritage). > However we have been chipping away at that in 1.1.9 and I am currently > running corosync 2.x with pacemaker 1.1.10-rc4 and node-name != uname > -n > >> >> Can someone give a hint for related documentation? >> >> The question arises when you want to configure a cman based cluster >> (cluster.conf) having a uname -n equal to the DNS-name of the >> external ip address but whant to route the cluster communication over >> the internal IP-adresse (cluster interconnect). >> I couldn't find a solution that doesn't use the DNS-names of the >> internal ip-addresses as node-names. >> >> Hints and rules welcome! >> >> Best regards >> Andreas Mock >> >> >> >> ___ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] clusterlabs.org down?
Hi Digimer, oh...sorry...just stonithed the server while trying to reverse engineer the fence api... ;) Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Digimer [mailto:li...@alteeve.ca] Gesendet: Mittwoch, 12. Juni 2013 16:45 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] clusterlabs.org down? On 06/12/2013 10:41 AM, David Vossel wrote: > - Original Message - >> From: "Michael Schwartzkopff" >> To: pacemaker@oss.clusterlabs.org >> Sent: Wednesday, June 12, 2013 9:21:08 AM >> Subject: [Pacemaker] clusterlabs.org down? > > yep, it is down for me as well. > > -- Vossel Down here, too. We should invent a technology that helps keep services available. Highly available, if you will. ;) I'll show myself to the door... -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] uname eq node-name
Hi Andrew, thank you for that information. You know, often one answer is followed by many other questions. The same here: Is there a tool, where a script is able to determine the node name based on the uname? For a script it is easy to find the nodename (uname -n) it is running on. But what has to be done when the script needs to know the node-name it is running on? Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Andrew Beekhof [mailto:and...@beekhof.net] Gesendet: Mittwoch, 12. Juni 2013 00:27 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] uname eq node-name On 11/06/2013, at 2:33 AM, Andreas Mock wrote: > Hi all, > > I couldn't find a definitive source stating that a > corosync/pacemaker/cman cluster must follow the > rule: uname -n == node-name (== DNS-name of communication-IP) In older versions this is true (an artefact of our heartbeat heritage). However we have been chipping away at that in 1.1.9 and I am currently running corosync 2.x with pacemaker 1.1.10-rc4 and node-name != uname -n > > Can someone give a hint for related documentation? > > The question arises when you want to configure a cman based cluster > (cluster.conf) having a uname -n equal to the DNS-name of the external > ip address but whant to route the cluster communication over the > internal IP-adresse (cluster interconnect). > I couldn't find a solution that doesn't use the DNS-names of the > internal ip-addresses as node-names. > > Hints and rules welcome! > > Best regards > Andreas Mock > > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Full API description for Fence Agent
Hi all, we need to implement a fence_agent (stonith agent) for cman/corosync/pacemaker (RHEL 6.x). I found the following documentation https://fedorahosted.org/cluster/wiki/FenceAgentAPI But in this document the required metadata action is not described. Can anybody point me to a documentation which is complete? Where is the schema of the xml returned by 'metadata'? What has to be done that a fence_agent can also be used by pacemaker? What is the right return code of action 'metadata'? Is there some explanation how the stonith/fence parts play together? Best regards Andreas Mock ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] What kind of cluster stack at opensuse-repositories
Hi Lars, thank you for answering. Could you tell me whether the stack is like Option1 or Option3 of this article http://blog.clusterlabs.org/blog/2012/pacemaker-and-cluster-filesystems/ If it's Option1 when do you think SuSE switches to Option3? Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Lars Marowsky-Bree [mailto:l...@suse.com] Gesendet: Montag, 10. Juni 2013 19:49 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] What kind of cluster stack at opensuse-repositories On 2013-06-10T19:25:38, Andreas Mock wrote: > Am I right that these a packages for a RHEL 6.x system but in a > corosync-pacemaker-fashion like SuSE uses it over years now? Yes. Those packages are scheduled for an update to latest upstream versions as soon as we wrap up our current project, but we'll not have cman-based packages available there, I'm pretty sure. Of course, OBS can build them if someone else maintains them ;-) No policy against it, just not our primary task. Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] What kind of cluster stack at opensuse-repositories
Hi all, I want to get sure that I do understand it right: What do I find at http://download.opensuse.org/repositories/network:/ha-clustering/RedHat_RHEL -6/x86_64/ Am I right that I can't use this repository as source for a more up-to-date-replacement for the RHEL 6.x packages because these packages are NOT build for a cman-corosync-pacemaker-cluster. Am I right that these a packages for a RHEL 6.x system but in a corosync-pacemaker-fashion like SuSE uses it over years now? Please help to sorting this out. Best regards Andreas Mock ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Differences in man pages
Hi all, hi Andrew, while having your package (pacemaker et. al.) set installed from http://clusterlabs.org/rpm-test-next/rhel-6/x86_64/ to (hopefully) help debugging and testing, I mentioned the following. The man page of 'crm_resource' doesn't mention some parameters (like -P) which do work and are documented by the related man page of the official RHEL package pacemaker-cli 1.1.8. Is there a reason that documentation was discarded? Best regards Andreas Mock ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] uname eq node-name
Hi all, I couldn't find a definitive source stating that a corosync/pacemaker/cman cluster must follow the rule: uname -n == node-name (== DNS-name of communication-IP) Can someone give a hint for related documentation? The question arises when you want to configure a cman based cluster (cluster.conf) having a uname -n equal to the DNS-name of the external ip address but whant to route the cluster communication over the internal IP-adresse (cluster interconnect). I couldn't find a solution that doesn't use the DNS-names of the internal ip-addresses as node-names. Hints and rules welcome! Best regards Andreas Mock ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] The main road of the cluster stack evolution
Hi Ivan, my advice: Look at http://blog.clusterlabs.org/blog/2013/pacemaker-on-rhel6-dot-4/ and at the other blog entries there. It gives some good insight. Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Халезов Иван [mailto:i.khale...@rts.ru] Gesendet: Montag, 10. Juni 2013 17:26 An: The Pacemaker cluster resource manager Betreff: [Pacemaker] The main road of the cluster stack evolution Hello everyone! I would like to ask a few questions about the main road of the cluster stack evolution. 1) The RedHat company is planning to drop corosync support and wants to switch to CMAN. ( http://www.gossamer-threads.com/lists/linuxha/pacemaker/84662 ) What do you think the main trend is? What is the most popular and better supported solution? Corosync, CMAN or something else? What cluster engine is better for Pacemaker at the moment? And what could be the best solution in 2-3 years? 2) What is the best tool for cluster management: crm, pcs or something else? Redhat switches to pcs and drops crm, but SUSE prefers crmsh tool. Why? What tool will you advice to use? 3) What version of pacemaker should I prefer for using on RedHat 6.3 (or 6.4) ? The version from the vendor (Pacemaker 1.1.7 for RedHat 6.3 and Pacemaker 1.1.8 for RedHat 6.4) or the upstream version from Github? I usually prefer software versions coming from the distribution, because I hope they are well-tested and supported by the vendor. But, as I know, Pacemaker is a teсhnology preview in RedHat 6, so they don't response for it stability. Also, all the same, I have to rebuild Redhat src.rpm package ( for adding corosync 2.3 support into pacemaker) With best regards, Ivan Khalezov ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Removing resource from group without disturbing remaining resources in group
Hi Dejan, thanks for answering. I'll have a look at it and will see whether sets fit our needs better. Have a nice weekend. Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Dejan Muhamedagic [mailto:deja...@fastmail.fm] Gesendet: Freitag, 7. Juni 2013 17:28 An: 'The Pacemaker cluster resource manager' Betreff: Re: [Pacemaker] Removing resource from group without disturbing remaining resources in group Hi, On Fri, Jun 07, 2013 at 04:45:34PM +0200, Andreas Mock wrote: > Hi Dejan, > > we need colocation and order constraints. If you need both, then groups are fine. > IMHO it's an use case for sets, but I have to admit > that I don't really understand how to configure them > with crm. A group gives a group resource id which I > can use as reference in the contraints. It makes the > config simple. If the configuration is simple, then you're OK. If the resource sets wouldn't make the configuration any better/simpler, best not to use them. Otherwise, the syntax is simple and should be available with the help (crm configure help order/colocation). Thanks, Dejan > > IP1-| > IP2---> service > IP3-| > > IP1 .. IP40 > > Advices welcome. > > Best regards > Andreas > > > > -Ursprüngliche Nachricht- > Von: Dejan Muhamedagic [mailto:deja...@fastmail.fm] > Gesendet: Freitag, 7. Juni 2013 15:55 > An: 'The Pacemaker cluster resource manager' > Betreff: Re: [Pacemaker] Removing resource from group without disturbing > remaining resources in group > > On Thu, Jun 06, 2013 at 06:15:26PM +0200, Andreas Mock wrote: > > Hi Florian, > > > > thank you very much for that method description. > > It seems that it does exactly what we want. By the way. > > It's the same use case as yours. Many IP for which we > > want a constraint handle (group). > > But wouldn't just a collocation constraint do if the order is not > important? > > Thanks, > > Dejan > > > Thank you! > > > > Best regards > > Andreas Mock > > > > > > -Ursprüngliche Nachricht- > > Von: Florian Crouzat [mailto:gen...@floriancrouzat.net] > > Gesendet: Donnerstag, 6. Juni 2013 16:50 > > An: pacemaker@oss.clusterlabs.org > > Betreff: Re: [Pacemaker] Removing resource from group without disturbing > > remaining resources in group > > > > Le 06/06/2013 16:35, Andreas Mock a écrit : > > > Hi all, > > > > > > is there a way to remove a resource from a group without > > > disturbing the other resources in the group. > > > > > > The following example: > > > - G1 has R1 R2 R3 > > > - All resources are started > > > - Stopping R1 would cause a stop of R2 R3 > > > - So, the idea was: > > > * crm configure edit => remove R1 from the group while running > > > * stop resource > > > * delete resource > > > > > > BUT: At some point (which we couldn't find out at > > > the moment) all remaining resources of the group are > > > restarted. It seems that the change of the implicit > > > dependency tree of the initial group forces a rebuild > > > of that tree including a restart of that group. > > > (Andrew: Is this assumption right?) > > > > > > So, is there are way to add/remove resources from > > > group without disturbing the other resources. > > > It's clear to me that the resources would restart > > > when the node assignment after removing would change. > > > > > > Hints welcome. > > > > > > > Approximative syntax, do not blame me ! > > > > * crm configure property maintenance-mode=true > > * crm resource stop R1 # it won't stop as it's in maintenance-mode > > * crm configure delete R1 > > * crm configure show # very that all references to R1 are gone > > * crm resource reprobe # the cluster double check the status of declared > > resources and sees that everything is fine and R1 doesn't exists anymore > > * crm_mon -Arf1 # double check that everything is "started (unmanaged)" > > and R1 is gone > > * crm_simulate -S -L -VVV # optional, to check what would happen when > > leaving maintenance-mode > > * crm configure property maintenance-mode=false > > > > If something goes wrong while in maintenance-mode, crm resource cleanup > > foo might be handy. Nothing should move, start or stop until you leave > > maintenance-mode anyway. I use this scenario very often, to add or > > remove IPaddr2 resources to a group of 30+ IPaddr2. >
Re: [Pacemaker] Removing resource from group without disturbing remaining resources in group
Hi Dejan, we need colocation and order constraints. IMHO it's an use case for sets, but I have to admit that I don't really understand how to configure them with crm. A group gives a group resource id which I can use as reference in the contraints. It makes the config simple. IP1-| IP2---> service IP3-| IP1 .. IP40 Advices welcome. Best regards Andreas -Ursprüngliche Nachricht- Von: Dejan Muhamedagic [mailto:deja...@fastmail.fm] Gesendet: Freitag, 7. Juni 2013 15:55 An: 'The Pacemaker cluster resource manager' Betreff: Re: [Pacemaker] Removing resource from group without disturbing remaining resources in group On Thu, Jun 06, 2013 at 06:15:26PM +0200, Andreas Mock wrote: > Hi Florian, > > thank you very much for that method description. > It seems that it does exactly what we want. By the way. > It's the same use case as yours. Many IP for which we > want a constraint handle (group). But wouldn't just a collocation constraint do if the order is not important? Thanks, Dejan > Thank you! > > Best regards > Andreas Mock > > > -Ursprüngliche Nachricht- > Von: Florian Crouzat [mailto:gen...@floriancrouzat.net] > Gesendet: Donnerstag, 6. Juni 2013 16:50 > An: pacemaker@oss.clusterlabs.org > Betreff: Re: [Pacemaker] Removing resource from group without disturbing > remaining resources in group > > Le 06/06/2013 16:35, Andreas Mock a écrit : > > Hi all, > > > > is there a way to remove a resource from a group without > > disturbing the other resources in the group. > > > > The following example: > > - G1 has R1 R2 R3 > > - All resources are started > > - Stopping R1 would cause a stop of R2 R3 > > - So, the idea was: > > * crm configure edit => remove R1 from the group while running > > * stop resource > > * delete resource > > > > BUT: At some point (which we couldn't find out at > > the moment) all remaining resources of the group are > > restarted. It seems that the change of the implicit > > dependency tree of the initial group forces a rebuild > > of that tree including a restart of that group. > > (Andrew: Is this assumption right?) > > > > So, is there are way to add/remove resources from > > group without disturbing the other resources. > > It's clear to me that the resources would restart > > when the node assignment after removing would change. > > > > Hints welcome. > > > > Approximative syntax, do not blame me ! > > * crm configure property maintenance-mode=true > * crm resource stop R1 # it won't stop as it's in maintenance-mode > * crm configure delete R1 > * crm configure show # very that all references to R1 are gone > * crm resource reprobe # the cluster double check the status of declared > resources and sees that everything is fine and R1 doesn't exists anymore > * crm_mon -Arf1 # double check that everything is "started (unmanaged)" > and R1 is gone > * crm_simulate -S -L -VVV # optional, to check what would happen when > leaving maintenance-mode > * crm configure property maintenance-mode=false > > If something goes wrong while in maintenance-mode, crm resource cleanup > foo might be handy. Nothing should move, start or stop until you leave > maintenance-mode anyway. I use this scenario very often, to add or > remove IPaddr2 resources to a group of 30+ IPaddr2. > > > -- > Cheers, > Florian Crouzat > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Removing resource from group without disturbing remaining resources in group
Hi Florian, thank you very much for that method description. It seems that it does exactly what we want. By the way. It's the same use case as yours. Many IP for which we want a constraint handle (group). Thank you! Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Florian Crouzat [mailto:gen...@floriancrouzat.net] Gesendet: Donnerstag, 6. Juni 2013 16:50 An: pacemaker@oss.clusterlabs.org Betreff: Re: [Pacemaker] Removing resource from group without disturbing remaining resources in group Le 06/06/2013 16:35, Andreas Mock a écrit : > Hi all, > > is there a way to remove a resource from a group without > disturbing the other resources in the group. > > The following example: > - G1 has R1 R2 R3 > - All resources are started > - Stopping R1 would cause a stop of R2 R3 > - So, the idea was: > * crm configure edit => remove R1 from the group while running > * stop resource > * delete resource > > BUT: At some point (which we couldn't find out at > the moment) all remaining resources of the group are > restarted. It seems that the change of the implicit > dependency tree of the initial group forces a rebuild > of that tree including a restart of that group. > (Andrew: Is this assumption right?) > > So, is there are way to add/remove resources from > group without disturbing the other resources. > It's clear to me that the resources would restart > when the node assignment after removing would change. > > Hints welcome. > Approximative syntax, do not blame me ! * crm configure property maintenance-mode=true * crm resource stop R1 # it won't stop as it's in maintenance-mode * crm configure delete R1 * crm configure show # very that all references to R1 are gone * crm resource reprobe # the cluster double check the status of declared resources and sees that everything is fine and R1 doesn't exists anymore * crm_mon -Arf1 # double check that everything is "started (unmanaged)" and R1 is gone * crm_simulate -S -L -VVV # optional, to check what would happen when leaving maintenance-mode * crm configure property maintenance-mode=false If something goes wrong while in maintenance-mode, crm resource cleanup foo might be handy. Nothing should move, start or stop until you leave maintenance-mode anyway. I use this scenario very often, to add or remove IPaddr2 resources to a group of 30+ IPaddr2. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] failed actions after resource creation
Hi Andreas, just a comment while I guess what your misunderstanding may come from. When services are clustered you often see a filesystem resource which is moved between the cluster nodes and on top of that filesystem resource is a service (call it S) which is also handled by the cluster. (colocation, groups, etc.) BUT: You have to be aware of one fact. The resource agents mostly rely on some service (S) related binaries to do there job. So if the binaries are not on every node the monitor action of the resource agent fails and the behaviour of the cluster is not what you like. So, most of the time you have to design your stack of resources in a way that the binaries of the service S is on every node in any case and is exactly the same on any node. I once wrote a resource agent which was clever enough to do a multiphase monitor action, checking first if there are expected binaries found. And if not assuming that the service can't be run. In this special case we were able to move the whole service S's binaries with the filesystem resource. But this is uncommon and mostly you don't like it. Best regards Andreas Mock Von: andreas graeper [mailto:agrae...@googlemail.com] Gesendet: Donnerstag, 6. Juni 2013 16:26 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] failed actions after resource creation hi and thanks. (better sentences: i will give my best) on inactive node there is actually only /etc/init.d/nfs and neither nfs-common nor nfs-kernel-server. is monitor not only looking for the running service on active node, but for the situation on inactive node, too ? so i would have expected, that the missing nfs-kernel-server was reported, too. i guess, this can be handled only with a init-script 'nfs' (same name on both nodes) that is starting/killing nfs-commo/nfs-kernel-server ? or is there another solution ? what is monitor in case of resource managed by lsb-script doing ? is it calling `service xxx status` ? what does the monitor expect on node where service is running / not running ? thanks in advance andreas 2013/6/6 Florian Crouzat Le 06/06/2013 15:49, andreas graeper a écrit : p_nfscommon_monitor_0 (node=linag, call=189, rc=5, status=complete): not installed Sounds obvious: "not installed". Node "linag" is missing some daemons/scripts , probably nfs-related. Check your nfs packages and configuration on both nodes, node1 should be missing something. what can i do ? Better sentences. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Removing resource from group without disturbing remaining resources in group
Hi all, is there a way to remove a resource from a group without disturbing the other resources in the group. The following example: - G1 has R1 R2 R3 - All resources are started - Stopping R1 would cause a stop of R2 R3 - So, the idea was: * crm configure edit => remove R1 from the group while running * stop resource * delete resource BUT: At some point (which we couldn't find out at the moment) all remaining resources of the group are restarted. It seems that the change of the implicit dependency tree of the initial group forces a rebuild of that tree including a restart of that group. (Andrew: Is this assumption right?) So, is there are way to add/remove resources from group without disturbing the other resources. It's clear to me that the resources would restart when the node assignment after removing would change. Hints welcome. Best regards Andreas Mock ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Release candidate: 1.1.10-rc3
Hi Andrew, waiting for the RHEL 6.x build of pacemaker 1.1.10 I want to ask whether there can be done something for finding the memory leaks. If so, than explain the steps needed in detail. Currently there are two real clusters available to do testing. (Questions: Do you need logs? Debug-Log? Some excerpt? Shall we look for certain patterns?) Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Andrew Beekhof [mailto:and...@beekhof.net] Gesendet: Mittwoch, 5. Juni 2013 04:26 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] Release candidate: 1.1.10-rc3 On 23/05/2013, at 12:33 PM, Andrew Beekhof wrote: > Please keep the bug reports coming in. There is a good chances that > this will be the final release candidate and 1.1.10 will be tagged on > May 30th. I am delaying rc4 until we can get definitive closure on the crmd memory leak(s). Valgrind has given it a clean bill of health, however the process still appears to be growing over time and at strange intervals, so its not yet clear what is responsible. Beyond fixes for memory leaks, rc4 will include a workaround for inconsistent tls handshake behavior between gnutls versions and some improvements to the way crm_resource can be used to move resources around. I hope to provide more detail soon. -- Andrew ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Need explanation for start stonith behaviour
Hi all, I've a two-node-cluster on a RHEL-clone (6.4, cman, pacemaker) and I'm facing a startup behaviour I can't explain and therefore hope, that you can enlight me. - 2 nodes: N1 N2 - both nodes up - everything is fine Start: - service pacemaker stop on N2 - all resources get migrated => OK - all pacemaker and corosync related processes seem to be shutdown correctly - now service pacemaker stop on N1 - all resources seem to be stopped correctly - all cluster stack processes seem to be stopped correctly. Scenario 1: Let's start with the node which was stopped last. - service pacemaker start on N1 - cluster stack gets started, we have to wait at topic "joining fence domain" - after timeout node gets started - resources get started on that node - now service pacemaker start on N2 - cluster stack does come up - resources started as requested by config => everything seems ok and straight forward Scenario 2: Don't start with the last node shut down but with the node which was stopped first, therefore: - service pacemaker start on N2 - cluster stack comes up seemingly the same way as in scenario 1. A litte wait on topic "joining fence domain". - And now the difference: Node N1 gets stonithed, which seems ok for me as N2 wants to get sure that it is the one and only node in the cluster. (Is this interpretation right?) Why is a stonith triggered in the one but not in the other scenario? Insights really appreciated. Is there some knowledge about the last cluster state made persistant? Is it correct that the node N2 is not stonithed in scenario 1? Thank you in advance. Best regards Andreas Mock ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] cman + corosync + pacemaker + fence_scsi
Hi Angel, two hints from my side. As you're working with ubuntu ask in this list which setup is or will be the best concerning corosync + pacemaker. I'm pretty sure (but I really don't know) that you'll get the advice to drop cman. When you use cman + pacemaker than stonithing works as following. Use the pcmk-redirect in cman which causes that cman delegates stonith commands to pacemaker. In pacemaker you have to add the stonith agents which use your hardware. You have to enable stonithing in pacemaker with stonith-enabled="true". Another issue with stonithing. In a two node cluster you have to configure the stonith agents in a way that the remaining part (which ever it is, mostly the faster one) is able to shoot the other node even when cluster communication is lost. When the stonith action is done over the same wire as your cluster communication than stonithing is meaningless. Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Angel L. Mateo [mailto:ama...@um.es] Gesendet: Mittwoch, 24. April 2013 14:49 An: The Pacemaker cluster resource manager Betreff: [Pacemaker] cman + corosync + pacemaker + fence_scsi Hello, I'm trying to configure a 2 node cluster in ubuntu with cman + corosync + pacemaker (the use of cman is because it is recommended at pacemaker quickstart). In order to solve the split brain in the 2 node cluster I'm using qdisk. For fencing, I'm trying to use fence_scsi and in this point I'm having the problem. I have attached my cluster.conf. xml node myotis51 node myotis52 primitive cluster_ip ocf:heartbeat:IPaddr2 \ params ip="155.54.211.167" \ op monitor interval="30s" property $id="cib-bootstrap-options" \ dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \ cluster-infrastructure="cman" \ stonith-enabled="false" \ last-lrm-refresh="1366803979" At this moment I'm trying just with an IP resource, but at the end I'll get LVM resources and dovecot server running in top of them. The problem I have is that whenever I interrupt network traffic between my nodes (to check if quorum and fencing is working) the IP resource is started in both nodes of the cluster. So it seems that node fencing configure at cluster.conf is not working for me. Then I have tried to configure as a stonith resource (since it is listed by sudo crm ra list stonith), so I have tried to include primitive stonith_fence_scsi stonith:redhat/fence_scsi The problem I'm having with this is that I don't know how to indicate params for the resource (I have tried params devices="...", params -d ..., but they are not accepted) and with this (default) configuration I get: pr 24 14:39:14 myotis51 lrmd: [6759]: debug: on_msg_perform_op: add an operation operation monitor[5] on stonith_fence_scsi for client 6763, its parameters: crm_feature_set=[3.0.5] CRM_meta_timeout=[2] to the operation list. Apr 24 14:39:14 myotis51 lrmd: [6759]: info: rsc:stonith_fence_scsi probe[5] (pid 10434) Apr 24 14:39:14 myotis51 lrmd: [10434]: ERROR: get_stonith_provider: No such device: redhat/fence_scsi Apr 24 14:39:14 myotis51 lrm-stonith: [10434]: ERROR: execra: No such legacy stonith device: redhat/fence_scsi Apr 24 14:39:14 myotis51 lrm-stonith: [10434]: debug: execra: stonith_fence_scsi_monitor returned -12 Apr 24 14:39:14 myotis51 lrmd: [6759]: WARN: Managed stonith_fence_scsi:monitor process 10434 exited with return code 7. Apr 24 14:39:14 myotis51 lrmd: [6759]: info: operation monitor[5] on stonith_fence_scsi for client 6763: pid 10434 exited with return code 7 Apr 24 14:39:14 myotis51 crmd: [6763]: debug: create_operation_update: do_update_resource: Updating resouce stonith_fence_scsi after complete monitor op (interval=0) Apr 24 14:39:14 myotis51 crmd: [6763]: info: process_lrm_event: LRM operation stonith_fence_scsi_monitor_0 (call=5, rc=7, cib-update=57, confirmed=true) not running I'm trying to use fence_scsi because I'm planning to use a shared storage (accesed via scsi fibre channel) and I don't want to use CLVM (because I need lvm snapshots, not supported by clvm), so I need a fencing device avoiding to concurrently use the same scsi devices in both nodes. Any idea on how to use fence_scsi? Or I could use any other fence/stonith device? Which one do you recommend? -- Angel L. Mateo Martínez Sección de Telemática Área de Tecnologías de la Información y las Comunicaciones Aplicadas (ATICA) http://www.um.es/atica Tfo: 868889150 Fax: 86337 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] pacemaker monitoring user permision denied
Hi Andrew, is 1.1.10-rc1 a working title or can the package be found somewhere? I saw that on http://clusterlabs.org/rpm-next/rhel-6/x86_64/ there is a new 1.1.9 build. Is this a new snapshop build (e.g. having memory leak corrections)? Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Andrew Beekhof [mailto:and...@beekhof.net] Gesendet: Dienstag, 23. April 2013 01:46 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] pacemaker monitoring user permision denied On 23/04/2013, at 1:45 AM, Wolfgang Routschka wrote: > Hi everbody, > > I want to monitor our pacemaker/cman cluster on scientific linux 6.4 RHEL clone with nagios . > > After reading documentation http://clusterlabs.org/doc/acls.html and > configuration my nagios user isn´t able to start crm_mon > > "Attempting connection to the cluster...Could not establish cib_ro connection: Permission denied (13)" > > User is in haclient group > > [nagios@xx ~]$ id > uid=510(nagios) gid=310(nagios) Gruppen=310(nagios),498(haclient) This is a known issue that has been fixed in 1.1.10-rc1 > > I used Pacemaker 1.1.8-7.el6.x86_64 > > My CIB schema is configured for pacemaker-1.2 > > > enable acl is configured > > crm configure show > > property $id="cib-bootstrap-options" \ > dc-version="1.1.8-7.el6-394e906" \ > cluster-infrastructure="cman" \ > no-quorum-policy="ignore" \ > stonith-enabled="false" \ > enable-acl="true" > > Greetings > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Enabling debugging with cman, corosync, pacemaker at runtime
Hi Andrew, thank you for that hint. Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Andrew Beekhof [mailto:and...@beekhof.net] Gesendet: Dienstag, 23. April 2013 01:45 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] Enabling debugging with cman, corosync, pacemaker at runtime Depending on your version of pacemaker you can do... # Enable trace logging (if it isn't already) killall -USR1 process_name # Dump trace logging to disk killall -TRAP process_name # Find out what file it was dumped to grep blackbox /var/log/messages # Read it qb-blackbox /path/to/file Subsequent calls to "killall -TRAP ..." will have only logs since the last dump. On 23/04/2013, at 2:41 AM, Andreas Mock wrote: > Hi all, > > is there a way to enable debug output on a cman, corosync, pacemaker > stack without restarting the whole cman stuff. > > I've found for cluster.conf, assuming that this > determines the value of the config-db-keys cluster.logging.debug=on > logging.debug=on > > Is it enough to write new values to these keys? > Or do I have to notify one or several processes to react on this change? > > Best regards > Andreas Mock > > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Enabling debugging with cman, corosync, pacemaker at runtime
Hi Michael, hi all others, I've to admit that I was to stupid to interpret (yes yes, there is a distinction between reading and understanding) the man page correctly. So, for the protocol, what I've done. * edit cluster.conf, insert or change entries. Update attribute config_version="XX" by one. * distribute cluster.conf via rsync or whatever to all nodes when not using ricci * issuing a cman_tool -r -S version That's it. A big thank you to Michael. Best regards Andreas Von: Andreas Mock [mailto:andreas.m...@web.de] Gesendet: Montag, 22. April 2013 19:39 An: 'The Pacemaker cluster resource manager' Betreff: Re: [Pacemaker] Enabling debugging with cman, corosync, pacemaker at runtime Hi Michael, this doesn't seem to have the desired effect. Please enlight me how to change the cluster.conf and telling all participants to react on that change. Best regards Andreas Mock Von: Andreas Mock [mailto:andreas.m...@web.de] Gesendet: Montag, 22. April 2013 19:11 An: 'The Pacemaker cluster resource manager' Betreff: Re: [Pacemaker] Enabling debugging with cman, corosync, pacemaker at runtime Hi Michael, thank you. I'll have a look at it. Best regards Andreas Mock Von: Michael Schwartzkopff [mailto:mi...@clusterbau.com] Gesendet: Montag, 22. April 2013 18:51 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] Enabling debugging with cman, corosync, pacemaker at runtime Am Montag, 22. April 2013, 18:41:33 schrieb Andreas Mock: > Hi all, > > is there a way to enable debug output on a cman, corosync, pacemaker > stack without restarting the whole cman stuff. > > I've found for cluster.conf, assuming that this > determines the value of the config-db-keys > cluster.logging.debug=on > logging.debug=on > > Is it enough to write new values to these keys? > Or do I have to notify one or several processes to react on this change? > > Best regards > Andreas Mock cman_tool -r -S For the details see man cman_tool. -- Dr. Michael Schwartzkopff Guardinistr. 63 81375 München Tel: (0163) 172 50 98 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Enabling debugging with cman, corosync, pacemaker at runtime
Hi Michael, this doesn't seem to have the desired effect. Please enlight me how to change the cluster.conf and telling all participants to react on that change. Best regards Andreas Mock Von: Andreas Mock [mailto:andreas.m...@web.de] Gesendet: Montag, 22. April 2013 19:11 An: 'The Pacemaker cluster resource manager' Betreff: Re: [Pacemaker] Enabling debugging with cman, corosync, pacemaker at runtime Hi Michael, thank you. I'll have a look at it. Best regards Andreas Mock Von: Michael Schwartzkopff [mailto:mi...@clusterbau.com] Gesendet: Montag, 22. April 2013 18:51 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] Enabling debugging with cman, corosync, pacemaker at runtime Am Montag, 22. April 2013, 18:41:33 schrieb Andreas Mock: > Hi all, > > is there a way to enable debug output on a cman, corosync, pacemaker > stack without restarting the whole cman stuff. > > I've found for cluster.conf, assuming that this > determines the value of the config-db-keys > cluster.logging.debug=on > logging.debug=on > > Is it enough to write new values to these keys? > Or do I have to notify one or several processes to react on this change? > > Best regards > Andreas Mock cman_tool -r -S For the details see man cman_tool. -- Dr. Michael Schwartzkopff Guardinistr. 63 81375 München Tel: (0163) 172 50 98 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Enabling debugging with cman, corosync, pacemaker at runtime
Hi Michael, thank you. I'll have a look at it. Best regards Andreas Mock Von: Michael Schwartzkopff [mailto:mi...@clusterbau.com] Gesendet: Montag, 22. April 2013 18:51 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] Enabling debugging with cman, corosync, pacemaker at runtime Am Montag, 22. April 2013, 18:41:33 schrieb Andreas Mock: > Hi all, > > is there a way to enable debug output on a cman, corosync, pacemaker > stack without restarting the whole cman stuff. > > I've found for cluster.conf, assuming that this > determines the value of the config-db-keys > cluster.logging.debug=on > logging.debug=on > > Is it enough to write new values to these keys? > Or do I have to notify one or several processes to react on this change? > > Best regards > Andreas Mock cman_tool -r -S For the details see man cman_tool. -- Dr. Michael Schwartzkopff Guardinistr. 63 81375 München Tel: (0163) 172 50 98 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Enabling debugging with cman, corosync, pacemaker at runtime
Hi all, is there a way to enable debug output on a cman, corosync, pacemaker stack without restarting the whole cman stuff. I've found for cluster.conf, assuming that this determines the value of the config-db-keys cluster.logging.debug=on logging.debug=on Is it enough to write new values to these keys? Or do I have to notify one or several processes to react on this change? Best regards Andreas Mock ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues
Hi Andrew, is the bug fix in 1.1.9 for RHEL6.4? Have you an idea when 1.1.20 will be released? Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Andrew Beekhof [mailto:and...@beekhof.net] Gesendet: Samstag, 20. April 2013 12:04 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues On 19/04/2013, at 11:28 AM, pavan tc wrote: > Yes, but looking at the code it should be impossible. > Would it be possible for you to add: > > export PCMK_trace_functions=peer_update_callback > > to /etc/sysconfig/pacemaker and re-test (and send me the new logs - probably in /var/log/pacemaker.log)? > > > Sorry about the delay. > > I have put these in place and am running tests now. The next time I hit this, I'll post the messages. Another user hit the same issue and was able to reproduce. You can see the resolution at https://bugzilla.redhat.com/show_bug.cgi?id=951340 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] crmsh: location preference for ms-resource
Hi all, thanks to the search capabilities provided by gossamer-threads I could find a solution provided by 'andreas at hastexo'. Thanks to him: http://www.gossamer-threads.com/lists/linuxha/pacemaker/80964?search_string= crm%20master%20location;#80964 --- location avoid_being_the_master ms_MySQL \ rule $role=Master -1000: #uname eq my_node location never_be_the_master ms_MySQL \ rule $role=Master -inf: #uname eq my_node --- Nice evening Andreas Mock -Ursprüngliche Nachricht- Von: Andreas Mock [mailto:andreas.m...@web.de] Gesendet: Donnerstag, 18. April 2013 17:01 An: 'The Pacemaker cluster resource manager' Betreff: [Pacemaker] crmsh: location preference for ms-resource Hi all, can someone tell me how I can configure in crmsh a node preference for a multistate resource in state promoted, so that the master starts preferably on a certain node? Hints very welcome? (I've to admit that I couldn't get it puzzled out with the crm online help, I'm too stupid) Best regards Andreas Mock ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] crmsh: location preference for ms-resource
Hi all, can someone tell me how I can configure in crmsh a node preference for a multistate resource in state promoted, so that the master starts preferably on a certain node? Hints very welcome? (I've to admit that I couldn't get it puzzled out with the crm online help, I'm too stupid) Best regards Andreas Mock ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] pcs equivalent of crm configure erase
Thank you for the links. Best regards Andreas Mock -Ursprüngliche Nachricht- Von: T. [mailto:nos...@godawa.de] Gesendet: Mittwoch, 17. April 2013 21:44 An: pacema...@clusterlabs.org Betreff: Re: [Pacemaker] pcs equivalent of crm configure erase Hi, > Can you please point me to a repository where I can find crmsh fitting > to RHEL6.4 or clones? haven't looked if there is a repo-file, I just installed via RPM: http://download.opensuse.org/repositories/network:/ha-clustering/CentOS_Cent OS-6/x86_64/crmsh-1.2.5-55.3.x86_64.rpm http://download.opensuse.org/repositories/network:/ha-clustering/CentOS_Cent OS-6/x86_64/pssh-2.3.1-15.1.x86_64.rpm -- To Answer please replace "invalid" with "de" ! Zum Antworten bitte "invalid" durch "de" ersetzen ! Chau y hasta luego, Thorolf ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] CentOS 6.4 - pacemaker 1.1.8 - heartbeat
Hi Thorolf, ah, ok. You meant hearbeat 1. Yes, this is really pre-pacemaker-time ;-) Best regards Andreas -Ursprüngliche Nachricht- Von: T. [mailto:nos...@godawa.de] Gesendet: Mittwoch, 17. April 2013 21:41 An: pacema...@clusterlabs.org Betreff: Re: [Pacemaker] CentOS 6.4 - pacemaker 1.1.8 - heartbeat Hi, > So, I don't understand why the usage of crm-shell in your case is more > complicated? because in the "past", with the heartbeat (1) I was used, I only had to put my resources into a file and sync it to the other node. For me this was easier to understand and I hadn't the config issues I have now with the crm shell (see my other post). But the new HA is much more flexible and modern, than the old one, I was using for the last 6 years or longer. -- Chau y hasta luego, Thorolf ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] CentOS 6.4 - pacemaker 1.1.8 - heartbeat
Hi Thorolf, both solutions heartbeat + pacemaker and corosync + pacemaker use pacemaker which can be configured using crm-shell. So, I don't understand why the usage of crm-shell in your case is more complicated? (besides the fact that you can only make a two node cluster with heartbeat). Best regards Andreas Mock -Ursprüngliche Nachricht- Von: T. [mailto:nos...@godawa.de] Gesendet: Mittwoch, 17. April 2013 18:47 An: pacema...@clusterlabs.org Betreff: Re: [Pacemaker] CentOS 6.4 - pacemaker 1.1.8 - heartbeat Hi, > No one else using pacemaker and heartbeat on CentOS 6.4? no, I switched to corosync/pacemaker, but it has not only advantages. For me, the configuration is much more powerfull, but also more complicated via the crm-shell. -- Chau y hasta luego, Thorolf ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] pcs equivalent of crm configure erase
Hi all, thank you for your hints. Can you please point me to a repository where I can find crmsh fitting to RHEL6.4 or clones? Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Vadym Chepkov [mailto:vchep...@gmail.com] Gesendet: Mittwoch, 17. April 2013 18:13 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] pcs equivalent of crm configure erase On Apr 17, 2013, at 11:57 AM, T. wrote: > Hi, > >> b) If I can't do it woith pcs, is there a reliable and secure way to >> do it with pacemaker low level tools? > why not just installing the crmsh from a different repository? > > This is what I have done on CentOS 6.4. My sentiments exactly. And "erase" is not the most important missed functionality. crm configure save, crm configure load (update | replace) is what made configurations easily manageable and trackable with a version control software. Cheers, Vadym ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] pcs: Return code handling not clean
Hi Chris, just seen in the github repo - which I found after posting here - that you made a fix. Thank you for the very fast reaction. Best regards Andreas -Ursprüngliche Nachricht- Von: Chris Feist [mailto:cfe...@redhat.com] Gesendet: Mittwoch, 17. April 2013 00:34 An: The Pacemaker cluster resource manager; Andreas Mock Betreff: Re: [Pacemaker] pcs: Return code handling not clean On 04/16/13 06:46, Andreas Mock wrote: > Hi all, > > as I don't really know, where to address this issue, I do post it > here. On the one handside as an information for guys scripting with > the help of 'pcs' and on the other handside with the hope that one > maintainer is listening and will have a look at this. > > Problem: When cluster is down a 'pcs resource' > shows an error message coming from a subprocess call of 'crm_resource > -L' but exits with an error code of 0. That's something which can be > improved. Especially while the python code does have error handling in > other paces. > > So I guess it is a simple oversight. > > Look at the following piece of code in > pcs/resource.py: > > 915 if len(argv) == 0: > 916 args = ["crm_resource","-L"] > 917 output,retval = utils.run(args) > 918 preg = re.compile(r'.*(stonith:.*)') > 919 for line in output.split('\n'): > 920 if not preg.match(line) and line != "": > 921 print line > 922 return > > retval is totally ignored, while being handled on other places. That > leads to the fact that the script returns with status 0. This is an oversight on my part, I've updated the code to check retval and return an error. Currently I'm not passing through the full error code (I'm only returning 0 on success and 1 on failure). However, if you think it would be useful to have this information I would be happy to look at it and see what I can do. I'm planning on eventually having pcs interpret the crm_resource error code and provide a more user friendly output instead of just a return code. Thanks, Chris > > Interestingly the error handling of the utils.run call used all over > the module is IMHO a little bit inconsistent. > If I remember correctly Andrew did some efforts in the past to have a > set of return codes comming from the base cibXXX and crm_XXX tools. (I > really don't know how much they are differentiated). Why not pass them > through? > > Best regards > Andreas Mock > > > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] pcs equivalent of crm configure erase
Hi Chris, I would like to see something where you can start your pacemaker configuration (only) from scratch. In a way, so that you know nothing is left (constraints, etc.). Best regards Andreas -Ursprüngliche Nachricht- Von: Chris Feist [mailto:cfe...@redhat.com] Gesendet: Mittwoch, 17. April 2013 00:23 An: The Pacemaker cluster resource manager Cc: Andreas Mock Betreff: Re: [Pacemaker] pcs equivalent of crm configure erase On 04/14/13 02:52, Andreas Mock wrote: > Hi all, > > can someone tell me what the pcs equivalent to > > crm configure erase is? From my understanding, 'crm configure erase' will remove everything from the configuration file except for the nodes. Are you trying to clear your configuration out and start from scratch? pcs has a destroy command (pcs cluster destroy), which will remove all pacemaker/corosync configuration and allow you to create your cluster from scratch. Is this what you're looking for? Or do you need a specific command to keep the cluster running, but reset the cib to its defaults? Thanks! Chris > > Is there a pcs cheat sheet showing the common tasks? > > Or a documentation? > > Best regards > > Andreas > > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] pcs equivalent of crm configure erase
Hi Rastislav, thank you for your hints. In this case, only to rely on pcs, I could probably use the following to get the list of resources: pcs resource show --all | perl -M5.010 -ane 'say $F[1] if $F[0] eq "Resource:"' Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Rasto Levrinc [mailto:rasto.levr...@gmail.com] Gesendet: Dienstag, 16. April 2013 10:45 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] pcs equivalent of crm configure erase On Tue, Apr 16, 2013 at 9:38 AM, Andreas Mock wrote: > Hi all, > > I try to bring that topic up once again because > it's still unresolved for me: > > a) How can I do the equivalent of 'crm configure erase' > in pcs? Is there a way? > > b) If I can't do it woith pcs, is there a reliable > and secure way to do it with pacemaker low level tools? I don't think so. cibadmin has a drastic version of erase, but this is probably not what you want. If you don't want to use any higher level tools, the best way is to probably make a loop and use pcs to remove the resources, since it also removes also the constraints, not sure about other objects. something like: for r in `crm_resource -l`; do pcs resource delete $r; done But test it first, I haven't used pcs myself yet. Rasto -- Dipl.-Ing. Rastislav Levrinc rasto.levr...@gmail.com Linux Cluster Management Console http://lcmc.sf.net/ ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Pacemaker configuration with different dependencies
Hi Ivor, I don't know whether I understand you completely right: If you want independence of resources don't put them into a group. Look at http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Pacemaker_Explain ed/ch10.html A group is made to tie together several resources without declaring all necessary colocations and orderings to get a desired behaviour. Otherwise. Name your resources ans how they should be spread across your cluster. (Show the technical dependency) Best regards Andreas Von: Ivor Prebeg [mailto:ivor.pre...@gmail.com] Gesendet: Dienstag, 16. April 2013 13:53 An: pacemaker@oss.clusterlabs.org Betreff: [Pacemaker] Pacemaker configuration with different dependencies Hi guys, I need some help with pacemaker configuration, it is all new to me and can't find solution... I have two-node HA environment with services that I want to be partially independent, in pacemaker/heartbeat configuration. There is active/active sip service with two floating IPs, it should all just migrate floating ip when one sip dies. There is also two active/active master/slave services with java container and rdbms with replication between them, should also fallback when one dies. What I can't figure out how to configure those two to be independent (put on-fail directive on group). What I want is to, e.g., in case my sip service fails, java container stays active on that node, but floating ip to be moved to other node. Another thing is, in case one of rdbms fails, I want to put whole service group on that node to standby, but leave sip service intact. Whole node should go to standby (all services down) only when L3_ping to gateway dies. All suggestions and configuration examples are welcome. Thanks in advance. Ivor Prebeg ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] pcs: Return code handling not clean
Hi all, as I don't really know, where to address this issue, I do post it here. On the one handside as an information for guys scripting with the help of 'pcs' and on the other handside with the hope that one maintainer is listening and will have a look at this. Problem: When cluster is down a 'pcs resource' shows an error message coming from a subprocess call of 'crm_resource -L' but exits with an error code of 0. That's something which can be improved. Especially while the python code does have error handling in other paces. So I guess it is a simple oversight. Look at the following piece of code in pcs/resource.py: 915 if len(argv) == 0: 916 args = ["crm_resource","-L"] 917 output,retval = utils.run(args) 918 preg = re.compile(r'.*(stonith:.*)') 919 for line in output.split('\n'): 920 if not preg.match(line) and line != "": 921 print line 922 return retval is totally ignored, while being handled on other places. That leads to the fact that the script returns with status 0. Interestingly the error handling of the utils.run call used all over the module is IMHO a little bit inconsistent. If I remember correctly Andrew did some efforts in the past to have a set of return codes comming from the base cibXXX and crm_XXX tools. (I really don't know how much they are differentiated). Why not pass them through? Best regards Andreas Mock ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] pcs equivalent of crm configure erase
Hi all, I try to bring that topic up once again because it's still unresolved for me: a) How can I do the equivalent of 'crm configure erase' in pcs? Is there a way? b) If I can't do it woith pcs, is there a reliable and secure way to do it with pacemaker low level tools? Thank you in advance. Best regards Andreas -Ursprüngliche Nachricht- Von: Andrew Beekhof [mailto:and...@beekhof.net] Gesendet: Montag, 15. April 2013 05:49 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] pcs equivalent of crm configure erase On 14/04/2013, at 5:52 PM, Andreas Mock wrote: > Hi all, > > can someone tell me what the pcs equivalent to > crm configure erase is? > > Is there a pcs cheat sheet showing the common tasks? > Or a documentation? "pcs help" should be reasonably informative, but I don't see anything equivalent Chris? > > Best regards > Andreas > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] RHEL6.x dependency between 2-node-settings for cman and quorum settings in pacemaker
Hi Andrew, that means (when I understand it right), that with this setting you get two different semantics about what the cluster knows about itself. With setting of a+c as recommended by you the 2-node-cluster does not get quorum in case only one node survives, but ignores that info. With the setting of b) the cluster does get quorum even when only one node is left. In this case I need not set c) as pacemaker believes having quorum (told by cman). Is this right? Best regards Andreas -Ursprüngliche Nachricht- Von: Andrew Beekhof [mailto:and...@beekhof.net] Gesendet: Montag, 15. April 2013 05:58 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] RHEL6.x dependency between 2-node-settings for cman and quorum settings in pacemaker On 12/04/2013, at 4:58 PM, Andreas Mock wrote: > Hi all, > > another question rised up while reading documentation concerning > 2-node-cluster under RHEL6.x with CMAN and pacemaker. > > a) In the quick start guide one of the things you set is > CMAN_QUORUM_TIMEOUT=0 in /etc/sysconfig/cman to get one node of the > cluster up without waiting for quorum. (Correct me if my understanding > is wrong) > > b) There is a special setting in cluster.conf expected_votes="1" > which allows one node to gain quorum in > a two node cluster (Please also correct me here if my understanding is > wrong) > > c) And there is a pacemaker setting > no-quorum-policy which is mostly set to 'ignore' in all startup > tutorials. > > My question: I would like to understand how these settings influence > each other and/or are dependent. a) allows "service cman start" to complete (and therefor allow "service pacemaker start" to begin) before quorum has arrived. b) is a possible alternative to a) but I've never tested it because it is superseded by c) and in fact makes c) meaningless since the cluster always has quorum. a+c is preferred for consistency with clusters of more than 2 nodes. > > As most insight as possible appreciated. ;-) > > Best regards > Andreas > > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] pcs equivalent of crm configure erase
Hi Andrew, the emphasis lies on ' reasonably'... ;-) I'll see whether someone can show hints. Best regards Andreas -Ursprüngliche Nachricht- Von: Andrew Beekhof [mailto:and...@beekhof.net] Gesendet: Montag, 15. April 2013 05:49 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] pcs equivalent of crm configure erase On 14/04/2013, at 5:52 PM, Andreas Mock wrote: > Hi all, > > can someone tell me what the pcs equivalent to crm configure erase is? > > Is there a pcs cheat sheet showing the common tasks? > Or a documentation? "pcs help" should be reasonably informative, but I don't see anything equivalent Chris? > > Best regards > Andreas > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Disable startup fencing with cman
Hi Andrew, thank you for your answers (to all of my questions). My problem is, I have both nodes down. Now I have to start one node without the other. And I know that the cluster is configured to stonith. How do I change the meta attribute of the stonith device without starting the one node and therefore pacemaker to do the mentioned change? Best regards Andreas -Ursprüngliche Nachricht- Von: Andrew Beekhof [mailto:and...@beekhof.net] Gesendet: Montag, 15. April 2013 02:09 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] Disable startup fencing with cman On 14/04/2013, at 6:47 PM, Andreas Mock wrote: > Hi all, > > in a two node cluster (RHEL6.x, cman, pacemaker) when I startup the > very first node, this node will try to fence the other node if it > can't see it. > This can be true in case of maintenance. How do I avoid this startup > fencing temporarily when I know that the other node is down? Set the target-role for your fencing device(s) to Stopped and use stonith_admin --confirm ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Disable startup fencing with cman
Hi all, in a two node cluster (RHEL6.x, cman, pacemaker) when I startup the very first node, this node will try to fence the other node if it can't see it. This can be true in case of maintenance. How do I avoid this startup fencing temporarily when I know that the other node is down? Best regards Andreas ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] pcs equivalent of crm configure erase
Hi all, can someone tell me what the pcs equivalent to crm configure erase is? Is there a pcs cheat sheet showing the common tasks? Or a documentation? Best regards Andreas ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] RHEL6.x dependency between 2-node-settings for cman and quorum settings in pacemaker
Hi all, another question rised up while reading documentation concerning 2-node-cluster under RHEL6.x with CMAN and pacemaker. a) In the quick start guide one of the things you set is CMAN_QUORUM_TIMEOUT=0 in /etc/sysconfig/cman to get one node of the cluster up without waiting for quorum. (Correct me if my understanding is wrong) b) There is a special setting in cluster.conf which allows one node to gain quorum in a two node cluster (Please also correct me here if my understanding is wrong) c) And there is a pacemaker setting no-quorum-policy which is mostly set to 'ignore' in all startup tutorials. My question: I would like to understand how these settings influence each other and/or are dependent. As most insight as possible appreciated. ;-) Best regards Andreas ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] RHEL6 and clones: CMAN needed anyway?
Hi Andrew, once again thank you for the fast response. My English seems to be not good enough. Therefore I would like to recap your answer in my words to be sure what you meant. a) CMAN will die. On the long term there will be corosync and pacemaker. That means option 3 of this document (http://theclusterguy.clusterlabs.org/post/34604901720/pacemaker-and-cluster -filesystems) is the target architecture. b) As I haven't tested yet I assume there will be an ERROR message when starting CMAN in addition to corosync and pacemaker. Is that what you mean? Best regards Andreas -Ursprüngliche Nachricht- Von: Andrew Beekhof [mailto:and...@beekhof.net] Gesendet: Dienstag, 9. April 2013 12:48 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] RHEL6 and clones: CMAN needed anyway? On 09/04/2013, at 8:07 PM, "Andreas Mock" wrote: > Hi all, > > after reading several docs on clusterlabs.org and trying to > understand how all pieces fit together, there is one question > remaining (you know: I understate ;-)): > > If I don't want to use any cluster-FS do I really need CMAN > on RHEL6.x and clones or is it enough to let corosync and > pacemaker play together? I wouldn't rely on other options continuing to be available in the long-term. There should already be a large ERROR to this effect when the plugin starts. > > Is fencing and fencing agents independent of CMAN? Correct, in both cases the Pacemaker equivalents are used > > Best regards > Andreas > > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Pacemaker 1.1.9 for RHEL 6.x and clones
Thank you. Andreas -Ursprüngliche Nachricht- Von: Alexandr A. Alexandrov [mailto:shurr...@gmail.com] Gesendet: Dienstag, 9. April 2013 10:26 An: pacemaker@oss.clusterlabs.org Betreff: Re: [Pacemaker] Pacemaker 1.1.9 for RHEL 6.x and clones Hi Andreas! For this purpose I put resources into 'unmanaged' state with 'crm resource unmanage ' - and after that tou can start/stop pacemaker/corosync without interrupting running resources. 09.04.2013 11:44, Andreas Mock пишет: > What would be the right procedure to restart pacemaker > freeing lost memory without interrupting cluster operation? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] RHEL6 and clones: CMAN needed anyway?
Hi all, after reading several docs on clusterlabs.org and trying to understand how all pieces fit together, there is one question remaining (you know: I understate ;-)): If I don't want to use any cluster-FS do I really need CMAN on RHEL6.x and clones or is it enough to let corosync and pacemaker play together? Is fencing and fencing agents independent of CMAN? Best regards Andreas ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org