Re: [Pacemaker] don't want to restart clone resource
Hi Andrew, The problem is not resolved, and I have upgraded the bugzilla. http://bugs.clusterlabs.org/show_bug.cgi?id=5038 Appreciate your reply. :) 2012/3/28 Fanghao Sha > Hi Andrew, > After your patch, I encountered a new problem, > and I have reported it in bugzilla. > http://bugs.clusterlabs.org/show_bug.cgi?id=5038 > > > 2012/2/23 Fanghao Sha > >> Hi Andrew, >> Hi Lars, >> I have reported it in bugzilla. >> http://bugs.clusterlabs.org/show_bug.cgi?id=5038 >> >> >> 2012/2/13 Andrew Beekhof >> >>> On Wed, Feb 8, 2012 at 5:48 PM, Fanghao Sha >>> wrote: >>> > Hi Andrew, >>> > Is crm_report included in pacemaker-1.0.12-1.el5.centos? >>> > I couldn't find it. >>> >>> /headslap >>> >>> I added it to the source but neglected to actually install it. >>> >>> hb_report should be available though >>> >>> > >>> > >>> > 2012/2/4 Andrew Beekhof >>> >> >>> >> On Fri, Feb 3, 2012 at 9:35 PM, Fanghao Sha >>> wrote: >>> >> > Sorry, I don't know how to file a bug, >>> >> >>> >> See the links at the bottom of every mail on this list? >>> >> >>> >> > and i have only "messages" file. >>> >> >>> >> man crm_report >>> >> >>> >> > >>> >> > I have tried to set clone-max=3, and after removing node-1, the >>> clone >>> >> > resource running on node-2 has not restart. >>> >> > But when I add another node-3 to cluster with "hb_addnode", the >>> clone >>> >> > resource running on node-2 became orphaned and restart. >>> >> > >>> >> > As attached "messages" file, >>> >> > I couldn't understand this line: >>> >> > "find_clone: Internally renamed node-app-rsc:2 on node-2 to >>> >> > node-app-rsc:3 >>> >> > (ORPHAN)". >>> >> > >>> >> > 2012/2/2 Andrew Beekhof >>> >> >> >>> >> >> On Thu, Feb 2, 2012 at 4:57 AM, Lars Ellenberg >>> >> >> wrote: >>> >> >> > On Wed, Feb 01, 2012 at 03:43:55PM +0100, Andreas Kurz wrote: >>> >> >> >> Hello, >>> >> >> >> >>> >> >> >> On 02/01/2012 10:39 AM, Fanghao Sha wrote: >>> >> >> >> > Hi Lars, >>> >> >> >> > >>> >> >> >> > Yes, you are right. But how to prevent the "orphaned" >>> resources >>> >> >> >> > from >>> >> >> >> > stopping by default, please? >>> >> >> >> >>> >> >> >> crm configure property stop-orphan-resources=false >>> >> >> > >>> >> >> > Well, sure. But for "normal" ophans, >>> >> >> > you actually want them to be stopped. >>> >> >> > >>> >> >> > No, pacemaker needs some additional smarts to recognize >>> >> >> > that there actually are no orphans, maybe by first relabling, >>> >> >> > and only then checking for instance label > clone-max. >>> >> >> >>> >> >> Instance label doesn't come into the equation. >>> >> >> It might look like it does on the outside, but its more complicated >>> >> >> than >>> >> >> that. >>> >> >> >>> >> >> > >>> >> >> > Did you file a bugzilla? >>> >> >> > Has that made progress? >>> >> >> > >>> >> >> > >>> >> >> > -- >>> >> >> > : Lars Ellenberg >>> >> >> > : LINBIT | Your Way to High Availability >>> >> >> > : DRBD/HA support and consulting http://www.linbit.com >>> >> >> > >>> >> >> > ___ >>> >> >> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> >> >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >> >> > >>> >> >> > Project Home: http://www.clusterlabs.org >>> >> >> > Getting started: >>> >> >> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> >> >> > Bugs: http://bugs.clusterlabs.org >>> >> >> >>> >> >> ___ >>> >> >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> >> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >> >> >>> >> >> Project Home: http://www.clusterlabs.org >>> >> >> Getting started: >>> >> >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> >> >> Bugs: http://bugs.clusterlabs.org >>> >> > >>> >> > >>> >> > >>> >> > ___ >>> >> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >> > >>> >> > Project Home: http://www.clusterlabs.org >>> >> > Getting started: >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> >> > Bugs: http://bugs.clusterlabs.org >>> >> > >>> >> >>> >> ___ >>> >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >> >>> >> Project Home: http://www.clusterlabs.org >>> >> Getting started: >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> >> Bugs: http://bugs.clusterlabs.org >>> > >>> > >>> > >>> > ___ >>> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> > >>> > Project Home: http://www.clusterlabs.org >>> > Getting started: >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> > Bugs: http://bugs.clusterlabs.org >>> > >>> >>> ___ >>> Pacemaker mailing li
Re: [Pacemaker] manually failing back resources when set sticky
On Mar 30, 2012, at 2:35 PM, Florian Haas wrote: > On Fri, Mar 30, 2012 at 8:26 PM, Brian J. Murrell > wrote: >> >> The question is, what is the proper administrative command(s) to move >> the resource back to it's "primary" after I have manually determined >> that that node is OK after coming back from a failure? > > crm configure rsc_defaults resource-stickiness=0 > > ... and then when resources have moved back, set it to 1000 again. > It's really that simple. :) What if some resources are more sticky than others, and don't simply inherit the default? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] manually failing back resources when set sticky
On 12-03-30 02:35 PM, Florian Haas wrote: > > crm configure rsc_defaults resource-stickiness=0 > > ... and then when resources have moved back, set it to 1000 again. > It's really that simple. :) That sounds racy. I am changing a parameter which has the potential to affect the stickiness of all resources for a (hopefully brief) period of time. If there is some other fail{ure,over} transaction in play while I do this I might adversely affect my policy of no-automatic-failback mightn't I? Since this suggestion is also non-atomic, meaning I set a contraint, wait for the result of the change in allocation due to that setting and then "undo" it when the allocation change has completed, wouldn't I just be better to use "crm resource migrate FOO" and then monitor for the reallocation and then remove the "cli-standby-FOO" constraint when it has? Wouldn't this effect your suggestion in the same non-atomic manner but be sure to only affect the one resource I am trying to fail back? Cheers, b. signature.asc Description: OpenPGP digital signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] manually failing back resources when set sticky
On Fri, Mar 30, 2012 at 8:26 PM, Brian J. Murrell wrote: > In my cluster configuration, each resource can be run on one of two node > and I designate a "primary" and a "secondary" using location constraints > such as: > > location FOO-primary FOO 20: bar1 > location FOO-secondary FOO 10: bar2 > > And I also set a default stickiness to prevent auto-fail-back (i.e. to > prevent flapping): > > rsc_defaults $id="rsc-options" resource-stickiness="1000" > > This all works as I expect. Resources run where I expect them to while > everything is operating normally and when a node fails the resource > migrates to the secondary and stays there even when the primary node > comes back. > > The question is, what is the proper administrative command(s) to move > the resource back to it's "primary" after I have manually determined > that that node is OK after coming back from a failure? > > I figure I could just create a new resource constraint, wait for the > migration and then remove it, but I just wonder if there is a more > atomic "move back to your preferred node" command I can issue. crm configure rsc_defaults resource-stickiness=0 ... and then when resources have moved back, set it to 1000 again. It's really that simple. :) Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] manually failing back resources when set sticky
In my cluster configuration, each resource can be run on one of two node and I designate a "primary" and a "secondary" using location constraints such as: location FOO-primary FOO 20: bar1 location FOO-secondary FOO 10: bar2 And I also set a default stickiness to prevent auto-fail-back (i.e. to prevent flapping): rsc_defaults $id="rsc-options" resource-stickiness="1000" This all works as I expect. Resources run where I expect them to while everything is operating normally and when a node fails the resource migrates to the secondary and stays there even when the primary node comes back. The question is, what is the proper administrative command(s) to move the resource back to it's "primary" after I have manually determined that that node is OK after coming back from a failure? I figure I could just create a new resource constraint, wait for the migration and then remove it, but I just wonder if there is a more atomic "move back to your preferred node" command I can issue. Cheers, b. signature.asc Description: OpenPGP digital signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Nodes not rejoining cluster
On Fri, Mar 30, 2012 at 7:45 PM, Gregg Stock wrote: > The full shutdown and restart fixed it. Hrm. So it's transient after all. Andrew, think you nailed that one with the commit I referred to upthread, or do you call heisenbug? Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Nodes not rejoining cluster
The full shutdown and restart fixed it. Thanks for your help. On 3/30/2012 9:33 AM, Florian Haas wrote: On Fri, Mar 30, 2012 at 6:09 PM, Gregg Stock wrote: That looks good. They were all the same and had the correct ip addresses. So you've got both healthy rings, and all 5 nodes have 5 members in the membership list? Then this would make it a Pacemaker problem. IIUC the code causing Pacemaker to discard the update from a node that is "not in our membership" has actually been removed from 1.1.7[1] so an upgrade may not be a bad idea, but you'll probably have to wait for a few more days until packages become available. Still, out of curiosity, and since you're saying this is a test cluster: what happens if you shut down corosync and Pacemaker on *all* the nodes, and bring it back up? We've had a few people report these "not in our membership" issues on the list before, and they seem to appear in a very sporadic and transient fashion, so the root cause (which may well be totally trivial) hasn't really been found out -- as far as I can tell, at least. Hence, my question of whether the issue persists after a full cluster shutdown. Florian [1] https://github.com/ClusterLabs/pacemaker/commit/03f6105592281901cc10550b8ad19af4beb5f72f -- note Andrew will rightfully flame me to a crisp if I've misinterpreted that commit, so caveat lector. :) ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Nodes not rejoining cluster
On Fri, Mar 30, 2012 at 6:09 PM, Gregg Stock wrote: > That looks good. They were all the same and had the correct ip addresses. So you've got both healthy rings, and all 5 nodes have 5 members in the membership list? Then this would make it a Pacemaker problem. IIUC the code causing Pacemaker to discard the update from a node that is "not in our membership" has actually been removed from 1.1.7[1] so an upgrade may not be a bad idea, but you'll probably have to wait for a few more days until packages become available. Still, out of curiosity, and since you're saying this is a test cluster: what happens if you shut down corosync and Pacemaker on *all* the nodes, and bring it back up? We've had a few people report these "not in our membership" issues on the list before, and they seem to appear in a very sporadic and transient fashion, so the root cause (which may well be totally trivial) hasn't really been found out -- as far as I can tell, at least. Hence, my question of whether the issue persists after a full cluster shutdown. Florian [1] https://github.com/ClusterLabs/pacemaker/commit/03f6105592281901cc10550b8ad19af4beb5f72f -- note Andrew will rightfully flame me to a crisp if I've misinterpreted that commit, so caveat lector. :) -- Need help with High Availability? http://www.hastexo.com/now ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Nodes not rejoining cluster
That looks good. They were all the same and had the correct ip addresses. On 3/30/2012 9:01 AM, Florian Haas wrote: On Fri, Mar 30, 2012 at 5:38 PM, Gregg Stock wrote: I took the last 200 lines of each. Can you check the health of the Corosync membership, as per this URL? http://www.hastexo.com/resources/hints-and-kinks/checking-corosync-cluster-membership Do _all_ nodes agree on the health of the rings, and on the cluster member list? Florian ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Nodes not rejoining cluster
On Fri, Mar 30, 2012 at 5:38 PM, Gregg Stock wrote: > I took the last 200 lines of each. Can you check the health of the Corosync membership, as per this URL? http://www.hastexo.com/resources/hints-and-kinks/checking-corosync-cluster-membership Do _all_ nodes agree on the health of the rings, and on the cluster member list? Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] configuring ocf:heartbeat:conntrackd
Hi, On Thu, Mar 22, 2012 at 12:32:44PM +0100, Kevin COUSIN wrote: > Hello, > > I try to use the ocf:heartbeat:conntrackd resource on a CentOS 6 two nodes > cluster. I don't understant how works the conntrackd resource, I configured > it as explained in documentation, and start a conntrackd daemon whith an lsb > script. When I try a takeover, the resource kill the daemon on nodes and > don't restart it, and the resource failed. > > Here is my configuration : > > ms MS_CONNTRACKD SUIVI_CONNEXIONS \ > meta notify="true" interleave="true" > primitive SUIVI_CONNEXIONS ocf:heartbeat:conntrackd \ > params conntrackd="/usr/sbin/conntrackd" > config="/etc/conntrackd/conntrackd.conf" \ > op monitor interval="20" role="Slave" timeout="20" \ > op monitor interval="10" role="Master" timeout="20" > Did you check the logs? The answer should be there. If not, then the conntrackd RA probably needs fixing. Thanks, Dejan > Thanks for help > > > >Kevin C. > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Using shadow configurations noninteractively
On Wed, Mar 21, 2012 at 12:21:55PM -0400, Phillip Frost wrote: > On Mar 19, 2012, at 4:30 PM, Florian Haas wrote: > > On Mon, Mar 19, 2012 at 9:00 PM, Phil Frost > > wrote: > >> On Mar 19, 2012, at 15:22 , Florian Haas wrote: > >>> On Mon, Mar 19, 2012 at 8:00 PM, Phil Frost > >>> wrote: > > Normally I'd expect some command-line option, but I can't find any. It > does look like it sets the environment variable "CIB_shadow". Is that > all there is to it? Is it safe to rely on that behavior? > >>> > >>> I've never tried this specific use case, so bear with me while I go > >>> out on a limb, but the crm shell is fully scriptable. Thus you > >>> *should* be able to generate a full-blown crm script, with "cib foo" > >>> commands and whathaveyou, in a temporary file, and then just do "crm < > >>> /path/to/temp/file". Does that work for you? > >> > >> I don't think so, because the crm shell, unlike cibadmin, has no > >> idempotent method of configuration I've found. > > > > Huh? What's wrong with "crm configure load replace "? > > > > Anyhow, I think you haven't really stated what you are trying to > > achieve, in detail. So: what is it that you want to do exactly? > > Sorry, I hadn't found that command yet. "crm configure load update " > seems about what need. So, when I tell puppet "there's this Xen domain called > foo, and it can run on xen01 or xen02", then it creates a file with a > primitive and two location constraints. An example of one such file: > > 8<-- > primitive nagios.macprofessionals.lan ocf:heartbeat:Xen \ > params \ > xmfile="/etc/xen/nagios.macprofessionals.lan.cfg" \ > name="nagios.macprofessionals.lan" \ > op start interval="0" timeout="60" \ > op stop interval="0" timeout="40" \ > op migrate_from interval="0" timeout="120" \ > op migrate_to interval="0" timeout="120" \ > op monitor interval="10" timeout="30" > > location nagios.macprofessionals.lan-on-xenhost02.macprofessionals.lan > nagios.macprofessionals.lan 100: xenhost02 > 8<-- > > There are several such files created in /etc/xen/crm, one for each Xen domain > puppet knows about. Then, I load them with this script: > > 8<-- > #!/bin/bash > > crmdir='/etc/xen/crm' > > function crm_input() { > echo "cib delete puppet" > echo "cib new puppet" > > for f in "$crmdir"/*.crm; do > echo configure load update "$f" > done > } > > crm_input | crm > 8<-- > > The end result here is to have, at any given time, a shadow configuration > which represents what Puppet, based on what it already knows about the Xen > domains, thinks the pacemaker configuration should be. If that differs from > the live configuration, an admin receives an alert, he runs ptest and reviews > it to make sure it isn't going to do anything horrible, and commits it. The > higher level goal is to not be manually poking at the pacemaker > configuration, because it's tedious, and people make more errors than > well-written tools with this sort of task. > > It seems to be working fairly well. Does this seem like a reasonable approach? Yes. I guess that the answer to your question is "crm cib commit ". Thanks, Dejan > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Patrik Rapposch is out of the office
Ich werde ab 30.03.2012 nicht im Büro sein. Ich kehre zurück am 10.04.2012. Please note, that I am not available till 10.04.2012. In urgent cases, please contact Gernot Pichler (gernot.pich...@knapp.com) or Manuel Thaller (manuel.thal...@knapp.com). ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Pacemaker + Oracle
Hello Fernando I think can be util for someone if you explain which was the problem Thanks Il giorno 30 marzo 2012 12:47, Ruwan Fernando ha scritto: > I solved the issue referring log file. Thank for the help > > > On Thu, Mar 29, 2012 at 5:44 PM, emmanuel segura wrote: > >> cat /etc/oratab >> >> And maybe you can post your log :-) >> >> Il giorno 29 marzo 2012 13:53, Ruwan Fernando ha >> scritto: >> >>> Hi, >>> I'm working with Pacemaker Active Passive Cluster and need to use oracle >>> as a resource to the pacemaker. my resource script is >>> crm configureprimitive Oracle ocf:heartbeat:oracle params sid=OracleDB >>> op monitor inetrval=120s >>> but it is not worked for me. >>> >>> Can someone help out on this matter? >>> >>> Regards, >>> Ruwan >>> >>> ___ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> >>> >> >> >> -- >> esta es mi vida e me la vivo hasta que dios quiera >> >> ___ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> >> > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > > -- esta es mi vida e me la vivo hasta que dios quiera ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Pacemaker + Oracle
I solved the issue referring log file. Thank for the help On Thu, Mar 29, 2012 at 5:44 PM, emmanuel segura wrote: > cat /etc/oratab > > And maybe you can post your log :-) > > Il giorno 29 marzo 2012 13:53, Ruwan Fernando ha > scritto: > >> Hi, >> I'm working with Pacemaker Active Passive Cluster and need to use oracle >> as a resource to the pacemaker. my resource script is >> crm configureprimitive Oracle ocf:heartbeat:oracle params sid=OracleDB op >> monitor inetrval=120s >> but it is not worked for me. >> >> Can someone help out on this matter? >> >> Regards, >> Ruwan >> >> ___ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> >> > > > -- > esta es mi vida e me la vivo hasta que dios quiera > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > > ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Nodes will not promote DRBD resources to master on failover
On 03/28/2012 04:56 PM, Andrew Martin wrote: > Hi Andreas, > > I disabled the DRBD init script and then restarted the slave node > (node2). After it came back up, DRBD did not start: > Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4): pending > Online: [ node2 node1 ] > > Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore] > Masters: [ node1 ] > Stopped: [ p_drbd_vmstore:1 ] > Master/Slave Set: ms_drbd_mount1 [p_drbd_tools] > Masters: [ node1 ] > Stopped: [ p_drbd_mount1:1 ] > Master/Slave Set: ms_drbd_mount2 [p_drbdmount2] > Masters: [ node1 ] > Stopped: [ p_drbd_mount2:1 ] > ... > > root@node2:~# service drbd status > drbd not loaded Yes, expected unless Pacemaker starts DRBD > > Is there something else I need to change in the CIB to ensure that DRBD > is started? All of my DRBD devices are configured like this: > primitive p_drbd_mount2 ocf:linbit:drbd \ > params drbd_resource="mount2" \ > op monitor interval="15" role="Master" \ > op monitor interval="30" role="Slave" > ms ms_drbd_mount2 p_drbd_mount2 \ > meta master-max="1" master-node-max="1" clone-max="2" > clone-node-max="1" notify="true" That should be enough ... unable to say more without seeing the complete configuration ... too much fragments of information ;-) Please provide (e.g. pastebin) your complete cib (cibadmin -Q) when cluster is in that state ... or even better create a crm_report archive > > Here is the output from the syslog (grep -i drbd /var/log/syslog): > Mar 28 09:24:47 node2 crmd: [3213]: info: do_lrm_rsc_op: Performing > key=12:315:7:24416169-73ba-469b-a2e3-56a22b437cbc > op=p_drbd_vmstore:1_monitor_0 ) > Mar 28 09:24:47 node2 lrmd: [3210]: info: rsc:p_drbd_vmstore:1 probe[2] > (pid 3455) > Mar 28 09:24:47 node2 crmd: [3213]: info: do_lrm_rsc_op: Performing > key=13:315:7:24416169-73ba-469b-a2e3-56a22b437cbc > op=p_drbd_mount1:1_monitor_0 ) > Mar 28 09:24:48 node2 lrmd: [3210]: info: rsc:p_drbd_mount1:1 probe[3] > (pid 3456) > Mar 28 09:24:48 node2 crmd: [3213]: info: do_lrm_rsc_op: Performing > key=14:315:7:24416169-73ba-469b-a2e3-56a22b437cbc > op=p_drbd_mount2:1_monitor_0 ) > Mar 28 09:24:48 node2 lrmd: [3210]: info: rsc:p_drbd_mount2:1 probe[4] > (pid 3457) > Mar 28 09:24:48 node2 Filesystem[3458]: [3517]: WARNING: Couldn't find > device [/dev/drbd0]. Expected /dev/??? to exist > Mar 28 09:24:48 node2 crm_attribute: [3563]: info: Invoked: > crm_attribute -N node2 -n master-p_drbd_mount2:1 -l reboot -D > Mar 28 09:24:48 node2 crm_attribute: [3557]: info: Invoked: > crm_attribute -N node2 -n master-p_drbd_vmstore:1 -l reboot -D > Mar 28 09:24:48 node2 crm_attribute: [3562]: info: Invoked: > crm_attribute -N node2 -n master-p_drbd_mount1:1 -l reboot -D > Mar 28 09:24:48 node2 lrmd: [3210]: info: operation monitor[4] on > p_drbd_mount2:1 for client 3213: pid 3457 exited with return code 7 > Mar 28 09:24:48 node2 lrmd: [3210]: info: operation monitor[2] on > p_drbd_vmstore:1 for client 3213: pid 3455 exited with return code 7 > Mar 28 09:24:48 node2 crmd: [3213]: info: process_lrm_event: LRM > operation p_drbd_mount2:1_monitor_0 (call=4, rc=7, cib-update=10, > confirmed=true) not running > Mar 28 09:24:48 node2 lrmd: [3210]: info: operation monitor[3] on > p_drbd_mount1:1 for client 3213: pid 3456 exited with return code 7 > Mar 28 09:24:48 node2 crmd: [3213]: info: process_lrm_event: LRM > operation p_drbd_vmstore:1_monitor_0 (call=2, rc=7, cib-update=11, > confirmed=true) not running > Mar 28 09:24:48 node2 crmd: [3213]: info: process_lrm_event: LRM > operation p_drbd_mount1:1_monitor_0 (call=3, rc=7, cib-update=12, > confirmed=true) not running No errors, just probing ... so for any reason Pacemaker does not like to start it ... use crm_simulate to find out why ... or provide information as requested above. Regards, Andreas -- Need help with Pacemaker? http://www.hastexo.com/now > > Thanks, > > Andrew > > > *From: *"Andreas Kurz" > *To: *pacemaker@oss.clusterlabs.org > *Sent: *Wednesday, March 28, 2012 9:03:06 AM > *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to > master on failover > > On 03/28/2012 03:47 PM, Andrew Martin wrote: >> Hi Andreas, >> >>> hmm ... what is that fence-peer script doing? If you want to use >>> resource-level fencing with the help of dopd, activate the >>> drbd-peer-outdater script in the line above ... and double check if the >>> path is correct >> fence-peer is just a wrapper for drbd-peer-outdater that does some >> additional logging. In my testing dopd has been working well. > > I see > >> I am thinking of making the following changes to the CIB (as per the official DRBD guide >> > http://www.drbd.org/users-guide/s-pacemaker-crm-drbd-backed-service.html) in order to add the DRBD lsb service and require that it start before the ocf:linbit:drbd resources. Does this look correct? >>> >>> Wher
Re: [Pacemaker] Pacemaker 1.1.7 now available
On Fri, Mar 30, 2012 at 10:37 AM, Andrew Beekhof wrote: > I blogged about it, which automatically got sent to twitter, and I > updated the IRC channel topic, but alas I forgot to mention it here > :-) > > So in case you missed it, 1.1.7 is finally out. > Special mention is due to David and Yan for the nifty features they've > been writing lately. > Thanks guys! Quick question: the blog post doesn't mention libqb specifically, the changelog says "core: *Support* libqb for logging" (as opposed to "require") but the RPM spec file introduces a hard BuildRequires on libqb-devel. Is this a hard dependency? IOW does libqb have to be packaged on distros where it's not currently available, or can people build without libqb support and still be able to use 1.1.7? Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Pacemaker 1.1.7 now available
I blogged about it, which automatically got sent to twitter, and I updated the IRC channel topic, but alas I forgot to mention it here :-) So in case you missed it, 1.1.7 is finally out. Special mention is due to David and Yan for the nifty features they've been writing lately. Thanks guys! The blog entry (http://theclusterguy.clusterlabs.org/post/20110630492/pacemaker-1-1-7-now-available) has more details while remaining readable. I'd encourage you to check it out there :-) -- Andrew ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] OCF_RESKEY_CRM_meta_{ordered,notify,interleave}
On Fri, Mar 30, 2012 at 1:12 AM, Andrew Beekhof wrote: > Because it was felt that RAs shouldn't need to know. > Those options change pacemaker's behaviour, not the RAs. > > But subsequently, in lf#2391, you convinced us to add notify since it > allowed the drbd agent to error out if they were not turned on. Yes, and for ordered the motivation is exactly the same. Let me give a bit of background info. I'm currently working on an RA for GlusterFS volumes (the server-side stuff, everything client side is already covered in ocf:heartbeat:Filesystem). GlusterFS volumes are composed of "bricks", and for every brick there's a separate process to be managed on each cluster node. When these brick processes fail, GlusterFS has no built-in way to recover, and that's where Pacemaker can be helpful. Obviously, you would run that RA as a clone, on however many nodes constitute your GlusterFS storage cluster. Now, while brick daemons can be _monitored_ individually, they can only be _started_ as part of the volume, with the "gluster volume start" command. And if we "start" a volume simultaneously on multiple nodes, GlusterFS just produces an error on all but one of them, and that error is also a generic one and not discernible from other errors by exit code (yes, you may rant). So, whenever we need to start >1 clone instance, we run into this problem: 1. Check whether brick is already running. 2. No, it's not. Start volume (this leaves other bricks untouched, but fires up the brick daemons expected to run locally). 3. Grumble. A different node just did the same thing. 4. All but one fail on start. Yes, all this isn't necessarily wonderful design (the start volume command could block until volume operations have completed on other servers, or it could error out with a "try again" error, or it could sleep randomly before retrying, or something else), but as it happens configuring the clone as ordered makes all of this evaporate. And it simply would be nice to be able to check whether clone ordering is enabled, during validate. > I'd need more information. The RA shouldn't need to care I would have > thought. The ordering happens in the PE/crmd, the RA should just do > what its told. Quite frankly, I don't quite get this segregation of "meta attributes we expect to be relevant to the RA" and "meta attributes the RA shouldn't care about." Can't we just have a rule that _all_ meta attributes, like parameters, are just always available in the RA environment with the OCF_RESKEY_CRM_meta_ prefix? Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Migration of "lower" resource causes dependent resources to restart
On Thu, Mar 29, 2012 at 8:35 AM, Andrew Beekhof wrote: > On Thu, Mar 29, 2012 at 5:28 PM, Vladislav Bogdanov > wrote: >> Hi Andrew, all, >> >> Pacemaker restarts resources when resource they depend on (ordering >> only, no colocation) is migrated. >> >> I mean that when I do crm resource migrate lustre, I get >> >> LogActions: Migrate lustre#011(Started lustre03-left -> lustre04-left) >> LogActions: Restart mgs#011(Started lustre01-left) >> >> I only have one ordering constraint for these two resources: >> >> order mgs-after-lustre inf: lustre:start mgs:start >> >> This reminds me what have been with reload in a past (dependent resource >> restart when "lower" resource is reloaded). >> >> Shouldn't this be changed? Migration usually means that service is not >> interrupted... > > Is that strictly true? Always? No. Few things are always true. :) However, see below. > My understanding was although A thinks the migration happens > instantaneously, it is in fact more likely to be pause+migrate+resume > and during that time anyone trying to talk to A during that time is > going to be disappointed. I tend to be with Vladislav on this one. The thing that most people would expect from a "live migration" is that it's interruption free. And what allow-migrate was first implemented for (iirc), live migrations for Xen, does fulfill that expectation. Same thing is true for live migrations in libvirt/KVM, and I think anyone would expect essentially the same thing from checkpoint/restore migrations where they're available. So I guess it's reasonable to assume that if one resource migrates, dependent resources need not be restarted. But since Pacemaker now does restart them, you might need to figure out a way to preserve the existing functionality for users who rely on that. Not sure if any do, though. Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [Openais] Help on mysql-proxy resource
Hi Carlos, You'll have most luck with crm configuration questions on the Pacemaker list (CC'd): pacemaker@oss.clusterlabs.org I don't actually know anything about the mysql-proxy RA, but you might have a typo. On 03/30/2012 12:52 PM, Carlos xavier wrote: > Hi. > > I have mysql-proxy running on my system and I want to agregate it to the > cluster configuration. > When it is started by the system I got this as result of ps auwwwx: > > root 29644 0.0 0.0 22844 844 ?S22:37 0:00 > /usr/sbin/mysql-proxy --pid-file /var/run/mysql-proxy.pid --daemon > --proxy-lua-script Note this is --proxy-lua-script (singular) > /usr/share/doc/packages/mysql-proxy/examples/tutorial-basic.lua > --proxy-backend-addresses=10.10.10.5:3306 --proxy-address=172.31.0.192:3306 > > So I created the following configuration at the CRM: > > primitive mysql-proxy ocf:heartbeat:mysql-proxy \ > params binary="/usr/sbin/mysql-proxy" > pidfile="/var/run/mysql-proxy.pid" proxy_backend_addresses="10.10.10.5:3306" > proxy_address="172.31.0.191:3306" parameters="--proxy-lua-scripts > /usr/share/doc/packages/mysql-proxy/examples/tutorial-basic.lua" \ This is --proxy-lua-scripts (plural). I'm guessing maybe that's the problem. HTH, Tim -- Tim Serong Senior Clustering Engineer SUSE tser...@suse.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org