Re: [Pacemaker] don't want to restart clone resource

2012-03-30 Thread Fanghao Sha
Hi Andrew,
The problem is not resolved, and I have upgraded the bugzilla.
http://bugs.clusterlabs.org/show_bug.cgi?id=5038
Appreciate your reply. :)

2012/3/28 Fanghao Sha 

> Hi Andrew,
> After your patch, I encountered a new problem,
> and I have reported it in  bugzilla.
> http://bugs.clusterlabs.org/show_bug.cgi?id=5038
>
>
> 2012/2/23 Fanghao Sha 
>
>> Hi Andrew,
>> Hi Lars,
>> I have reported it in bugzilla.
>> http://bugs.clusterlabs.org/show_bug.cgi?id=5038
>>
>>
>> 2012/2/13 Andrew Beekhof 
>>
>>> On Wed, Feb 8, 2012 at 5:48 PM, Fanghao Sha 
>>> wrote:
>>> > Hi Andrew,
>>> > Is crm_report included in pacemaker-1.0.12-1.el5.centos?
>>> > I couldn't find it.
>>>
>>> /headslap
>>>
>>> I added it to the source but neglected to actually install it.
>>>
>>> hb_report should be available though
>>>
>>> >
>>> >
>>> > 2012/2/4 Andrew Beekhof 
>>> >>
>>> >> On Fri, Feb 3, 2012 at 9:35 PM, Fanghao Sha 
>>> wrote:
>>> >> > Sorry, I don't know how to file a bug,
>>> >>
>>> >> See the links at the bottom of every mail on this list?
>>> >>
>>> >> > and i have only "messages" file.
>>> >>
>>> >> man crm_report
>>> >>
>>> >> >
>>> >> > I have tried to set clone-max=3, and after removing node-1, the
>>> clone
>>> >> > resource running on node-2 has not restart.
>>> >> > But when I add another node-3 to cluster with "hb_addnode", the
>>> clone
>>> >> > resource running on node-2 became orphaned and restart.
>>> >> >
>>> >> > As attached "messages" file,
>>> >> > I couldn't understand this line:
>>> >> > "find_clone: Internally renamed node-app-rsc:2 on node-2 to
>>> >> > node-app-rsc:3
>>> >> > (ORPHAN)".
>>> >> >
>>> >> > 2012/2/2 Andrew Beekhof 
>>> >> >>
>>> >> >> On Thu, Feb 2, 2012 at 4:57 AM, Lars Ellenberg
>>> >> >>  wrote:
>>> >> >> > On Wed, Feb 01, 2012 at 03:43:55PM +0100, Andreas Kurz wrote:
>>> >> >> >> Hello,
>>> >> >> >>
>>> >> >> >> On 02/01/2012 10:39 AM, Fanghao Sha wrote:
>>> >> >> >> > Hi Lars,
>>> >> >> >> >
>>> >> >> >> > Yes, you are right. But how to prevent the "orphaned"
>>> resources
>>> >> >> >> > from
>>> >> >> >> > stopping by default, please?
>>> >> >> >>
>>> >> >> >> crm configure property stop-orphan-resources=false
>>> >> >> >
>>> >> >> > Well, sure. But for "normal" ophans,
>>> >> >> > you actually want them to be stopped.
>>> >> >> >
>>> >> >> > No, pacemaker needs some additional smarts to recognize
>>> >> >> > that there actually are no orphans, maybe by first relabling,
>>> >> >> > and only then checking for instance label > clone-max.
>>> >> >>
>>> >> >> Instance label doesn't come into the equation.
>>> >> >> It might look like it does on the outside, but its more complicated
>>> >> >> than
>>> >> >> that.
>>> >> >>
>>> >> >> >
>>> >> >> > Did you file a bugzilla?
>>> >> >> > Has that made progress?
>>> >> >> >
>>> >> >> >
>>> >> >> > --
>>> >> >> > : Lars Ellenberg
>>> >> >> > : LINBIT | Your Way to High Availability
>>> >> >> > : DRBD/HA support and consulting http://www.linbit.com
>>> >> >> >
>>> >> >> > ___
>>> >> >> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> >> >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> >> >> >
>>> >> >> > Project Home: http://www.clusterlabs.org
>>> >> >> > Getting started:
>>> >> >> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> >> >> > Bugs: http://bugs.clusterlabs.org
>>> >> >>
>>> >> >> ___
>>> >> >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> >> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> >> >>
>>> >> >> Project Home: http://www.clusterlabs.org
>>> >> >> Getting started:
>>> >> >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> >> >> Bugs: http://bugs.clusterlabs.org
>>> >> >
>>> >> >
>>> >> >
>>> >> > ___
>>> >> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> >> >
>>> >> > Project Home: http://www.clusterlabs.org
>>> >> > Getting started:
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> >> > Bugs: http://bugs.clusterlabs.org
>>> >> >
>>> >>
>>> >> ___
>>> >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> >>
>>> >> Project Home: http://www.clusterlabs.org
>>> >> Getting started:
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> >> Bugs: http://bugs.clusterlabs.org
>>> >
>>> >
>>> >
>>> > ___
>>> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> >
>>> > Project Home: http://www.clusterlabs.org
>>> > Getting started:
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> > Bugs: http://bugs.clusterlabs.org
>>> >
>>>
>>> ___
>>> Pacemaker mailing li

Re: [Pacemaker] manually failing back resources when set sticky

2012-03-30 Thread Phillip Frost

On Mar 30, 2012, at 2:35 PM, Florian Haas wrote:

> On Fri, Mar 30, 2012 at 8:26 PM, Brian J. Murrell  
> wrote:
>> 
>> The question is, what is the proper administrative command(s) to move
>> the resource back to it's "primary" after I have manually determined
>> that that node is OK after coming back from a failure?
> 
> crm configure rsc_defaults resource-stickiness=0
> 
> ... and then when resources have moved back, set it to 1000 again.
> It's really that simple. :)

What if some resources are more sticky than others, and don't simply inherit 
the default?

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] manually failing back resources when set sticky

2012-03-30 Thread Brian J. Murrell
On 12-03-30 02:35 PM, Florian Haas wrote:
> 
> crm configure rsc_defaults resource-stickiness=0
> 
> ... and then when resources have moved back, set it to 1000 again.
> It's really that simple. :)

That sounds racy.  I am changing a parameter which has the potential to
affect the stickiness of all resources for a (hopefully brief) period of
time.  If there is some other fail{ure,over} transaction in play while I
do this I might adversely affect my policy of no-automatic-failback
mightn't I?

Since this suggestion is also non-atomic, meaning I set a contraint,
wait for the result of the change in allocation due to that setting and
then "undo" it when the allocation change has completed, wouldn't I just
be better to use "crm resource migrate FOO" and then monitor for the
reallocation and then remove the "cli-standby-FOO" constraint when it
has?  Wouldn't this effect your suggestion in the same non-atomic manner
but be sure to only affect the one resource I am trying to fail back?

Cheers,
b.



signature.asc
Description: OpenPGP digital signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] manually failing back resources when set sticky

2012-03-30 Thread Florian Haas
On Fri, Mar 30, 2012 at 8:26 PM, Brian J. Murrell  wrote:
> In my cluster configuration, each resource can be run on one of two node
> and I designate a "primary" and a "secondary" using location constraints
> such as:
>
> location FOO-primary FOO 20: bar1
> location FOO-secondary FOO 10: bar2
>
> And I also set a default stickiness to prevent auto-fail-back (i.e. to
> prevent flapping):
>
> rsc_defaults $id="rsc-options" resource-stickiness="1000"
>
> This all works as I expect.  Resources run where I expect them to while
> everything is operating normally and when a node fails the resource
> migrates to the secondary and stays there even when the primary node
> comes back.
>
> The question is, what is the proper administrative command(s) to move
> the resource back to it's "primary" after I have manually determined
> that that node is OK after coming back from a failure?
>
> I figure I could just create a new resource constraint, wait for the
> migration and then remove it, but I just wonder if there is a more
> atomic "move back to your preferred node" command I can issue.

crm configure rsc_defaults resource-stickiness=0

... and then when resources have moved back, set it to 1000 again.
It's really that simple. :)

Cheers,
Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] manually failing back resources when set sticky

2012-03-30 Thread Brian J. Murrell
In my cluster configuration, each resource can be run on one of two node
and I designate a "primary" and a "secondary" using location constraints
such as:

location FOO-primary FOO 20: bar1
location FOO-secondary FOO 10: bar2

And I also set a default stickiness to prevent auto-fail-back (i.e. to
prevent flapping):

rsc_defaults $id="rsc-options" resource-stickiness="1000"

This all works as I expect.  Resources run where I expect them to while
everything is operating normally and when a node fails the resource
migrates to the secondary and stays there even when the primary node
comes back.

The question is, what is the proper administrative command(s) to move
the resource back to it's "primary" after I have manually determined
that that node is OK after coming back from a failure?

I figure I could just create a new resource constraint, wait for the
migration and then remove it, but I just wonder if there is a more
atomic "move back to your preferred node" command I can issue.

Cheers,
b.



signature.asc
Description: OpenPGP digital signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Nodes not rejoining cluster

2012-03-30 Thread Florian Haas
On Fri, Mar 30, 2012 at 7:45 PM, Gregg Stock  wrote:
> The full shutdown and restart fixed it.

Hrm. So it's transient after all. Andrew, think you nailed that one
with the commit I referred to upthread, or do you call heisenbug?

Cheers,
Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Nodes not rejoining cluster

2012-03-30 Thread Gregg Stock

The full shutdown and restart fixed it.

Thanks for your help.

On 3/30/2012 9:33 AM, Florian Haas wrote:

On Fri, Mar 30, 2012 at 6:09 PM, Gregg Stock  wrote:

That looks good. They were all the same and had the correct ip addresses.

So you've got both healthy rings, and all 5 nodes have 5 members in
the membership list?

Then this would make it a Pacemaker problem. IIUC the code causing
Pacemaker to discard the update from a node that is "not in our
membership" has actually been removed from 1.1.7[1] so an upgrade may
not be a bad idea, but you'll probably have to wait for a few more
days until packages become available.

Still, out of curiosity, and since you're saying this is a test
cluster: what happens if you shut down corosync and Pacemaker on *all*
the nodes, and bring it back up?

We've had a few people report these "not in our membership" issues on
the list before, and they seem to appear in a very sporadic and
transient fashion, so the root cause (which may well be totally
trivial) hasn't really been found out -- as far as I can tell, at
least. Hence, my question of whether the issue persists after a full
cluster shutdown.

Florian

[1] 
https://github.com/ClusterLabs/pacemaker/commit/03f6105592281901cc10550b8ad19af4beb5f72f
-- note Andrew will rightfully flame me to a crisp if I've
misinterpreted that commit, so caveat lector. :)



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Nodes not rejoining cluster

2012-03-30 Thread Florian Haas
On Fri, Mar 30, 2012 at 6:09 PM, Gregg Stock  wrote:
> That looks good. They were all the same and had the correct ip addresses.

So you've got both healthy rings, and all 5 nodes have 5 members in
the membership list?

Then this would make it a Pacemaker problem. IIUC the code causing
Pacemaker to discard the update from a node that is "not in our
membership" has actually been removed from 1.1.7[1] so an upgrade may
not be a bad idea, but you'll probably have to wait for a few more
days until packages become available.

Still, out of curiosity, and since you're saying this is a test
cluster: what happens if you shut down corosync and Pacemaker on *all*
the nodes, and bring it back up?

We've had a few people report these "not in our membership" issues on
the list before, and they seem to appear in a very sporadic and
transient fashion, so the root cause (which may well be totally
trivial) hasn't really been found out -- as far as I can tell, at
least. Hence, my question of whether the issue persists after a full
cluster shutdown.

Florian

[1] 
https://github.com/ClusterLabs/pacemaker/commit/03f6105592281901cc10550b8ad19af4beb5f72f
-- note Andrew will rightfully flame me to a crisp if I've
misinterpreted that commit, so caveat lector. :)

-- 
Need help with High Availability?
http://www.hastexo.com/now

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Nodes not rejoining cluster

2012-03-30 Thread Gregg Stock

That looks good. They were all the same and had the correct ip addresses.

On 3/30/2012 9:01 AM, Florian Haas wrote:

On Fri, Mar 30, 2012 at 5:38 PM, Gregg Stock  wrote:

I took the last 200 lines of each.

Can you check the health of the Corosync membership, as per this URL?

http://www.hastexo.com/resources/hints-and-kinks/checking-corosync-cluster-membership

Do _all_ nodes agree on the health of the rings, and on the cluster member list?

Florian



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Nodes not rejoining cluster

2012-03-30 Thread Florian Haas
On Fri, Mar 30, 2012 at 5:38 PM, Gregg Stock  wrote:
> I took the last 200 lines of each.

Can you check the health of the Corosync membership, as per this URL?

http://www.hastexo.com/resources/hints-and-kinks/checking-corosync-cluster-membership

Do _all_ nodes agree on the health of the rings, and on the cluster member list?

Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] configuring ocf:heartbeat:conntrackd

2012-03-30 Thread Dejan Muhamedagic
Hi,

On Thu, Mar 22, 2012 at 12:32:44PM +0100, Kevin COUSIN wrote:
> Hello,
> 
> I try to use the ocf:heartbeat:conntrackd  resource on a CentOS 6 two nodes 
> cluster. I don't understant how works the conntrackd resource, I configured 
> it as explained in documentation, and start a conntrackd daemon whith an lsb 
> script. When I try a takeover, the resource kill the daemon on nodes and 
> don't restart it, and the resource failed.
> 
> Here is my configuration :
> 
> ms MS_CONNTRACKD SUIVI_CONNEXIONS \
> meta notify="true" interleave="true"
> primitive SUIVI_CONNEXIONS ocf:heartbeat:conntrackd \
> params conntrackd="/usr/sbin/conntrackd" 
> config="/etc/conntrackd/conntrackd.conf" \
> op monitor interval="20" role="Slave" timeout="20" \
> op monitor interval="10" role="Master" timeout="20"
> 

Did you check the logs? The answer should be there. If not, then
the conntrackd RA probably needs fixing.

Thanks,

Dejan

> Thanks for help 
> 
> 
> 
>Kevin C.
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Using shadow configurations noninteractively

2012-03-30 Thread Dejan Muhamedagic
On Wed, Mar 21, 2012 at 12:21:55PM -0400, Phillip Frost wrote:
> On Mar 19, 2012, at 4:30 PM, Florian Haas wrote:
> > On Mon, Mar 19, 2012 at 9:00 PM, Phil Frost  
> > wrote:
> >> On Mar 19, 2012, at 15:22 , Florian Haas wrote:
> >>> On Mon, Mar 19, 2012 at 8:00 PM, Phil Frost  
> >>> wrote:
>  
>  Normally I'd expect some command-line option, but I can't find any. It 
>  does look like it sets the environment variable "CIB_shadow". Is that 
>  all there is to it? Is it safe to rely on that behavior?
> >>> 
> >>> I've never tried this specific use case, so bear with me while I go
> >>> out on a limb, but the crm shell is fully scriptable. Thus you
> >>> *should* be able to generate a full-blown crm script, with "cib foo"
> >>> commands and whathaveyou, in a temporary file, and then just do "crm <
> >>> /path/to/temp/file". Does that work for you?
> >> 
> >> I don't think so, because the crm shell, unlike cibadmin, has no 
> >> idempotent method of configuration I've found.
> > 
> > Huh? What's wrong with "crm configure load replace "?
> > 
> > Anyhow, I think you haven't really stated what you are trying to
> > achieve, in detail. So: what is it that you want to do exactly?
> 
> Sorry, I hadn't found that command yet. "crm configure load update " 
> seems about what need. So, when I tell puppet "there's this Xen domain called 
> foo, and it can run on xen01 or xen02", then it creates a file with a 
> primitive and two location constraints. An example of one such file:
> 
> 8<--
> primitive nagios.macprofessionals.lan ocf:heartbeat:Xen \
> params \
> xmfile="/etc/xen/nagios.macprofessionals.lan.cfg" \
> name="nagios.macprofessionals.lan" \
> op start interval="0" timeout="60" \
> op stop interval="0" timeout="40" \
> op migrate_from interval="0" timeout="120" \
> op migrate_to interval="0" timeout="120" \
> op monitor interval="10" timeout="30"
> 
> location nagios.macprofessionals.lan-on-xenhost02.macprofessionals.lan 
> nagios.macprofessionals.lan 100: xenhost02
> 8<--
> 
> There are several such files created in /etc/xen/crm, one for each Xen domain 
> puppet knows about. Then, I load them with this script:
> 
> 8<--
> #!/bin/bash
> 
> crmdir='/etc/xen/crm'
> 
> function crm_input() {
> echo "cib delete puppet"
> echo "cib new puppet"
> 
> for f in "$crmdir"/*.crm; do
> echo configure load update "$f"
> done
> }
> 
> crm_input | crm
> 8<--
> 
> The end result here is to have, at any given time, a shadow configuration 
> which represents what Puppet, based on what it already knows about the Xen 
> domains, thinks the pacemaker configuration should be. If that differs from 
> the live configuration, an admin receives an alert, he runs ptest and reviews 
> it to make sure it isn't going to do anything horrible, and commits it. The 
> higher level goal is to not be manually poking at the pacemaker 
> configuration, because it's tedious, and people make more errors than 
> well-written tools with this sort of task.
> 
> It seems to be working fairly well. Does this seem like a reasonable approach?

Yes.

I guess that the answer to your question is "crm cib commit ".

Thanks,

Dejan



> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Patrik Rapposch is out of the office

2012-03-30 Thread Patrik . Rapposch

Ich werde ab  30.03.2012 nicht im Büro sein. Ich kehre zurück am
10.04.2012.

Please note, that I am not available till 10.04.2012. In urgent cases,
please contact Gernot Pichler (gernot.pich...@knapp.com) or Manuel Thaller
(manuel.thal...@knapp.com).


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker + Oracle

2012-03-30 Thread emmanuel segura
Hello Fernando

I think can be util for someone if you explain which was the problem

Thanks

Il giorno 30 marzo 2012 12:47, Ruwan Fernando  ha
scritto:

> I solved the issue referring log file. Thank for the help
>
>
> On Thu, Mar 29, 2012 at 5:44 PM, emmanuel segura wrote:
>
>> cat /etc/oratab
>>
>> And maybe you can post your log :-)
>>
>> Il giorno 29 marzo 2012 13:53, Ruwan Fernando  ha
>> scritto:
>>
>>> Hi,
>>> I'm working with Pacemaker Active Passive Cluster and need to use oracle
>>> as a resource to the pacemaker. my resource script is
>>> crm configureprimitive Oracle ocf:heartbeat:oracle params sid=OracleDB
>>> op monitor inetrval=120s
>>> but it is not worked for me.
>>>
>>> Can someone help out on this matter?
>>>
>>> Regards,
>>> Ruwan
>>>
>>> ___
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>
>>
>> --
>> esta es mi vida e me la vivo hasta que dios quiera
>>
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>


-- 
esta es mi vida e me la vivo hasta que dios quiera
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker + Oracle

2012-03-30 Thread Ruwan Fernando
I solved the issue referring log file. Thank for the help

On Thu, Mar 29, 2012 at 5:44 PM, emmanuel segura  wrote:

> cat /etc/oratab
>
> And maybe you can post your log :-)
>
> Il giorno 29 marzo 2012 13:53, Ruwan Fernando  ha
> scritto:
>
>> Hi,
>> I'm working with Pacemaker Active Passive Cluster and need to use oracle
>> as a resource to the pacemaker. my resource script is
>> crm configureprimitive Oracle ocf:heartbeat:oracle params sid=OracleDB op
>> monitor inetrval=120s
>> but it is not worked for me.
>>
>> Can someone help out on this matter?
>>
>> Regards,
>> Ruwan
>>
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>
>
> --
> esta es mi vida e me la vivo hasta que dios quiera
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Nodes will not promote DRBD resources to master on failover

2012-03-30 Thread Andreas Kurz
On 03/28/2012 04:56 PM, Andrew Martin wrote:
> Hi Andreas,
> 
> I disabled the DRBD init script and then restarted the slave node
> (node2). After it came back up, DRBD did not start:
> Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4): pending
> Online: [ node2 node1 ]
> 
>  Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore]
>  Masters: [ node1 ]
>  Stopped: [ p_drbd_vmstore:1 ]
>  Master/Slave Set: ms_drbd_mount1 [p_drbd_tools]
>  Masters: [ node1 ]
>  Stopped: [ p_drbd_mount1:1 ]
>  Master/Slave Set: ms_drbd_mount2 [p_drbdmount2]
>  Masters: [ node1 ]
>  Stopped: [ p_drbd_mount2:1 ]
> ...
> 
> root@node2:~# service drbd status
> drbd not loaded

Yes, expected unless Pacemaker starts DRBD

> 
> Is there something else I need to change in the CIB to ensure that DRBD
> is started? All of my DRBD devices are configured like this:
> primitive p_drbd_mount2 ocf:linbit:drbd \
> params drbd_resource="mount2" \
> op monitor interval="15" role="Master" \
> op monitor interval="30" role="Slave"
> ms ms_drbd_mount2 p_drbd_mount2 \
> meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"

That should be enough ... unable to say more without seeing the complete
configuration ... too much fragments of information ;-)

Please provide (e.g. pastebin) your complete cib (cibadmin -Q) when
cluster is in that state ... or even better create a crm_report archive

> 
> Here is the output from the syslog (grep -i drbd /var/log/syslog):
> Mar 28 09:24:47 node2 crmd: [3213]: info: do_lrm_rsc_op: Performing
> key=12:315:7:24416169-73ba-469b-a2e3-56a22b437cbc
> op=p_drbd_vmstore:1_monitor_0 )
> Mar 28 09:24:47 node2 lrmd: [3210]: info: rsc:p_drbd_vmstore:1 probe[2]
> (pid 3455)
> Mar 28 09:24:47 node2 crmd: [3213]: info: do_lrm_rsc_op: Performing
> key=13:315:7:24416169-73ba-469b-a2e3-56a22b437cbc
> op=p_drbd_mount1:1_monitor_0 )
> Mar 28 09:24:48 node2 lrmd: [3210]: info: rsc:p_drbd_mount1:1 probe[3]
> (pid 3456)
> Mar 28 09:24:48 node2 crmd: [3213]: info: do_lrm_rsc_op: Performing
> key=14:315:7:24416169-73ba-469b-a2e3-56a22b437cbc
> op=p_drbd_mount2:1_monitor_0 )
> Mar 28 09:24:48 node2 lrmd: [3210]: info: rsc:p_drbd_mount2:1 probe[4]
> (pid 3457)
> Mar 28 09:24:48 node2 Filesystem[3458]: [3517]: WARNING: Couldn't find
> device [/dev/drbd0]. Expected /dev/??? to exist
> Mar 28 09:24:48 node2 crm_attribute: [3563]: info: Invoked:
> crm_attribute -N node2 -n master-p_drbd_mount2:1 -l reboot -D
> Mar 28 09:24:48 node2 crm_attribute: [3557]: info: Invoked:
> crm_attribute -N node2 -n master-p_drbd_vmstore:1 -l reboot -D
> Mar 28 09:24:48 node2 crm_attribute: [3562]: info: Invoked:
> crm_attribute -N node2 -n master-p_drbd_mount1:1 -l reboot -D
> Mar 28 09:24:48 node2 lrmd: [3210]: info: operation monitor[4] on
> p_drbd_mount2:1 for client 3213: pid 3457 exited with return code 7
> Mar 28 09:24:48 node2 lrmd: [3210]: info: operation monitor[2] on
> p_drbd_vmstore:1 for client 3213: pid 3455 exited with return code 7
> Mar 28 09:24:48 node2 crmd: [3213]: info: process_lrm_event: LRM
> operation p_drbd_mount2:1_monitor_0 (call=4, rc=7, cib-update=10,
> confirmed=true) not running
> Mar 28 09:24:48 node2 lrmd: [3210]: info: operation monitor[3] on
> p_drbd_mount1:1 for client 3213: pid 3456 exited with return code 7
> Mar 28 09:24:48 node2 crmd: [3213]: info: process_lrm_event: LRM
> operation p_drbd_vmstore:1_monitor_0 (call=2, rc=7, cib-update=11,
> confirmed=true) not running
> Mar 28 09:24:48 node2 crmd: [3213]: info: process_lrm_event: LRM
> operation p_drbd_mount1:1_monitor_0 (call=3, rc=7, cib-update=12,
> confirmed=true) not running

No errors, just probing ... so for any reason Pacemaker does not like to
start it ... use crm_simulate to find out why ... or provide information
as requested above.

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> 
> Thanks,
> 
> Andrew
> 
> 
> *From: *"Andreas Kurz" 
> *To: *pacemaker@oss.clusterlabs.org
> *Sent: *Wednesday, March 28, 2012 9:03:06 AM
> *Subject: *Re: [Pacemaker] Nodes will not promote DRBD resources to
> master on failover
> 
> On 03/28/2012 03:47 PM, Andrew Martin wrote:
>> Hi Andreas,
>>
>>> hmm ... what is that fence-peer script doing? If you want to use
>>> resource-level fencing with the help of dopd, activate the
>>> drbd-peer-outdater script in the line above ... and double check if the
>>> path is correct
>> fence-peer is just a wrapper for drbd-peer-outdater that does some
>> additional logging. In my testing dopd has been working well.
> 
> I see
> 
>>
 I am thinking of making the following changes to the CIB (as per the
 official DRBD
 guide
>>
> http://www.drbd.org/users-guide/s-pacemaker-crm-drbd-backed-service.html) in
 order to add the DRBD lsb service and require that it start before the
 ocf:linbit:drbd resources. Does this look correct?
>>>
>>> Wher

Re: [Pacemaker] Pacemaker 1.1.7 now available

2012-03-30 Thread Florian Haas
On Fri, Mar 30, 2012 at 10:37 AM, Andrew Beekhof  wrote:
> I blogged about it, which automatically got sent to twitter, and I
> updated the IRC channel topic, but alas I forgot to mention it here
> :-)
>
> So in case you missed it, 1.1.7 is finally out.
> Special mention is due to David and Yan for the nifty features they've
> been writing lately.
> Thanks guys!

Quick question: the blog post doesn't mention libqb specifically, the
changelog says "core: *Support* libqb for logging" (as opposed to
"require") but the RPM spec file introduces a hard BuildRequires on
libqb-devel. Is this a hard dependency? IOW does libqb have to be
packaged on distros where it's not currently available, or can people
build without libqb support and still be able to use 1.1.7?

Cheers,
Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Pacemaker 1.1.7 now available

2012-03-30 Thread Andrew Beekhof
I blogged about it, which automatically got sent to twitter, and I
updated the IRC channel topic, but alas I forgot to mention it here
:-)

So in case you missed it, 1.1.7 is finally out.
Special mention is due to David and Yan for the nifty features they've
been writing lately.
Thanks guys!

The blog entry 
(http://theclusterguy.clusterlabs.org/post/20110630492/pacemaker-1-1-7-now-available)
has more details while remaining readable.
I'd encourage you to check it out there :-)

-- Andrew

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] OCF_RESKEY_CRM_meta_{ordered,notify,interleave}

2012-03-30 Thread Florian Haas
On Fri, Mar 30, 2012 at 1:12 AM, Andrew Beekhof  wrote:
> Because it was felt that RAs shouldn't need to know.
> Those options change pacemaker's behaviour, not the RAs.
>
> But subsequently, in lf#2391, you convinced us to add notify since it
> allowed the drbd agent to error out if they were not turned on.

Yes, and for ordered the motivation is exactly the same. Let me give a
bit of background info.

I'm currently working on an RA for GlusterFS volumes (the server-side
stuff, everything client side is already covered in
ocf:heartbeat:Filesystem). GlusterFS volumes are composed of "bricks",
and for every brick there's a separate process to be managed on each
cluster node. When these brick processes fail, GlusterFS has no
built-in way to recover, and that's where Pacemaker can be helpful.

Obviously, you would run that RA as a clone, on however many nodes
constitute your GlusterFS storage cluster.

Now, while brick daemons can be _monitored_ individually, they can
only be _started_ as part of the volume, with the "gluster volume
start" command. And if we "start" a volume simultaneously on multiple
nodes, GlusterFS just produces an error on all but one of them, and
that error is also a generic one and not discernible from other errors
by exit code (yes, you may rant).

So, whenever we need to start >1 clone instance, we run into this problem:

1. Check whether brick is already running.
2. No, it's not. Start volume (this leaves other bricks untouched, but
fires up the brick daemons expected to run locally).
3. Grumble. A different node just did the same thing.
4. All but one fail on start.

Yes, all this isn't necessarily wonderful design (the start volume
command could block until volume operations have completed on other
servers, or it could error out with a "try again" error, or it could
sleep randomly before retrying, or something else), but as it happens
configuring the clone as ordered makes all of this evaporate.

And it simply would be nice to be able to check whether clone ordering
is enabled, during validate.

> I'd need more information.  The RA shouldn't need to care I would have
> thought. The ordering happens in the PE/crmd, the RA should just do
> what its told.

Quite frankly, I don't quite get this segregation of "meta attributes
we expect to be relevant to the RA" and "meta attributes the RA
shouldn't care about." Can't we just have a rule that _all_ meta
attributes, like parameters, are just always available in the RA
environment with the OCF_RESKEY_CRM_meta_ prefix?

Cheers,
Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Migration of "lower" resource causes dependent resources to restart

2012-03-30 Thread Florian Haas
On Thu, Mar 29, 2012 at 8:35 AM, Andrew Beekhof  wrote:
> On Thu, Mar 29, 2012 at 5:28 PM, Vladislav Bogdanov
>  wrote:
>> Hi Andrew, all,
>>
>> Pacemaker restarts resources when resource they depend on (ordering
>> only, no colocation) is migrated.
>>
>> I mean that when I do crm resource migrate lustre, I get
>>
>> LogActions: Migrate lustre#011(Started lustre03-left -> lustre04-left)
>> LogActions: Restart mgs#011(Started lustre01-left)
>>
>> I only have one ordering constraint for these two resources:
>>
>> order mgs-after-lustre inf: lustre:start mgs:start
>>
>> This reminds me what have been with reload in a past (dependent resource
>> restart when "lower" resource is reloaded).
>>
>> Shouldn't this be changed? Migration usually means that service is not
>> interrupted...
>
> Is that strictly true?  Always?

No. Few things are always true. :) However, see below.

> My understanding was although A thinks the migration happens
> instantaneously, it is in fact more likely to be pause+migrate+resume
> and during that time anyone trying to talk to A during that time is
> going to be disappointed.

I tend to be with Vladislav on this one. The thing that most people
would expect from a "live migration" is that it's interruption free.
And what allow-migrate was first implemented for (iirc), live
migrations for Xen, does fulfill that expectation. Same thing is true
for live migrations in libvirt/KVM, and I think anyone would expect
essentially the same thing from checkpoint/restore migrations where
they're available.

So I guess it's reasonable to assume that if one resource migrates,
dependent resources need not be restarted. But since Pacemaker now
does restart them, you might need to figure out a way to preserve the
existing functionality for users who rely on that. Not sure if any do,
though.

Cheers,
Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Openais] Help on mysql-proxy resource

2012-03-30 Thread Tim Serong
Hi Carlos,

You'll have most luck with crm configuration questions on the Pacemaker
list (CC'd):

  pacemaker@oss.clusterlabs.org

I don't actually know anything about the mysql-proxy RA, but you might
have a typo.

On 03/30/2012 12:52 PM, Carlos xavier wrote:
> Hi.
> 
> I have mysql-proxy running on my system and I want to agregate it to the
> cluster configuration.
> When it is started by the system I got this as result of ps auwwwx:
> 
> root 29644  0.0  0.0  22844   844 ?S22:37   0:00
> /usr/sbin/mysql-proxy --pid-file /var/run/mysql-proxy.pid --daemon
> --proxy-lua-script

Note this is --proxy-lua-script (singular)

> /usr/share/doc/packages/mysql-proxy/examples/tutorial-basic.lua
> --proxy-backend-addresses=10.10.10.5:3306 --proxy-address=172.31.0.192:3306
> 
> So I created the following configuration at the CRM:
> 
> primitive mysql-proxy ocf:heartbeat:mysql-proxy \
> params binary="/usr/sbin/mysql-proxy"
> pidfile="/var/run/mysql-proxy.pid" proxy_backend_addresses="10.10.10.5:3306"
> proxy_address="172.31.0.191:3306" parameters="--proxy-lua-scripts
> /usr/share/doc/packages/mysql-proxy/examples/tutorial-basic.lua" \

This is --proxy-lua-scripts (plural).  I'm guessing maybe that's the
problem.

HTH,

Tim
-- 
Tim Serong
Senior Clustering Engineer
SUSE
tser...@suse.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org