from:"Andreas Mock"

Re: [Pacemaker] Some questions on the currenct state

2015-01-12 Thread Andreas Mock

Hi David,

thank you for your answers.

Best regards
Andreas Mock


-Ursprüngliche Nachricht-
Von: David Vossel [mailto:dvos...@redhat.com] 
Gesendet: Montag, 12. Januar 2015 18:28
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] Some questions on the currenct state



- Original Message -
> Hi Trevor,
> 
> thank you for answering so fast.
> 
> 2) Besides the fact that rpm packages are available do you know how to 
> make rpm packages from git repository?

./autogen.sh && ./configure && make rpm

That will generate rpms from the source tree.

> 4) Is RHEL 7.x using corosync 2.x and pacemaker plugin for cluster 
> membership?

no. RHEL 7.x uses corosync 2.x and the new corosync vote quorum api.
The plugins are a thing of the past for rhel7.

> Best regards
> Andreas Mock
> 
> 
> > -Ursprüngliche Nachricht-
> > Von: Trevor Hemsley [mailto:thems...@voiceflex.com]
> > Gesendet: Montag, 12. Januar 2015 16:42
> > An: The Pacemaker cluster resource manager
> > Betreff: Re: [Pacemaker] Some questions on the currenct state
> > 
> > On 12/01/15 15:09, Andreas Mock wrote:
> > > Hi all,
> > >
> > > almost allways when I'm forced to do some major upgrades to our 
> > > core machines in terms of hardware and/or software (OS) I'm forced 
> > > to have a look at the current state of pacemaker based HA. Things 
> > > are going on and things change. Projects converge and diverge, 
> > > tool(s)/chains come and go and distributions marketing strategies 
> > > change. Therefor I want to ask the following question in the hope 
> > > list members deeply involved can answer easily.
> > >
> > > 1) Are there pacemaker packages für RHEL 6.6 and clones?
> > > When yes where?
> > 
> > In the CentOS (etc) base/updates repos. For RHEL they're in the HA 
> > channel.
> > 
> > >
> > > 2) How can I create a pacemaker package 1.1.12 on my own from the 
> > > git sources?
> > It's already in base/updates.
> > 
> > >
> > > 3) How can I get the current versions of pcs and/or crmsh?
> > > Is pcs competitive to crmsh meanwhile?
> > pcs is in el6.6 and now includes pcsd. You can get crmsh from an 
> > opensuse build repo for el6.
> > >
> > > 4) Is the pacemaker HA solution of RHEL 7.x still bound to use of 
> > > cman?
> > No
> > >
> > > 5) Where can I find a currenct workable version of the agents for 
> > > RHEL 6.6 (and clones) and RHEL 7.x?
> > Probably you want the resource-agents package.
> > 
> > T
> > 
> > ___
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > 
> > Project Home: http://www.clusterlabs.org Getting started: 
> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Some questions on the currenct state

2015-01-12 Thread Andreas Mock

Hi Trevor,

thank you for answering so fast.

2) Besides the fact that rpm packages are available do
you know how to make rpm packages from git repository?

4) Is RHEL 7.x using corosync 2.x and pacemaker plugin
for cluster membership?

Best regards
Andreas Mock


> -Ursprüngliche Nachricht-
> Von: Trevor Hemsley [mailto:thems...@voiceflex.com]
> Gesendet: Montag, 12. Januar 2015 16:42
> An: The Pacemaker cluster resource manager
> Betreff: Re: [Pacemaker] Some questions on the currenct state
> 
> On 12/01/15 15:09, Andreas Mock wrote:
> > Hi all,
> >
> > almost allways when I'm forced to do some major upgrades
> > to our core machines in terms of hardware and/or software (OS)
> > I'm forced to have a look at the current state of pacemaker
> > based HA. Things are going on and things change. Projects
> > converge and diverge, tool(s)/chains come and go and
> > distributions marketing strategies change. Therefor I want
> > to ask the following question in the hope list members
> > deeply involved can answer easily.
> >
> > 1) Are there pacemaker packages für RHEL 6.6 and clones?
> > When yes where?
> 
> In the CentOS (etc) base/updates repos. For RHEL they're in the HA
> channel.
> 
> >
> > 2) How can I create a pacemaker package 1.1.12 on my own from
> > the git sources?
> It's already in base/updates.
> 
> >
> > 3) How can I get the current versions of pcs and/or crmsh?
> > Is pcs competitive to crmsh meanwhile?
> pcs is in el6.6 and now includes pcsd. You can get crmsh from an
> opensuse build repo for el6.
> >
> > 4) Is the pacemaker HA solution of RHEL 7.x still bound to use
> > of cman?
> No
> >
> > 5) Where can I find a currenct workable version of the agents
> > for RHEL 6.6 (and clones) and RHEL 7.x?
> Probably you want the resource-agents package.
> 
> T
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Some questions on the currenct state

2015-01-12 Thread Andreas Mock

Hi all,

almost allways when I'm forced to do some major upgrades
to our core machines in terms of hardware and/or software (OS)
I'm forced to have a look at the current state of pacemaker 
based HA. Things are going on and things change. Projects
converge and diverge, tool(s)/chains come and go and
distributions marketing strategies change. Therefor I want
to ask the following question in the hope list members
deeply involved can answer easily.

1) Are there pacemaker packages für RHEL 6.6 and clones?
When yes where?

2) How can I create a pacemaker package 1.1.12 on my own from
the git sources?

3) How can I get the current versions of pcs and/or crmsh?
Is pcs competitive to crmsh meanwhile?

4) Is the pacemaker HA solution of RHEL 7.x still bound to use
of cman?

5) Where can I find a currenct workable version of the agents
for RHEL 6.6 (and clones) and RHEL 7.x?

It would be really nice if someone could give answers or
helpful pointers for answering the questions on my own.

Thank you all in advance.

Best regards
Andreas Mock



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Enabling pacemaker debug logging while running

2014-03-24 Thread Andreas Mock

Hi Andrew,

thank you for your answer. I found that blog entry before.

I'm pretty sure I'm too stupid to get my information out
of that blog entry.

You write there:
"[...] If the level of detail in the cluster log file is still insufficient,
or you simply wish to go blind, you can turn on debugging in Corosync/CMAN,
or set PCMK_debug in /etc/sysconfig/pacemaker.[...]".

I did enable the debug option in cman as I described in my
initial post. But it seemed that this option change was only
propagated to corosync but not to pacemaker (and resource
agents). Does this and the reference to PCMK_debug mean that
I can't enable debugging in pacemaker without restart?

Or is the "only" option the backbox feature?

Best regards
Andreas Mock

-Ursprüngliche Nachricht-
Von: Andrew Beekhof [mailto:and...@beekhof.net] 
Gesendet: Montag, 24. März 2014 00:36
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] Enabling pacemaker debug logging while running

On 20 Mar 2014, at 11:24 pm, Andreas Mock  wrote:

> Hi all,
> 
> today I faced a problem which I couldn't solve reading
> several man pages and other found hint on the web.
> 
> I have a clone of RHEL 6.5, cman based cluster and 
> pacemaker 1.1.10+. I was able to change the value
> debug="on" in cluster.conf as described in the man page.
> I was able to propagate this change with 'cman_tool -r -S version'.
> The result was, that I could see debug messages from
> the corosync layer, vut not from pacemaker and agents.
> 
> What do I have to do to enable debug logging of pacemaker at
> runtime? (And how can I switch it off afterwards?)

http://blog.clusterlabs.org/blog/2013/pacemaker-logging/

> 
> Best regards
> Andreas Mock
> 
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Enabling pacemaker debug logging while running

2014-03-20 Thread Andreas Mock

Hi all,

today I faced a problem which I couldn't solve reading
several man pages and other found hint on the web.

I have a clone of RHEL 6.5, cman based cluster and 
pacemaker 1.1.10+. I was able to change the value
debug="on" in cluster.conf as described in the man page.
I was able to propagate this change with 'cman_tool -r -S version'.
The result was, that I could see debug messages from
the corosync layer, vut not from pacemaker and agents.

What do I have to do to enable debug logging of pacemaker at
runtime? (And how can I switch it off afterwards?)

Best regards
Andreas Mock



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Stoping clone on one node

2013-10-17 Thread Andreas Mock

Hi all,

probably a totally stupid question:

But how can I stop a clone resource on one
node? Is there a way with crm?

The only thing which comes to my mind is
creating a -inf location contraint temporarily.

Best regards
Andreas Mock



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Solving a resource allocation problem

2013-09-19 Thread Andreas Mock

Hi Lars,

that's why I wrote: The interested reader of that list does
now know why I tried crm_simulate...  :-)

Thank you
Andreas Mock


-Ursprüngliche Nachricht-
Von: Lars Marowsky-Bree [mailto:l...@suse.com] 
Gesendet: Donnerstag, 19. September 2013 12:18
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] Solving a resource allocation problem

On 2013-09-19T12:12:31, Andreas Mock  wrote:

> For a solution where I like to push a certain resource
> to the new node (this service interruption doesn't
> hurt too much) while being sure that the other gets
> started on the newly upcoming node I have to balance
> the stickiness and negative constraint scores.

"negative" constraint scores are always absolute.

You can set stickiness per resource. So for that that you want shifted,
just set it to zero, and to non-zero for the others.

Utilization can be used to perform the load balancing bit.

Is that not working?

> Therefore I would like to see the simulated scores.

That's something you've got a separate thread going for. ;-)


Regards,
Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer,
HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Solving a resource allocation problem

2013-09-19 Thread Andreas Mock

Hi Lars,

no you're not missing something. 
I just intermixed two acceptable solutions and
the way I asked for it.

So, for letting the resources stay where they are,
you're absolutly right.

For a solution where I like to push a certain resource
to the new node (this service interruption doesn't
hurt too much) while being sure that the other gets
started on the newly upcoming node I have to balance
the stickiness and negative constraint scores.
Therefore I would like to see the simulated scores.

Thank you for answering.

Best regards
Andreas Mock

-Ursprüngliche Nachricht-
Von: Lars Marowsky-Bree [mailto:l...@suse.com] 
Gesendet: Donnerstag, 19. September 2013 11:08
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] Solving a resource allocation problem

On 2013-09-19T10:20:07, Andreas Mock  wrote:

> Hi all,
> 
> I need a hint how to solve a resource allocation problem
> on a two node cluster (pmck 1.1.11).
> 
> I have two resource blocks (some stacked resources colocation inf)
> which shall run on seperate nodes. I did this with a small negativ
> colocation constraint. This works so far.
> 
> But now I want to achieve the following. When one node is
> brought down all resources are moved correctly. But when I
> bring that node up again, than all resources which where
> on that node are pushed back because of that negative colocation.
> 
> I would like the cluster to leave the resources on that one node
> and manually migrate (rebalance) the resources avoiding another
> interrupt of service. 

Right, so your anti-colocation constraint is not actually a hard
requirement, you want it to be optional - scatter resources if possible,
but don't restart resources for it.

You can do this using the utilization feature,
placement-strategy="balanced" and resource-stickiness=inf.

Or am I missing something still?

Regards,
Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer,
HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Solving a resource allocation problem

2013-09-19 Thread Andreas Mock

Hi all,

I need a hint how to solve a resource allocation problem
on a two node cluster (pmck 1.1.11).

I have two resource blocks (some stacked resources colocation inf)
which shall run on seperate nodes. I did this with a small negativ
colocation constraint. This works so far.

But now I want to achieve the following. When one node is
brought down all resources are moved correctly. But when I
bring that node up again, than all resources which where
on that node are pushed back because of that negative colocation.

I would like the cluster to leave the resources on that one node
and manually migrate (rebalance) the resources avoiding another
interrupt of service. 

I thought that resource stickyness is the right feature for that.
But as the resource blocks are a little complicated resource
calculation is not straight forward.

How can I solve that problem?
Is resource stickyness applied even in a case where the
node is brought down manually?

(The interested readers now know why I want to simulate this issue.)

Best regards
Andreas Mock



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Howto test/simulate the reaction of the cluster to node up and down

2013-09-19 Thread Andreas Mock

Hi Lars, hi Andrew,

thank you for your answers.
But I'm still stuck.

When I do have both nodes online and the resources
are spread over these nodes and I do a
crm_simulate -Ls -R -d node1
I do see nicly what would happen to the cluster when the
node goes down. Allocation scores and a transition summary
showing the movements of the resources.

But in the case vice versa, that means the node is down
(service pacemaker stop) and I want to simulate the "going online
of the node" with
crm_simulate -Ls -R -u node1
I do see the current cluster status, the scores (without node being
online (=> -INFINITY) and no transitions.

It looks like another state transition is missing and I only
see the result of one of one or more steps involved.

Best regards
Andreas Mock

-Ursprüngliche Nachricht-
Von: Lars Marowsky-Bree [mailto:l...@suse.com] 
Gesendet: Donnerstag, 19. September 2013 09:20
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] Howto test/simulate the reaction of the cluster to
node up and down

On 2013-09-17T13:37:54, Andreas Mock  wrote:

> I have the problem that after a node rejoins the cluster some
> resources are move back to that node. 
> Now I want to see the calculated scores to see where I do
> have to adjust the stickyness to get the behaviour I like.
> 
> I'm not sure how to use crm_simulate to get these values.
> When both nodes are online I can simulate a node down
> by crm_simulate -Ls -d .
> But how do I simulate thr transition from a state where one
> node is down? When I bring down a node by 'service pacemaker stop'
> and try a crm_simulate -Ls -u  I don't see resource transitions.

crm cib cibstatus
crm(live)cib cibstatus# node hex-1 online
crm(live)cib cibstatus# simulate nograph scores

For more details, see "help simulate"

Regards,
Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer,
HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Howto test/simulate the reaction of the cluster to node up and down

2013-09-18 Thread Andreas Mock

Hallo Andreas,

thank you for your reply.
I use 1.1.11-git.

What I did: I put one node down (servive pacemaker stop) and then
execute a crm_simulate -Ls -u node and I only see the output
as said before. When I bring up the node in reality pacemaker
is moving resources to that node. The output of crm_simulate
doen't reflect these operations.

I don't know whether I do something wrong or I'm hitting a bug.

Anyway, thank you.

Best regards
Andreas Mock


-Ursprüngliche Nachricht-
Von: Andreas Kurz [mailto:andr...@hastexo.com] 
Gesendet: Mittwoch, 18. September 2013 15:45
An: pacemaker@oss.clusterlabs.org
Betreff: Re: [Pacemaker] Howto test/simulate the reaction of the cluster to
node up and down

On 2013-09-18 15:08, Andreas Mock wrote:
> Hi all,
> 
> really nobody here with deeper experience of crm_simulate?
> Or with a hint for good documentation?

What Pacemaker version are you using? I did a quick test here on older
1.1.6 and 1.1.7 clusters and they show a nice output on "crm_simulate
-Ls -u testnode" with transitions and scores.

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> 
> Best regards
> Andreas Mock
> 
> 
> -Ursprüngliche Nachricht-
> Von: Andreas Mock [mailto:andreas.m...@web.de] 
> Gesendet: Dienstag, 17. September 2013 13:38
> An: 'The Pacemaker cluster resource manager'
> Betreff: [Pacemaker] Howto test/simulate the reaction of the cluster to
node
> up and down
> 
> Hi all,
> 
> I have the problem that after a node rejoins the cluster some
> resources are move back to that node. 
> Now I want to see the calculated scores to see where I do
> have to adjust the stickyness to get the behaviour I like.
> 
> I'm not sure how to use crm_simulate to get these values.
> When both nodes are online I can simulate a node down
> by crm_simulate -Ls -d .
> But how do I simulate thr transition from a state where one
> node is down? When I bring down a node by 'service pacemaker stop'
> and try a crm_simulate -Ls -u  I don't see resource transitions.
> I only see:
> --8<
> Performing requested modifications
>  + Bringing node dis04 online
> --8<
> 
> Any hints appreciated.
> 
> Best regards
> Andreas Mock
> 
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 






___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Howto test/simulate the reaction of the cluster to node up and down

2013-09-18 Thread Andreas Mock

Hi all,

really nobody here with deeper experience of crm_simulate?
Or with a hint for good documentation?

Best regards
Andreas Mock


-Ursprüngliche Nachricht-
Von: Andreas Mock [mailto:andreas.m...@web.de] 
Gesendet: Dienstag, 17. September 2013 13:38
An: 'The Pacemaker cluster resource manager'
Betreff: [Pacemaker] Howto test/simulate the reaction of the cluster to node
up and down

Hi all,

I have the problem that after a node rejoins the cluster some
resources are move back to that node. 
Now I want to see the calculated scores to see where I do
have to adjust the stickyness to get the behaviour I like.

I'm not sure how to use crm_simulate to get these values.
When both nodes are online I can simulate a node down
by crm_simulate -Ls -d .
But how do I simulate thr transition from a state where one
node is down? When I bring down a node by 'service pacemaker stop'
and try a crm_simulate -Ls -u  I don't see resource transitions.
I only see:
--8<
Performing requested modifications
 + Bringing node dis04 online
--8<

Any hints appreciated.

Best regards
Andreas Mock



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Howto test/simulate the reaction of the cluster to node up and down

2013-09-17 Thread Andreas Mock

Hi all,

I have the problem that after a node rejoins the cluster some
resources are move back to that node. 
Now I want to see the calculated scores to see where I do
have to adjust the stickyness to get the behaviour I like.

I'm not sure how to use crm_simulate to get these values.
When both nodes are online I can simulate a node down
by crm_simulate -Ls -d .
But how do I simulate thr transition from a state where one
node is down? When I bring down a node by 'service pacemaker stop'
and try a crm_simulate -Ls -u  I don't see resource transitions.
I only see:
--8<
Performing requested modifications
 + Bringing node dis04 online
--8<

Any hints appreciated.

Best regards
Andreas Mock



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Problems with fence_ipmilan

2013-09-17 Thread Andreas Mock

Hi Digimer,

your hint concerning acpid was very valueable.
I didn't know about that recommendation.
After disabling acpid I could stonith instantly
as I like to do.

The video has no context. It was meant to make
this dry stuff a little bit funny. IMHO worth looking
anyway.

Thank you!

Best regards
Andreas Mock

-Ursprüngliche Nachricht-
Von: Digimer [mailto:li...@alteeve.ca] 
Gesendet: Dienstag, 17. September 2013 06:37
An: The Pacemaker cluster resource manager
Cc: Andreas Mock
Betreff: Re: [Pacemaker] Problems with fence_ipmilan

On 16/09/13 16:53, Andreas Mock wrote:
> Hi all,
> 
> I'm using (want to use) RHEL 6.4 fence_ipmilan for our IBM x3650 M4 (IMM).
> My problem is the following. In contrast to the documented behaviour
> a 'chassis power off' or a 'chassis power reset' is doing a soft reset as
if
> you have pressed the on-off-button of the server. That means the
> shutdown process is initiated.
> 
> As you can imagine this is like stonithing this way:
> http://www.youtube.com/watch?v=fVJiwuk75Ig#t=1m23s
> Especially when a SAN volume is blocking in 'D' state.
> 
> What I want is a hard reset. It seems that the only solution
> at the moment is to send a 'chassis power reset'.
> fence_ipmilan doesn't support that ipmi command at the
> moment.
> 
> Has anybody experience with similar (bad) behaviour and workarounds?
> 
> Best regards
> Andreas Mock

I can't watch the video (yay hotel internet \o/), so if there is context
there, I am missing it.

The FenceAgentAPI says that "reset" should be "off -> verify -> try on
but don't care if that fails". This is because "reset" doesn't have a
verifiable "off" state.

Next is that you probably have acpid enabled. Most (all?) systems will
instantly turn off if acpid is disabled. For this reason, Red Hat
actually recommends disabling acpid to help avoid this issue.

Third; With IPMI type fence devices, there is no way to prevent one
fence from starting after another one has started because the devices
are independent. So to help deal with this, it's a good idea to set a
'delay="15"' to one of the node's fence methods. This way, if there is a
break and both nodes try to fence the other, the node with the delay
will not be fenced immediately. Say you set the delay against node 1.
Then there is a break and both start a fence. Node 2 will see that Node
1 has a delay of fifteen seconds and pauses. Node 1 will see no delay
against node 2, so it fences immediately. Node 2 will be long dead
before it's timer expires, so you avoid the dual fence. Had node 1
really crashed, node 2 would delay 15 seconds, then proceed with the fence.

digimer

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Problems with fence_ipmilan

2013-09-16 Thread Andreas Mock

Hi all,

 

I'm using (want to use) RHEL 6.4 fence_ipmilan for our IBM x3650 M4 (IMM).

My problem is the following. In contrast to the documented behaviour

a 'chassis power off' or a 'chassis power reset' is doing a soft reset as if

you have pressed the on-off-button of the server. That means the

shutdown process is initiated.

 

As you can imagine this is like stonithing this way:

http://www.youtube.com/watch?v=fVJiwuk75Ig#t=1m23s

Especially when a SAN volume is blocking in 'D' state.

 

What I want is a hard reset. It seems that the only solution

at the moment is to send a 'chassis power reset'. 

fence_ipmilan doesn't support that ipmi command at the

moment.

 

Has anybody experience with similar (bad) behaviour and workarounds?

 

Best regards

Andreas Mock

 

 

 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] CMAN nodes online

2013-09-16 Thread Andreas Mock

Hi,

 

tell us on which OS you want to install and run cman et. al.

 

Show us what you've done so far. (e.g. Communication paths,

IP addresses)

 

Best regards

Andreas Mock

 

 

Von: Gopalakrishnan N [mailto:gopalakrishnan...@gmail.com] 
Gesendet: Montag, 16. September 2013 14:01
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] CMAN nodes online

 

Again the when i restarted the pacemaker and cman not the nodes are not in
online, back to square 1. 

 

node1 shows only node1 online, and node2 says node2 online. I don't know
what's happening in the background... 

 

Any advice would be appreciated.. 

 

Thanks. 

 

On Mon, Sep 16, 2013 at 6:47 PM, Gopalakrishnan N
 wrote:

Hi guys, 

 

I got it, basically it tool some time to propogate and now two nodes are
showing online... 

 

Thanks. 

 

On Mon, Sep 16, 2013 at 6:39 PM, Gopalakrishnan N
 wrote:

I have configured CMAN as per the link
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Clusters_f
rom_Scratch/index.html#_configuring_cman but when I type cman_tools nodes
only one node is online even thought the cluster.conf is propogated in other
node as well. 

 

what could be the reason, in node1, cman_tool nodes shows only node1 online,
in node2 it shows only node2 is online. How to make two nodes as online,
even thought CMAN service is running in both nodes. 

 

Thanks in advance. 

 

Regards,

Gopal

 

 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] [Linux-HA] Probably a regression of the linbit drbd agent between pacemaker 1.1.8 and 1.1.10

2013-09-16 Thread Andreas Mock

Hi Lars, hi all,

we took the time and tested drbd 8.4.4-rc in our problematic scenario.

We were able to reproduce the promote error regularly with drbd 8.4.3.
After installing 8.4.4-rc we were not able to get this error any more.

So, concerning the changes made to get around the known race condition,
8.4.4-rc seems to work. We didn't look at other aspects of the new
version. If there is something we should test with your knowledge
let us know.

Best regards
Andreas

-Ursprüngliche Nachricht-
Von: linux-ha-boun...@lists.linux-ha.org
[mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Lars Ellenberg
Gesendet: Dienstag, 10. September 2013 14:10
An: linux...@lists.linux-ha.org; pacemaker@oss.clusterlabs.org
Betreff: Re: [Linux-HA] [Pacemaker] Probably a regression of the linbit drbd
agent between pacemaker 1.1.8 and 1.1.10

On Mon, Sep 09, 2013 at 01:41:17PM +0200, Andreas Mock wrote:
> Hi Lars,
> 
> here also my official "Thank you very much" looking
> at the problem.

> I've been looking forward to the official release
> of drbd 8.4.4.
> 
> Or do you need disoriented rc testers like me? ;-)

Why not?
That's what release candidates are intended for.
You'd only have to confirm that it works for you now.

Respectively, that it still does not,
in which case you better report that now
than after the release, right?

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
___
Linux-HA mailing list
linux...@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] [Linux-HA] Probably a regression of the linbit drbd agent between pacemaker 1.1.8 and 1.1.10

2013-09-09 Thread Andreas Mock

Hi Lars,

here also my official "Thank you very much" looking
at the problem.

Also thank you for writing a summary that - coming
from your knowing and insider standpoint -
is much better than I could do while trying to understand
all details presented by you here and in our offlist
communication. Additionally such a post gains much
more value for a list archive when sent by a famous
drbd, HA, pacemaker contributor like you are.

I've been looking forward to the official release
of drbd 8.4.4.

Or do you need disoriented rc testers like me? ;-)

Best regards
Andreas Mock

-Ursprüngliche Nachricht-
Von: linux-ha-boun...@lists.linux-ha.org
[mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Lars Ellenberg
Gesendet: Montag, 9. September 2013 12:21
An: linux...@lists.linux-ha.org; pacemaker@oss.clusterlabs.org
Betreff: Re: [Linux-HA] [Pacemaker] Probably a regression of the linbit drbd
agent between pacemaker 1.1.8 and 1.1.10

On Mon, Sep 09, 2013 at 02:42:45PM +1000, Andrew Beekhof wrote:
> 
> On 06/09/2013, at 5:51 PM, Lars Ellenberg 
wrote:
> 
> > On Tue, Aug 27, 2013 at 06:51:45AM +0200, Andreas Mock wrote:
> >> Hi Andrew,
> >> 
> >> as this is a real showstopper at the moment I invested some other
> >> hours to be sure (as far as possible) not having made an error.
> >> 
> >> Some additions:
> >> 1) I mirrored the whole mini drbd config to another pacemaker cluster.
> >> Same result: pacemaker 1.1.8 works, pacemaker 1.1.10 not 
> >> 2) When I remove the target role Stopped from the drbd ms resource
> >> and insert the config snippet related to the drbd device via crm -f

> >> to a lean running pacemaker config (pacemaker cluster options, stonith
> >> resources),
> >> it seems to work. That means one of the nodes gets promoted.
> >> 
> >> Then after stopping 'crm resource stop ms_drbd_xxx' and starting again
> >> I see the same promotion error as described.
> >> 
> >> The drbd resource agent is using /usr/sbin/crm_master.
> >> Is there a possibility that feedback given through this client tool
> >> is changing the timing behaviour of pacemaker? Or the way
> >> transitions are scheduled?
> >> Any idea that may be related to a change in pacemaker?
> > 
> > I think that recent pacemaker allows for "start" and "promote" in the
> > same transition.
> 
> At least in the one case I saw logs of, this wasn't the case.
> The PE computed:
> 
> Current cluster status:
> Online: [ db05 db06 ]
> 
> r_stonith-db05(stonith:fence_imm):Started db06 
> r_stonith-db06(stonith:fence_imm):Started db05 
> Master/Slave Set: ms_drbd_fodb [r_drbd_fodb]
> Slaves: [ db05 db06 ]
> Master/Slave Set: ms_drbd_fodblog [r_drbd_fodblog]
> Slaves: [ db05 db06 ]
> 
> Transition Summary:
> * Promote r_drbd_fodb:0   (Slave -> Master db05)
> * Promote r_drbd_fodblog:0(Slave -> Master db05)
> 
> and it was the promotion of r_drbd_fodb:0 that failed.

Right.

Off-list communication revealed that
DRBD came up as "Consistent" only,
which is a normal and expected state,
when using resource level fencing.

The promotion attempt then raced with the connection handshake.
The DRBD fence-peer handler is run (because it's only Consistent,
not UpToDate) and returns successfully, but due to that race,
this result is ignored, DRBD stays "only Consistent", which
is not good enough to be promoted ("need access to UpToDate data").

Once the handshake is done, that also results in "access to good data",
which is why the next promotion attempt succeeds.

Something in the timing of pacemaker actions has changed
between the affected and unaffected versions.
Apparently before there was enough time to do the connection handshake
before the promote request was made.

This race is fixed with DRBD 8.3.16 and 8.4.4 (currently rc1)

You can avoid that race by not allowing Pacemaker to promote
if DRBD is only "Consistent".

Pacemaker will only attempt promotion,
if there is a positive master score for the resource.

The ocf:linbit:drbd RA hardcodes the master score for
"Consistent" to 5.
So you may edit the RA and instead remove the master score
for the "only Consistent".

(above mentioned fixed DRBD versions also introduce a new
"adjust_master_score" paramater, and this becomes configurable)

Or you can add a location constraint like this:
 location no-master-if-only-consistent ms_drbd_XY \
rule $role="Master" -10: defined #uname

where "defined #uname" is a funny way to express "true",
as in this constraint reduces the resulting master score by 10,

Re: [Pacemaker] Resource ordering/colocating question (DRBD + LVM + FS)

2013-09-09 Thread Andreas Mock

Hi Heikki,

it has to be crm_simulate -L -s. Sorry for the wrong command line
parameters.

Best regards
Andreas

-Ursprüngliche Nachricht-
Von: Heikki Manninen [mailto:h...@iki.fi] 
Gesendet: Montag, 9. September 2013 10:46
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] Resource ordering/colocating question (DRBD + LVM +
FS)

Hello Andreas, thanks for your input, much appreciated.

On 5.9.2013, at 16.39, "Andreas Mock"  wrote:

> 1) The second output of crm_mon show a resource IP_database
> which is not shown in the initial crm_mon output and also
> not in the config. => Reduce your problem/config to the
> minimum being reproducible.

True. I edited out the resource from the e-mail that did not have anything
to do with the problem as such (works ok all the time). Just forgot to
remove it from the second copy-paste also. And yes, no more IP resource in
the configuration.

> 2) Enable logging and look out which node is the DC.
> There in the logs you find many many informations showing
> what is going on. Hint: Open a terminal session with an
> opened tail -f logfile. Watch it while inserting commands.
> You'll get used to it.

Seems that node #2 was the DC (also visible in the pcs status output). I
have looked at the logs all the time, just not yet too familiar with the
contents of pacemaker logging. Here's the thing that keeps repeating
everytime those LVM and FS resources stay in stopped state:

Sep  3 20:01:23 pgdbsrv02 pengine[1667]:   notice: LogActions: Start
LVM_vgdata01#011(pgdbsrv01.cl1.local - blocked)
Sep  3 20:01:23 pgdbsrv02 pengine[1667]:   notice: LogActions: Start
FS_data01#011(pgdbsrv01.cl1.local - blocked)
Sep  3 20:01:23 pgdbsrv02 pengine[1667]:   notice: LogActions: Start
LVM_vgdata02#011(pgdbsrv01.cl1.local - blocked)
Sep  3 20:01:23 pgdbsrv02 pengine[1667]:   notice: LogActions: Start
FS_data02#011(pgdbsrv01.cl1.local - blocked)

So what does blocked mean here? Is it that the node #1 in this case is in
need of fencing/stonithing and thus being blocked or something else (I have
a backgroud in the RHCS/HACMP/LifeKeeper etc. world). No quorum policy is
set to ignore.

> 3) The shown status of a drbd resource (crm_mon) doesn't show
> you all informations of the drbd devices. Have a look at
> drbd-overview on both nodes. (e.g. syncing status).

True, DRBD is working fine on these occations. Connected, Synced etc.

> 4) This setup CRIES for stonithing. Even in a test environment.
> When stonith happens (this is what you see immediately) you
> know something went wrong. This is a good indicator for
> errors in agents or in the config. Believe me, as tedious stonithing
> is the valuable it is for getting hints for bad cluster state.
> On virtual machines stonithing is not as painful as on real
> servers.

Very much true. I have implemented some custom fencing/stonithing agents
before on physical and virtual cluster environments. Problem being here is
that I'm not aware of reasonably simple ways to implement stonith with
VMware Fusion that I'm bound to use for this test setup. Have to dig more
into this though. So fencing from cman cluster.conf is chained to pacemaker
fencing and pacemaker stonithing is disabled, no quorum policy is ignore.

> 5) Is the drbd fencing script enabled? If yes, in certain circumstances
> -INF rules are inserted to deny promoting of "wrong" nodes.
> You should grep for them 'cibadmin -Q | grep '

No, DRBD fencing is not enabled and split-brain recovery is done manually.

> 6) crm_simulate -L -v gives you an output of the scores of
> the resources on each node. I really don't know how to read it
> exactly (Is there a documentation of that anywhere?), but it
> gives you a hint where to look at, when resources don't start.
> Especially the aggregation of stickiness values in groups are
> sometimes misleading.

Could be that I have some different version maybe, because -v is unknown
option and:

# crm_simulate -L -V

Current cluster status:
Online: [ pgdbsrv01.cl1.local pgdbsrv02.cl1.local ]

Master/Slave Set: DRBD_ms_data01 [DRBD_data01]
Masters: [ pgdbsrv01.cl1.local ]
Slaves: [ pgdbsrv02.cl1.local ]
Master/Slave Set: DRBD_ms_data02 [DRBD_data02]
Masters: [ pgdbsrv01.cl1.local ]
Slaves: [ pgdbsrv02.cl1.local ]
Resource Group: GRP_data01
LVM_vgdata01(ocf::heartbeat:LVM):   Stopped
FS_data01   (ocf::heartbeat:Filesystem):Stopped
Resource Group: GRP_data02
LVM_vgdata02(ocf::heartbeat:LVM):   Stopped
FS_data02   (ocf::heartbeat:Filesystem):Stopped

Only shows that much.

Original problem description left quoted below.

-- 
Heikki M

> -Ursprüngliche Nachricht-
> Von: Heikki Manninen [mailto:h...@iki.fi] 
> Gesendet: Donnerstag, 5. September 2013 14:08
> An: pacemaker@oss.clusterlabs.org
> Betreff: [Pacemak

Re: [Pacemaker] Resource ordering/colocating question (DRBD + LVM + FS)

2013-09-05 Thread Andreas Mock

Hi Heikki,

just some comments for helping yourself.

1) The second output of crm_mon show a resource IP_database
which is not shown in the initial crm_mon output and also
not in the config. => Reduce your problem/config to the
minimum being reproducible.

2) Enable logging and look out which node is the DC.
There in the logs you find many many informations showing
what is going on. Hint: Open a terminal session with an
opened tail -f logfile. Watch it while inserting commands.
You'll get used to it.

3) The shown status of a drbd resource (crm_mon) doesn't show
you all informations of the drbd devices. Have a look at
drbd-overview on both nodes. (e.g. syncing status).

4) This setup CRIES for stonithing. Even in a test environment.
When stonith happens (this is what you see immediately) you
know something went wrong. This is a good indicator for
errors in agents or in the config. Believe me, as tedious stonithing
is the valuable it is for getting hints for bad cluster state.
On virtual machines stonithing is not as painful as on real
servers.

5) Is the drbd fencing script enabled? If yes, in certain circumstances
-INF rules are inserted to deny promoting of "wrong" nodes.
You should grep for them 'cibadmin -Q | grep '

6) crm_simulate -L -v gives you an output of the scores of
the resources on each node. I really don't know how to read it
exactly (Is there a documentation of that anywhere?), but it
gives you a hint where to look at, when resources don't start.
Especially the aggregation of stickiness values in groups are
sometimes misleading.


7) Sometimes behaviour of pacemaker changed and it is possible
that you hit a bug. But this hard to find out. Possibility:
Check a newer version.

Hope this helps.

Best regards
Andreas Mock




-Ursprüngliche Nachricht-
Von: Heikki Manninen [mailto:h...@iki.fi] 
Gesendet: Donnerstag, 5. September 2013 14:08
An: pacemaker@oss.clusterlabs.org
Betreff: [Pacemaker] Resource ordering/colocating question (DRBD + LVM + FS)

Hello,

I'm having a bit of a problem understanding what's going on with my simple
two-node demo cluster here. My resources come up correctly after restarting
the whole cluster but the LVM and Filesystem resources fail to start after a
single node restart or standby/unstandby (after node comes back online - why
do they even stop/start after the second node comes back?).

OS: CentOS 6.4 (cman stack)
Pacemaker: pacemaker-1.1.8-7.el6.x86_64
DRBD: drbd84-utils-8.4.3-1.el6.elrepo.x86_64

Everything is configured using: pcs-0.9.26-10.el6_4.1.noarch

Two DRBD resources configured and working: data01 & data02
Two nodes: pgdbsrv01.cl1.local & pgdbsrv02.cl1.local

Configuration:

node pgdbsrv01.cl1.local
node pgdbsrv02.cl1.local
primitive DRBD_data01 ocf:linbit:drbd \
 params drbd_resource="data01" \
 op monitor interval="30s"
primitive DRBD_data02 ocf:linbit:drbd \
 params drbd_resource="data02" \
 op monitor interval="30s"
primitive FS_data01 ocf:heartbeat:Filesystem \
 params device="/dev/mapper/vgdata01-lvdata01" directory="/data01"
fstype="ext4" \
 op monitor interval="30s"
primitive FS_data02 ocf:heartbeat:Filesystem \
 params device="/dev/mapper/vgdata02-lvdata02" directory="/data02"
fstype="ext4" \
 op monitor interval="30s"
primitive LVM_vgdata01 ocf:heartbeat:LVM \
 params volgrpname="vgdata01" exclusive="true" \
 op monitor interval="30s"
primitive LVM_vgdata02 ocf:heartbeat:LVM \
 params volgrpname="vgdata02" exclusive="true" \
 op monitor interval="30s"
group GRP_data01 LVM_vgdata01 FS_data01
group GRP_data02 LVM_vgdata02 FS_data02
ms DRBD_ms_data01 DRBD_data01 \
 meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"
ms DRBD_ms_data02 DRBD_data02 \
 meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"
colocation colocation-GRP_data01-DRBD_ms_data01-INFINITY inf: GRP_data01
DRBD_ms_data01:Master
colocation colocation-GRP_data02-DRBD_ms_data02-INFINITY inf: GRP_data02
DRBD_ms_data02:Master
order order-DRBD_data01-GRP_data01-mandatory : DRBD_data01:promote
GRP_data01:start
order order-DRBD_data02-GRP_data02-mandatory : DRBD_data02:promote
GRP_data02:start
property $id="cib-bootstrap-options" \
 dc-version="1.1.8-7.el6-394e906" \
 cluster-infrastructure="cman" \
 stonith-enabled="false" \
 no-quorum-policy="ignore" \
 migration-threshold="1"
rsc_defaults $id="rsc_defaults-options" \
 resource-stickiness="100"


1) After starting the cluster, everything runs happily:

Last updated: Tue

[Pacemaker] Howto recover from node state UNCLEAN (online)

2013-09-05 Thread Andreas Mock

Hi all,

is there a way to recover from node state UNCLEAN (online) without
rebooting?

Background: 
- RHEL6.4
- cman-cluster with pacemaker
- stonith enabled and working

- resource monitoring failed on node 1
  => stop of resource on node 1 failed 
  => stonith off node 1 worked
- more or less parallel as resource is clone resource
  resource monitoring failed on node 2
  => stop of resource on node 2 failed
  => stonith of node 2 failed as stonith resource agent on
 node 1 is unreachable caused by stonithing of node1

- Error message stating, giving up stonithing.
=> node 2 in the state above

Interestingly: a "service stop pacemaker" doesn't work
as pacemaker seems to be blocked by this node state.

The questions:
1) How to recover from this state without rebooting?
2) Is self-stonithing allowed meanwhile, so that
a self-stonithing device could be added in a fencing
topology?

Best regards
Andreas Mock

   


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] why not updated http://clusterlabs.org/rpm-next/ ..... ?

2013-09-03 Thread Andreas Mock

Thank you. I'll have a look at it.

Best regards
Andreas Mock


-Ursprüngliche Nachricht-
Von: Andrew Beekhof [mailto:and...@beekhof.net] 
Gesendet: Mittwoch, 4. September 2013 07:05
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] why not updated http://clusterlabs.org/rpm-next/
. ?


On 04/09/2013, at 2:56 PM, "Andreas Mock"  wrote:

> Hi Andrew,
> 
> meanwhile I do know how to build it.
> Therefor it is really doable  for dummies like me.
> 
> Can you tell me how to build a certain git revision?
> 
> I found out, that 'make rpm' is building packages from the current git 
> head.
> 
> Can you also tell us how to set a certain rpm package name, so someone 
> can distinguish several git head builds?
> Like 1.1.11-a4fdre and 1.1.11-5fa45?

This will get you most of the way there:

   make TAG=a4fdre WITH="--with pre_release" rpm

> 
> Best regards
> Andreas Mock
> 
> 
> -Ursprüngliche Nachricht-
> Von: Andrew Beekhof [mailto:and...@beekhof.net]
> Gesendet: Mittwoch, 4. September 2013 06:31
> An: The Pacemaker cluster resource manager
> Betreff: Re: [Pacemaker] why not updated 
> http://clusterlabs.org/rpm-next/ . ?
> 
> 
> On 23/08/2013, at 3:02 PM, Andreas Mock  wrote:
> 
>> Hi Andrew,
>> 
>> I can only talk for myself: Please, please provide rpm-Packages of 
>> pacemaker 1.1.10 + fitting for RHEL 6.x.
>> 
>> Is this feasible with not too much effort for you? *
> 
> I had hoped to get out of the packaging business by making it not too 
> much effort for anyone :-)
> 
> 1. Obtain RHEL6.x box
> 2. Install dependancies if you haven't already:
>   # sudo yum install -y yum-utils
>   # make rpm-dep
> 3. Build Pacemaker
>   # make release
> 
> Not good?
> 
>> 
>> Best regards
>> Andreas Mock
>> 
>> (* This is the question ;) )
>> 
>> -Ursprüngliche Nachricht-
>> Von: Andrew Beekhof [mailto:and...@beekhof.net]
>> Gesendet: Freitag, 23. August 2013 06:27
>> An: The Pacemaker cluster resource manager
>> Betreff: Re: [Pacemaker] why not updated 
>> http://clusterlabs.org/rpm-next/ . ?
>> 
>> 
>> On 20/08/2013, at 8:49 PM, Andrey Groshev  wrote:
>> 
>>> Hello Andrew!
>>> Why not updated http://clusterlabs.org/rpm-next/* ?
>> 
>> No-one asked :)
>> 
>> 
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org Getting started: 
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] why not updated http://clusterlabs.org/rpm-next/ ..... ?

2013-09-03 Thread Andreas Mock

Hi Andrew,

meanwhile I do know how to build it.
Therefor it is really doable  for dummies like me.

Can you tell me how to build a certain git revision?

I found out, that 'make rpm' is building packages from
the current git head.

Can you also tell us how to set a certain rpm package name,
so someone can distinguish several git head builds?
Like 1.1.11-a4fdre and 1.1.11-5fa45?

Best regards
Andreas Mock


-Ursprüngliche Nachricht-
Von: Andrew Beekhof [mailto:and...@beekhof.net] 
Gesendet: Mittwoch, 4. September 2013 06:31
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] why not updated http://clusterlabs.org/rpm-next/
. ?


On 23/08/2013, at 3:02 PM, Andreas Mock  wrote:

> Hi Andrew,
> 
> I can only talk for myself: Please, please provide rpm-Packages of 
> pacemaker 1.1.10 + fitting for RHEL 6.x.
> 
> Is this feasible with not too much effort for you? *

I had hoped to get out of the packaging business by making it not too much
effort for anyone :-)

1. Obtain RHEL6.x box
2. Install dependancies if you haven't already:
   # sudo yum install -y yum-utils
   # make rpm-dep
3. Build Pacemaker
   # make release

Not good?

> 
> Best regards
> Andreas Mock
> 
> (* This is the question ;) )
> 
> -Ursprüngliche Nachricht-
> Von: Andrew Beekhof [mailto:and...@beekhof.net]
> Gesendet: Freitag, 23. August 2013 06:27
> An: The Pacemaker cluster resource manager
> Betreff: Re: [Pacemaker] why not updated 
> http://clusterlabs.org/rpm-next/ . ?
> 
> 
> On 20/08/2013, at 8:49 PM, Andrey Groshev  wrote:
> 
>> Hello Andrew!
>> Why not updated http://clusterlabs.org/rpm-next/* ?
> 
> No-one asked :)
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] [Problem]Two error information is displayed.

2013-08-29 Thread Andreas Mock

Hi Hideo san,

the two line shall emphasis that you do not only have trouble
but real trouble...  ;-)

But to be seriously: I see this phaenomena, too.
(pacemaker 1.1.11-1.el6-4f672bc)

Best regards
Andreas Mock

-Ursprüngliche Nachricht-
Von: renayama19661...@ybb.ne.jp [mailto:renayama19661...@ybb.ne.jp] 
Gesendet: Donnerstag, 29. August 2013 02:38
An: PaceMaker-ML
Betreff: [Pacemaker] [Problem]Two error information is displayed.

Hi All,

Though the trouble is only once, two error information is displayed in
crm_mon.

-

[root@rh64-coro2 ~]# crm_mon -1 -Af
Last updated: Thu Aug 29 18:11:00 2013
Last change: Thu Aug 29 18:10:45 2013 via cibadmin on rh64-coro2
Stack: corosync
Current DC: NONE
1 Nodes configured
1 Resources configured


Online: [ rh64-coro2 ]


Node Attributes:
* Node rh64-coro2:

Migration summary:
* Node rh64-coro2: 
   dummy: migration-threshold=1 fail-count=1 last-failure='Thu Aug 29
18:10:57 2013'

Failed actions:
dummy_monitor_3000 on (null) 'not running' (7): call=11,
status=complete, last-rc-change='Thu Aug 29 18:10:57 2013', queued=0ms,
exec=0ms
dummy_monitor_3000 on rh64-coro2 'not running' (7): call=11,
status=complete, last-rc-change='Thu Aug 29 18:10:57 2013', queued=0ms,
exec=0ms

-

There seems to be the problem with an additional judgment of the error
information somehow or other.

-
static void
unpack_rsc_op_failure(resource_t *rsc, node_t *node, int rc, xmlNode
*xml_op, enum action_fail_response *on_fail, pe_working_set_t * data_set) 
{
int interval = 0;
bool is_probe = FALSE;
action_t *action = NULL;
(snip)
if (rc != PCMK_OCF_NOT_INSTALLED || is_set(data_set->flags,
pe_flag_symmetric_cluster)) {
if ((node->details->shutdown == FALSE) || (node->details->online ==
TRUE)) {
add_node_copy(data_set->failed, xml_op);
}
}

crm_xml_add(xml_op, XML_ATTR_UNAME, node->details->uname);
if ((node->details->shutdown == FALSE) || (node->details->online ==
TRUE)) {
add_node_copy(data_set->failed, xml_op);
}
(snip)
-


Please revise the additional handling of error information.

Best Regards,
Hideo Yamauchi.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] [Linux-HA] Probably a regression of the linbit drbd agent between pacemaker 1.1.8 and 1.1.10

2013-08-27 Thread Andreas Mock

Hi Andrew,

thank you having still an eye on that issue.
I'll do my best to present the requested reports.

Best regards
Andreas Mock


-Ursprüngliche Nachricht-
Von: Andrew Beekhof [mailto:and...@beekhof.net] 
Gesendet: Mittwoch, 28. August 2013 00:12
An: General Linux-HA mailing list
Cc: 'The Pacemaker cluster resource manager'
Betreff: Re: [Pacemaker] [Linux-HA] Probably a regression of the linbit drbd
agent between pacemaker 1.1.8 and 1.1.10


On 27/08/2013, at 2:51 PM, Andreas Mock  wrote:

> Hi Andrew,
> 
> as this is a real showstopper at the moment I invested some other 
> hours to be sure (as far as possible) not having made an error.
> 
> Some additions:
> 1) I mirrored the whole mini drbd config to another pacemaker cluster.
> Same result: pacemaker 1.1.8 works, pacemaker 1.1.10 not

The version of drbd is the same too?

> 2) When I remove the target role Stopped from the drbd ms resource and 
> insert the config snippet related to the drbd device via crm -f  
> to a lean running pacemaker config (pacemaker cluster options, stonith 
> resources), it seems to work. That means one of the nodes gets 
> promoted.
> 
> Then after stopping 'crm resource stop ms_drbd_xxx' and starting again 
> I see the same promotion error as described.
> 
> The drbd resource agent is using /usr/sbin/crm_master.
> Is there a possibility that feedback given through this client tool is 
> changing the timing behaviour of pacemaker? Or the way transitions are 
> scheduled?
> Any idea that may be related to a change in pacemaker?

# git diff --stat Pacemaker-1.1.8..Pacemaker-1.1.10 | tail -n 1
 1610 files changed, 109697 insertions(+), 62940 deletions(-)

Needle, meet haystack.
Particularly since I have no idea what that drbd error means.

If you want me to have a look, you'll need to create a crm_report archive of
"works" and "not works".
Logs aren't enough.

> 
> Best regards
> Andreas Mock
> 
> 
> -Ursprüngliche Nachricht-
> Von: Andrew Beekhof [mailto:and...@beekhof.net]
> Gesendet: Dienstag, 27. August 2013 05:02
> An: General Linux-HA mailing list
> Cc: pacemaker@oss.clusterlabs.org
> Betreff: Re: [Pacemaker] [Linux-HA] Probably a regression of the 
> linbit drbd agent between pacemaker 1.1.8 and 1.1.10
> 
> 
> On 27/08/2013, at 3:31 AM, Andreas Mock  wrote:
> 
>> Hi all,
>> 
>> while the linbit drbd resource agent seems to work perfectly on 
>> pacemaker 1.1.8 (standard software repository) we have problems with 
>> the last release 1.1.10 and also with the newest head 1.1.11.xxx.
>> 
>> As using drbd is not so uncommon I really hope to find interested 
>> people helping me out. I can provide as much debug information as you 
>> want.
>> 
>> 
>> Environment:
>> RHEL 6.4 clone (Scientific Linux 6.4) cman based cluster.
>> DRBD 8.4.3 compiled from sources.
>> 64bit
>> 
>> - A drbd resource configured following the linbit documentation.
>> - Manual start and stop (up/down) and setting primary of drbd 
>> resource working smoothly.
>> - 2 nodes dis03-test/dis04-test
>> 
>> 
>> 
>> - Following simple config on pacemaker 1.1.8 configure
>>   property no-quorum-policy=stop
>>   property stonith-enabled=true
>>   rsc_defaults resource-stickiness=2
>>   primitive r_stonith-dis03-test stonith:fence_mock \
>>   meta resource-stickiness="INFINITY" target-role="Started" \
>>   op monitor interval="180" timeout="300" requires="nothing" \
>>   op start interval="0" timeout="300" \
>>   op stop interval="0" timeout="300" \
>>   params vmname=dis03-test pcmk_host_list="dis03-test"
>>   primitive r_stonith-dis04-test stonith:fence_mock \
>>   meta resource-stickiness="INFINITY" target-role="Started" \
>>   op monitor interval="180" timeout="300" requires="nothing" \
>>   op start interval="0" timeout="300" \
>>   op stop interval="0" timeout="300" \
>>   params vmname=dis04-test pcmk_host_list="dis04-test"
>>   location r_stonith-dis03_hates_dis03 r_stonith-dis03-test \
>>   rule $id="r_stonith-dis03_hates_dis03-test_rule" -inf: #uname 
>> eq dis03-test
>>   location r_stonith-dis04_hates_dis04 r_stonith-dis04-test \
>>   rule $id="r_stonith-dis04_hates_dis04-test_rule" -inf: #uname 
>> eq dis04-test
>>   primitive r_drbd_postfix ocf:linbit:drbd \
>>   params drbd_resource="postfix" dr

Re: [Pacemaker] [Linux-HA] Probably a regression of the linbit drbd agent between pacemaker 1.1.8 and 1.1.10

2013-08-26 Thread Andreas Mock

Hi Andrew,

as this is a real showstopper at the moment I invested some other
hours to be sure (as far as possible) not having made an error.

Some additions:
1) I mirrored the whole mini drbd config to another pacemaker cluster.
Same result: pacemaker 1.1.8 works, pacemaker 1.1.10 not 
2) When I remove the target role Stopped from the drbd ms resource
and insert the config snippet related to the drbd device via crm -f 
to a lean running pacemaker config (pacemaker cluster options, stonith
resources),
it seems to work. That means one of the nodes gets promoted.

Then after stopping 'crm resource stop ms_drbd_xxx' and starting again
I see the same promotion error as described.

The drbd resource agent is using /usr/sbin/crm_master.
Is there a possibility that feedback given through this client tool
is changing the timing behaviour of pacemaker? Or the way
transitions are scheduled?
Any idea that may be related to a change in pacemaker?

Best regards
Andreas Mock


-Ursprüngliche Nachricht-
Von: Andrew Beekhof [mailto:and...@beekhof.net] 
Gesendet: Dienstag, 27. August 2013 05:02
An: General Linux-HA mailing list
Cc: pacemaker@oss.clusterlabs.org
Betreff: Re: [Pacemaker] [Linux-HA] Probably a regression of the linbit drbd
agent between pacemaker 1.1.8 and 1.1.10


On 27/08/2013, at 3:31 AM, Andreas Mock  wrote:

> Hi all,
> 
> while the linbit drbd resource agent seems to work perfectly on 
> pacemaker 1.1.8 (standard software repository) we have problems with 
> the last release 1.1.10 and also with the newest head 1.1.11.xxx.
> 
> As using drbd is not so uncommon I really hope to find interested 
> people helping me out. I can provide as much debug information as you 
> want.
> 
> 
> Environment:
> RHEL 6.4 clone (Scientific Linux 6.4) cman based cluster.
> DRBD 8.4.3 compiled from sources.
> 64bit
> 
> - A drbd resource configured following the linbit documentation.
> - Manual start and stop (up/down) and setting primary of drbd resource 
> working smoothly.
> - 2 nodes dis03-test/dis04-test
> 
> 
> 
> - Following simple config on pacemaker 1.1.8 configure
>property no-quorum-policy=stop
>property stonith-enabled=true
>rsc_defaults resource-stickiness=2
>primitive r_stonith-dis03-test stonith:fence_mock \
>meta resource-stickiness="INFINITY" target-role="Started" \
>op monitor interval="180" timeout="300" requires="nothing" \
>op start interval="0" timeout="300" \
>op stop interval="0" timeout="300" \
>params vmname=dis03-test pcmk_host_list="dis03-test"
>primitive r_stonith-dis04-test stonith:fence_mock \
>meta resource-stickiness="INFINITY" target-role="Started" \
>op monitor interval="180" timeout="300" requires="nothing" \
>op start interval="0" timeout="300" \
>op stop interval="0" timeout="300" \
>params vmname=dis04-test pcmk_host_list="dis04-test"
>location r_stonith-dis03_hates_dis03 r_stonith-dis03-test \
>rule $id="r_stonith-dis03_hates_dis03-test_rule" -inf: #uname 
> eq dis03-test
>location r_stonith-dis04_hates_dis04 r_stonith-dis04-test \
>rule $id="r_stonith-dis04_hates_dis04-test_rule" -inf: #uname 
> eq dis04-test
>primitive r_drbd_postfix ocf:linbit:drbd \
>params drbd_resource="postfix" drbdconf="/usr/local/etc/drbd.conf"
\
>op monitor interval="15s"  timeout="60s" role="Master" \
>op monitor interval="45s"  timeout="60s" role="Slave" \
>op start timeout="240" \
>op stop timeout="240" \
>meta target-role="Stopped" migration-threshold="2"
>ms ms_drbd_postfix r_drbd_postfix \
>meta master-max="1" master-node-max="1" \
>clone-max="2" clone-node-max="1" \
>notify="true" \
>meta target-role="Stopped"
> commit
> 
> - Pacemaker is started from scratch
> - Config above is applied by crm -f  where  has the above 
> config snippet.
> 
> - After that crm_mon shows the following status
> --8<-
> Last updated: Mon Aug 26 18:42:47 2013 Last change: Mon Aug 26 
> 18:42:42 2013 via cibadmin on dis03-test
> Stack: cman
> Current DC: dis03-test - partition with quorum
> Version: 1.1.10-1.el6-9abe687
> 2 Nodes configured
> 4 Resources configured
> 
> 
> Online: [ dis03-test dis04-test ]
> 
> Full list o

Re: [Pacemaker] Probably a regression of the linbit drbd agent betweenpacemaker 1.1.8 and 1.1.10

2013-08-26 Thread Andreas Mock

Hi Matthew,

thank you for that hint. I'll recheck once again.
I'm pretty sure this is not the problem. But who knows... ;-)

Best regards
Andreas Mock

-Ursprüngliche Nachricht-
Von: Matthew O'Connor [mailto:m...@ecsorl.com] 
Gesendet: Montag, 26. August 2013 21:12
An: The Pacemaker cluster resource manager
Cc: Andreas Mock; 'General Linux-HA mailing list'
Betreff: Re: [Pacemaker] Probably a regression of the linbit drbd agent
betweenpacemaker 1.1.8 and 1.1.10

On 08/26/2013 01:31 PM, Andreas Mock wrote:
> cat /proc/drbd
> version: 8.4.3 (api:1/proto:86-101)
> GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by
root@dis03-test,
> 2013-07-24 17:19:24
>
> on both nodes. The drbd resource was shutdown previously in a clean state,
> so that any node can be the primary.
>

Not sure if this will be helpful or not, but I ran into similar symptoms
when I manually upgraded to DRBD 8.4.3 from 8.3.11; it turned out my
resource agent script for drbd was not up-to-date* and had problems when
starting my drbd resources from a full stop.  I could manually start
them, and even take one node down and bring it back, but if the resource
was completely stopped then neither node was able to start the resource
back up.  Making sure I had the resource agent that ships with 8.4.3
fixed this for me.

* In my case it was an issue with bad install paths, which were my own
doing at the time.

-- Matthew

-- 
Thank you!
  Matthew O'Connor
  (GPG Key ID: 55F981C4)

CONFIDENTIAL NOTICE: The information contained in this electronic message is
legally privileged, confidential and exempt from disclosure under applicable
law. It is intended only for the use of the individual or entity named
above. If the reader of this message is not the intended recipient, you are
hereby notified that any dissemination, distribution or copying of this
message is strictly prohibited. If you have received this communication in
error, please notify the sender immediately by return e-mail and delete the
original message and any copies of it from your computer system. Thank you.

EXPORT CONTROL WARNING:  This document may contain technical data that is
subject to the International Traffic in Arms Regulations (ITAR) controls and
may not be exported or otherwise disclosed to any foreign person or firm,
whether in the US or abroad, without first complying with all requirements
of the ITAR, 22 CFR 120-130, including the requirement for obtaining an
export license if applicable. In addition, this document may contain
technology that is subject to the Export Administration Regulations (EAR)
and may not be exported or otherwise disclosed to any non-U.S. person,
whether in the US or abroad, without first complying with all requirements
of the EAR, 15 CFR 730-774, including the requirement for obtaining an
export license if applicable. Violation of these export laws is subject to
severe criminal penalties.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Probably a regression of the linbit drbd agent between pacemaker 1.1.8 and 1.1.10

2013-08-26 Thread Andreas Mock

-

In the log of the drbd agent I can find the following
when the promoting request is handled on dis03-test

--8<-
++ drbdadm -c /usr/local/etc/drbd.conf primary postfix
0: State change failed: (-2) Need access to UpToDate data
Command 'drbdsetup primary 0' terminated with exit code 17
+ cmd_out=
+ ret=17
+ '[' 17 '!=' 0 ']'
+ ocf_log err 'postfix: Called drbdadm -c /usr/local/etc/drbd.conf primary
postfix'
+ '[' 2 -lt 2 ']'
+ __OCF_PRIO=err
+ shift
--8<-

While working without problems on pacemaker 1.1.8 it doesn't work here.
The error message let me assume that there is a kind of
race condition where pacemaker is firing the promotion too early.
Probably it has something to do with applying attributes from the
drbd resource agent.
But this is just a guess and I really don't know.

ONE ADDITIONAL information: As soon as I do a
resource cleanup on the "defective" node the master
is promoted as expected. That means a:
   crm resource cleanup r_drbd_postfix dis03-test
results in the following:

--8<-
Last updated: Mon Aug 26 19:29:38 2013
Last change: Mon Aug 26 19:29:28 2013 via cibadmin on dis04-test
Stack: cman
Current DC: dis03-test - partition with quorum
Version: 1.1.10-1.el6-9abe687
2 Nodes configured
4 Resources configured


Online: [ dis03-test dis04-test ]

Full list of resources:

 r_stonith-dis03-test   (stonith:fence_mock):   Started dis04-test
 r_stonith-dis04-test   (stonith:fence_mock):   Started dis03-test
 Master/Slave Set: ms_drbd_postfix [r_drbd_postfix]
 Masters: [ dis03-test ]
 Slaves: [ dis04-test ]

Migration summary:
* Node dis03-test:
* Node dis04-test:
--8<-



I really hope I can get some attention as pacemaker 1.1.10
is a milestone for Andrew and drbd from linbit is pretty sure
a building block of many pacemaker based clusters.

Cluster log of DC dis03-test at http://pastebin.com/2S9Y6V3P
DRBD agent log at http://pastebin.com/ceYNEAhH


So, any help welcome.

Best regards
Andreas Mock



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] why not updated http://clusterlabs.org/rpm-next/ ..... ?

2013-08-22 Thread Andreas Mock

Hi Andrew,

I can only talk for myself: Please, please provide rpm-Packages
of pacemaker 1.1.10 + fitting for RHEL 6.x.

Is this feasible with not too much effort for you? *

Best regards
Andreas Mock

(* This is the question ;) )

-Ursprüngliche Nachricht-
Von: Andrew Beekhof [mailto:and...@beekhof.net] 
Gesendet: Freitag, 23. August 2013 06:27
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] why not updated http://clusterlabs.org/rpm-next/
. ?


On 20/08/2013, at 8:49 PM, Andrey Groshev  wrote:

> Hello Andrew!
> Why not updated http://clusterlabs.org/rpm-next/* ?

No-one asked :)


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Weird behaviour of crm_resource -N

2013-08-22 Thread Andreas Mock

Hi all,

I just wanted to cleanup a failed stonith device with
crm_resource -C -r r_stonith -N node
but I made an error taking the wrong node name 'node'.
'node' is not existent in the cluster, but on the commandline
I get a
8<-
Cleaning up r_stonith on node
Waiting for 1 replies from the CRMd
8<-
until it times out after 60 seconds with
8<-
Cleaning up r_stonith on node
Waiting for 1 replies from the CRMdNo messages received in 60 seconds..
aborting.
8<-

Is there a reason why there is no check against the
cluster membership in advance so that crm_resource
could just say: No such node 'node' in cluster?

(crm_rsource from latest git (1.1.10+))

Best regards
Andreas Mock



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Compiling head of git clone --depth 0 git://github.com/ClusterLabs/pacemaker.git

2013-08-21 Thread Andreas Mock

Thank you!


-Ursprüngliche Nachricht-
Von: David Vossel [mailto:dvos...@redhat.com] 
Gesendet: Mittwoch, 21. August 2013 23:38
An: The Pacemaker cluster resource manager
Cc: pacema...@clusterlabs.org
Betreff: Re: [Pacemaker] Compiling head of git clone --depth 0 
git://github.com/ClusterLabs/pacemaker.git

- Original Message -
> From: "Andreas Mock" 
> To: "The Pacemaker cluster resource manager" 
> , pacema...@clusterlabs.org
> Sent: Wednesday, August 21, 2013 10:05:38 AM
> Subject: Re: [Pacemaker] Compiling head of git clone --depth  0   
> git://github.com/ClusterLabs/pacemaker.git
> 
> Hi all,
> 
> for the archive:
> It seems that I have found it.
> 
> A simple 'make rpm' does the job.
> 
> I had problems because of a make run with another tag before. So I 
> don't know how to clean up the whole build environment without 
> deleting the whole git repository.

git reset --hard
git clean -f -d -x

> 
> Best regards
> Andreas Mock
> 
> 
> -Ursprüngliche Nachricht-
> Von: Andreas Mock [mailto:andreas.m...@web.de]
> Gesendet: Mittwoch, 21. August 2013 16:34
> An: pacema...@clusterlabs.org
> Betreff: [Pacemaker] Compiling head of git clone --depth 0 
> git://github.com/ClusterLabs/pacemaker.git
> 
> Hi all,
> 
> can someone tell me how I can compile and build a rpm from the current 
> head of the git repository as git clone --depth 0 
> git://github.com/ClusterLabs/pacemaker.git ?
> 
> Best regards
> Andreas Mock
> 
> 
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Compiling head of git clone --depth 0 git://github.com/ClusterLabs/pacemaker.git

2013-08-21 Thread Andreas Mock

Hi all,

for the archive:
It seems that I have found it.

A simple 'make rpm' does the job.

I had problems because of a make run with
another tag before. So I don't know how
to clean up the whole build environment without
deleting the whole git repository.

Best regards
Andreas Mock


-Ursprüngliche Nachricht-----
Von: Andreas Mock [mailto:andreas.m...@web.de] 
Gesendet: Mittwoch, 21. August 2013 16:34
An: pacema...@clusterlabs.org
Betreff: [Pacemaker] Compiling head of git clone --depth 0
git://github.com/ClusterLabs/pacemaker.git

Hi all,

can someone tell me how I can compile and build a rpm
from the current head of the git repository as
git clone --depth 0 git://github.com/ClusterLabs/pacemaker.git ?

Best regards
Andreas Mock




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Compiling head of git clone --depth 0 git://github.com/ClusterLabs/pacemaker.git

2013-08-21 Thread Andreas Mock

Hi all,

can someone tell me how I can compile and build a rpm
from the current head of the git repository as
git clone --depth 0 git://github.com/ClusterLabs/pacemaker.git ?

Best regards
Andreas Mock




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] 2 node Cluster

2013-08-13 Thread Andreas Mock

Hi Christian,

 

show us please how all these services should depend on each other.

The dependency graph shown in the pacemaker docs is a nice way

to do it. And then people here can give advice.

 

Otherwise we only can guess what you want to achieve.

 

Best regards

Andreas Mock

 

 

 

Von: Christian Gebler [mailto:geblerchrist...@googlemail.com] 
Gesendet: Dienstag, 13. August 2013 11:57
An: The Pacemaker cluster resource manager
Betreff: [Pacemaker] 2 node Cluster

 

Hi,

I am trying to set up a 2 node Pacemaker-Cluster with a few services (drbd,
psql, ip, tomcat, nginx).
All these services should run on one node, all the time, if one service is
down, everything must migrate to the other node.

So I created one colocation and one order, that works fine and all services
run and migrate as expected. 
But I have one problem...if I stop my Tomcat or Nginx (on the CRM CLI), the
database and the ip goes down too, but that should not happen. I have no
idea how to fix this problem, so I hope you can help me. Or is the only
solution to unmanage the Service at first? 

Thx!

Chris


Here is my Config:

http://goo.gl/FkeqlH

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] New action for resource running in multiple nodes

2013-08-12 Thread Andreas Mock

Hi Adrián,

 

IMHO the effort would focus on the wrong issue.

Make your network for clustering reliable. It is THE building block

of a cluster besides the nodes.

- Additional network cards

- Different vendor

- Bonding

- Different path through switches

 

On a two-node-cluster without the necessary option to

increase the number of nodes I almost always take a crosscable

for one of the interconnects.

 

Best regards

Andreas Mock

 

P.S. The story sounds to me that you also don't have stonith

enabled. Another building block IMHO.

 

 

Von: Adrián López Tejedor [mailto:adrian...@gmail.com] 
Gesendet: Montag, 12. August 2013 16:26
An: pacemaker@oss.clusterlabs.org
Betreff: [Pacemaker] New action for resource running in multiple nodes

 

Hi!

 

In the environment we use corosync/pacemaker, recently we are having some
problems with the network used to maintain the cluster. This short
interruptions cause the passive node (we have a two node active-passive
configuration with apache tomcat) to think he is alone, and start another
instance of tomcat. 

Few seconds later, the cluster reconnects, and the resource is found active
in both nodes. The default behaviour (as seen in
http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-re
source-options.html) is to stop both, and start one of them.

 

For us, this implies that service is down everytime a short interruption in
the network occurs.

 

Maybe a new option for "multiple-active" like "stop_old" and/or "stop_new"
could be useful, stopping only the newest instance of the resource.

 

Thanks!

Adrián

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Problems with SBD fencing

2013-08-06 Thread Andreas Mock

Hi Dejan,

can you explain how the SDB agent works, when this resource
is running on exactly that node which has to be stonithed?

Thank you in advance.

Best regards
Andreas Mock


-Ursprüngliche Nachricht-
Von: Dejan Muhamedagic [mailto:deja...@fastmail.fm] 
Gesendet: Dienstag, 6. August 2013 11:15
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] Problems with SBD fencing

Hi,

On Thu, Aug 01, 2013 at 07:58:55PM +0200, Jan Christian Kaldestad wrote:
> Thanks for the explanation. But I'm quite confused about the SBD stonith
> resource configuration, as the SBD fencing wiki clearly states:
> "The sbd agent does not need to and should not be cloned. If all of your
> nodes run SBD, as is most likely, not even a monitor action provides a
real
> benefit, since the daemon would suicide the node if there was a problem. "
> 
> and also this thread
>
http://oss.clusterlabs.org/pipermail/pacemaker/2012-March/013507.htmlmention
> that there should be only one SBD resource configured.
> 
> Can someone please clarify? Should I configure 2 separate SBD resources,
> one for each cluster node?

No. One sbd resource is sufficient.

Thanks,

Dejan

> 
> -- 
> Best regards
> Jan
> 
> 
> On Thu, Aug 1, 2013 at 6:47 PM, Andreas Mock  wrote:
> 
> > Hi Jan,
> >
> > ** **
> >
> > first of all I don't know the SBD-Fencing-Infrastructure (just read
the***
> > *
> >
> > article linked by you). But as far as I understand the "normal"
fencing***
> > *
> >
> > (initiated on behalf of pacemaker) is done in the following way.
> >
> > ** **
> >
> > SBD fencing resoure (agent) is writing a request for self-stonithing
into*
> > ***
> >
> > one or more SBD partitions where the SBD-daemon is listening and
hopefully
> > 
> >
> > reacting on.
> >
> > So, I'm pretty sure (without knowing) that you have to configure the
> >
> > stonith agent in a way that pacemaker knows howto talk to the stonith
agent
> > 
> >
> > to kill a certain cluster node.
> >
> > What is the problem in you scenario: The agent which should be
contacted**
> > **
> >
> > to stonith the node2 is/was running on node2 and can't be connected
> > anymore.
> >
> > ** **
> >
> > Because of that stonith agent configuration is most of the times done
the*
> > ***
> >
> > following way in a two node cluster:
> >
> > On every node runs a stonith agent. The stonith agent is configured
to
> >
> > stonith the OTHER node. You have to be sure that this is technically

> >
> > always possible.
> >
> > This can be achieved with resource clones or - which is IMHO simpler -
in
> > 
> >
> > a 2-node-environment with two stonith resources and a negative
colocation*
> > ***
> >
> > constraint.
> >
> > ** **
> >
> > As far as I know there is also a self-stonith-safty-belt implemented
> >
> > in a way that a stonith agent on a node to be shot is never
contacted.
> >
> > (Do I remember correct?)
> >
> > ** **
> >
> > I'm sure this may solve your problem.
> >
> > ** **
> >
> > Best regards
> >
> > Andreas Mock
> >
> > ** **
> >
> > ** **
> >
> > *Von:* Jan Christian Kaldestad [mailto:janc...@gmail.com]
> > *Gesendet:* Donnerstag, 1. August 2013 15:46
> > *An:* pacemaker@oss.clusterlabs.org
> > *Betreff:* [Pacemaker] Problems with SBD fencing
> >
> > ** **
> >
> > Hi,
> >
> >
> > I am evaluating the SLES HA Extension 11 SP3 product. The cluster
> > consists of 2-nodes (active/passive), using SBD stonith resource on a
> > shared SAN disk. Configuration according to
> > http://www.linux-ha.org/wiki/SBD_Fencing
> >
> > The SBD daemon is running on both nodes, and the stontih resource
(defined
> > as primitive) is running on one node only.
> > There is also a monitor operation for the stonith resource
> > (interval=36000, timeout=20)
> >
> > I am having some problems getting failover/fencing to work as expected
in
> > the following scenario:
> > - Node 1 is running the resources that I created (except stonith)
> > - Node 2 is running the stonith resource
> > - Disconnect Node 2 from the network by bringing the interface down
> > - Node 2 status changes to UNCLEAN (offline), but the stonith resource
> > does not switch over to Node 1 and Node 2 does not reb

Re: [Pacemaker] Problems with SBD fencing

2013-08-01 Thread Andreas Mock

Hi Jan,

 

first of all I don't know the SBD-Fencing-Infrastructure (just read the

article linked by you). But as far as I understand the "normal" fencing

(initiated on behalf of pacemaker) is done in the following way.

 

SBD fencing resoure (agent) is writing a request for self-stonithing into

one or more SBD partitions where the SBD-daemon is listening and hopefully

reacting on.

So, I'm pretty sure (without knowing) that you have to configure the

stonith agent in a way that pacemaker knows howto talk to the stonith agent

to kill a certain cluster node.

What is the problem in you scenario: The agent which should be contacted

to stonith the node2 is/was running on node2 and can't be connected anymore.

 

Because of that stonith agent configuration is most of the times done the

following way in a two node cluster:

On every node runs a stonith agent. The stonith agent is configured to

stonith the OTHER node. You have to be sure that this is technically 

always possible.

This can be achieved with resource clones or - which is IMHO simpler - in 

a 2-node-environment with two stonith resources and a negative colocation

constraint.

 

As far as I know there is also a self-stonith-safty-belt implemented

in a way that a stonith agent on a node to be shot is never contacted.

(Do I remember correct?)

 

I'm sure this may solve your problem.

 

Best regards

Andreas Mock

 

 

Von: Jan Christian Kaldestad [mailto:janc...@gmail.com] 
Gesendet: Donnerstag, 1. August 2013 15:46
An: pacemaker@oss.clusterlabs.org
Betreff: [Pacemaker] Problems with SBD fencing

 

Hi,


I am evaluating the SLES HA Extension 11 SP3 product. The cluster  consists
of 2-nodes (active/passive), using SBD stonith resource on a shared SAN
disk. Configuration according to http://www.linux-ha.org/wiki/SBD_Fencing

The SBD daemon is running on both nodes, and the stontih resource (defined
as primitive) is running on one node only.
There is also a monitor operation for the stonith resource (interval=36000,
timeout=20)

I am having some problems getting failover/fencing to work as expected in
the following scenario:
- Node 1 is running the resources that I created (except stonith)
- Node 2 is running the stonith resource
- Disconnect Node 2 from the network by bringing the interface down
- Node 2 status changes to UNCLEAN (offline), but the stonith resource does
not switch over to Node 1 and Node 2 does not reboot as I would expect.
- Checking the logs on Node 1, I notice the following:
 Aug  1 12:00:01 slesha1n1i-u pengine[8915]:  warning: pe_fence_node: Node
slesha1n2i-u will be fenced because the node is no longer part of the
cluster
 Aug  1 12:00:01 slesha1n1i-u pengine[8915]:  warning:
determine_online_status: Node slesha1n2i-u is unclean
 Aug  1 12:00:01 slesha1n1i-u pengine[8915]:  warning: custom_action: Action
stonith_sbd_stop_0 on slesha1n2i-u is unrunnable (offline)
 Aug  1 12:00:01 slesha1n1i-u pengine[8915]:  warning: stage6: Scheduling
Node slesha1n2i-u for STONITH
 Aug  1 12:00:01 slesha1n1i-u pengine[8915]:   notice: LogActions: Move
stonith_sbd   (Started slesha1n2i-u -> slesha1n1i-u)
 ...
 Aug  1 12:00:01 slesha1n1i-u crmd[8916]:   notice: te_fence_node: Executing
reboot fencing operation (24) on slesha1n2i-u (timeout=6)
 Aug  1 12:00:01 slesha1n1i-u stonith-ng[8912]:   notice: handle_request:
Client crmd.8916.3144546f wants to fence (reboot) 'slesha1n2i-u' with device
'(any)'
 Aug  1 12:00:01 slesha1n1i-u stonith-ng[8912]:   notice:
initiate_remote_stonith_op: Initiating remote operation reboot for
slesha1n2i-u: 8c00ff7b-2986-4b2a-8b4a-760e8346349b (0)
 Aug  1 12:00:01 slesha1n1i-u stonith-ng[8912]:error: remote_op_done:
Operation reboot of slesha1n2i-u by slesha1n1i-u for
crmd.8916@slesha1n1i-u.8c00ff7b: No route to host
 Aug  1 12:00:01 slesha1n1i-u crmd[8916]:   notice:
tengine_stonith_callback: Stonith operation
3/24:3:0:8a0f32b2-f91c-4cdf-9cee-1ba9b6e187ab: No route to host (-113)
 Aug  1 12:00:01 slesha1n1i-u crmd[8916]:   notice:
tengine_stonith_callback: Stonith operation 3 for slesha1n2i-u failed (No
route to host): aborting transition.
 Aug  1 12:00:01 slesha1n1i-u crmd[8916]:   notice: tengine_stonith_notify:
Peer slesha1n2i-u was not terminated (st_notify_fence) by slesha1n1i-u for
slesha1n1i-u: No route to host (ref=8c00ff7b-2986-4b2a-8b4a-760e8346349b) by
client crmd.8916
 Aug  1 12:00:01 slesha1n1i-u crmd[8916]:   notice: run_graph: Transition 3
(Complete=1, Pending=0, Fired=0, Skipped=5, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-warn-15.bz2): Stopped
 Aug  1 12:00:01 slesha1n1i-u pengine[8915]:   notice: unpack_config: On
loss of CCM Quorum: Ignore
 Aug  1 12:00:01 slesha1n1i-u pengine[8915]:  warning: pe_fence_node: Node
slesha1n2i-u will be fenced because the node is no longer part of the
cluster
 Aug  1 12:00:01 slesha1n1i-u pengine[8915]:  warning:
determine_online_status: Node slesha1n2i-u is unclean
 Aug  1 12:00:0

Re: [Pacemaker] order required if group is present?

2013-07-25 Thread Andreas Mock

Hi Stefan,

 

a) yes, the ordered behaviour is intentional.

b) In former version you could change this behaviour with an attribute.

But this attribute is depreciated in newer versions of pacemaker.

c) The solution for parallel starting resources are resource sets.

 

Best regards

Andreas Mock

 

P.S.: Always give information about used versions of elements of

the cluster stack. Behaviour changed over time.

 

 

Von: Bauer, Stefan (IZLBW Extern) [mailto:stefan.ba...@iz.bwl.de] 
Gesendet: Donnerstag, 25. Juli 2013 12:53
An: Pacemaker@oss.clusterlabs.org
Betreff: [Pacemaker] order required if group is present?

 

Hi List,

 

i have 5 resources configured (p_bond1, p_conntrackd, p_vlan118,p_vlan119,
p_openvpn)

 

additionally I have put all of them in a group with:

 

group cluster1 p_bond1,p_vlan118,p_vlan119,p_openvpn,p_conntrackd

 

By this, crm is starting the resources in the order, the group is defined
(p_bond1,p_vlan118 and so on.)

 

Is this an expected behavior? If so, it's providing the function `order` was
made for?

 

Thanks in advance

 

Stefan

 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Simulating that a node is down.

2013-07-12 Thread Andreas Mock

Hi Jacobo,

 

1) corosync communicates through 2 ports, don't forget the second one.

2) IMHO, when you block both ports, it's like a classical split brain.

I've done it to test split brain and hopefully fencing behaviour.

´

Best regards

Andreas Mock

 

 

 

Von: Jacobo García [mailto:jacobo.gar...@gmail.com] 
Gesendet: Freitag, 12. Juli 2013 11:04
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] Simulating that a node is down.

 

Thanks Andreas for your kind answer, I'll add this to my test battery.

 

Also, my other question, is it a good idea to close the corosync port? Should 
corosync behave in a expected way? I am getting odd behaviors on this one, but 
not sure if where to put the blame.

 

Thanks in advance.




Jacobo García López de Araujo

http://thebourbaki.com | http://twitter.com/clapkent 

 

On Thu, Jul 11, 2013 at 8:39 PM, Andreas Mock  wrote:

Hi Jacobo,

 

one very interesting thing is missing.

Overload the node. Make a programm/script which generates

many IO-operations, many flushes and meanwhile requesting

more and more memory from the OS until swapping begins.

Ohhh, yes, swapping and IO is nice…

 

…then you can prove your monitor and stop action timeouts…  ;-)

 

Best regards

Andreas Mock

 

 

Von: Jacobo García [mailto:jacobo.gar...@gmail.com] 
Gesendet: Donnerstag, 11. Juli 2013 19:14
An: pacemaker@oss.clusterlabs.org
Betreff: [Pacemaker] Simulating that a node is down.

 

Hello,

 

I am looking for different ways of testing that a node is down. I am finding a 
strange behavior with one of them (closing with IPtables the UDP communication 
port). I would like to know if closing the port is a recommended way of 
achieving my testing purposes. 

 

Also I would like to know other ways of testing apart from the ones compiled in 
the list below:

 

1. Stopping corosync.

2. Shutting down the node.

3. Shutting down the eth0 interface.

4. Killing corosync process.

5. Closing the corosync communication port.

Thanks,




Jacobo García López de Araujo

 


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Simulating that a node is down.

2013-07-11 Thread Andreas Mock

Hi Jacobo,

 

one very interesting thing is missing.

Overload the node. Make a programm/script which generates

many IO-operations, many flushes and meanwhile requesting

more and more memory from the OS until swapping begins.

Ohhh, yes, swapping and IO is nice…

 

…then you can prove your monitor and stop action timeouts…  ;-)

 

Best regards

Andreas Mock

 

 

Von: Jacobo García [mailto:jacobo.gar...@gmail.com] 
Gesendet: Donnerstag, 11. Juli 2013 19:14
An: pacemaker@oss.clusterlabs.org
Betreff: [Pacemaker] Simulating that a node is down.

 

Hello,

 

I am looking for different ways of testing that a node is down. I am finding a 
strange behavior with one of them (closing with IPtables the UDP communication 
port). I would like to know if closing the port is a recommended way of 
achieving my testing purposes. 

 

Also I would like to know other ways of testing apart from the ones compiled in 
the list below:

 

1. Stopping corosync.

2. Shutting down the node.

3. Shutting down the eth0 interface.

4. Killing corosync process.

5. Closing the corosync communication port.

Thanks,




Jacobo García López de Araujo

 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] cib_process_diff: Failed application of an update diff

2013-07-10 Thread Andreas Mock

Hi Johan,

I do forget it also often, but more than years ago
you have to give detailed informations about your stack:
- OS
- corosync version (or heartbeat)
- pacemaker version
- agent version
- etc.

Best regards
Andreas


-Ursprüngliche Nachricht-
Von: Johan Huysmans [mailto:johan.huysm...@inuits.be] 
Gesendet: Mittwoch, 10. Juli 2013 15:17
An: The Pacemaker cluster resource manager
Betreff: [Pacemaker] cib_process_diff: Failed application of an update diff

Hi All,

Every time a resource fails or recovers or any other action is performed 
I see following messages in my log.
Which can be the cause for this problem, how can I see more information 
about this message (view the patch / diff which is failing).



  stonith-ng[25994]:  warning: cib_process_diff: Diff 0.90.29 -> 0.90.30 
from local not applied to 0.90.29: Failed application of an update diff
  stonith-ng[25994]:   notice: update_cib_cache_cb: [cib_diff_notify] 
Patch aborted: Application of an update diff failed (-206)


thx!
Johan Huysmans

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Question concerning pacemaker-1-1-10-rc6

2013-07-08 Thread Andreas Mock

Hi Andrew,

is it much work to provide the release candidate also as
complete package on the mentioned site?
Or is it against some policy?

Anyway: The last version pacemaker-1.1.10-3.1736.37b9108.git.el6.x86_64.rpm
seems to work pretty well.

Best regards
Andreas



-Ursprüngliche Nachricht-
Von: Andrew Beekhof [mailto:and...@beekhof.net] 
Gesendet: Dienstag, 9. Juli 2013 04:09
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] Question concerning pacemaker-1-1-10-rc6


On 08/07/2013, at 5:57 PM, Andreas Mock  wrote:

> Hi Andrew,
> 
> I'm taking the builds from
> http://clusterlabs.org/rpm-test-next/rhel-6/x86_64/
> to avoid compiling on my own.
> Do these build relate to the release candidates you're announcing?

Not at all, those are whatever I happen to be testing at the time.
They do include the git hash in the rpm names though.

> If yes, could you also announce the version strings?
> 
> Best regards
> Andreas Mock
> 
> 
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Best way to notify stonith action

2013-07-08 Thread Andreas Mock

Hi all,

thank you for your recommendations.
I just hoped that there is something pacemaker internal,
e.g. like sending traps via snmp or something like that.

Best regards
Andreas Mock

-Ursprüngliche Nachricht-
Von: Digimer [mailto:li...@alteeve.ca] 
Gesendet: Montag, 8. Juli 2013 16:01
An: The Pacemaker cluster resource manager
Cc: Andreas Mock
Betreff: Re: [Pacemaker] Best way to notify stonith action

On 08/07/13 03:48, Andreas Mock wrote:
> Hi all,
>
> I'm just wondering what the best way is to
> let an admin know that the cluster (rest of
> a cluster) has stonithed some other nodes?
>
> What is the recommended way?
> (The fact that the machine rebooted or is
> halted is not the problem. I want to know
> that stonithing was done)
>
> Best regards
> Andreas Mock

Personally, I have a little monitoring script I wrote that watches the 
cluster resources, local hardware (via the IPMI BMC), UPSes and 
what-not. It loop every 30 seconds and sends an email if/when anything 
of note changes. A node being fenced certainly raises a flag and emails 
go out.

My script is principally for cman + rgmanager, but it should be easy to 
craft your own, too. I just read in the current state of things, compare 
against the values in the last scan, decide whether to send an email or 
not, copy the just-read values over to the last-scan values and delete 
the "new" values and go back to sleep for 30 seconds.

hth

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Question concerning pacemaker-1-1-10-rc6

2013-07-08 Thread Andreas Mock

Hi Andrew,

I'm taking the builds from 
http://clusterlabs.org/rpm-test-next/rhel-6/x86_64/
to avoid compiling on my own.
Do these build relate to the release candidates you're announcing?
If yes, could you also announce the version strings?

Best regards
Andreas Mock




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Best way to notify stonith action

2013-07-08 Thread Andreas Mock

Hi all,

I'm just wondering what the best way is to
let an admin know that the cluster (rest of
a cluster) has stonithed some other nodes?

What is the recommended way?
(The fact that the machine rebooted or is
halted is not the problem. I want to know
that stonithing was done)

Best regards
Andreas Mock



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Full API description for Fence Agent

2013-07-08 Thread Andreas Mock

Hi all,

after doing a bigger debugging session and reading the
documentation more than once, I got the fence agent to
work. In this case, the fence agent is a program which
can be used by cman/fenced (RHEL cluster) and by
pacemaker running in this environment as stonith device.

Only for completion:
https://fedorahosted.org/cluster/wiki/FenceAgentAPI

My findings to get it work:
a) Additionally (that was also said somewhere else before)
the agent needs to implement the 'metadata' call which 
prints a xml-document on STDOUT and was adapted from
other scripted fence agents. I couldn't find a spec for
that xml. I let this call also return 0.

b) Contrary to the spec above, stonith_ng seems to
send the parameters 'nodename' and 'port'. As my
stonith agent doesn't need that, I've thrown an exception
scanning these parameters which led to an error in the
logs. => Now these parameters are valid even when not
used.
Someone should clarify how to react on that parameters
correctly when not used. See c)

c) One thing I missed when configuring the stonith
agent in pacemaker was the parameter 'pcmk_host_list'.
Look here
http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/_c
onfiguring_stonith.html
to see details. Therefor pacemaker couldn't know
how to fence the node. This was also seen by
issuing the command 'stonith_admin -l ',
what I was wondering about before solving the problem.

Additions and corrections welcome for all
fence agent programmers.

Best regards
Andreas Mock

-Ursprüngliche Nachricht-
Von: Andrew Beekhof [mailto:and...@beekhof.net] 
Gesendet: Montag, 8. Juli 2013 05:27
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] Full API description for Fence Agent

On 04/07/2013, at 9:52 PM, Andreas Mock  wrote:

> Hi Andrew,
> 
> is there some kind of agreement how to tag a message?
> Like (DEBUG/TRACE/ERROR/WARN)?

No.  But pacemaker obeys the general convention of "errors to stderr,
everything else to stdout".

> Is there a way message level filtering is done?

There is no filtering.

> 
> Best regards
> Andreas
> 
> 
> -Ursprüngliche Nachricht-
> Von: Andrew Beekhof [mailto:and...@beekhof.net] 
> Gesendet: Donnerstag, 4. Juli 2013 13:41
> An: The Pacemaker cluster resource manager
> Betreff: Re: [Pacemaker] Full API description for Fence Agent
> 
> 
> On 04/07/2013, at 7:24 PM, Andreas Mock  wrote:
> 
>> Hi digimer,
>> 
>> I would like to take your offer and asking the following:
>> 
>> The API documents says nothing about the correct way
>> of giving messages back to the stonith daemon.
>> So, what is the right way to write error/warn/info messages.
>> 
>> Looking at the scripted agents available I can find a nice
>> mixture of using STDERR and STDOUT.
>> What is the rule here?
>> Can you give insights, whether STDOUT/STDERR is captured by
>> the calling program and logged somewher (and where)?
> 
> In the case of pacemaker, we capture and log both.
> 
>> 
>> By the way: How is it going with merging the stonith/fencing API? ;-)
>> 
>> Best regards
>> Andreas
>> 
>> -Ursprüngliche Nachricht-
>> Von: Digimer [mailto:li...@alteeve.ca] 
>> Gesendet: Dienstag, 11. Juni 2013 15:34
>> An: The Pacemaker cluster resource manager
>> Cc: Andreas Mock
>> Betreff: Re: [Pacemaker] Full API description for Fence Agent
>> 
>> Hi Andreas,
>> 
>>  The metadata section of the document has not been added yet, but we 
>> are aware of it missing and are working to add it. The rest of the 
>> document is accurate though. If you build an agent to follow that API, 
>> it will work with red hat's cluster and pacemaker.
>> 
>>  In the meantime, it's not ideal, but if you call any other fence 
>> agent and pass '-o metadata', you will see the output that the cluster 
>> expects. It should be easy to adapt to your new agent.
>> 
>>  If you have any trouble, please don't hesitate to ask here and we 
>> will do our best to help.
>> 
>> digimer
>> 
>> On 06/11/2013 07:04 AM, Andreas Mock wrote:
>>> Hi all,
>>> 
>>> we need to implement a fence_agent (stonith agent) for
>>> cman/corosync/pacemaker (RHEL 6.x). I found the following documentation
>>> https://fedorahosted.org/cluster/wiki/FenceAgentAPI
>>> 
>>> But in this document the required metadata action is not
>>> described. Can anybody point me to a documentation which
>>> is complete?
>>> 
>>> Where is the schema of the xml returned by 'metadata'?
>>> 
>

Re: [Pacemaker] Another question about fencing/stonithing

2013-07-05 Thread Andreas Mock

Thank you for your hint.

 There is a German saying which I try to translate:
"You don't see the forest 'cause of all the trees"

So, I'll see.

Best regards
Andreas Mock


-Ursprüngliche Nachricht-
Von: Digimer [mailto:li...@alteeve.ca] 
Gesendet: Freitag, 5. Juli 2013 17:22
An: Andreas Mock
Cc: 'The Pacemaker cluster resource manager'; 'Marek Grac'
Betreff: Re: AW: [Pacemaker] Another question about fencing/stonithing

Andrew might know the trick. In theory, putting your agent into the 
/usr/sbin or /sbin directory (where ever the other agents are) should 
"just work". You're sure the exit codes are appropriate? I am sure they 
are, but just thinking out loud about too-obvious-to-see possible issues.

On 05/07/13 11:17, Andreas Mock wrote:
> Hi Digimer,
>
> sorry I forget to mention that I implemented the metadata-call
> accordingly. But it may be the "registration" thing which
> is necessary to make it know to the stonith/fencing daemon.
>
> I don't know. I'm wondering a little bit that there is no
> pointer how to do it.
>
> Thank you for your answer!
>
> Best regards
> Andreas Mock
>
>
> -Ursprüngliche Nachricht-
> Von: Digimer [mailto:li...@alteeve.ca]
> Gesendet: Freitag, 5. Juli 2013 16:52
> An: The Pacemaker cluster resource manager
> Cc: Andreas Mock; Marek Grac
> Betreff: Re: [Pacemaker] Another question about fencing/stonithing
>
> On 05/07/13 03:34, Andreas Mock wrote:
>> Hi all,
>>
>> I just wrote a stonith agent which IMHO implements the
>> API spec found at https://fedorahosted.org/cluster/wiki/FenceAgentAPI.
>>
>> But it seems it has a problem when used as pacemaker stonith device.
>>
>> What has to be done, to have a stonith/fencing agent which implements
>> both roles. I'm pretty sure something is missing.
>> It's just a guess that it has something to do with listing "registered"
>> agents.
>>
>> What is a registered stonith agent and what is done while registering it?
>>
>> When I configure my own fencing agent as packemaker stonith device
>> and try to do a "stonith_admin --list=nodename" I get a "no such device"
>> error.
>>
>> Any pointer appreciated.
>>
>> Best regards
>> Andreas Mock
>
> The API doesn't (yet) cover the metadata action. The agents now have to
> print out XML validation of valid attributes and elements for your
> agent. If you call any existing fence_* agent with just -o metadata, you
> will see the format.
>
> I know rhcs can be forced to see the new agent by putting it in the same
> directory as the other agents and then running 'ccs_update_schema'. If
> pacemaker doesn't immediately see it, then there might be an equivalent
> command you can run.
>
> I will try to get the API updated. I'm not a cardinal source, but
> something is better than nothing. Marek (who I have cc'ed) is, so I can
> run the changes by him when done to ensure they're accurate.
>


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Another question about fencing/stonithing

2013-07-05 Thread Andreas Mock

Hi Digimer,

sorry I forget to mention that I implemented the metadata-call
accordingly. But it may be the "registration" thing which
is necessary to make it know to the stonith/fencing daemon.

I don't know. I'm wondering a little bit that there is no
pointer how to do it.

Thank you for your answer!

Best regards
Andreas Mock

-Ursprüngliche Nachricht-
Von: Digimer [mailto:li...@alteeve.ca] 
Gesendet: Freitag, 5. Juli 2013 16:52
An: The Pacemaker cluster resource manager
Cc: Andreas Mock; Marek Grac
Betreff: Re: [Pacemaker] Another question about fencing/stonithing

On 05/07/13 03:34, Andreas Mock wrote:
> Hi all,
>
> I just wrote a stonith agent which IMHO implements the
> API spec found at https://fedorahosted.org/cluster/wiki/FenceAgentAPI.
>
> But it seems it has a problem when used as pacemaker stonith device.
>
> What has to be done, to have a stonith/fencing agent which implements
> both roles. I'm pretty sure something is missing.
> It's just a guess that it has something to do with listing "registered"
> agents.
>
> What is a registered stonith agent and what is done while registering it?
>
> When I configure my own fencing agent as packemaker stonith device
> and try to do a "stonith_admin --list=nodename" I get a "no such device"
> error.
>
> Any pointer appreciated.
>
> Best regards
> Andreas Mock

The API doesn't (yet) cover the metadata action. The agents now have to 
print out XML validation of valid attributes and elements for your 
agent. If you call any existing fence_* agent with just -o metadata, you 
will see the format.

I know rhcs can be forced to see the new agent by putting it in the same 
directory as the other agents and then running 'ccs_update_schema'. If 
pacemaker doesn't immediately see it, then there might be an equivalent 
command you can run.

I will try to get the API updated. I'm not a cardinal source, but 
something is better than nothing. Marek (who I have cc'ed) is, so I can 
run the changes by him when done to ensure they're accurate.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Another question about fencing/stonithing

2013-07-05 Thread Andreas Mock

Hi all,

I just wrote a stonith agent which IMHO implements the
API spec found at https://fedorahosted.org/cluster/wiki/FenceAgentAPI.

But it seems it has a problem when used as pacemaker stonith device.

What has to be done, to have a stonith/fencing agent which implements
both roles. I'm pretty sure something is missing.
It's just a guess that it has something to do with listing "registered"
agents.

What is a registered stonith agent and what is done while registering it?

When I configure my own fencing agent as packemaker stonith device
and try to do a "stonith_admin --list=nodename" I get a "no such device"
error.

Any pointer appreciated.

Best regards
Andreas Mock



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Full API description for Fence Agent

2013-07-04 Thread Andreas Mock

Hi Digimer, hi all,

there is a little thing in the API doc which is also unclear to me.
It says:

"[...]
status - this is not implemented by most agents nor used by fenced at this
time. Return values:

0 if the fence device is reachable and the port is in the on state
1 if the fence device could not be contacted
2 if the fence device is reachable but is in the off state
[...]"

What is meant with return code 2? Does it mean I could contact the fence
device
and it says that the PORT is in off state?
How do I have to understand the state "fence device in off state"?

Best regards
Andreas Mock

-Ursprüngliche Nachricht-----
Von: Andreas Mock [mailto:andreas.m...@web.de] 
Gesendet: Donnerstag, 4. Juli 2013 11:25
An: 'Digimer'; 'The Pacemaker cluster resource manager'
Betreff: Re: [Pacemaker] Full API description for Fence Agent

Hi digimer,

I would like to take your offer and asking the following:

The API documents says nothing about the correct way
of giving messages back to the stonith daemon.
So, what is the right way to write error/warn/info messages.

Looking at the scripted agents available I can find a nice
mixture of using STDERR and STDOUT.
What is the rule here?
Can you give insights, whether STDOUT/STDERR is captured by
the calling program and logged somewher (and where)?

By the way: How is it going with merging the stonith/fencing API? ;-)

Best regards
Andreas

-Ursprüngliche Nachricht-
Von: Digimer [mailto:li...@alteeve.ca] 
Gesendet: Dienstag, 11. Juni 2013 15:34
An: The Pacemaker cluster resource manager
Cc: Andreas Mock
Betreff: Re: [Pacemaker] Full API description for Fence Agent

Hi Andreas,

   The metadata section of the document has not been added yet, but we 
are aware of it missing and are working to add it. The rest of the 
document is accurate though. If you build an agent to follow that API, 
it will work with red hat's cluster and pacemaker.

   In the meantime, it's not ideal, but if you call any other fence 
agent and pass '-o metadata', you will see the output that the cluster 
expects. It should be easy to adapt to your new agent.

   If you have any trouble, please don't hesitate to ask here and we 
will do our best to help.

digimer

On 06/11/2013 07:04 AM, Andreas Mock wrote:
> Hi all,
>
> we need to implement a fence_agent (stonith agent) for
> cman/corosync/pacemaker (RHEL 6.x). I found the following documentation
> https://fedorahosted.org/cluster/wiki/FenceAgentAPI
>
> But in this document the required metadata action is not
> described. Can anybody point me to a documentation which
> is complete?
>
> Where is the schema of the xml returned by 'metadata'?
>
> What has to be done that a fence_agent can also be used
> by pacemaker?
>
> What is the right return code of action 'metadata'?
>
> Is there some explanation how the stonith/fence parts
> play together?
>
> Best regards
> Andreas Mock
>
>
>
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Full API description for Fence Agent

2013-07-04 Thread Andreas Mock

Hi Andrew,

is there some kind of agreement how to tag a message?
Like (DEBUG/TRACE/ERROR/WARN)?
Is there a way message level filtering is done?

Best regards
Andreas


-Ursprüngliche Nachricht-
Von: Andrew Beekhof [mailto:and...@beekhof.net] 
Gesendet: Donnerstag, 4. Juli 2013 13:41
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] Full API description for Fence Agent


On 04/07/2013, at 7:24 PM, Andreas Mock  wrote:

> Hi digimer,
> 
> I would like to take your offer and asking the following:
> 
> The API documents says nothing about the correct way
> of giving messages back to the stonith daemon.
> So, what is the right way to write error/warn/info messages.
> 
> Looking at the scripted agents available I can find a nice
> mixture of using STDERR and STDOUT.
> What is the rule here?
> Can you give insights, whether STDOUT/STDERR is captured by
> the calling program and logged somewher (and where)?

In the case of pacemaker, we capture and log both.

> 
> By the way: How is it going with merging the stonith/fencing API? ;-)
> 
> Best regards
> Andreas
> 
> -Ursprüngliche Nachricht-
> Von: Digimer [mailto:li...@alteeve.ca] 
> Gesendet: Dienstag, 11. Juni 2013 15:34
> An: The Pacemaker cluster resource manager
> Cc: Andreas Mock
> Betreff: Re: [Pacemaker] Full API description for Fence Agent
> 
> Hi Andreas,
> 
>   The metadata section of the document has not been added yet, but we 
> are aware of it missing and are working to add it. The rest of the 
> document is accurate though. If you build an agent to follow that API, 
> it will work with red hat's cluster and pacemaker.
> 
>   In the meantime, it's not ideal, but if you call any other fence 
> agent and pass '-o metadata', you will see the output that the cluster 
> expects. It should be easy to adapt to your new agent.
> 
>   If you have any trouble, please don't hesitate to ask here and we 
> will do our best to help.
> 
> digimer
> 
> On 06/11/2013 07:04 AM, Andreas Mock wrote:
>> Hi all,
>> 
>> we need to implement a fence_agent (stonith agent) for
>> cman/corosync/pacemaker (RHEL 6.x). I found the following documentation
>> https://fedorahosted.org/cluster/wiki/FenceAgentAPI
>> 
>> But in this document the required metadata action is not
>> described. Can anybody point me to a documentation which
>> is complete?
>> 
>> Where is the schema of the xml returned by 'metadata'?
>> 
>> What has to be done that a fence_agent can also be used
>> by pacemaker?
>> 
>> What is the right return code of action 'metadata'?
>> 
>> Is there some explanation how the stonith/fence parts
>> play together?
>> 
>> Best regards
>> Andreas Mock
>> 
>> 
>> 
>> 
>> 
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>> 
> 
> 
> -- 
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without 
> access to education?
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Full API description for Fence Agent

2013-07-04 Thread Andreas Mock

Hi digimer,

I would like to take your offer and asking the following:

The API documents says nothing about the correct way
of giving messages back to the stonith daemon.
So, what is the right way to write error/warn/info messages.

Looking at the scripted agents available I can find a nice
mixture of using STDERR and STDOUT.
What is the rule here?
Can you give insights, whether STDOUT/STDERR is captured by
the calling program and logged somewher (and where)?

By the way: How is it going with merging the stonith/fencing API? ;-)

Best regards
Andreas

-Ursprüngliche Nachricht-
Von: Digimer [mailto:li...@alteeve.ca] 
Gesendet: Dienstag, 11. Juni 2013 15:34
An: The Pacemaker cluster resource manager
Cc: Andreas Mock
Betreff: Re: [Pacemaker] Full API description for Fence Agent

Hi Andreas,

   The metadata section of the document has not been added yet, but we 
are aware of it missing and are working to add it. The rest of the 
document is accurate though. If you build an agent to follow that API, 
it will work with red hat's cluster and pacemaker.

   In the meantime, it's not ideal, but if you call any other fence 
agent and pass '-o metadata', you will see the output that the cluster 
expects. It should be easy to adapt to your new agent.

   If you have any trouble, please don't hesitate to ask here and we 
will do our best to help.

digimer

On 06/11/2013 07:04 AM, Andreas Mock wrote:
> Hi all,
>
> we need to implement a fence_agent (stonith agent) for
> cman/corosync/pacemaker (RHEL 6.x). I found the following documentation
> https://fedorahosted.org/cluster/wiki/FenceAgentAPI
>
> But in this document the required metadata action is not
> described. Can anybody point me to a documentation which
> is complete?
>
> Where is the schema of the xml returned by 'metadata'?
>
> What has to be done that a fence_agent can also be used
> by pacemaker?
>
> What is the right return code of action 'metadata'?
>
> Is there some explanation how the stonith/fence parts
> play together?
>
> Best regards
> Andreas Mock
>
>
>
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Question to fencing/stonithing

2013-07-01 Thread Andreas Mock

Hi Leon,

thank you for the pointer to the manuals. I read it already.

My 2-node-cluster seems not to fence the other node
at startup. And I do not have an explanation. That's the reason
I asked (after reading the docs).

- CMAN_QUORUM_TIMEOUT=0
As the inline doc says:
# CMAN_QUORUM_TIMEOUT -- amount of time to wait for a quorate cluster on
# startup quorum is needed by many other applications, so we may as
# well wait here.  If CMAN_QUORUM_TIMEOUT is zero, quorum will
# be ignored.

=> quorum is ignored => fence-domain is created and enabled with the first
node
joining (isn't it?).

- as man fenced says:
When the fence domain is first created in the cluster (by the first node to
join it)
and subsequently enabled (by the cluster gaining quorum)  any
nodes  listed  in cluster.conf that are not presently members of the
corosync cluster are fenced.

- so, does quorum ignore mean: You don't have quorum but
it doesn't matter or does it mean the first node does get quorum
even it's the one an only node. 

So, my questions more precise:
Why does a startup-fencing not happen in my 2-node-cluster?
Is there a way to get this behaviour?

Best regards
Andreas Mock

-Ursprüngliche Nachricht-
Von: Leon Fauster [mailto:leonfaus...@googlemail.com] 
Gesendet: Montag, 1. Juli 2013 19:45
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] Question to fencing/stonithing

Am 01.07.2013 um 14:28 schrieb Andreas Mock :
> Hi all,
> 
> just want to get clear about startup fencing.
> 
> Scenario: RHEL 6.4, cman, 2-node-cluster, pacemaker, fence via 
> pcmk-redirect. pacemaker stonith enabled, no-quorum-policy=ignore, 
> CMAN_QUORUM_TIMEOUT=0
> 
> 
> When should a startup fencing operation occure?
> I thought a freshly starting node not seeing the other members in a 
> timeout interval will try to stonith the other node to get sure that 
> this one doesn't run resources. Is this true?
> Where is the config variable for that timeout?
> 
> Can someone put light on that, please?



/etc/sysconfig/cman

man fenced

--
LF

 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Question to fencing/stonithing

2013-07-01 Thread Andreas Mock

Hi all,

just want to get clear about startup fencing.

Scenario: RHEL 6.4, cman, 2-node-cluster, pacemaker,
fence via pcmk-redirect. pacemaker stonith enabled,
no-quorum-policy=ignore, CMAN_QUORUM_TIMEOUT=0


When should a startup fencing operation occure?
I thought a freshly starting node not seeing the
other members in a timeout interval will try to
stonith the other node to get sure that this one
doesn't run resources. Is this true?
Where is the config variable for that timeout?

Can someone put light on that, please?

Best regards
Andreas Mock



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] known problem with corosync 1.4.1 on centos64 ?

2013-06-21 Thread Andreas Mock

Hi Andreas,

my two cents to your questions:

a) If you want to learn most, take any distro and compile the components
from

source and afterwards use them. => Most learned.

b) I don't know how others think about it: But I use a cluster to try to
increase uptime.

If I know that a disto's component is buggy causing failures while doing the
first steps

with a more or less standard config (corosync/pacemaker/drbd + some service)
I have

two choices when I have to stick to a distro's repos:

1) Take the next step distro6.4 in your case. But it can have bugs too.

2) Ask why it is important to stick to the ditro's repos with a certain
software stack.

In your case I don't know why it is "allowed" to build drbd from source and
it's not

"allowed" to build the cluster stack from source. Especially while getting
the feet wet

with corosync/pacemaker and all the stuff is much more effort compared to
the effort

understanding, configuring and maintaining a cluster.

My policy is also to keep as close as possible to the distro's repos. But
when

I need a newer or more stable version of a software, I have to use it.

Best regards

Andreas

Von: andreas graeper [mailto:agrae...@googlemail.com] 
Gesendet: Freitag, 21. Juni 2013 15:00
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] known problem with corosync 1.4.1 on centos64 ?

hi,

> old version : 

i shall maintain a centos63 with, except drbd (build from source), only
standard-repos are used.

for testing i installed newest centos64, but .. .

there is no chance to get rid of that centos63, but for learning/testing
what are the best distros ? not in general, but for use with
drbd+corosync+pacemaker. 

2013/6/21 Lars Marowsky-Bree 

On 2013-06-21T10:56:29, andreas graeper  wrote:

> hi,
> when only i remove or add resources, corosync starts to eat up all cpu.
> drbd 8.4.1 (build from source)
> corosync 1.4.1

yes, corosync 1.4.1 had one such error, I recall. If you're building
from source, why are you sticking to such an old version?

Regards,
Lars

--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer,
HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] uname eq node-name

2013-06-12 Thread Andreas Mock

Hi Andrew,

can you tell me what the attribute  #uname is holding?
Is it the node-name or the 'uname -n' of the node?
(I justt read
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Pacemaker_
Explained/index.html#_which_resource_instance_is_promoted)

Is there an attribute like '#node'or '#nodename'?

Best regards
Andreas Mock



-Ursprüngliche Nachricht-
Von: Andrew Beekhof [mailto:and...@beekhof.net] 
Gesendet: Mittwoch, 12. Juni 2013 06:45
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] uname eq node-name


On 12/06/2013, at 2:40 PM, "Andreas Mock"  wrote:

> Hi Andrew,
> 
> thank you for that information. You know, often one answer is followed 
> by many other questions. The same here:
> 
> Is there a tool, where a script is able to determine the node name 
> based on the uname?
> For a script it is easy to find the nodename (uname -n) it is running 
> on. But what has to be done when the script needs to know the 
> node-name it is running on?

crm_node -n is a good place to start, but requires a running cluster.

> 
> Best regards
> Andreas Mock
> 
> 
> -Ursprüngliche Nachricht-
> Von: Andrew Beekhof [mailto:and...@beekhof.net]
> Gesendet: Mittwoch, 12. Juni 2013 00:27
> An: The Pacemaker cluster resource manager
> Betreff: Re: [Pacemaker] uname eq node-name
> 
> 
> On 11/06/2013, at 2:33 AM, Andreas Mock  wrote:
> 
>> Hi all,
>> 
>> I couldn't find a definitive source stating that a 
>> corosync/pacemaker/cman cluster must follow the
>> rule: uname -n == node-name (== DNS-name of communication-IP)
> 
> In older versions this is true (an artefact of our heartbeat heritage).
> However we have been chipping away at that in 1.1.9 and I am currently 
> running corosync 2.x with pacemaker 1.1.10-rc4 and node-name != uname 
> -n
> 
>> 
>> Can someone give a hint for related documentation?
>> 
>> The question arises when you want to configure a cman based cluster
>> (cluster.conf) having a uname -n equal to the DNS-name of the 
>> external ip address but whant to route the cluster communication over 
>> the internal IP-adresse (cluster interconnect).
>> I couldn't find a solution that doesn't use the DNS-names of the 
>> internal ip-addresses as node-names.
>> 
>> Hints and rules welcome!
>> 
>> Best regards
>> Andreas Mock
>> 
>> 
>> 
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org Getting started: 
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] clusterlabs.org down?

2013-06-12 Thread Andreas Mock

Hi Digimer,

oh...sorry...just stonithed the server while
trying to reverse engineer the fence api... 

;)

Best regards
Andreas Mock



-Ursprüngliche Nachricht-
Von: Digimer [mailto:li...@alteeve.ca] 
Gesendet: Mittwoch, 12. Juni 2013 16:45
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] clusterlabs.org down?

On 06/12/2013 10:41 AM, David Vossel wrote:
> - Original Message -
>> From: "Michael Schwartzkopff" 
>> To: pacemaker@oss.clusterlabs.org
>> Sent: Wednesday, June 12, 2013 9:21:08 AM
>> Subject: [Pacemaker] clusterlabs.org down?
>
> yep, it is down for me as well.
>
> -- Vossel

Down here, too.

We should invent a technology that helps keep services available. Highly 
available, if you will.

;)

I'll show myself to the door...

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] uname eq node-name

2013-06-11 Thread Andreas Mock

Hi Andrew,

thank you for that information. You know, often one answer
is followed by many other questions. The same here:

Is there a tool, where a script is able to determine the
node name based on the uname? 
For a script it is easy to find the nodename (uname -n)
it is running on. But what has to be done when the
script needs to know the node-name it is running on?

Best regards
Andreas Mock


-Ursprüngliche Nachricht-
Von: Andrew Beekhof [mailto:and...@beekhof.net] 
Gesendet: Mittwoch, 12. Juni 2013 00:27
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] uname eq node-name


On 11/06/2013, at 2:33 AM, Andreas Mock  wrote:

> Hi all,
> 
> I couldn't find a definitive source stating that a 
> corosync/pacemaker/cman cluster must follow the
> rule: uname -n == node-name (== DNS-name of communication-IP)

In older versions this is true (an artefact of our heartbeat heritage).
However we have been chipping away at that in 1.1.9 and I am currently
running corosync 2.x with pacemaker 1.1.10-rc4 and node-name != uname -n

> 
> Can someone give a hint for related documentation?
> 
> The question arises when you want to configure a cman based cluster 
> (cluster.conf) having a uname -n equal to the DNS-name of the external 
> ip address but whant to route the cluster communication over the 
> internal IP-adresse (cluster interconnect).
> I couldn't find a solution that doesn't use the DNS-names of the 
> internal ip-addresses as node-names.
> 
> Hints and rules welcome!
> 
> Best regards
> Andreas Mock
> 
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Full API description for Fence Agent

2013-06-11 Thread Andreas Mock

Hi all,

we need to implement a fence_agent (stonith agent) for
cman/corosync/pacemaker (RHEL 6.x). I found the following documentation
https://fedorahosted.org/cluster/wiki/FenceAgentAPI

But in this document the required metadata action is not
described. Can anybody point me to a documentation which
is complete?

Where is the schema of the xml returned by 'metadata'?

What has to be done that a fence_agent can also be used
by pacemaker?

What is the right return code of action 'metadata'?

Is there some explanation how the stonith/fence parts
play together?

Best regards
Andreas Mock





___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] What kind of cluster stack at opensuse-repositories

2013-06-10 Thread Andreas Mock

Hi Lars,

thank you for answering. Could you tell me whether the stack
is like Option1 or Option3 of this article
http://blog.clusterlabs.org/blog/2012/pacemaker-and-cluster-filesystems/

If it's Option1 when do you think SuSE switches to Option3?

Best regards
Andreas Mock

-Ursprüngliche Nachricht-
Von: Lars Marowsky-Bree [mailto:l...@suse.com] 
Gesendet: Montag, 10. Juni 2013 19:49
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] What kind of cluster stack at opensuse-repositories

On 2013-06-10T19:25:38, Andreas Mock  wrote:

> Am I right that these a packages for a RHEL 6.x system but in a 
> corosync-pacemaker-fashion like SuSE uses it over years now?

Yes.

Those packages are scheduled for an update to latest upstream versions as
soon as we wrap up our current project, but we'll not have cman-based
packages available there, I'm pretty sure.

Of course, OBS can build them if someone else maintains them ;-) No policy
against it, just not our primary task.

Regards,
Lars

--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer,
HRB 21284 (AG Nürnberg) "Experience is the name everyone gives to their
mistakes." -- Oscar Wilde

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] What kind of cluster stack at opensuse-repositories

2013-06-10 Thread Andreas Mock

Hi all,

I want to get sure that I do understand it right:

What do I find at 
http://download.opensuse.org/repositories/network:/ha-clustering/RedHat_RHEL
-6/x86_64/

Am I right that I can't use this repository as source for a
more up-to-date-replacement for the RHEL 6.x packages because
these packages are NOT build for a cman-corosync-pacemaker-cluster.

Am I right that these a packages for a RHEL 6.x system but in a
corosync-pacemaker-fashion like SuSE uses it over years now?

Please help to sorting this out.

Best regards
Andreas Mock



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Differences in man pages

2013-06-10 Thread Andreas Mock

Hi all, hi Andrew,

while having your package (pacemaker et. al.) set installed from 
http://clusterlabs.org/rpm-test-next/rhel-6/x86_64/
to (hopefully) help debugging and testing, I mentioned
the following.

The man page of 'crm_resource' doesn't mention
some parameters (like -P) which do work and are
documented by the related man page of the official
RHEL package pacemaker-cli 1.1.8.

Is there a reason that documentation was discarded?

Best regards
Andreas Mock



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] uname eq node-name

2013-06-10 Thread Andreas Mock

Hi all,

I couldn't find a definitive source stating that
a corosync/pacemaker/cman cluster must follow the
rule: uname -n == node-name (== DNS-name of communication-IP)

Can someone give a hint for related documentation?

The question arises when you want to configure a
cman based cluster (cluster.conf) having a uname -n
equal to the DNS-name of the external ip address but
whant to route the cluster communication over
the internal IP-adresse (cluster interconnect).
I couldn't find a solution that doesn't use the
DNS-names of the internal ip-addresses as node-names.

Hints and rules welcome!

Best regards
Andreas Mock



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] The main road of the cluster stack evolution

2013-06-10 Thread Andreas Mock

Hi Ivan,

my advice: Look at 
http://blog.clusterlabs.org/blog/2013/pacemaker-on-rhel6-dot-4/
and at the other blog entries there. It gives some good insight.

Best regards
Andreas Mock


-Ursprüngliche Nachricht-
Von: Халезов Иван [mailto:i.khale...@rts.ru] 
Gesendet: Montag, 10. Juni 2013 17:26
An: The Pacemaker cluster resource manager
Betreff: [Pacemaker] The main road of the cluster stack evolution

Hello everyone!

I would like to ask a few questions about the main road of the cluster 
stack evolution.

1) The RedHat company is planning to drop corosync support and wants to 
switch to CMAN. ( 
http://www.gossamer-threads.com/lists/linuxha/pacemaker/84662 )

What do you think the main trend is? What is the most popular and better 
supported solution? Corosync, CMAN or something else?
What cluster engine is better for Pacemaker at the moment? And what 
could be the best solution in 2-3 years?

2) What is the best tool for cluster management: crm, pcs or something 
else?

Redhat switches to pcs and drops crm, but SUSE prefers crmsh tool.

Why? What tool will you advice to use?

3) What version of pacemaker should I prefer for using on RedHat 6.3 (or 
6.4) ?

The version from the vendor (Pacemaker 1.1.7 for RedHat 6.3 and 
Pacemaker 1.1.8 for RedHat 6.4) or the upstream version from Github?

I usually prefer software versions coming from the distribution, because 
I hope they are well-tested and supported by the vendor.
But, as I know, Pacemaker is a teсhnology preview in RedHat 6, so they 
don't response for it stability.
Also, all the same, I have to rebuild Redhat src.rpm package ( for 
adding corosync 2.3 support into pacemaker)


With best regards,
Ivan Khalezov

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Removing resource from group without disturbing remaining resources in group

2013-06-07 Thread Andreas Mock

Hi Dejan,

thanks for answering. I'll have a look at it and
will see whether sets fit our needs better.

Have a nice weekend.

Best regards
Andreas Mock


-Ursprüngliche Nachricht-
Von: Dejan Muhamedagic [mailto:deja...@fastmail.fm] 
Gesendet: Freitag, 7. Juni 2013 17:28
An: 'The Pacemaker cluster resource manager'
Betreff: Re: [Pacemaker] Removing resource from group without disturbing
remaining resources in group

Hi,

On Fri, Jun 07, 2013 at 04:45:34PM +0200, Andreas Mock wrote:
> Hi Dejan,
> 
> we need colocation and order constraints.

If you need both, then groups are fine.

> IMHO it's an use case for sets, but I have to admit
> that I don't really understand how to configure them
> with crm. A group gives a group resource id which I
> can use as reference in the contraints. It makes the
> config simple.

If the configuration is simple, then you're OK. If the resource
sets wouldn't make the configuration any better/simpler, best not
to use them. Otherwise, the syntax is simple and should be
available with the help (crm configure help order/colocation).

Thanks,

Dejan

> 
> IP1-|
> IP2---> service
> IP3-|
> 
> IP1 .. IP40
> 
> Advices welcome.
> 
> Best regards
> Andreas
> 
> 
> 
> -Ursprüngliche Nachricht-
> Von: Dejan Muhamedagic [mailto:deja...@fastmail.fm] 
> Gesendet: Freitag, 7. Juni 2013 15:55
> An: 'The Pacemaker cluster resource manager'
> Betreff: Re: [Pacemaker] Removing resource from group without disturbing
> remaining resources in group
> 
> On Thu, Jun 06, 2013 at 06:15:26PM +0200, Andreas Mock wrote:
> > Hi Florian,
> > 
> > thank you very much for that method description.
> > It seems that it does exactly what we want. By the way.
> > It's the same use case as yours. Many IP for which we
> > want a constraint handle (group).
> 
> But wouldn't just a collocation constraint do if the order is not
> important?
> 
> Thanks,
> 
> Dejan
> 
> > Thank you!
> > 
> > Best regards
> > Andreas Mock
> > 
> > 
> > -Ursprüngliche Nachricht-
> > Von: Florian Crouzat [mailto:gen...@floriancrouzat.net] 
> > Gesendet: Donnerstag, 6. Juni 2013 16:50
> > An: pacemaker@oss.clusterlabs.org
> > Betreff: Re: [Pacemaker] Removing resource from group without disturbing
> > remaining resources in group
> > 
> > Le 06/06/2013 16:35, Andreas Mock a écrit :
> > > Hi all,
> > >
> > > is there a way to remove a resource from a group without
> > > disturbing the other resources in the group.
> > >
> > > The following example:
> > > - G1 has R1 R2 R3
> > > - All resources are started
> > > - Stopping R1 would cause a stop of R2 R3
> > > - So, the idea was:
> > > * crm configure edit => remove R1 from the group while running
> > > * stop resource
> > > * delete resource
> > >
> > > BUT: At some point (which we couldn't find out at
> > > the moment) all remaining resources of the group are
> > > restarted. It seems that the change of the implicit
> > > dependency tree of the initial group forces a rebuild
> > > of that tree including a restart of that group.
> > > (Andrew: Is this assumption right?)
> > >
> > > So, is there are way to add/remove resources from
> > > group without disturbing the other resources.
> > > It's clear to me that the resources would restart
> > > when the node assignment after removing would change.
> > >
> > > Hints welcome.
> > >
> > 
> > Approximative syntax, do not blame me !
> > 
> > * crm configure property maintenance-mode=true
> > * crm resource stop R1 # it won't stop as it's in maintenance-mode
> > * crm configure delete R1
> > * crm configure show # very that all references to R1 are gone
> > * crm resource reprobe # the cluster double check the status of declared

> > resources and sees that everything is fine and R1 doesn't exists anymore
> > * crm_mon -Arf1 # double check that everything is "started (unmanaged)" 
> > and R1 is gone
> > * crm_simulate -S -L -VVV # optional, to check what would happen when 
> > leaving maintenance-mode
> > * crm configure property maintenance-mode=false
> > 
> > If something goes wrong while in maintenance-mode, crm resource cleanup 
> > foo might be handy. Nothing should move, start or stop until you leave 
> > maintenance-mode anyway. I use this scenario very often, to add or 
> > remove IPaddr2 resources to a group of 30+ IPaddr2.
>

Re: [Pacemaker] Removing resource from group without disturbing remaining resources in group

2013-06-07 Thread Andreas Mock

Hi Dejan,

we need colocation and order constraints.
IMHO it's an use case for sets, but I have to admit
that I don't really understand how to configure them
with crm. A group gives a group resource id which I
can use as reference in the contraints. It makes the
config simple.


IP1-|
IP2---> service
IP3-|

IP1 .. IP40

Advices welcome.

Best regards
Andreas



-Ursprüngliche Nachricht-
Von: Dejan Muhamedagic [mailto:deja...@fastmail.fm] 
Gesendet: Freitag, 7. Juni 2013 15:55
An: 'The Pacemaker cluster resource manager'
Betreff: Re: [Pacemaker] Removing resource from group without disturbing
remaining resources in group

On Thu, Jun 06, 2013 at 06:15:26PM +0200, Andreas Mock wrote:
> Hi Florian,
> 
> thank you very much for that method description.
> It seems that it does exactly what we want. By the way.
> It's the same use case as yours. Many IP for which we
> want a constraint handle (group).

But wouldn't just a collocation constraint do if the order is not
important?

Thanks,

Dejan

> Thank you!
> 
> Best regards
> Andreas Mock
> 
> 
> -Ursprüngliche Nachricht-
> Von: Florian Crouzat [mailto:gen...@floriancrouzat.net] 
> Gesendet: Donnerstag, 6. Juni 2013 16:50
> An: pacemaker@oss.clusterlabs.org
> Betreff: Re: [Pacemaker] Removing resource from group without disturbing
> remaining resources in group
> 
> Le 06/06/2013 16:35, Andreas Mock a écrit :
> > Hi all,
> >
> > is there a way to remove a resource from a group without
> > disturbing the other resources in the group.
> >
> > The following example:
> > - G1 has R1 R2 R3
> > - All resources are started
> > - Stopping R1 would cause a stop of R2 R3
> > - So, the idea was:
> > * crm configure edit => remove R1 from the group while running
> > * stop resource
> > * delete resource
> >
> > BUT: At some point (which we couldn't find out at
> > the moment) all remaining resources of the group are
> > restarted. It seems that the change of the implicit
> > dependency tree of the initial group forces a rebuild
> > of that tree including a restart of that group.
> > (Andrew: Is this assumption right?)
> >
> > So, is there are way to add/remove resources from
> > group without disturbing the other resources.
> > It's clear to me that the resources would restart
> > when the node assignment after removing would change.
> >
> > Hints welcome.
> >
> 
> Approximative syntax, do not blame me !
> 
> * crm configure property maintenance-mode=true
> * crm resource stop R1 # it won't stop as it's in maintenance-mode
> * crm configure delete R1
> * crm configure show # very that all references to R1 are gone
> * crm resource reprobe # the cluster double check the status of declared 
> resources and sees that everything is fine and R1 doesn't exists anymore
> * crm_mon -Arf1 # double check that everything is "started (unmanaged)" 
> and R1 is gone
> * crm_simulate -S -L -VVV # optional, to check what would happen when 
> leaving maintenance-mode
> * crm configure property maintenance-mode=false
> 
> If something goes wrong while in maintenance-mode, crm resource cleanup 
> foo might be handy. Nothing should move, start or stop until you leave 
> maintenance-mode anyway. I use this scenario very often, to add or 
> remove IPaddr2 resources to a group of 30+ IPaddr2.
> 
> 
> -- 
> Cheers,
> Florian Crouzat
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Removing resource from group without disturbing remaining resources in group

2013-06-06 Thread Andreas Mock

Hi Florian,

thank you very much for that method description.
It seems that it does exactly what we want. By the way.
It's the same use case as yours. Many IP for which we
want a constraint handle (group).

Thank you!

Best regards
Andreas Mock


-Ursprüngliche Nachricht-
Von: Florian Crouzat [mailto:gen...@floriancrouzat.net] 
Gesendet: Donnerstag, 6. Juni 2013 16:50
An: pacemaker@oss.clusterlabs.org
Betreff: Re: [Pacemaker] Removing resource from group without disturbing
remaining resources in group

Le 06/06/2013 16:35, Andreas Mock a écrit :
> Hi all,
>
> is there a way to remove a resource from a group without
> disturbing the other resources in the group.
>
> The following example:
> - G1 has R1 R2 R3
> - All resources are started
> - Stopping R1 would cause a stop of R2 R3
> - So, the idea was:
> * crm configure edit => remove R1 from the group while running
> * stop resource
> * delete resource
>
> BUT: At some point (which we couldn't find out at
> the moment) all remaining resources of the group are
> restarted. It seems that the change of the implicit
> dependency tree of the initial group forces a rebuild
> of that tree including a restart of that group.
> (Andrew: Is this assumption right?)
>
> So, is there are way to add/remove resources from
> group without disturbing the other resources.
> It's clear to me that the resources would restart
> when the node assignment after removing would change.
>
> Hints welcome.
>

Approximative syntax, do not blame me !

* crm configure property maintenance-mode=true
* crm resource stop R1 # it won't stop as it's in maintenance-mode
* crm configure delete R1
* crm configure show # very that all references to R1 are gone
* crm resource reprobe # the cluster double check the status of declared 
resources and sees that everything is fine and R1 doesn't exists anymore
* crm_mon -Arf1 # double check that everything is "started (unmanaged)" 
and R1 is gone
* crm_simulate -S -L -VVV # optional, to check what would happen when 
leaving maintenance-mode
* crm configure property maintenance-mode=false

If something goes wrong while in maintenance-mode, crm resource cleanup 
foo might be handy. Nothing should move, start or stop until you leave 
maintenance-mode anyway. I use this scenario very often, to add or 
remove IPaddr2 resources to a group of 30+ IPaddr2.


-- 
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] failed actions after resource creation

2013-06-06 Thread Andreas Mock

Hi Andreas,

 

just a comment while I guess what your misunderstanding may come from.

 

When services are clustered you often see a filesystem resource which

is moved between the cluster nodes and on top of that filesystem 

resource is a service (call it S) which is also handled by the cluster. 

(colocation, groups, etc.)

 

BUT: You have to be aware of one fact. The resource agents mostly

rely on some service (S) related binaries to do there job. So if

the binaries are not on every node the monitor action of the resource

agent fails and the behaviour of the cluster is not what you like.

 

So, most of the time you have to design your stack of resources in

a way that the binaries of the service S is on every node in any case

and is exactly the same on any node.

 

I once wrote a resource agent which was clever enough to do a

multiphase monitor action, checking first if there are expected

binaries found. And if not assuming that the service can't be run.

In this special case we were able to move the whole service S's

binaries with the filesystem resource. But this is uncommon and

mostly you don't like it.

 

Best regards

Andreas Mock

 

 

 

Von: andreas graeper [mailto:agrae...@googlemail.com] 
Gesendet: Donnerstag, 6. Juni 2013 16:26
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] failed actions after resource creation

 

hi and thanks.

(better sentences: i will give my best)

on inactive node there is actually only /etc/init.d/nfs and neither
nfs-common nor nfs-kernel-server. 

is monitor not only looking for the running service on active node, but for
the situation on inactive node, too ?

so i would have expected, that the missing nfs-kernel-server was reported,
too. 



i guess, this can be handled only with a init-script 'nfs' (same name on
both nodes) that is starting/killing nfs-commo/nfs-kernel-server ?

or is there another solution ? 

what is monitor in case of resource managed by lsb-script doing ? 

is it calling `service xxx status` ? 

what does the monitor expect on node where service is running / not running
?

thanks in advance

andreas

 

2013/6/6 Florian Crouzat 

Le 06/06/2013 15:49, andreas graeper a écrit :

 

 p_nfscommon_monitor_0 (node=linag, call=189, rc=5,
status=complete): not installed

 

Sounds obvious: "not installed". Node "linag" is missing some
daemons/scripts , probably nfs-related. Check your nfs packages and
configuration on both nodes, node1 should be missing something.

 

what can i do ?

 

Better sentences.

-- 
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Removing resource from group without disturbing remaining resources in group

2013-06-06 Thread Andreas Mock

Hi all,

is there a way to remove a resource from a group without
disturbing the other resources in the group.

The following example:
- G1 has R1 R2 R3
- All resources are started
- Stopping R1 would cause a stop of R2 R3
- So, the idea was:
* crm configure edit => remove R1 from the group while running
* stop resource
* delete resource

BUT: At some point (which we couldn't find out at
the moment) all remaining resources of the group are
restarted. It seems that the change of the implicit
dependency tree of the initial group forces a rebuild
of that tree including a restart of that group.
(Andrew: Is this assumption right?)

So, is there are way to add/remove resources from
group without disturbing the other resources.
It's clear to me that the resources would restart
when the node assignment after removing would change.

Hints welcome.

Best regards
Andreas Mock




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Release candidate: 1.1.10-rc3

2013-06-05 Thread Andreas Mock

Hi Andrew,

waiting for the RHEL 6.x build of pacemaker 1.1.10 I want to ask
whether there can be done something for finding the memory leaks.
If so, than explain the steps needed in detail. Currently there
are two real clusters available to do testing.

(Questions: Do you need logs? Debug-Log? Some excerpt? Shall we look
for certain patterns?)

Best regards
Andreas Mock

-Ursprüngliche Nachricht-
Von: Andrew Beekhof [mailto:and...@beekhof.net] 
Gesendet: Mittwoch, 5. Juni 2013 04:26
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] Release candidate: 1.1.10-rc3

On 23/05/2013, at 12:33 PM, Andrew Beekhof  wrote:

> Please keep the bug reports coming in.  There is a good chances that
> this will be the final release candidate and 1.1.10 will be tagged on
> May 30th.

I am delaying rc4 until we can get definitive closure on the crmd memory
leak(s).
Valgrind has given it a clean bill of health, however the process still
appears to be growing over time and at strange intervals, so its not yet
clear what is responsible.

Beyond fixes for memory leaks, rc4 will include a workaround for
inconsistent tls handshake behavior between gnutls versions and some
improvements to the way crm_resource can be used to move resources around.
I hope to provide more detail soon.

-- Andrew
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Need explanation for start stonith behaviour

2013-05-28 Thread Andreas Mock

Hi all,

I've a two-node-cluster on a RHEL-clone (6.4, cman, pacemaker)
and I'm facing a startup behaviour I can't explain and therefore
hope, that you can enlight me.

- 2 nodes: N1 N2
- both nodes up
- everything is fine

Start:
- service pacemaker stop on N2
- all resources get migrated => OK
- all pacemaker and corosync related processes seem to be
shutdown correctly
- now service pacemaker stop on N1
- all resources seem to be stopped correctly
- all cluster stack processes seem to be stopped
correctly.

Scenario 1:
Let's start with the node which was stopped last.
- service pacemaker start on N1
- cluster stack gets started, we have to wait at topic
"joining fence domain"
- after timeout node gets started
- resources get started on that node
- now service pacemaker start  on N2
- cluster stack does come up
- resources started as requested by config
=> everything seems ok and straight forward

Scenario 2:
Don't start with the last node shut down but with
the node which was stopped first, therefore:
- service pacemaker start on N2
- cluster stack comes up seemingly the same way
as in scenario 1. A litte wait on topic "joining fence domain".

- And now the difference: Node N1 gets stonithed, which seems
ok for me as N2 wants to get sure that it is the one and only 
node in the cluster. (Is this interpretation right?)

Why is a stonith triggered in the one but not in the other
scenario? Insights really appreciated. Is there some knowledge
about the last cluster state made persistant?
Is it correct that the node N2 is not stonithed in scenario 1?

Thank you in advance.

Best regards
Andreas Mock




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] cman + corosync + pacemaker + fence_scsi

2013-04-24 Thread Andreas Mock

Hi Angel,

two hints from my side. As you're working with ubuntu
ask in this list which setup is or will be the best
concerning corosync + pacemaker. I'm pretty sure
(but I really don't know) that you'll get the advice
to drop cman.

When you use cman + pacemaker than stonithing works
as following. Use the pcmk-redirect in cman which
causes that cman delegates stonith commands to
pacemaker. In pacemaker you have to add the
stonith agents which use your hardware. You
have to enable stonithing in pacemaker with
stonith-enabled="true".

Another issue with stonithing. In a two node cluster
you have to configure the stonith agents in a
way that the remaining part (which ever it is,
mostly the faster one) is able to shoot the other
node even when cluster communication is lost.
When the stonith action is done over the same
wire as your cluster communication than stonithing
is meaningless.

Best regards
Andreas Mock


-Ursprüngliche Nachricht-
Von: Angel L. Mateo [mailto:ama...@um.es] 
Gesendet: Mittwoch, 24. April 2013 14:49
An: The Pacemaker cluster resource manager
Betreff: [Pacemaker] cman + corosync + pacemaker + fence_scsi

Hello,

I'm trying to configure a 2 node cluster in ubuntu with cman +
corosync 
+ pacemaker (the use of cman is because it is recommended at pacemaker 
quickstart). In order to solve the split brain in the 2 node cluster I'm 
using qdisk. For fencing, I'm trying to use fence_scsi and in this point 
I'm having the problem. I have attached my cluster.conf.

xml 
node myotis51
node myotis52
primitive cluster_ip ocf:heartbeat:IPaddr2 \
params ip="155.54.211.167" \
op monitor interval="30s"
property $id="cib-bootstrap-options" \
dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
cluster-infrastructure="cman" \
stonith-enabled="false" \
last-lrm-refresh="1366803979"

At this moment I'm trying just with an IP resource, but at the end
I'll 
get LVM resources and dovecot server running in top of them.

The problem I have is that whenever I interrupt network traffic
between 
my nodes (to check if quorum and fencing is working) the IP resource is 
started in both nodes of the cluster.

So it seems that node fencing configure at cluster.conf is not
working 
for me. Then I have tried to configure as a stonith resource (since it 
is listed by sudo crm ra list stonith), so I have tried to include

primitive stonith_fence_scsi stonith:redhat/fence_scsi

The problem I'm having with this is that I don't know how to
indicate 
params for the resource (I have tried params devices="...", params -d 
..., but they are not accepted) and with this (default) configuration I get:

pr 24 14:39:14 myotis51 lrmd: [6759]: debug: on_msg_perform_op: add an 
operation operation monitor[5] on stonith_fence_scsi for client 6763, 
its parameters: crm_feature_set=[3.0.5] CRM_meta_timeout=[2]  to the 
operation list.
Apr 24 14:39:14 myotis51 lrmd: [6759]: info: rsc:stonith_fence_scsi 
probe[5] (pid 10434)
Apr 24 14:39:14 myotis51 lrmd: [10434]: ERROR: get_stonith_provider: No 
such device: redhat/fence_scsi
Apr 24 14:39:14 myotis51 lrm-stonith: [10434]: ERROR: execra: No such 
legacy stonith device: redhat/fence_scsi
Apr 24 14:39:14 myotis51 lrm-stonith: [10434]: debug: execra: 
stonith_fence_scsi_monitor returned -12
Apr 24 14:39:14 myotis51 lrmd: [6759]: WARN: Managed 
stonith_fence_scsi:monitor process 10434 exited with return code 7.
Apr 24 14:39:14 myotis51 lrmd: [6759]: info: operation monitor[5] on 
stonith_fence_scsi for client 6763: pid 10434 exited with return code 7
Apr 24 14:39:14 myotis51 crmd: [6763]: debug: create_operation_update: 
do_update_resource: Updating resouce stonith_fence_scsi after complete 
monitor op (interval=0)
Apr 24 14:39:14 myotis51 crmd: [6763]: info: process_lrm_event: LRM 
operation stonith_fence_scsi_monitor_0 (call=5, rc=7, cib-update=57, 
confirmed=true) not running

I'm trying to use fence_scsi because I'm planning to use a shared 
storage (accesed via scsi fibre channel) and I don't want to use CLVM 
(because I need lvm snapshots, not supported by clvm), so I need a 
fencing device avoiding to concurrently use the same scsi devices in 
both nodes.

Any idea on how to use fence_scsi? Or I could use any other 
fence/stonith device? Which one do you recommend?

-- 
Angel L. Mateo Martínez
Sección de Telemática
Área de Tecnologías de la Información
y las Comunicaciones Aplicadas (ATICA)
http://www.um.es/atica
Tfo: 868889150
Fax: 86337


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] pacemaker monitoring user permision denied

2013-04-22 Thread Andreas Mock

Hi Andrew,

is 1.1.10-rc1 a working title or can the package be found somewhere?

I saw that on http://clusterlabs.org/rpm-next/rhel-6/x86_64/
there is a new 1.1.9 build. 
Is this a new snapshop build (e.g. having memory leak corrections)?

Best regards
Andreas Mock


-Ursprüngliche Nachricht-
Von: Andrew Beekhof [mailto:and...@beekhof.net] 
Gesendet: Dienstag, 23. April 2013 01:46
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] pacemaker monitoring user permision denied


On 23/04/2013, at 1:45 AM, Wolfgang Routschka
 wrote:

> Hi everbody,
>  
> I want to monitor our pacemaker/cman cluster on scientific linux 6.4 RHEL
clone with nagios .
>  
> After reading documentation http://clusterlabs.org/doc/acls.html and 
> configuration my nagios user isn´t able to start crm_mon
>  
> "Attempting connection to the cluster...Could not establish cib_ro
connection: Permission denied (13)"
>  
> User is in haclient group
>  
> [nagios@xx ~]$ id
> uid=510(nagios) gid=310(nagios) Gruppen=310(nagios),498(haclient)

This is a known issue that has been fixed in 1.1.10-rc1

>  
> I used Pacemaker 1.1.8-7.el6.x86_64
>  
> My CIB schema is configured for pacemaker-1.2
>  
>   
> enable acl is configured
>  
> crm configure show
>  
> property $id="cib-bootstrap-options" \
>   dc-version="1.1.8-7.el6-394e906" \
>   cluster-infrastructure="cman" \
> no-quorum-policy="ignore" \
> stonith-enabled="false" \
> enable-acl="true"
>  
> Greetings
>  
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Enabling debugging with cman, corosync, pacemaker at runtime

2013-04-22 Thread Andreas Mock

Hi Andrew,

thank you for that hint.

Best regards
Andreas Mock


-Ursprüngliche Nachricht-
Von: Andrew Beekhof [mailto:and...@beekhof.net] 
Gesendet: Dienstag, 23. April 2013 01:45
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] Enabling debugging with cman, corosync, pacemaker
at runtime

Depending on your version of pacemaker you can do...

# Enable trace logging (if it isn't already) killall -USR1 process_name

# Dump trace logging to disk
killall -TRAP process_name

# Find out what file it was dumped to
grep blackbox /var/log/messages

# Read it
qb-blackbox /path/to/file


Subsequent calls to "killall -TRAP ..." will have only logs since the last
dump.

On 23/04/2013, at 2:41 AM, Andreas Mock  wrote:

> Hi all,
> 
> is there a way to enable debug output on a cman, corosync, pacemaker 
> stack without restarting the whole cman stuff.
> 
> I've found  for cluster.conf, assuming that this 
> determines the value of the config-db-keys cluster.logging.debug=on 
> logging.debug=on
> 
> Is it enough to write new values to these keys?
> Or do I have to notify one or several processes to react on this change?
> 
> Best regards
> Andreas Mock
> 
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Enabling debugging with cman, corosync, pacemaker at runtime

2013-04-22 Thread Andreas Mock

Hi Michael, hi all others,

 

I've to admit that I was to stupid to interpret (yes yes, there

is a distinction between reading and understanding) the man page

correctly. So, for the protocol, what I've done.

 

* edit cluster.conf, insert or change entries. Update

attribute config_version="XX" by one.

* distribute cluster.conf via rsync or whatever to all nodes

when not using ricci

* issuing a cman_tool -r -S version

 

That's it. A big thank you to Michael.

 

Best regards

Andreas

 

 

Von: Andreas Mock [mailto:andreas.m...@web.de] 
Gesendet: Montag, 22. April 2013 19:39
An: 'The Pacemaker cluster resource manager'
Betreff: Re: [Pacemaker] Enabling debugging with cman, corosync, pacemaker
at runtime

 

Hi Michael,

 

this doesn't seem to have the desired effect.

 

Please enlight me how to change the cluster.conf and

telling all participants to react on that change.

 

Best regards

Andreas Mock

 

 

Von: Andreas Mock [mailto:andreas.m...@web.de] 
Gesendet: Montag, 22. April 2013 19:11
An: 'The Pacemaker cluster resource manager'
Betreff: Re: [Pacemaker] Enabling debugging with cman, corosync, pacemaker
at runtime

 

Hi Michael,

 

thank you. I'll have a look at it.

 

Best regards

Andreas Mock

 

 

Von: Michael Schwartzkopff [mailto:mi...@clusterbau.com] 
Gesendet: Montag, 22. April 2013 18:51
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] Enabling debugging with cman, corosync, pacemaker
at runtime

 

Am Montag, 22. April 2013, 18:41:33 schrieb Andreas Mock:

> Hi all,

> 

> is there a way to enable debug output on a cman, corosync, pacemaker

> stack without restarting the whole cman stuff.

> 

> I've found  for cluster.conf, assuming that this

> determines the value of the config-db-keys

> cluster.logging.debug=on

> logging.debug=on

> 

> Is it enough to write new values to these keys?

> Or do I have to notify one or several processes to react on this change?

> 

> Best regards

> Andreas Mock

 

cman_tool -r -S

 

For the details see man cman_tool.

 

-- 

Dr. Michael Schwartzkopff

Guardinistr. 63

81375 München

 

Tel: (0163) 172 50 98

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Enabling debugging with cman, corosync, pacemaker at runtime

2013-04-22 Thread Andreas Mock

Hi Michael,

 

this doesn't seem to have the desired effect.

 

Please enlight me how to change the cluster.conf and

telling all participants to react on that change.

 

Best regards

Andreas Mock

 

 

Von: Andreas Mock [mailto:andreas.m...@web.de] 
Gesendet: Montag, 22. April 2013 19:11
An: 'The Pacemaker cluster resource manager'
Betreff: Re: [Pacemaker] Enabling debugging with cman, corosync, pacemaker
at runtime

 

Hi Michael,

 

thank you. I'll have a look at it.

 

Best regards

Andreas Mock

 

 

Von: Michael Schwartzkopff [mailto:mi...@clusterbau.com] 
Gesendet: Montag, 22. April 2013 18:51
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] Enabling debugging with cman, corosync, pacemaker
at runtime

 

Am Montag, 22. April 2013, 18:41:33 schrieb Andreas Mock:

> Hi all,

> 

> is there a way to enable debug output on a cman, corosync, pacemaker

> stack without restarting the whole cman stuff.

> 

> I've found  for cluster.conf, assuming that this

> determines the value of the config-db-keys

> cluster.logging.debug=on

> logging.debug=on

> 

> Is it enough to write new values to these keys?

> Or do I have to notify one or several processes to react on this change?

> 

> Best regards

> Andreas Mock

 

cman_tool -r -S

 

For the details see man cman_tool.

 

-- 

Dr. Michael Schwartzkopff

Guardinistr. 63

81375 München

 

Tel: (0163) 172 50 98

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Enabling debugging with cman, corosync, pacemaker at runtime

2013-04-22 Thread Andreas Mock

Hi Michael,

 

thank you. I'll have a look at it.

 

Best regards

Andreas Mock

 

 

Von: Michael Schwartzkopff [mailto:mi...@clusterbau.com] 
Gesendet: Montag, 22. April 2013 18:51
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] Enabling debugging with cman, corosync, pacemaker
at runtime

 

Am Montag, 22. April 2013, 18:41:33 schrieb Andreas Mock:

> Hi all,

> 

> is there a way to enable debug output on a cman, corosync, pacemaker

> stack without restarting the whole cman stuff.

> 

> I've found  for cluster.conf, assuming that this

> determines the value of the config-db-keys

> cluster.logging.debug=on

> logging.debug=on

> 

> Is it enough to write new values to these keys?

> Or do I have to notify one or several processes to react on this change?

> 

> Best regards

> Andreas Mock

 

cman_tool -r -S

 

For the details see man cman_tool.

 

-- 

Dr. Michael Schwartzkopff

Guardinistr. 63

81375 München

 

Tel: (0163) 172 50 98

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Enabling debugging with cman, corosync, pacemaker at runtime

2013-04-22 Thread Andreas Mock

Hi all,

is there a way to enable debug output on a cman, corosync, pacemaker
stack without restarting the whole cman stuff.

I've found  for cluster.conf, assuming that this
determines the value of the config-db-keys
 cluster.logging.debug=on
 logging.debug=on

Is it enough to write new values to these keys?
Or do I have to notify one or several processes to react on this change?

Best regards
Andreas Mock



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues

2013-04-20 Thread Andreas Mock

Hi Andrew,

is the bug fix in 1.1.9 for RHEL6.4?
Have you an idea when 1.1.20 will be released?

Best regards
Andreas Mock


-Ursprüngliche Nachricht-
Von: Andrew Beekhof [mailto:and...@beekhof.net] 
Gesendet: Samstag, 20. April 2013 12:04
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion
issues


On 19/04/2013, at 11:28 AM, pavan tc  wrote:

> Yes, but looking at the code it should be impossible.
> Would it be possible for you to add:
> 
> export PCMK_trace_functions=peer_update_callback
> 
> to /etc/sysconfig/pacemaker and re-test (and send me the new logs -
probably in /var/log/pacemaker.log)?
> 
> 
> Sorry about the delay.
> 
> I have put these in place and am running tests now. The next time I hit
this, I'll post the messages.

Another user hit the same issue and was able to reproduce.
You can see the resolution at
https://bugzilla.redhat.com/show_bug.cgi?id=951340


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] crmsh: location preference for ms-resource

2013-04-18 Thread Andreas Mock

Hi all,

thanks to the search capabilities provided by  gossamer-threads I
could find a solution provided by 'andreas at hastexo'. Thanks to him:

http://www.gossamer-threads.com/lists/linuxha/pacemaker/80964?search_string=
crm%20master%20location;#80964

---
location avoid_being_the_master ms_MySQL \
rule $role=Master -1000: #uname eq my_node
location never_be_the_master ms_MySQL \
rule $role=Master -inf: #uname eq my_node
---

Nice evening
Andreas Mock


-Ursprüngliche Nachricht-
Von: Andreas Mock [mailto:andreas.m...@web.de] 
Gesendet: Donnerstag, 18. April 2013 17:01
An: 'The Pacemaker cluster resource manager'
Betreff: [Pacemaker] crmsh: location preference for ms-resource

Hi all,

can someone tell me how I can configure in crmsh a node preference for a
multistate resource in state promoted, so that the master starts preferably
on a certain node?

Hints very welcome?

(I've to admit that I couldn't get it puzzled out with the crm online help,
I'm too stupid)

Best regards
Andreas Mock



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] crmsh: location preference for ms-resource

2013-04-18 Thread Andreas Mock

Hi all,

can someone tell me how I can configure in crmsh
a node preference for a multistate resource
in state promoted, so that the master starts
preferably on a certain node?

Hints very welcome?

(I've to admit that I couldn't get it puzzled out
with the crm online help, I'm too stupid)

Best regards
Andreas Mock



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] pcs equivalent of crm configure erase

2013-04-17 Thread Andreas Mock

Thank you for the links.

Best regards
Andreas Mock


-Ursprüngliche Nachricht-
Von: T. [mailto:nos...@godawa.de] 
Gesendet: Mittwoch, 17. April 2013 21:44
An: pacema...@clusterlabs.org
Betreff: Re: [Pacemaker] pcs equivalent of crm configure erase

Hi,

> Can you please point me to a repository where I can find crmsh fitting 
> to RHEL6.4 or clones?
haven't looked if there is a repo-file, I just installed via RPM:

http://download.opensuse.org/repositories/network:/ha-clustering/CentOS_Cent
OS-6/x86_64/crmsh-1.2.5-55.3.x86_64.rpm

http://download.opensuse.org/repositories/network:/ha-clustering/CentOS_Cent
OS-6/x86_64/pssh-2.3.1-15.1.x86_64.rpm


--
To Answer please replace "invalid" with "de" !
Zum Antworten bitte "invalid" durch "de" ersetzen !


Chau y hasta luego,

Thorolf


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] CentOS 6.4 - pacemaker 1.1.8 - heartbeat

2013-04-17 Thread Andreas Mock

Hi Thorolf,

ah, ok. You meant hearbeat 1. Yes, this is really pre-pacemaker-time  ;-)

Best regards
Andreas


-Ursprüngliche Nachricht-
Von: T. [mailto:nos...@godawa.de] 
Gesendet: Mittwoch, 17. April 2013 21:41
An: pacema...@clusterlabs.org
Betreff: Re: [Pacemaker] CentOS 6.4 - pacemaker 1.1.8 - heartbeat

Hi,

> So, I don't understand why the usage of crm-shell in your case is more 
> complicated?
because in the "past", with the heartbeat (1) I was used, I only had to put
my resources into a file and sync it to the other node.

For me this was easier to understand and I hadn't the config issues I have
now with the crm shell (see my other post).

But the new HA is much more flexible and modern, than the old one, I was
using for the last 6 years or longer.
-- 

Chau y hasta luego,

Thorolf


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] CentOS 6.4 - pacemaker 1.1.8 - heartbeat

2013-04-17 Thread Andreas Mock

Hi Thorolf,

both solutions heartbeat + pacemaker and corosync + pacemaker 
use pacemaker which can be configured using crm-shell.
So, I don't understand why the usage of crm-shell in your
case is more complicated? (besides the fact that you can only
make a two node cluster with heartbeat).


Best regards
Andreas Mock


-Ursprüngliche Nachricht-
Von: T. [mailto:nos...@godawa.de] 
Gesendet: Mittwoch, 17. April 2013 18:47
An: pacema...@clusterlabs.org
Betreff: Re: [Pacemaker] CentOS 6.4 - pacemaker 1.1.8 - heartbeat

Hi,

> No one else using pacemaker and heartbeat on CentOS 6.4?
no, I switched to corosync/pacemaker, but it has not only advantages.

For me, the configuration is much more powerfull, but also more complicated
via the crm-shell.
-- 

Chau y hasta luego,

Thorolf


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] pcs equivalent of crm configure erase

2013-04-17 Thread Andreas Mock

Hi all,

thank you for your hints. 

Can you please point me to a repository where I can find
crmsh fitting to RHEL6.4 or clones?

Best regards
Andreas Mock



-Ursprüngliche Nachricht-
Von: Vadym Chepkov [mailto:vchep...@gmail.com] 
Gesendet: Mittwoch, 17. April 2013 18:13
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] pcs equivalent of crm configure erase


On Apr 17, 2013, at 11:57 AM, T. wrote:

> Hi,
> 
>> b) If I can't do it woith pcs, is there a reliable and secure way to 
>> do it with pacemaker low level tools?
> why not just installing the crmsh from a different repository?
> 
> This is what I have done on CentOS 6.4.

My sentiments exactly. And "erase" is not the most important missed
functionality. 
crm configure save, crm configure load (update | replace) is what made
configurations easily manageable and trackable with a version control
software.

Cheers,
Vadym




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] pcs: Return code handling not clean

2013-04-16 Thread Andreas Mock

Hi Chris,

just seen in the github repo - which I found after posting here -
that you made a fix.

Thank you for the very fast reaction.

Best regards
Andreas

-Ursprüngliche Nachricht-
Von: Chris Feist [mailto:cfe...@redhat.com] 
Gesendet: Mittwoch, 17. April 2013 00:34
An: The Pacemaker cluster resource manager; Andreas Mock
Betreff: Re: [Pacemaker] pcs: Return code handling not clean

On 04/16/13 06:46, Andreas Mock wrote:
> Hi all,
>
> as I don't really know, where to address this issue, I do post it 
> here. On the one handside as an information for guys scripting with 
> the help of 'pcs' and on the other handside with the hope that one 
> maintainer is listening and will have a look at this.
>
> Problem: When cluster is down a 'pcs resource'
> shows an error message coming from a subprocess call of 'crm_resource 
> -L' but exits with an error code of 0. That's something which can be 
> improved. Especially while the python code does have error handling in 
> other paces.
>
> So I guess it is a simple oversight.
>
> Look at the following piece of code in
> pcs/resource.py:
>
> 915 if len(argv) == 0:
> 916 args = ["crm_resource","-L"]
> 917 output,retval = utils.run(args)
> 918 preg = re.compile(r'.*(stonith:.*)')
> 919 for line in output.split('\n'):
> 920 if not preg.match(line) and line != "":
> 921 print line
> 922 return
>
> retval is totally ignored, while being handled on other places. That 
> leads to the fact that the script returns with status 0.

This is an oversight on my part, I've updated the code to check retval and
return an error.  Currently I'm not passing through the full error code (I'm
only returning 0 on success and 1 on failure).  However, if you think it
would be useful to have this information I would be happy to look at it and
see what I can do.  I'm planning on eventually having pcs interpret the
crm_resource error code and provide a more user friendly output instead of
just a return code.

Thanks,
Chris

>
> Interestingly the error handling of the utils.run call used all over 
> the module is IMHO a little bit inconsistent.
> If I remember correctly Andrew did some efforts in the past to have a 
> set of return codes comming from the base cibXXX and crm_XXX tools. (I 
> really don't know how much they are differentiated). Why not pass them 
> through?
>
> Best regards
> Andreas Mock
>
>
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] pcs equivalent of crm configure erase

2013-04-16 Thread Andreas Mock

Hi Chris,

I would like to see something where you can start your
pacemaker configuration (only) from scratch.
In a way, so that you know nothing is left (constraints, etc.).

Best regards
Andreas

-Ursprüngliche Nachricht-
Von: Chris Feist [mailto:cfe...@redhat.com] 
Gesendet: Mittwoch, 17. April 2013 00:23
An: The Pacemaker cluster resource manager
Cc: Andreas Mock
Betreff: Re: [Pacemaker] pcs equivalent of crm configure erase

On 04/14/13 02:52, Andreas Mock wrote:
> Hi all,
>
> can someone tell me what the pcs equivalent to
>
> crm configure erase is?

 From my understanding, 'crm configure erase' will remove everything from
the configuration file except for the nodes.

Are you trying to clear your configuration out and start from scratch?

pcs has a destroy command (pcs cluster destroy), which will remove all
pacemaker/corosync configuration and allow you to create your cluster from
scratch.  Is this what you're looking for?

Or do you need a specific command to keep the cluster running, but reset the
cib to its defaults?

Thanks!
Chris

>
> Is there a pcs cheat sheet showing the common tasks?
>
> Or a documentation?
>
> Best regards
>
> Andreas
>
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] pcs equivalent of crm configure erase

2013-04-16 Thread Andreas Mock

Hi Rastislav,

thank you for your hints.

In this case, only to rely on pcs, I could
probably use the following to get the list
of resources:

pcs resource show --all | perl -M5.010 -ane 'say $F[1] if $F[0] eq
"Resource:"'

Best regards
Andreas Mock



-Ursprüngliche Nachricht-
Von: Rasto Levrinc [mailto:rasto.levr...@gmail.com] 
Gesendet: Dienstag, 16. April 2013 10:45
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] pcs equivalent of crm configure erase

On Tue, Apr 16, 2013 at 9:38 AM, Andreas Mock  wrote:
> Hi all,
>
> I try to bring that topic up once again because
> it's still unresolved for me:
>
> a) How can I do the equivalent of 'crm configure erase'
> in pcs? Is there a way?
>
> b) If I can't do it woith pcs, is there a reliable
> and secure way to do it with pacemaker low level tools?

I don't think so. cibadmin has a drastic version of erase, but this is
probably not what you want. If you don't want to use any higher level
tools, the best way is to probably make a loop and use pcs to remove the
resources, since it also removes also the constraints, not sure about other
objects.

something like:

for r in `crm_resource -l`; do pcs resource delete $r; done

But test it first, I haven't used pcs myself yet.

Rasto

-- 
Dipl.-Ing. Rastislav Levrinc
rasto.levr...@gmail.com
Linux Cluster Management Console
http://lcmc.sf.net/

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Pacemaker configuration with different dependencies

2013-04-16 Thread Andreas Mock

Hi Ivor,

 

I don't know whether I understand you completely right:

If you want independence of resources don't put them into a group.

 

Look at 

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Pacemaker_Explain
ed/ch10.html

 

A group is made to tie together several resources without

declaring all necessary colocations and orderings to get

a desired behaviour.

 

Otherwise. Name your resources ans how they should be spread across

your cluster. (Show the technical dependency)

 

Best regards

Andreas

 

 

Von: Ivor Prebeg [mailto:ivor.pre...@gmail.com] 
Gesendet: Dienstag, 16. April 2013 13:53
An: pacemaker@oss.clusterlabs.org
Betreff: [Pacemaker] Pacemaker configuration with different dependencies

 

Hi guys,

I need some help with pacemaker configuration, it is all new to me and can't
find solution...

I have two-node HA environment with services that I want to be partially
independent, in pacemaker/heartbeat configuration.

There is active/active sip service with two floating IPs, it should all just
migrate floating ip when one sip dies.

There is also two active/active master/slave services with java container
and rdbms with replication between them, should also fallback when one dies.

What I can't figure out how to configure those two to be independent (put
on-fail directive on group). What I want is to, e.g., in case my sip service
fails, java container stays active on that node, but floating ip to be moved
to other node.

Another thing is, in case one of rdbms fails, I want to put whole service
group on that node to standby, but leave sip service intact.

Whole node should go to standby (all services down) only when L3_ping to
gateway dies.

 

All suggestions and configuration examples are welcome.

Thanks in advance.

 

Ivor Prebeg

 

 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] pcs: Return code handling not clean

2013-04-16 Thread Andreas Mock

Hi all,

as I don't really know, where to address this
issue, I do post it here. On the one handside
as an information for guys scripting with the
help of 'pcs' and on the other handside with
the hope that one maintainer is listening
and will have a look at this.

Problem: When cluster is down a 'pcs resource'
shows an error message coming from a subprocess
call of 'crm_resource -L' but exits with an
error code of 0. That's something which can
be improved. Especially while the python code
does have error handling in other paces.

So I guess it is a simple oversight.

Look at the following piece of code in
pcs/resource.py:

915 if len(argv) == 0:
916 args = ["crm_resource","-L"]
917 output,retval = utils.run(args)
918 preg = re.compile(r'.*(stonith:.*)')
919 for line in output.split('\n'):
920 if not preg.match(line) and line != "":
921 print line
922 return

retval is totally ignored, while being handled on
other places. That leads to the fact that the script
returns with status 0.

Interestingly the error handling of the utils.run call
used all over the module is IMHO a little bit inconsistent.
If I remember correctly Andrew did some efforts in the
past to have a set of return codes comming from the
base cibXXX and crm_XXX tools. (I really don't know
how much they are differentiated). Why not pass them
through?

Best regards
Andreas Mock




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] pcs equivalent of crm configure erase

2013-04-16 Thread Andreas Mock

Hi all,

I try to bring that topic up once again because
it's still unresolved for me:

a) How can I do the equivalent of 'crm configure erase'
in pcs? Is there a way?

b) If I can't do it woith pcs, is there a reliable
and secure way to do it with pacemaker low level tools?

Thank you in advance.

Best regards
Andreas


-Ursprüngliche Nachricht-
Von: Andrew Beekhof [mailto:and...@beekhof.net] 
Gesendet: Montag, 15. April 2013 05:49
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] pcs equivalent of crm configure erase


On 14/04/2013, at 5:52 PM, Andreas Mock  wrote:

> Hi all,
>  
> can someone tell me what the pcs equivalent to
> crm configure erase is?
>  
> Is there a pcs cheat sheet showing the common tasks?
> Or a documentation?

"pcs help" should be reasonably informative, but I don't see anything
equivalent
Chris?

>  
> Best regards
> Andreas
>  
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] RHEL6.x dependency between 2-node-settings for cman and quorum settings in pacemaker

2013-04-15 Thread Andreas Mock

Hi Andrew,

that means (when I understand it right), that with this setting
you get two different semantics about what the cluster knows
about itself.

With setting of a+c as recommended by you the 2-node-cluster
does not get quorum in case only one node survives, but ignores
that info.

With the setting of b) the cluster does get quorum even when
only one node is left. In this case I need not set c) as pacemaker
believes having quorum (told by cman).

Is this right?

Best regards
Andreas

-Ursprüngliche Nachricht-
Von: Andrew Beekhof [mailto:and...@beekhof.net] 
Gesendet: Montag, 15. April 2013 05:58
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] RHEL6.x dependency between 2-node-settings for cman
and quorum settings in pacemaker

On 12/04/2013, at 4:58 PM, Andreas Mock  wrote:

> Hi all,
> 
> another question rised up while reading documentation concerning 
> 2-node-cluster under RHEL6.x with CMAN and pacemaker.
> 
> a) In the quick start guide one of the things you set is
> CMAN_QUORUM_TIMEOUT=0 in /etc/sysconfig/cman to get one node of the 
> cluster up without waiting for quorum. (Correct me if my understanding 
> is wrong)
> 
> b) There is a special setting in cluster.conf   expected_votes="1" >   which allows one node to gain quorum in 
> a two node cluster (Please also correct me here if my understanding is 
> wrong)
> 
> c) And there is a pacemaker setting
> no-quorum-policy which is mostly set to 'ignore' in all startup 
> tutorials.
> 
> My question: I would like to understand how these settings influence 
> each other and/or are dependent.

a) allows "service cman start" to complete (and therefor allow "service
pacemaker start" to begin) before quorum has arrived.
b) is a possible alternative to a) but I've never tested it because it is
superseded by c) and in fact makes c) meaningless since the cluster always
has quorum.

a+c is preferred for consistency with clusters of more than 2 nodes.

> 
> As most insight as possible appreciated. ;-)
> 
> Best regards
> Andreas
> 
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] pcs equivalent of crm configure erase

2013-04-14 Thread Andreas Mock

Hi Andrew,

the emphasis lies on ' reasonably'...  ;-)

I'll see whether someone can show hints.

Best regards
Andreas


-Ursprüngliche Nachricht-
Von: Andrew Beekhof [mailto:and...@beekhof.net] 
Gesendet: Montag, 15. April 2013 05:49
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] pcs equivalent of crm configure erase


On 14/04/2013, at 5:52 PM, Andreas Mock  wrote:

> Hi all,
>  
> can someone tell me what the pcs equivalent to crm configure erase is?
>  
> Is there a pcs cheat sheet showing the common tasks?
> Or a documentation?

"pcs help" should be reasonably informative, but I don't see anything
equivalent Chris?

>  
> Best regards
> Andreas
>  
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Disable startup fencing with cman

2013-04-14 Thread Andreas Mock

Hi Andrew,

thank you for your answers (to all of my questions).

My problem is, I have both  nodes down. Now I have to
start one node without the other. And I know that
the cluster is configured to stonith. How do I change
the meta attribute of the stonith device without
starting the one node and therefore pacemaker
to do the mentioned change?

Best regards
Andreas


-Ursprüngliche Nachricht-
Von: Andrew Beekhof [mailto:and...@beekhof.net] 
Gesendet: Montag, 15. April 2013 02:09
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] Disable startup fencing with cman


On 14/04/2013, at 6:47 PM, Andreas Mock  wrote:

> Hi all,
>  
> in a two node cluster (RHEL6.x, cman, pacemaker) when I startup the 
> very first node, this node will try to fence the other node if it 
> can't see it.
> This can be true in case of maintenance. How do I avoid this startup 
> fencing temporarily when I know that the other node is down?

Set the target-role for your fencing device(s) to Stopped and use
stonith_admin --confirm ___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Disable startup fencing with cman

2013-04-14 Thread Andreas Mock

Hi all,

 

in a two node cluster (RHEL6.x, cman, pacemaker)

when I startup the very first node,

this node will try to fence the other node if it can't see it.

This can be true in case of maintenance. How do I avoid

this startup fencing temporarily when I know that the

other node is down?

 

Best regards

Andreas

 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] pcs equivalent of crm configure erase

2013-04-14 Thread Andreas Mock

Hi all,

 

can someone tell me what the pcs equivalent to

crm configure erase is?

 

Is there a pcs cheat sheet showing the common tasks?

Or a documentation?

 

Best regards

Andreas

 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] RHEL6.x dependency between 2-node-settings for cman and quorum settings in pacemaker

2013-04-12 Thread Andreas Mock

Hi all,

another question rised up while reading documentation concerning
2-node-cluster under RHEL6.x with CMAN and pacemaker.

a) In the quick start guide one of the things you set is
CMAN_QUORUM_TIMEOUT=0 in /etc/sysconfig/cman to get one
node of the cluster up without waiting for quorum. (Correct
me if my understanding is wrong)

b) There is a special setting in cluster.conf
  
  
which allows one node to gain quorum in a two node cluster
(Please also correct me here if my understanding is wrong)

c) And there is a pacemaker setting
no-quorum-policy which is mostly set to 'ignore' in all startup
tutorials.

My question: I would like to understand how these settings
influence each other and/or are dependent.

As most insight as possible appreciated. ;-)

Best regards
Andreas



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] RHEL6 and clones: CMAN needed anyway?

2013-04-09 Thread Andreas Mock

Hi Andrew,

once again thank you for the fast response.

My English seems to be not good enough. Therefore I
would like to recap your answer in my words to be sure
what you meant.

a) CMAN will die. On the long term there will be
corosync and pacemaker. That means option 3 of
this document
(http://theclusterguy.clusterlabs.org/post/34604901720/pacemaker-and-cluster
-filesystems) 
is the target architecture.

b) As I haven't tested yet I assume there will be
an ERROR message when starting CMAN in addition to
corosync and pacemaker. Is that what you mean?

Best regards
Andreas


-Ursprüngliche Nachricht-
Von: Andrew Beekhof [mailto:and...@beekhof.net] 
Gesendet: Dienstag, 9. April 2013 12:48
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] RHEL6 and clones: CMAN needed anyway?


On 09/04/2013, at 8:07 PM, "Andreas Mock"  wrote:

> Hi all,
> 
> after reading several docs on clusterlabs.org and trying to
> understand how all pieces fit together, there is one question
> remaining (you know: I understate ;-)):
> 
> If I don't want to use any cluster-FS do I really need CMAN
> on RHEL6.x and clones or is it enough to let corosync and
> pacemaker play together?

I wouldn't rely on other options continuing to be available in the
long-term.
There should already be a large ERROR to this effect when the plugin starts.

> 
> Is fencing and fencing agents independent of CMAN?

Correct, in both cases the Pacemaker equivalents are used

> 
> Best regards
> Andreas
> 
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Pacemaker 1.1.9 for RHEL 6.x and clones

2013-04-09 Thread Andreas Mock

Thank you.

Andreas

-Ursprüngliche Nachricht-
Von: Alexandr A. Alexandrov [mailto:shurr...@gmail.com] 
Gesendet: Dienstag, 9. April 2013 10:26
An: pacemaker@oss.clusterlabs.org
Betreff: Re: [Pacemaker] Pacemaker 1.1.9 for RHEL 6.x and clones

Hi Andreas!

For this purpose I put resources into 'unmanaged' state with 'crm 
resource unmanage ' - and after that tou can start/stop 
pacemaker/corosync without interrupting running resources.

09.04.2013 11:44, Andreas Mock пишет:
> What would be the right procedure to restart pacemaker
> freeing lost memory without interrupting cluster operation?


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] RHEL6 and clones: CMAN needed anyway?

2013-04-09 Thread Andreas Mock

Hi all,

after reading several docs on clusterlabs.org and trying to
understand how all pieces fit together, there is one question
remaining (you know: I understate ;-)):

If I don't want to use any cluster-FS do I really need CMAN
on RHEL6.x and clones or is it enough to let corosync and
pacemaker play together?

Is fencing and fencing agents independent of CMAN?

Best regards
Andreas



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

1 2 >

1 - 100 of 145 matches

Mail list logo