Re: [Pacemaker] Anticolocation problem

2012-02-02 Thread Dimokritos Stamatakis
thanks for your answer,

with cibadmin I get:

Signon to CIB failed: connection failed
Init failed, could not perform requested operations

my /var/lib/heartbeat/crm directory is empty, is that OK?
Also, shall I check some groups ? at /etc/group I can see : haclient:x:105

Many thanks,
Dimos.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Anticolocation problem

2012-02-02 Thread Andrew Beekhof
The latest code appears to behave ok, so perhaps the problem was since fixed.
Can you send me the output from cibadmin -Ql when the cluster is in
this state so I can confirm?

On Mon, Jan 30, 2012 at 11:38 PM, agutxi Agustin  wrote:
> Hi guys,
> I'm trying to setup some anticolocation rules, but I'm finding some
> strange behaviour and not getting the desired effect, so  I wonder if
> I'm missing something or there is really some problem with my
> setting.If you could lend me a hand that would be great.
>
> The scenario: 3 Dummy resources running based on utilization (1 core
> for each resource running) on 2 nodes, each with 2 cores capacity.
> Plus: Anticolocation rules: No 2 resources can run in the same node (I
> know in this case I could limit this with utilization, but this is
> just a test case from a bigger scenario where I detected the problem)
> Configuration:
> ___
> crm(live)# configure show
> node vmHost1 \
>utilization cores="2"
> node vmHost2 \
>utilization cores="2"
> primitive DummyVM1 ocf:pacemaker:Dummy \
>op monitor interval="60s" timeout="60s" \
>op start on-fail="restart" interval="0" \
>op stop on-fail="ignore" interval="0" \
>utilization cores="1" \
>meta is-managed="true" migration-threshold="2" target-role="Started"
> primitive DummyVM2 ocf:pacemaker:Dummy \
>op monitor interval="60s" timeout="60s" \
>op start on-fail="restart" interval="0" \
>op stop on-fail="ignore" interval="0" \
>utilization cores="1" \
>meta is-managed="true" migration-threshold="2" target-role="Started"
> primitive DummyVM3 ocf:pacemaker:Dummy \
>op monitor interval="60s" timeout="60s" \
>op start on-fail="restart" interval="0" \
>op stop on-fail="ignore" interval="0" \
>utilization cores="1" \
>meta is-managed="true" migration-threshold="2" target-role="Stopped"
> colocation antidummy12 -INF: DummyVM1 DummyVM2
> colocation antidummy13 -INF: DummyVM1 DummyVM3
> colocation antidummy23 -INF: DummyVM2 DummyVM3
> property $id="cib-bootstrap-options" \
>dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
>cluster-infrastructure="openais" \
>expected-quorum-votes="2" \
>stonith-enabled="false" \
>stop-all-resources="false" \
>placement-strategy="utilization" \
>no-quorum-policy="ignore" \
>cluster-infrastructure="openais" \
>stop-orphan-resources="true" \
>stop-orphan-actions="true" \
>symmetric-cluster="true" \
>last-lrm-refresh="1326975274"
> rsc_defaults $id="rsc-options" \
>resource-stickiness="INFINITY"
> ___
>
> Looking around for symmetric anti-collocation information, I found a
> message where Andrew Beekhof stated:
>
> colocation X-Y -2: X Y
> colocation Y-X -2: Y X
>
 the second one is implied by the first and is therefore redundant

>>> If only that were true!
>>>
>>
>> It is. I know exactly how my code works in this regard.
>> More than likely a score of -2 is simply too low to have any effect.
>
> so I was expecting my resources to prevent another resource from
> running on the same node.
>
> Test: I start 2 resources: DummyVM1 & DUmmyVM2: they correctly start
> on vmHost1 and vmHost2, as expected (I don't care about location)
> ___
> crm(live)# status
> 
> Last updated: Mon Jan 30 13:33:19 2012
> Last change: Mon Jan 30 13:30:21 2012 via cibadmin on vmHost1
> Current DC: vmHost2 - partition with quorum
> Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
> 2 Nodes configured, 2 expected votes
> 3 Resources configured.
> 
>
> Online: [ vmHost1 vmHost2 ]
>
>  DummyVM1   (ocf::pacemaker:Dummy): Started vmHost1
>  DummyVM2   (ocf::pacemaker:Dummy): Started vmHost2
> ___
>
> Then, I start the DummyVM3 resource:
>
> ___
> crm(live)# resource start DummyVM3
> crm(live)# status
> 
> Last updated: Mon Jan 30 13:33:52 2012
> Last change: Mon Jan 30 13:33:50 2012 via cibadmin on vmHost1
> Current DC: vmHost2 - partition with quorum
> Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
> 2 Nodes configured, 2 expected votes
> 3 Resources configured.
> 
>
> Online: [ vmHost1 vmHost2 ]
>
>  DummyVM1   (ocf::pacemaker:Dummy): Started vmHost1
>  DummyVM2   (ocf::pacemaker:Dummy): Started vmHost2
>  DummyVM3   (ocf::pacemaker:Dummy): Started vmHost1
> ___
>
> and immediately DummyVM3 is started on vmHost1, though from my
> underst

[Pacemaker] pacemaker with heartbeat problem

2012-02-02 Thread Dimokritos Stamatakis
Hello,
I just installed heartbeat and pacemaker via apt-get in a debian VM , as seen 
from here:
http://www.clusterlabs.org/wiki/Debian_Lenny_HowTo
and then I fixed the /etc/ha.d/ha.cf file and the /etc/ha.d/authkeys as they 
mention.

So, I am able to start heartbeat, and I can see that it works fine running.
But I have a problem with crm.
It says Connection to cluster failed: connection failed when I try crm_mon -1 
to check the connection.
All the crm commands fail with a connection failure message returned.

My /etc/ha.d/ha.cf file is:

autojoin none
bcast eth0
warntime 3
deadtime 6
initdead 60
keepalive 1
node debian-node1
node debian-node2
node debian-node3
crm respawn


and the authkeys is:
auth 1
1 sha1 43e9e99f10efc4f9f5d45eb20feb85e6

and it's the same for all.

What else shall I check??

Many thanks,
Dimos.___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] migrate resource for specified time: problem

2012-02-02 Thread Andrew Beekhof
On Tue, Jan 31, 2012 at 8:42 PM,   wrote:
> Hi All,
>   I am facing some issue with Resource migration with a specified time
>   If current system date is end-of-month then migration for 1440 Minute
> (one day) is having some problem.
>
>   I am doing theseoperation
>   1. date +%MMdd -s "20100131"
>   2. /usr/sbin/crm_resource -M -r group1 -u PT1440M -H nodeName
>
>   Output:
>   migration will take effect untill: 2011-13-29 18:33:47Z

WTF.

What version is this?  Can you create a bug for this please?

>
>   Here command output is invalid date and migration constraint with
> invalid date is added in cib.xml
>
>   resource also not migrated to the specified node.
>
> Regards
> Manish
>
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Need help with resolving very long election cycle

2012-02-02 Thread Andrew Beekhof
On Fri, Feb 3, 2012 at 9:31 AM, Andrew Beekhof  wrote:
> On Thu, Feb 2, 2012 at 9:55 PM, Shyam  wrote:
>> Hi Andrew,
>>
>> Here is more logs covering a larger period that shows multiple of this
>> election cycle. Please note that in the below case I had set dc-deadtime to
>> 5secs & the I_DC_TIMEOUT pops up every 5 secs. I turned this dc-deadtime to
>> 10secs & the long election cycle problem disappeared. It no longer happens.
>> I suspect that before a single election cycle completes, the next
>> I_DC_TIMEOUT kicks-in. Could this be the reason?
>
> Yes.  The question is why the cycle is taking so long :-/

Could you reproduce with debug on please?
It would be nice to know what the cluster is doing for the 4 seconds
between these two messages:

Jan 17 12:00:04 vsa-003ca-vc-0 crmd: [1120]: WARN:
start_subsystem: Client pengine already running as pid 4243
Jan 17 12:00:08 vsa-003ca-vc-0 crmd: [1120]: info: do_dc_takeover:
Taking over DC status for this partition

What version of pacemaker is this btw?

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Need help with resolving very long election cycle

2012-02-02 Thread Andrew Beekhof
On Thu, Feb 2, 2012 at 9:55 PM, Shyam  wrote:
> Hi Andrew,
>
> Here is more logs covering a larger period that shows multiple of this
> election cycle. Please note that in the below case I had set dc-deadtime to
> 5secs & the I_DC_TIMEOUT pops up every 5 secs. I turned this dc-deadtime to
> 10secs & the long election cycle problem disappeared. It no longer happens.
> I suspect that before a single election cycle completes, the next
> I_DC_TIMEOUT kicks-in. Could this be the reason?

Yes.  The question is why the cycle is taking so long :-/

>
> On node 1:
> Jan 17 12:00:02 vsa-003ca-vc-0 crmd: [1120]: info:
> do_election_count_vote: Election 3 (owner:
> 0970-0970-0001-2b91-0001) pass: vote from vsa-003ca-vc-1
> (Age)
> Jan 17 12:00:02 vsa-003ca-vc-0 crmd: [1120]: info: do_state_transition:
> State transition S_INTEGRATION -> S_ELECTION [ input=I_ELECTION
> cause=C_FSA_INTERNAL origin=do_election_count_vote ]
> Jan 17 12:00:02 vsa-003ca-vc-0 crmd: [1120]: WARN: do_log: FSA: Input
> I_JOIN_OFFER from route_message() received in state S_ELECTION
> Jan 17 12:00:02 vsa-003ca-vc-0 crmd: [1120]: WARN: do_log: FSA: Input
> I_JOIN_OFFER from route_message() received in state S_ELECTION
> Jan 17 12:00:03 vsa-003ca-vc-0 crmd: [1120]: info:
> do_election_count_vote: Election 4 (owner:
> 0970-0970-0001-2b91-0001) pass: vote from vsa-003ca-vc-1
> (Age)
> Jan 17 12:00:03 vsa-003ca-vc-0 crmd: [1120]: info:
> do_election_count_vote: Election 5 (owner:
> 0970-0970-0001-2b91-0001) pass: vote from vsa-003ca-vc-1
> (Age)
> Jan 17 12:00:04 vsa-003ca-vc-0 crmd: [1120]: info: do_state_transition:
> State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
> cause=C_FSA_INTERNAL origin=do_election_check ]
> Jan 17 12:00:04 vsa-003ca-vc-0 crmd: [1120]: info: start_subsystem:
> Starting sub-system "pengine"
> Jan 17 12:00:04 vsa-003ca-vc-0 crmd: [1120]: WARN: start_subsystem:
> Client pengine already running as pid 4243
> Jan 17 12:00:08 vsa-003ca-vc-0 crmd: [1120]: info: do_dc_takeover:
> Taking over DC status for this partition
> Jan 17 12:00:08 vsa-003ca-vc-0 cib: [1116]: info: cib_process_readwrite:
> We are now in R/O mode
> Jan 17 12:00:08 vsa-003ca-vc-0 cib: [1116]: info: cib_process_request:
> Operation complete: op cib_slave_all for section 'all'
> (origin=local/crmd/108, version=1.8.26): ok (rc=0)
> Jan 17 12:00:08 vsa-003ca-vc-0 cib: [1116]: info: cib_process_readwrite:
> We are now in R/W mode
> Jan 17 12:00:08 vsa-003ca-vc-0 cib: [1116]: info: cib_process_request:
> Operation complete: op cib_master for section 'all' (origin=local/crmd/109,
> version=1.8.26): ok (rc=0)
> Jan 17 12:00:08 vsa-003ca-vc-0 cib: [1116]: info: cib_process_request:
> Operation complete: op cib_modify for section cib (origin=local/crmd/110,
> version=1.8.26): ok (rc=0)
> Jan 17 12:00:08 vsa-003ca-vc-0 cib: [1116]: info: cib_process_request:
> Operation complete: op cib_modify for section crm_config
> (origin=local/crmd/112, version=1.8.26): ok (rc=0)
> Jan 17 12:00:08 vsa-003ca-vc-0 crmd: [1120]: info: do_dc_join_offer_all:
> join-5: Waiting on 2 outstanding join acks
> Jan 17 12:00:08 vsa-003ca-vc-0 cib: [1116]: info: cib_process_request:
> Operation complete: op cib_modify for section crm_config
> (origin=local/crmd/114, version=1.8.26): ok (rc=0)
> Jan 17 12:00:08 vsa-003ca-vc-0 crmd: [1120]: info:
> config_query_callback: Checking for expired actions every 90ms
> Jan 17 12:00:09 vsa-003ca-vc-0 crmd: [1120]: info:
> do_election_count_vote: Election 6 (owner:
> 0970-0970-0001-2b91-0001) pass: vote from vsa-003ca-vc-1
> (Age)
> Jan 17 12:00:09 vsa-003ca-vc-0 crmd: [1120]: info: update_dc: Set DC to
> vsa-003ca-vc-0 (3.0.1)
> Jan 17 12:00:09 vsa-003ca-vc-0 crmd: [1120]: info: do_state_transition:
> State transition S_INTEGRATION -> S_ELECTION [ input=I_ELECTION
> cause=C_FSA_INTERNAL origin=do_election_count_vote ]
> Jan 17 12:00:09 vsa-003ca-vc-0 crmd: [1120]: info: update_dc: Unset DC
> vsa-003ca-vc-0
> Jan 17 12:00:09 vsa-003ca-vc-0 crmd: [1120]: info:
> do_election_count_vote: Election 7 (owner:
> 0970-0970-0001-2b91-0001) pass: vote from vsa-003ca-vc-1
> (Age)
> Jan 17 12:00:09 vsa-003ca-vc-0 crmd: [1120]: WARN: do_log: FSA: Input
> I_JOIN_REQUEST from route_message() received in state S_ELECTION
> Jan 17 12:00:10 vsa-003ca-vc-0 crmd: [1120]: info: do_state_transition:
> State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
> cause=C_FSA_INTERNAL origin=do_election_check ]
> Jan 17 12:00:10 vsa-003ca-vc-0 crmd: [1120]: info: start_subsystem:
> Starting sub-system "pengine"
> Jan 17 12:00:10 vsa-003ca-vc-0 crmd: [1120]: WARN: start_subsystem:
> Client pengine already running as pid 4243
> Jan 17 12:00:14 vsa-003ca-vc-0 crmd: [1120]: info: do_dc_takeover:
> Taking over DC status for this partition
> Jan 17 12:00:14 v

Re: [Pacemaker] Adding 100 Resources Locks Cluster for Several Minutes

2012-02-02 Thread Andrew Beekhof
On Mon, Jan 30, 2012 at 11:41 AM, Gruen, Wolfgang  wrote:
>
>
> *** Adding 100 Resources Locks Cluster for Several Minutes
>
> Adding 100 resources to the cluster causes the cib process to jump to 100%
> when viewed with the "top" command (all nodes), and the cluster becomes
> unresponsive to commands like "crm status" or "cibadmin -Q" for several
> minutes.

The cluster is working as hard as it can to clear the thousands of CIB
updates that result from adding that many resources.

Operations = R*N + 2*R, R=#resources, N=#nodes

For 300 resources, 15 nodes and your measurement of 17 minutes, thats
about 0.2s per operation.
Which isn't /horrible/ given the amount of work involved in each
operation.  No doubt we can do better.

Have you tried tuning the batch-limit parameter?

Were there any messages from the CIB about failure to apply an update diff?
If so, you might be affected by:

https://github.com/ClusterLabs/pacemaker/commit/10e9e579ab032bde3938d7f3e13c414e297ba3e9

> cibadmin -R --scope resources -x rsrc100.xml
> The following listing shows that all the resources were allocated to node
> 11, no other nodes received resources even though they were online, and
> every entry listed an error after approximately 10 minutes elapsed from when
> they were added to the cluster.
>
> [root@pcs_linuxha_11 ~]# crm status
>
> 
>
> Last updated: Fri Jan 27 19:21:12 2012
>
> Last change: Fri Jan 27 19:14:35 2012 via cibadmin on pcs_linuxha_1
>
> Stack: openais
>
> Current DC: pcs_linuxha_1 - partition with quorum
>
> Version: 1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558
>
> 15 Nodes configured, 15 expected votes
>
> 100 Resources configured.
>
> 
>
>
>
> Online: [ pcs_linuxha_1 pcs_linuxha_2 pcs_linuxha_3 pcs_linuxha_4
> pcs_linuxha_5 pcs_linuxha_6 pcs_linuxha_7 pcs_linuxha_8 pcs_linuxha_9
> pcs_linuxha_10 pcs_linuxha_11 pcs_linuxha_12 pcs_linuxha_13 pcs_linuxha_14
> pcs_linuxha_15 ]
>
>
>
>  pcs_resource_1 (ocf::idirect:ppct):   Started pcs_linuxha_11
>
>  pcs_resource_2 (ocf::idirect:ppct):   Started pcs_linuxha_11
>
> ...
>
>  pcs_resource_100  (ocf::idirect:ppct):   Started pcs_linuxha_11
>
>
>
> Failed actions:
>
>     pcs_resource_1_monitor_0 (node=pcs_linuxha_11, call=-1, rc=1,
> status=Timed Out): unknown error
>
>     pcs_resource_2_monitor_0 (node=pcs_linuxha_11, call=-1, rc=1,
> status=Timed Out): unknown error
>
> ...
>
>     pcs_resource_100_monitor_0 (node=pcs_linuxha_11, call=-1, rc=1,
> status=Timed Out): unknown error
>
> [root@pcs_linuxha_11 ~]#
>
>
>
> Update: Adding an additional 300 resources caused the cib process to go to
> 100% cpu utilization for approximately 17 minutes, and caused the designated
> controller (DC) to switch from node 1 to node 5. Many errors were logged at
> the 17 minute point on output of crm status, although the load was split
> amongst the cluster instead of all being loaded on node 11 as with the first
> 100 resources.
>
>
>
>
>
>
> _
> This electronic message and any files transmitted with it contains
> information from iDirect, which may be privileged, proprietary
> and/or confidential. It is intended solely for the use of the individual
> or entity to whom they are addressed. If you are not the original
> recipient or the person responsible for delivering the email to the
> intended recipient, be advised that you have received this email
> in error, and that any use, dissemination, forwarding, printing, or
> copying of this email is strictly prohibited. If you received this email
> in error, please delete it and immediately notify the sender.
> _
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Stopping 300 Resources Causes Node To Go Offline

2012-02-02 Thread Andrew Beekhof
On Mon, Jan 30, 2012 at 11:40 AM, Gruen, Wolfgang  wrote:
> We are running a cluster with 15 nodes and are running with 300 resources.
>
>
>
> *** Stopping 300 Resources Causes Node 2 To Go Offline
>
> Used the command cibadmin --replace --scope resources --xml-text
> ""
> result was all running resources stopped, but node 2 went offline
>
> [root@pcs_linuxha_2 ~]# crm status
>
>
>
> Connection to cluster failed: connection failed
>
> [root@pcs_linuxha_2 ~]# /etc/init.d/pacemaker status
>
> pacemakerd dead but pid file exists
>
> [root@pcs_linuxha_2 ~]# /etc/init.d/corosync

Nowhere near enough information, sorry.
We need a crm_report tarball to be able to comment further.
Perhaps open a bug (bugs.clusterlabs.org) and attach it there for analysis.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] How to start resources in a Resource Group in parallel

2012-02-02 Thread Lars Ellenberg
On Thu, Feb 02, 2012 at 08:28:16PM +1100, Andrew Beekhof wrote:
> On Tue, Jan 31, 2012 at 9:52 PM, Dejan Muhamedagic  
> wrote:
> > Hi,
> >
> > On Tue, Jan 31, 2012 at 10:29:14AM +, Kashif Jawed Siddiqui wrote:
> >> Hi Andrew,
> >>
> >>           It is the LRMD_MAX_CHILDREN limit which by default is 4.
> >>
> >>           I see in forums that this parameter is tunable by adding 
> >> /etc/sysconfig/pacemaker
> >> with the following line as content
> >>    LRMD_MAX_CHILDREN=8
> >>
> >>           But the above works only for Hearbeat. How do we do it for 
> >> Corosync?
> >>
> >>           can you suggest?
> >
> > It is not heartbeat or corosync specific, but depends on support
> > in the init script (/etc/init.d/corosync). The init script should
> > read the sysconfig file and then invoke lrmadmin to set the max
> > children parameter.
> 
> Just a reminder, but systemd unit files cannot do this.
> SLES wont be affected for a while, but openSUSE users will presumably
> start complaining soon.
> 
> I recommend:
> 
> diff -r 0285b706fcde lrm/lrmd/lrmd.c
> --- a/lrm/lrmd/lrmd.c Tue Sep 28 19:10:38 2010 +0200
> +++ b/lrm/lrmd/lrmd.c Thu Feb 02 20:27:33 2012 +1100
> @@ -832,6 +832,13 @@ main(int argc, char ** argv)
>   init_stop(PID_FILE);
>   }
> 
> +if(getenv("LRMD_MAX_CHILDREN")) {
> +int tmp = atoi(getenv("LRMD_MAX_CHILDREN"));
> +if(tmp > 4) {
> +max_child_count = tmp;
> +}
> +}
> +
>   return init_start();
>  }

Yes, please...

and of course we have to remember to not only set, but also export
LRMD_MAX_CHILDREN from wherever lrmd will be started from.

Lars

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] crm_simulate crashes on centos 6.2

2012-02-02 Thread Besse Mickael
There is no more error with the fix.

Thanks

-Message d'origine-
De : Andrew Beekhof [mailto:and...@beekhof.net] 
Envoyé : jeudi 2 février 2012 11:44
À : The Pacemaker cluster resource manager
Objet : Re: [Pacemaker] crm_simulate crashes on centos 6.2

On Thu, Feb 2, 2012 at 9:13 PM, Rasto Levrinc  wrote:
> On Tue, Jan 31, 2012 at 9:03 PM, Andrew Beekhof  wrote:
>> On Wed, Feb 1, 2012 at 7:00 AM, Andrew Beekhof  wrote:
>>> Not hugely useful without the debugging symbols I'm afraid.
>>
>> Although I can reproduce here. So I'll get that fixed.
>
> Could you still reproduce it after this bug fix or it was before?
>
> https://github.com/ClusterLabs/pacemaker/commit/7da9e833b63d83c32852154481572f816754c114
>

Before.

> Rasto
>
>>
>>>
>>> On Wed, Feb 1, 2012 at 2:57 AM, Besse Mickael
>>>  wrote:
 Hello

 In fact I'm not using directly crm_simulate I'm using linux cluster 
 management console which call crm_simulate.
 LCMC use the command: /usr/sbin/crm_simulate -S - -L

 Here the core file open whith gdb :


 gdb /usr/sbin/crm_simulate /tmp/core-crm_simulate-6-0-0-20826-1328021079
>
> ...
>
> --
> Dipl.-Ing. Rastislav Levrinc
> rasto.levr...@gmail.com
> Linux Cluster Management Console
> http://lcmc.sf.net/
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] nfsserver in Fedora16

2012-02-02 Thread emmanuel segura
try to look parameter

*nfs_shared_infodir *


2012/2/2 Michael Schwartzkopff 

> > On 02/02/2012 08:16 AM, Vogelsang, Andreas wrote:
> > > So I think it isn't a good idea to wait that it is done. I must finish
> a
> > > project until 30th of march. My project is a HA NFSv4 Cluster. Does
> > > anybody know if I can  use CentOS for that? Or have they also changed
> to
> > > systemd?
> >
> > Centos 6.x works fine, no systemd ...
>
> Does the kernel release the locks after failover correctly? Or does the
> client
> have to wait for the 90 seconds?
>
> --
> Dr. Michael Schwartzkopff
> Guardinistr. 63
> 81375 München
>
> Tel: (0163) 172 50 98
> Fax: (089) 620 304 13
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>


-- 
esta es mi vida e me la vivo hasta que dios quiera
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] failure-timeout Problem

2012-02-02 Thread Andrew Beekhof
On Tue, Jan 31, 2012 at 8:56 PM, Vogt Josef  wrote:
> Hi all
>
> I'm struggling with the use of failure-timeout.  Altough the failcount is 
> expired, it is not reset. This is what the logs show:
> Failcount for C_STONITH_VMWARE on id17 has expired (limit was 60s)
>
> With pacemaker > 1.0 the failcounts should be reset when they have expired. 
> What am I missing?
>
> Here is the relevant part of my config:

Can we see some logs too?

>
> property $id="cib-bootstrap-options" \
>        dc-version="1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558" \
>        cluster-infrastructure="openais" \
>        no-quorum-policy="ignore" \
>        stonith-enabled="true" \
>        dc-deadtime="20s" \
>        cluster-recheck-interval="60s" \
>        expected-quorum-votes="2"
> rsc_defaults $id="rsc-options" \
>        resource-stickiness="100" \
>        failure-timeout="60s"
>
>
> Thanks!
> Josef
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] nfsserver in Fedora16

2012-02-02 Thread Michael Schwartzkopff
> On 02/02/2012 08:16 AM, Vogelsang, Andreas wrote:
> > So I think it isn't a good idea to wait that it is done. I must finish a
> > project until 30th of march. My project is a HA NFSv4 Cluster. Does
> > anybody know if I can  use CentOS for that? Or have they also changed to
> > systemd?
> 
> Centos 6.x works fine, no systemd ...

Does the kernel release the locks after failover correctly? Or does the client 
have to wait for the 90 seconds?

-- 
Dr. Michael Schwartzkopff
Guardinistr. 63
81375 München

Tel: (0163) 172 50 98
Fax: (089) 620 304 13


signature.asc
Description: This is a digitally signed message part.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Need help with resolving very long election cycle

2012-02-02 Thread Shyam
Hi Andreas,

Yes this is only for testing. The specific test was not two VM's running on
same host. We have two physical servers each running a VM & the VM's run
pacemaker/heartbeat. We reboot both physical servers (to simulate a
power-fail) & after that watch both VM's do negotiation.

--Shyam

On Thu, Feb 2, 2012 at 3:38 PM, Andreas Kurz  wrote:

> On 02/02/2012 04:45 AM, Shyam wrote:
> > Hi Andreas,
> >
> > Thanks for your reply.
> >
> > We are using pacemaker in VM environment & was primarily checking how it
> > behaves when two nodes hosting the clustered VM's reboot. It apparently
> > took a very long time doing the elections.
>
> Ok, but this is only for testing? For a production system the VMs
> running a cluster should not run on the same host as this would be a SPOF.
>
> >
> > I realized that we were using dc-deadtime at 5sec. After bumping this up
> > to 10sec, this long election cycle problem disappeared.
>
> ... interesting
>
> Regards,
> Andreas
>
> --
> Need help with Pacemaker?
> http://www.hastexo.com/now
>
> >
> > --Shyam
> >
> > On Thu, Feb 2, 2012 at 3:59 AM, Andreas Kurz  > > wrote:
> >
> > On 01/27/2012 12:21 PM, Shyam wrote:
> > > Folks,
> > >
> > > We are constantly running into a long election cycle where in a
> 2-node
> > > cluster when both of them are simultaneously rebooted, they take a
> > long
> > > time running through election loop.
> >
> > why do you want to reboot them simultaneously? ... stop them one
> after
> > another and this will work fine.
> >
> > If you want to avoid time consuming resource movement use cluster
> > property stop-all-resources prior to the serialized shutdown.
> >
> > Regards,
> > Andreas
> >
> > --
> > Need help with Pacemaker?
> > http://www.hastexo.com/now
> >
> > >
> > > On one node pacemaker loops like:
> > > Jan 26 22:03:20 vsa-009c-vc-1 crmd: [1134]: info:
> do_dc_takeover:
> > > Taking over DC status for this partition
> > > Jan 26 22:03:20 vsa-009c-vc-1 cib: [1130]: info:
> > > cib_process_readwrite: We are now in R/O mode
> > > Jan 26 22:03:20 vsa-009c-vc-1 cib: [1130]: info:
> > > cib_process_request: Operation complete: op cib_slave_all for
> section
> > > 'all' (origin=local/crmd/222, version=1.1.1): ok (rc=0)
> > > Jan 26 22:03:20 vsa-009c-vc-1 cib: [1130]: info:
> > > cib_process_readwrite: We are now in R/W mode
> > > Jan 26 22:03:20 vsa-009c-vc-1 cib: [1130]: info:
> > > cib_process_request: Operation complete: op cib_master for section
> > 'all'
> > > (origin=local/crmd/223, version=1.1.1): ok (rc=0)
> > > Jan 26 22:03:20 vsa-009c-vc-1 cib: [1130]: info:
> > > cib_process_request: Operation complete: op cib_modify for section
> cib
> > > (origin=local/crmd/224, version=1.1.1): ok (rc=0)
> > > Jan 26 22:03:20 vsa-009c-vc-1 cib: [1130]: info:
> > > cib_process_request: Operation complete: op cib_modify for section
> > > crm_config (origin=local/crmd/226, version=1.1.1): ok (rc=0)
> > > Jan 26 22:03:20 vsa-009c-vc-1 crmd: [1134]: info:
> > > do_dc_join_offer_all: join-25: Waiting on 2 outstanding join acks
> > > Jan 26 22:03:20 vsa-009c-vc-1 cib: [1130]: info:
> > > cib_process_request: Operation complete: op cib_modify for section
> > > crm_config (origin=local/crmd/228, version=1.1.1): ok (rc=0)
> > > Jan 26 22:03:20 vsa-009c-vc-1 crmd: [1134]: info:
> > > config_query_callback: Checking for expired actions every 90ms
> > > Jan 26 22:03:20 vsa-009c-vc-1 crmd: [1134]: info:
> > > do_election_count_vote: Election 50 (owner:
> > > 0156-0156--2b91-) pass: vote from
> > vsa-009c-vc-0
> > > (Age)
> > > Jan 26 22:03:20 vsa-009c-vc-1 crmd: [1134]: info: update_dc:
> > Set DC
> > > to vsa-009c-vc-1 (3.0.1)
> > > Jan 26 22:03:20 vsa-009c-vc-1 crmd: [1134]: info:
> > > do_state_transition: State transition S_INTEGRATION -> S_ELECTION [
> > > input=I_ELECTION cause=C_FSA_INTERNAL
> origin=do_election_count_vote ]
> > > Jan 26 22:03:20 vsa-009c-vc-1 crmd: [1134]: info: update_dc:
> Unset
> > > DC vsa-009c-vc-1
> > > Jan 26 22:03:21 vsa-009c-vc-1 crmd: [1134]: info:
> > > do_election_count_vote: Election 51 (owner:
> > > 0156-0156--2b91-) pass: vote from
> > vsa-009c-vc-0
> > > (Age)
> > > Jan 26 22:03:21 vsa-009c-vc-1 crmd: [1134]: WARN: do_log: FSA:
> > Input
> > > I_JOIN_REQUEST from route_message() received in state S_ELECTION
> > > Jan 26 22:03:22 vsa-009c-vc-1 crmd: [1134]: info:
> > > do_state_transition: State transition S_ELECTION -> S_INTEGRATION [
> > > input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ]
> > > Jan 26 22:03:22 vsa-009c-vc-1 crmd: [1134]: info:
> start_subsystem:
> > > Sta

Re: [Pacemaker] Need help with resolving very long election cycle

2012-02-02 Thread Shyam
Hi Andrew,

Here is more logs covering a larger period that shows multiple of this
election cycle. Please note that in the below case I had set dc-deadtime to
5secs & the I_DC_TIMEOUT pops up every 5 secs. I turned this dc-deadtime to
10secs & the long election cycle problem disappeared. It no longer happens.
I suspect that before a single election cycle completes, the next
I_DC_TIMEOUT kicks-in. Could this be the reason?

On node 1:
Jan 17 12:00:02 vsa-003ca-vc-0 crmd: [1120]: info:
do_election_count_vote: Election 3 (owner:
0970-0970-0001-2b91-0001) pass: vote from vsa-003ca-vc-1
(Age)
Jan 17 12:00:02 vsa-003ca-vc-0 crmd: [1120]: info: do_state_transition:
State transition S_INTEGRATION -> S_ELECTION [ input=I_ELECTION
cause=C_FSA_INTERNAL origin=do_election_count_vote ]
Jan 17 12:00:02 vsa-003ca-vc-0 crmd: [1120]: WARN: do_log: FSA: Input
I_JOIN_OFFER from route_message() received in state S_ELECTION
Jan 17 12:00:02 vsa-003ca-vc-0 crmd: [1120]: WARN: do_log: FSA: Input
I_JOIN_OFFER from route_message() received in state S_ELECTION
Jan 17 12:00:03 vsa-003ca-vc-0 crmd: [1120]: info:
do_election_count_vote: Election 4 (owner:
0970-0970-0001-2b91-0001) pass: vote from vsa-003ca-vc-1
(Age)
Jan 17 12:00:03 vsa-003ca-vc-0 crmd: [1120]: info:
do_election_count_vote: Election 5 (owner:
0970-0970-0001-2b91-0001) pass: vote from vsa-003ca-vc-1
(Age)
Jan 17 12:00:04 vsa-003ca-vc-0 crmd: [1120]: info: do_state_transition:
State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
cause=C_FSA_INTERNAL origin=do_election_check ]
Jan 17 12:00:04 vsa-003ca-vc-0 crmd: [1120]: info: start_subsystem:
Starting sub-system "pengine"
Jan 17 12:00:04 vsa-003ca-vc-0 crmd: [1120]: WARN: start_subsystem:
Client pengine already running as pid 4243
Jan 17 12:00:08 vsa-003ca-vc-0 crmd: [1120]: info: do_dc_takeover:
Taking over DC status for this partition
Jan 17 12:00:08 vsa-003ca-vc-0 cib: [1116]: info:
cib_process_readwrite: We are now in R/O mode
Jan 17 12:00:08 vsa-003ca-vc-0 cib: [1116]: info: cib_process_request:
Operation complete: op cib_slave_all for section 'all'
(origin=local/crmd/108, version=1.8.26): ok (rc=0)
Jan 17 12:00:08 vsa-003ca-vc-0 cib: [1116]: info:
cib_process_readwrite: We are now in R/W mode
Jan 17 12:00:08 vsa-003ca-vc-0 cib: [1116]: info: cib_process_request:
Operation complete: op cib_master for section 'all' (origin=local/crmd/109,
version=1.8.26): ok (rc=0)
Jan 17 12:00:08 vsa-003ca-vc-0 cib: [1116]: info: cib_process_request:
Operation complete: op cib_modify for section cib (origin=local/crmd/110,
version=1.8.26): ok (rc=0)
Jan 17 12:00:08 vsa-003ca-vc-0 cib: [1116]: info: cib_process_request:
Operation complete: op cib_modify for section crm_config
(origin=local/crmd/112, version=1.8.26): ok (rc=0)
Jan 17 12:00:08 vsa-003ca-vc-0 crmd: [1120]: info:
do_dc_join_offer_all: join-5: Waiting on 2 outstanding join acks
Jan 17 12:00:08 vsa-003ca-vc-0 cib: [1116]: info: cib_process_request:
Operation complete: op cib_modify for section crm_config
(origin=local/crmd/114, version=1.8.26): ok (rc=0)
Jan 17 12:00:08 vsa-003ca-vc-0 crmd: [1120]: info:
config_query_callback: Checking for expired actions every 90ms
Jan 17 12:00:09 vsa-003ca-vc-0 crmd: [1120]: info:
do_election_count_vote: Election 6 (owner:
0970-0970-0001-2b91-0001) pass: vote from vsa-003ca-vc-1
(Age)
Jan 17 12:00:09 vsa-003ca-vc-0 crmd: [1120]: info: update_dc: Set DC to
vsa-003ca-vc-0 (3.0.1)
Jan 17 12:00:09 vsa-003ca-vc-0 crmd: [1120]: info: do_state_transition:
State transition S_INTEGRATION -> S_ELECTION [ input=I_ELECTION
cause=C_FSA_INTERNAL origin=do_election_count_vote ]
Jan 17 12:00:09 vsa-003ca-vc-0 crmd: [1120]: info: update_dc: Unset DC
vsa-003ca-vc-0
Jan 17 12:00:09 vsa-003ca-vc-0 crmd: [1120]: info:
do_election_count_vote: Election 7 (owner:
0970-0970-0001-2b91-0001) pass: vote from vsa-003ca-vc-1
(Age)
Jan 17 12:00:09 vsa-003ca-vc-0 crmd: [1120]: WARN: do_log: FSA: Input
I_JOIN_REQUEST from route_message() received in state S_ELECTION
Jan 17 12:00:10 vsa-003ca-vc-0 crmd: [1120]: info: do_state_transition:
State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
cause=C_FSA_INTERNAL origin=do_election_check ]
Jan 17 12:00:10 vsa-003ca-vc-0 crmd: [1120]: info: start_subsystem:
Starting sub-system "pengine"
Jan 17 12:00:10 vsa-003ca-vc-0 crmd: [1120]: WARN: start_subsystem:
Client pengine already running as pid 4243
Jan 17 12:00:14 vsa-003ca-vc-0 crmd: [1120]: info: do_dc_takeover:
Taking over DC status for this partition
Jan 17 12:00:14 vsa-003ca-vc-0 cib: [1116]: info:
cib_process_readwrite: We are now in R/O mode
Jan 17 12:00:14 vsa-003ca-vc-0 cib: [1116]: info: cib_process_request:
Operation complete: op cib_slave_all for section 'all'
(origin=local/crmd/117, version=1.8.26): ok (rc=0)
Jan 17 12:00:14 

Re: [Pacemaker] MySQL-Cluster with Pacemaker

2012-02-02 Thread Stallmann, Andreas
> If I understand you correct, you'd like to have a multi-master MySQL setup 
> with circular replication?
No, not really. :-) I just want to avoid the overhead of DRBD for replication. 
We have to get faster than that. :-)

As I already wrote. I'm evaluating two different approaches:

- MySQL replication, with Master/Slave Status and shared writer IP controlled 
by Pacemaker (as described by Yves Trudeau)
- MySQL NBD Cluster with Pacemaker controlling a shared IP and maybe the MySQL 
Cluster Manager (as described by Yves, too and  - with an additional 
loadbalancer - by Falko Timme).

You'll find the links in my first post. What I was asking for is the opinion 
and experience the audience of this mailing list has with the three solutions 
mentioned above.

Cheers,

Andreas


-
CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef.
Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136)
Gesch?ftsf?hrer/Managing Directors: Anke H?fer
-


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] crm_simulate crashes on centos 6.2

2012-02-02 Thread Andrew Beekhof
On Thu, Feb 2, 2012 at 9:13 PM, Rasto Levrinc  wrote:
> On Tue, Jan 31, 2012 at 9:03 PM, Andrew Beekhof  wrote:
>> On Wed, Feb 1, 2012 at 7:00 AM, Andrew Beekhof  wrote:
>>> Not hugely useful without the debugging symbols I'm afraid.
>>
>> Although I can reproduce here. So I'll get that fixed.
>
> Could you still reproduce it after this bug fix or it was before?
>
> https://github.com/ClusterLabs/pacemaker/commit/7da9e833b63d83c32852154481572f816754c114
>

Before.

> Rasto
>
>>
>>>
>>> On Wed, Feb 1, 2012 at 2:57 AM, Besse Mickael
>>>  wrote:
 Hello

 In fact I'm not using directly crm_simulate I'm using linux cluster 
 management console which call crm_simulate.
 LCMC use the command: /usr/sbin/crm_simulate -S - -L

 Here the core file open whith gdb :


 gdb /usr/sbin/crm_simulate /tmp/core-crm_simulate-6-0-0-20826-1328021079
>
> ...
>
> --
> Dipl.-Ing. Rastislav Levrinc
> rasto.levr...@gmail.com
> Linux Cluster Management Console
> http://lcmc.sf.net/
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] nfsserver in Fedora16

2012-02-02 Thread Andreas Kurz
On 02/02/2012 08:16 AM, Vogelsang, Andreas wrote:
> So I think it isn't a good idea to wait that it is done. I must finish a 
> project until 30th of march. My project is a HA NFSv4 Cluster.
> Does anybody know if I can  use CentOS for that? Or have they also changed to 
> systemd?

Centos 6.x works fine, no systemd ...

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> 
> Best regards,
> Andreas Vogelsang
> 
> -Ursprüngliche Nachricht-
> Von: Andrew Beekhof [mailto:and...@beekhof.net] 
> Gesendet: Donnerstag, 2. Februar 2012 02:40
> An: mi...@schwartzkopff.org; The Pacemaker cluster resource manager
> Betreff: Re: [Pacemaker] nfsserver in Fedora16
> 
> On Thu, Feb 2, 2012 at 9:22 AM, Michael Schwartzkopff
>  wrote:
>>> On 01/22/2012 04:56 PM, Michael Schwartzkopff wrote:
 Hi,

 as far as I can see Fedora16 has switched to systemd to start services.
 Thus also all init scripts /etc/init.d/nfs... disappeared.

 Both ocf:heartbeat:nfsserver and ocf:redhat:nfsserver.sh seem to rely on
 that init script and fail on start if the script is not available. The
 radhat RA says:

 Jan 22 16:49:59 node1 lrmd: [2021]: info: RA output:
 (resNFS:probe:stderr) /usr/lib/ocf/resource.d//redhat/nfsserver.sh: line
 198: /etc/init.d/nfs: No such file or directory

 Any idea how to set up a HA-NFSServer with Fedora16? Or how to utilize
 the systemd scritps for the pacemaker cluster?
>>>
>>> you can use ocf:heartbeat:exportfs RA and let systemd start and respawn
>>> nfs services.
>>>
>>> Regards,
>>> Andreas
>>
>> Thanks. But it seems to be as I thought: There are no chances to use systemd
>> scipts in pacemaker clusters.
> 
> Yet
> 
>>
>> Greetings,
>>
>> --
>> Dr. Michael Schwartzkopff
>> Guardinistr. 63
>> 81375 München
>>
>> Tel: (0163) 172 50 98
>> Fax: (089) 620 304 13
>>
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org




signature.asc
Description: OpenPGP digital signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] MySQL-Cluster with Pacemaker

2012-02-02 Thread Andreas Kurz
On 02/02/2012 10:42 AM, Stallmann, Andreas wrote:
> Well, yes, I read the presentation a while ago, but it didn't present me with 
> any new information.
> 
> What Florian does is replicate a MySQL-Master over two nodes with DRBD and 
> have several MySQL-Slaves replicate their data from the current Master via 
> the built-in mechanism.
> 
> That's firstly quite nice, if you have a scenario, where there is the need 
> for multiple nodes to be read from and where one single node for writing is 
> sufficient.
> 
> Secondly, this scenario works only with 2+(n>1) nodes, whereas we have (and 
> want) a classical 2 node scenario.

If I understand you correct, you'd like to have a multi-master MySQL
setup with circular replication?

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/services/remote

> 
> Thirdly (and that's the main issue with MySQL on DRBD), the failover process 
> still takes quite long, because of the amount of events due to happen: Stop 
> mysql on node 1, unmount the filesystem, switch DRBD master/slave, mount 
> filesystem on node 2, start mysql (and perhaps repair tables).
> 
> Still, thanks for the suggestion.
> 
> Cheers,
> 
> Andreas
> 
> 
> 
> -
> CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef.
> Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136)
> Geschäftsführer/Managing Directors: Anke Höfer
> -
> 
> -Ursprüngliche Nachricht-
> Von: Andreas Kurz [mailto:andr...@hastexo.com]
> Gesendet: Mittwoch, 1. Februar 2012 23:55
> An: pacemaker@oss.clusterlabs.org
> Betreff: Re: [Pacemaker] MySQL-Cluster with Pacemaker
> 
> Hello,
> 
> On 01/30/2012 10:46 AM, Stallmann, Andreas wrote:
>> Hi!
>>
>>
>>
>> I'm on the lookout for alternatives to our current MySQL "cluster",
>> which is an Active/Standby solution with MySQL on DRBD.
>>
>> With increasing customer demand for a faster failover, we want an
>> Active/Passive or even an Active/Active cluster.
> 
> Already had a look at this presentation my colleague Florian held at Percona 
> Live UK 2011 in London?
> 
> http://goo.gl/5mDFR
> 
> Regards,
> Andreas
> 
> --
> Need help with Pacemaker?
> http://www.hastexo.com/now
> 
>>
>>
>>
>> Currently we run a Tomcat application, which works in
>> Active/Passive-Mode. The applications on the active node communicate
>> it's status via a MySQL database to the passive node. In our current
>> setup, both nodes run the tomcat application but only one holds the
>> database (due to the setup with DRBD). For faster failovers, we'd
>> rather want the database to be active on both nodes. It's not
>> necessary, that it can written on both nodes, but read access would be
>> desirable. The thought is, that switching the "master status" of a
>> database might be quicker than switching DRBD's master status,
>> unmounting and mounting the file system and stopping and starting the 
>> database.
>>
>>
>>
>> A MySQL/Pacemaker cluster with replication as described  in
>> http://www.mysqlperformanceblog.com/2011/11/29/percona-replication-man
>> ager-a-solution-for-mysql-high-availability-with-replication-using-pac
>> emaker/  thus looked very promising, but it seems to be not yet mature
>> enough for a production environment. Please do correct men if I'm
>> wrong there, I'm really interested in your experience with this
>> solution in a real world scenario.
>>
>>
>>
>> Are there perhaps other howtos describing Pacemaker and MySQL replication?
>>
>>
>>
>> The second idea was using the native MySQL NDB clustering with
>> Pacemaker.
>> http://www.mysqlperformanceblog.com/2010/05/19/pacemaker-please-meet-n
>> db-cluster-or-using-pacemakerheartbeat-to-start-a-ndb-cluster/
>> (from 2010, uses heartbeat and pacemaker) and
>> http://www.howtoforge.com/loadbalanced_mysql_cluster_debian (from
>> 2008, uses pure heartbeat). Are there any more recent "howtos" on
>> pacemaker and MySQL NDB 7.x describe this. Can you provide me with
>> your opinions and field reports on these setups?
>>
>>
>>
>> Looking forward to your upcoming mails,
>>
>>
>>
>> Andreas
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> 
>> CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef.
>> Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr.
>> 9136) Geschäftsführer/Managing Director: Anke Höfer
>>
>>  
>>
>>
>>
>>
>>
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clus

Re: [Pacemaker] crm_simulate crashes on centos 6.2

2012-02-02 Thread Rasto Levrinc
On Tue, Jan 31, 2012 at 9:03 PM, Andrew Beekhof  wrote:
> On Wed, Feb 1, 2012 at 7:00 AM, Andrew Beekhof  wrote:
>> Not hugely useful without the debugging symbols I'm afraid.
>
> Although I can reproduce here. So I'll get that fixed.

Could you still reproduce it after this bug fix or it was before?

https://github.com/ClusterLabs/pacemaker/commit/7da9e833b63d83c32852154481572f816754c114

Rasto

>
>>
>> On Wed, Feb 1, 2012 at 2:57 AM, Besse Mickael
>>  wrote:
>>> Hello
>>>
>>> In fact I'm not using directly crm_simulate I'm using linux cluster 
>>> management console which call crm_simulate.
>>> LCMC use the command: /usr/sbin/crm_simulate -S - -L
>>>
>>> Here the core file open whith gdb :
>>>
>>>
>>> gdb /usr/sbin/crm_simulate /tmp/core-crm_simulate-6-0-0-20826-1328021079

...

-- 
Dipl.-Ing. Rastislav Levrinc
rasto.levr...@gmail.com
Linux Cluster Management Console
http://lcmc.sf.net/

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Need help with resolving very long election cycle

2012-02-02 Thread Andreas Kurz
On 02/02/2012 04:45 AM, Shyam wrote:
> Hi Andreas,
> 
> Thanks for your reply.
> 
> We are using pacemaker in VM environment & was primarily checking how it
> behaves when two nodes hosting the clustered VM's reboot. It apparently
> took a very long time doing the elections.

Ok, but this is only for testing? For a production system the VMs
running a cluster should not run on the same host as this would be a SPOF.

> 
> I realized that we were using dc-deadtime at 5sec. After bumping this up
> to 10sec, this long election cycle problem disappeared.

... interesting

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> 
> --Shyam
> 
> On Thu, Feb 2, 2012 at 3:59 AM, Andreas Kurz  > wrote:
> 
> On 01/27/2012 12:21 PM, Shyam wrote:
> > Folks,
> >
> > We are constantly running into a long election cycle where in a 2-node
> > cluster when both of them are simultaneously rebooted, they take a
> long
> > time running through election loop.
> 
> why do you want to reboot them simultaneously? ... stop them one after
> another and this will work fine.
> 
> If you want to avoid time consuming resource movement use cluster
> property stop-all-resources prior to the serialized shutdown.
> 
> Regards,
> Andreas
> 
> --
> Need help with Pacemaker?
> http://www.hastexo.com/now
> 
> >
> > On one node pacemaker loops like:
> > Jan 26 22:03:20 vsa-009c-vc-1 crmd: [1134]: info: do_dc_takeover:
> > Taking over DC status for this partition
> > Jan 26 22:03:20 vsa-009c-vc-1 cib: [1130]: info:
> > cib_process_readwrite: We are now in R/O mode
> > Jan 26 22:03:20 vsa-009c-vc-1 cib: [1130]: info:
> > cib_process_request: Operation complete: op cib_slave_all for section
> > 'all' (origin=local/crmd/222, version=1.1.1): ok (rc=0)
> > Jan 26 22:03:20 vsa-009c-vc-1 cib: [1130]: info:
> > cib_process_readwrite: We are now in R/W mode
> > Jan 26 22:03:20 vsa-009c-vc-1 cib: [1130]: info:
> > cib_process_request: Operation complete: op cib_master for section
> 'all'
> > (origin=local/crmd/223, version=1.1.1): ok (rc=0)
> > Jan 26 22:03:20 vsa-009c-vc-1 cib: [1130]: info:
> > cib_process_request: Operation complete: op cib_modify for section cib
> > (origin=local/crmd/224, version=1.1.1): ok (rc=0)
> > Jan 26 22:03:20 vsa-009c-vc-1 cib: [1130]: info:
> > cib_process_request: Operation complete: op cib_modify for section
> > crm_config (origin=local/crmd/226, version=1.1.1): ok (rc=0)
> > Jan 26 22:03:20 vsa-009c-vc-1 crmd: [1134]: info:
> > do_dc_join_offer_all: join-25: Waiting on 2 outstanding join acks
> > Jan 26 22:03:20 vsa-009c-vc-1 cib: [1130]: info:
> > cib_process_request: Operation complete: op cib_modify for section
> > crm_config (origin=local/crmd/228, version=1.1.1): ok (rc=0)
> > Jan 26 22:03:20 vsa-009c-vc-1 crmd: [1134]: info:
> > config_query_callback: Checking for expired actions every 90ms
> > Jan 26 22:03:20 vsa-009c-vc-1 crmd: [1134]: info:
> > do_election_count_vote: Election 50 (owner:
> > 0156-0156--2b91-) pass: vote from
> vsa-009c-vc-0
> > (Age)
> > Jan 26 22:03:20 vsa-009c-vc-1 crmd: [1134]: info: update_dc:
> Set DC
> > to vsa-009c-vc-1 (3.0.1)
> > Jan 26 22:03:20 vsa-009c-vc-1 crmd: [1134]: info:
> > do_state_transition: State transition S_INTEGRATION -> S_ELECTION [
> > input=I_ELECTION cause=C_FSA_INTERNAL origin=do_election_count_vote ]
> > Jan 26 22:03:20 vsa-009c-vc-1 crmd: [1134]: info: update_dc: Unset
> > DC vsa-009c-vc-1
> > Jan 26 22:03:21 vsa-009c-vc-1 crmd: [1134]: info:
> > do_election_count_vote: Election 51 (owner:
> > 0156-0156--2b91-) pass: vote from
> vsa-009c-vc-0
> > (Age)
> > Jan 26 22:03:21 vsa-009c-vc-1 crmd: [1134]: WARN: do_log: FSA:
> Input
> > I_JOIN_REQUEST from route_message() received in state S_ELECTION
> > Jan 26 22:03:22 vsa-009c-vc-1 crmd: [1134]: info:
> > do_state_transition: State transition S_ELECTION -> S_INTEGRATION [
> > input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ]
> > Jan 26 22:03:22 vsa-009c-vc-1 crmd: [1134]: info: start_subsystem:
> > Starting sub-system "pengine"
> > Jan 26 22:03:22 vsa-009c-vc-1 crmd: [1134]: WARN: start_subsystem:
> > Client pengine already running as pid 1234
> > Jan 26 22:03:26 vsa-009c-vc-1 crmd: [1134]: info: do_dc_takeover:
> > Taking over DC status for this partition
> > Jan 26 22:03:26 vsa-009c-vc-1 cib: [1130]: info:
> > cib_process_readwrite: We are now in R/O mode
> > Jan 26 22:03:26 vsa-009c-vc-1 cib: [1130]: info:
> > cib_process_request: Operation complete: op cib_slave_all for section
> > 'all' (origin=l

Re: [Pacemaker] don't want to restart clone resource

2012-02-02 Thread Andrew Beekhof
On Thu, Feb 2, 2012 at 4:57 AM, Lars Ellenberg
 wrote:
> On Wed, Feb 01, 2012 at 03:43:55PM +0100, Andreas Kurz wrote:
>> Hello,
>>
>> On 02/01/2012 10:39 AM, Fanghao Sha wrote:
>> > Hi Lars,
>> >
>> > Yes, you are right. But how to prevent the "orphaned" resources from
>> > stopping by default, please?
>>
>> crm configure property stop-orphan-resources=false
>
> Well, sure. But for "normal" ophans,
> you actually want them to be stopped.
>
> No, pacemaker needs some additional smarts to recognize
> that there actually are no orphans, maybe by first relabling,
> and only then checking for instance label > clone-max.

Instance label doesn't come into the equation.
It might look like it does on the outside, but its more complicated than that.

>
> Did you file a bugzilla?
> Has that made progress?
>
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Need help with resolving very long election cycle

2012-02-02 Thread Andrew Beekhof
They both think they should be the DC.
But the log fragments dont extend back far enough to say why.

On Fri, Jan 27, 2012 at 10:21 PM, Shyam  wrote:
> Folks,
>
> We are constantly running into a long election cycle where in a 2-node
> cluster when both of them are simultaneously rebooted, they take a long time
> running through election loop.
>
> On one node pacemaker loops like:
> Jan 26 22:03:20 vsa-009c-vc-1 crmd: [1134]: info: do_dc_takeover: Taking
> over DC status for this partition
> Jan 26 22:03:20 vsa-009c-vc-1 cib: [1130]: info: cib_process_readwrite:
> We are now in R/O mode
> Jan 26 22:03:20 vsa-009c-vc-1 cib: [1130]: info: cib_process_request:
> Operation complete: op cib_slave_all for section 'all'
> (origin=local/crmd/222, version=1.1.1): ok (rc=0)
> Jan 26 22:03:20 vsa-009c-vc-1 cib: [1130]: info: cib_process_readwrite:
> We are now in R/W mode
> Jan 26 22:03:20 vsa-009c-vc-1 cib: [1130]: info: cib_process_request:
> Operation complete: op cib_master for section 'all' (origin=local/crmd/223,
> version=1.1.1): ok (rc=0)
> Jan 26 22:03:20 vsa-009c-vc-1 cib: [1130]: info: cib_process_request:
> Operation complete: op cib_modify for section cib (origin=local/crmd/224,
> version=1.1.1): ok (rc=0)
> Jan 26 22:03:20 vsa-009c-vc-1 cib: [1130]: info: cib_process_request:
> Operation complete: op cib_modify for section crm_config
> (origin=local/crmd/226, version=1.1.1): ok (rc=0)
> Jan 26 22:03:20 vsa-009c-vc-1 crmd: [1134]: info: do_dc_join_offer_all:
> join-25: Waiting on 2 outstanding join acks
> Jan 26 22:03:20 vsa-009c-vc-1 cib: [1130]: info: cib_process_request:
> Operation complete: op cib_modify for section crm_config
> (origin=local/crmd/228, version=1.1.1): ok (rc=0)
> Jan 26 22:03:20 vsa-009c-vc-1 crmd: [1134]: info: config_query_callback:
> Checking for expired actions every 90ms
> Jan 26 22:03:20 vsa-009c-vc-1 crmd: [1134]: info:
> do_election_count_vote: Election 50 (owner:
> 0156-0156--2b91-) pass: vote from vsa-009c-vc-0
> (Age)
> Jan 26 22:03:20 vsa-009c-vc-1 crmd: [1134]: info: update_dc: Set DC to
> vsa-009c-vc-1 (3.0.1)
> Jan 26 22:03:20 vsa-009c-vc-1 crmd: [1134]: info: do_state_transition:
> State transition S_INTEGRATION -> S_ELECTION [ input=I_ELECTION
> cause=C_FSA_INTERNAL origin=do_election_count_vote ]
> Jan 26 22:03:20 vsa-009c-vc-1 crmd: [1134]: info: update_dc: Unset DC
> vsa-009c-vc-1
> Jan 26 22:03:21 vsa-009c-vc-1 crmd: [1134]: info:
> do_election_count_vote: Election 51 (owner:
> 0156-0156--2b91-) pass: vote from vsa-009c-vc-0
> (Age)
> Jan 26 22:03:21 vsa-009c-vc-1 crmd: [1134]: WARN: do_log: FSA: Input
> I_JOIN_REQUEST from route_message() received in state S_ELECTION
> Jan 26 22:03:22 vsa-009c-vc-1 crmd: [1134]: info: do_state_transition:
> State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
> cause=C_FSA_INTERNAL origin=do_election_check ]
> Jan 26 22:03:22 vsa-009c-vc-1 crmd: [1134]: info: start_subsystem:
> Starting sub-system "pengine"
> Jan 26 22:03:22 vsa-009c-vc-1 crmd: [1134]: WARN: start_subsystem:
> Client pengine already running as pid 1234
> Jan 26 22:03:26 vsa-009c-vc-1 crmd: [1134]: info: do_dc_takeover: Taking
> over DC status for this partition
> Jan 26 22:03:26 vsa-009c-vc-1 cib: [1130]: info: cib_process_readwrite:
> We are now in R/O mode
> Jan 26 22:03:26 vsa-009c-vc-1 cib: [1130]: info: cib_process_request:
> Operation complete: op cib_slave_all for section 'all'
> (origin=local/crmd/231, version=1.1.1): ok (rc=0)
> Jan 26 22:03:26 vsa-009c-vc-1 cib: [1130]: info: cib_process_readwrite:
> We are now in R/W mode
> Jan 26 22:03:26 vsa-009c-vc-1 cib: [1130]: info: cib_process_request:
> Operation complete: op cib_master for section 'all' (origin=local/crmd/232,
> version=1.1.1): ok (rc=0)
> Jan 26 22:03:26 vsa-009c-vc-1 cib: [1130]: info: cib_process_request:
> Operation complete: op cib_modify for section cib (origin=local/crmd/233,
> version=1.1.1): ok (rc=0)
> Jan 26 22:03:26 vsa-009c-vc-1 cib: [1130]: info: cib_process_request:
> Operation complete: op cib_modify for section crm_config
> (origin=local/crmd/235, version=1.1.1): ok (rc=0)
> Jan 26 22:03:26 vsa-009c-vc-1 crmd: [1134]: info: do_dc_join_offer_all:
> join-26: Waiting on 2 outstanding join acks
> Jan 26 22:03:26 vsa-009c-vc-1 cib: [1130]: info: cib_process_request:
> Operation complete: op cib_modify for section crm_config
> (origin=local/crmd/237, version=1.1.1): ok (rc=0)
> Jan 26 22:03:26 vsa-009c-vc-1 crmd: [1134]: info: config_query_callback:
> Checking for expired actions every 90ms
> Jan 26 22:03:26 vsa-009c-vc-1 crmd: [1134]: info:
> do_election_count_vote: Election 52 (owner:
> 0156-0156--2b91-) pass: vote from vsa-009c-vc-0
> (Age)
> Jan 26 22:03:26 vsa-009c-vc-1 crmd: [1134]: info: update_dc: Set DC to
> vsa-009c-vc-1 (3.0.1)
> Jan 26 22:03:2

Re: [Pacemaker] MySQL-Cluster with Pacemaker

2012-02-02 Thread Stallmann, Andreas
Well, yes, I read the presentation a while ago, but it didn't present me with 
any new information.

What Florian does is replicate a MySQL-Master over two nodes with DRBD and have 
several MySQL-Slaves replicate their data from the current Master via the 
built-in mechanism.

That's firstly quite nice, if you have a scenario, where there is the need for 
multiple nodes to be read from and where one single node for writing is 
sufficient.

Secondly, this scenario works only with 2+(n>1) nodes, whereas we have (and 
want) a classical 2 node scenario.

Thirdly (and that's the main issue with MySQL on DRBD), the failover process 
still takes quite long, because of the amount of events due to happen: Stop 
mysql on node 1, unmount the filesystem, switch DRBD master/slave, mount 
filesystem on node 2, start mysql (and perhaps repair tables).

Still, thanks for the suggestion.

Cheers,

Andreas



-
CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef.
Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136)
Geschäftsführer/Managing Directors: Anke Höfer
-

-Ursprüngliche Nachricht-
Von: Andreas Kurz [mailto:andr...@hastexo.com]
Gesendet: Mittwoch, 1. Februar 2012 23:55
An: pacemaker@oss.clusterlabs.org
Betreff: Re: [Pacemaker] MySQL-Cluster with Pacemaker

Hello,

On 01/30/2012 10:46 AM, Stallmann, Andreas wrote:
> Hi!
>
>
>
> I'm on the lookout for alternatives to our current MySQL "cluster",
> which is an Active/Standby solution with MySQL on DRBD.
>
> With increasing customer demand for a faster failover, we want an
> Active/Passive or even an Active/Active cluster.

Already had a look at this presentation my colleague Florian held at Percona 
Live UK 2011 in London?

http://goo.gl/5mDFR

Regards,
Andreas

--
Need help with Pacemaker?
http://www.hastexo.com/now

>
>
>
> Currently we run a Tomcat application, which works in
> Active/Passive-Mode. The applications on the active node communicate
> it's status via a MySQL database to the passive node. In our current
> setup, both nodes run the tomcat application but only one holds the
> database (due to the setup with DRBD). For faster failovers, we'd
> rather want the database to be active on both nodes. It's not
> necessary, that it can written on both nodes, but read access would be
> desirable. The thought is, that switching the "master status" of a
> database might be quicker than switching DRBD's master status,
> unmounting and mounting the file system and stopping and starting the 
> database.
>
>
>
> A MySQL/Pacemaker cluster with replication as described  in
> http://www.mysqlperformanceblog.com/2011/11/29/percona-replication-man
> ager-a-solution-for-mysql-high-availability-with-replication-using-pac
> emaker/  thus looked very promising, but it seems to be not yet mature
> enough for a production environment. Please do correct men if I'm
> wrong there, I'm really interested in your experience with this
> solution in a real world scenario.
>
>
>
> Are there perhaps other howtos describing Pacemaker and MySQL replication?
>
>
>
> The second idea was using the native MySQL NDB clustering with
> Pacemaker.
> http://www.mysqlperformanceblog.com/2010/05/19/pacemaker-please-meet-n
> db-cluster-or-using-pacemakerheartbeat-to-start-a-ndb-cluster/
> (from 2010, uses heartbeat and pacemaker) and
> http://www.howtoforge.com/loadbalanced_mysql_cluster_debian (from
> 2008, uses pure heartbeat). Are there any more recent "howtos" on
> pacemaker and MySQL NDB 7.x describe this. Can you provide me with
> your opinions and field reports on these setups?
>
>
>
> Looking forward to your upcoming mails,
>
>
>
> Andreas
>
>
>
>
>
>
>
>
>
>
>
>
>
> 
> CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef.
> Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr.
> 9136) Geschäftsführer/Managing Director: Anke Höfer
>
>  
>
>
>
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] How to start resources in a Resource Group in parallel

2012-02-02 Thread Andrew Beekhof
On Tue, Jan 31, 2012 at 9:52 PM, Dejan Muhamedagic  wrote:
> Hi,
>
> On Tue, Jan 31, 2012 at 10:29:14AM +, Kashif Jawed Siddiqui wrote:
>> Hi Andrew,
>>
>>           It is the LRMD_MAX_CHILDREN limit which by default is 4.
>>
>>           I see in forums that this parameter is tunable by adding 
>> /etc/sysconfig/pacemaker
>> with the following line as content
>>    LRMD_MAX_CHILDREN=8
>>
>>           But the above works only for Hearbeat. How do we do it for 
>> Corosync?
>>
>>           can you suggest?
>
> It is not heartbeat or corosync specific, but depends on support
> in the init script (/etc/init.d/corosync). The init script should
> read the sysconfig file and then invoke lrmadmin to set the max
> children parameter.

Just a reminder, but systemd unit files cannot do this.
SLES wont be affected for a while, but openSUSE users will presumably
start complaining soon.

I recommend:

diff -r 0285b706fcde lrm/lrmd/lrmd.c
--- a/lrm/lrmd/lrmd.c   Tue Sep 28 19:10:38 2010 +0200
+++ b/lrm/lrmd/lrmd.c   Thu Feb 02 20:27:33 2012 +1100
@@ -832,6 +832,13 @@ main(int argc, char ** argv)
init_stop(PID_FILE);
}

+if(getenv("LRMD_MAX_CHILDREN")) {
+int tmp = atoi(getenv("LRMD_MAX_CHILDREN"));
+if(tmp > 4) {
+max_child_count = tmp;
+}
+}
+
return init_start();
 }



>
> Thanks,
>
> Dejan
>
>> Regards
>> KASHIF
>>
>> 
>> From: Andrew Beekhof [and...@beekhof.net]
>> Sent: Tuesday, January 31, 2012 2:34 PM
>> To: The Pacemaker cluster resource manager
>> Subject: Re: [Pacemaker] How to start resources in a Resource Group in  
>> parallel
>>
>> On Tue, Jan 31, 2012 at 6:46 PM,   wrote:
>> > Hi All,
>> >  I am using pacemaker 1.0.11 + corosync 1.4.2 for a 2 node cluster
>> >  I have configured one Group, which is having 12 resources,
>> >  Ordered parameter is set to false but all 12 resource are not started
>> > parallel.
>> >  At a time only four resource started parallel.
>>
>> This is probably the lrmd limit.
>>
>> >  Resource 1,2,3,4 starts then 5,6,7,8 starts, then 9,10,11,12
>> >  If 5 second sleep is their in Resource-Agent's start method.
>> >  Then All the resources will start in 15 second.
>> >
>> >  Please can you suggest how can I solve the same,
>> >
>> >  It seems, only four thread starts at a time for a group's resource,
>> >
>> > Regards
>> > Manish
>> >
>> > On Fri, December 2, 2011 4:23 pm, Andreas Kurz wrote:
>> >> Hello Kashif,
>> >>
>> >>
>> >> On 12/02/2011 06:04 AM, Kashif Jawed Siddiqui wrote:
>> >>
>> >>> Hi All,
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> I am using pacemaker 1.0.11 + corosync 1.4.2 for a 2 node cluster.
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> The old cib.xml for Heartbeat based cluster had an option
>> >>> "ordered=true | false" for "group" tag which supported starting of
>> >>> resources in series or parallel.
>> >>
>> >> this meta attribute is still available ... when in crm shell add:
>> >>
>> >> meta ordered=false
>> >>
>> >> to your group resource ... or you could also use a colocation set.
>> >>
>> >> Regards,
>> >> Andreas
>> >>
>> >>
>> >> --
>> >> Need help with Pacemaker?
>> >> http://www.hastexo.com/now
>> >>
>> >>
>> >>>
>> >>>
>> >>>
>> >>> By default, resources in Resource Group start in serial order for
>> >>> Pacemaker based clusters.
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> Is there a way to start multiple resources in parallel?
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> Regards
>> >>>
>> >>>
>> >>> KASHIF
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> ___
>> >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >>>
>> >>>
>> >>> Project Home: http://www.clusterlabs.org
>> >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> >>>  Bugs: http://bugs.clusterlabs.org
>> >>>
>> >>
>> >>
>> >> ___
>> >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >>
>> >>
>> >> Project Home: http://www.clusterlabs.org
>> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> >> Bugs: http://bugs.clusterlabs.org
>> >>
>> >>
>> >
>> >
>> >
>> > ___
>> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>>
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>> ___

Re: [Pacemaker] nfsserver in Fedora16

2012-02-02 Thread Michael Schwartzkopff
> So I think it isn't a good idea to wait that it is done. I must finish a
> project until 30th of march. My project is a HA NFSv4 Cluster. Does
> anybody know if I can  use CentOS for that? Or have they also changed to
> systemd?
> 
> Best regards,
> Andreas Vogelsang

Hi,

trying to set up a HA NFSv4 Cluster I found some problems in the last year.

1) The cluster has to release the existing locks after a failover. This is 
only possible in recent kernels. I do not know if this is possible in CentOS.

2) Only recent exportfs resource agent gives you the possibility to release 
the locks. Perhaps you would have to compile your own cluster-agents package.

3) Fedora 16 (which comes with a recent kernel) moved from the init system to 
systemd. pacemaker cannot handle systems. So you cannot start the nfs server 
under cluster control. You can work around this using the respan option of 
systemd.

4) Debian-based distribution seem to have problems with the locks. But I do 
not know exactly.

5) You could try with opensuse.

Please mail me about your advances. I am interested.

Greetings,

-- 
Dr. Michael Schwartzkopff
Guardinistr. 63
81375 München

Tel: (0163) 172 50 98
Fax: (089) 620 304 13


signature.asc
Description: This is a digitally signed message part.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org