Re: [Pacemaker] Unable to configure Pacemaker with cibadmin

2011-08-03 Thread Andrew Beekhof
On Tue, Aug 2, 2011 at 11:12 AM, Kelly Wong  wrote:
> It waits for a pretty long time, at least 30 seconds.  The weird thing is if
> I use cibadmin —replace —xml-file cib.xml, it’s able to update the cib, but
> if I try to replace a particular scope, it fails.

Thats really weird.
Can you grab me the logs from the DC node from about the time you ran
the cibadmin command?

>
> Kelly Wong
> TXBU - Cisco Systems
>
>
> On 7/31/11 7:00 PM, "Andrew Beekhof"  wrote:
>
> On Sat, Jul 23, 2011 at 11:57 AM, Kelly Wong  wrote:
>> Hello,
>>
>> I am trying to update the configuration of my cluster through the cibadmin
>> command, but the command always fails:
>>
>> cibadmin --replace --scope resources --xml-file r.xml
>> Call cib_replace failed (-41): Remote node did not respond
>
> That error is triggered by a timeout.  How long does the command wait
> before returning this error?
>
>> 
>>
>> I was able to replace the initial blank configuration, but updating it
>> doesn’t seem to work.  The cluster is functioning and running some of the
>> resources.  Some of the are down, but I don’t think that should make a
>> difference:
>>
>> 
>> Last updated: Fri Jul 22 18:33:03 2011
>> Stack: openais
>> Current DC: poc-tst-rh4 - partition with quorum
>> Version: 1.0.9-89bd754939df5150de7cd76835f98fe90851b677
>> 2 Nodes configured, 2 expected votes
>> 3 Resources configured.
>> 
>>
>> Online: [ poc-tst-rh4 poc-tst-rh4-2 ]
>>
>>  Resource Group: mysql
>>  fs_mysql    (ocf::heartbeat:Filesystem):    Started poc-tst-rh4
>>  mysqld    (ocf::heartbeat:mysql):    Stopped
>>  Master/Slave Set: ms_drbd_mysql
>>  Masters: [ poc-tst-rh4 ]
>>  Slaves: [ poc-tst-rh4-2 ]
>>  Clone Set: pingclone
>>  Started: [ poc-tst-rh4-2 poc-tst-rh4 ]
>>
>> Failed actions:
>> mysqld_start_0 (node=poc-tst-rh4, call=26, rc=5, status=complete): not
>> installed
>> fs_mysql_start_0 (node=poc-tst-rh4-2, call=31, rc=5, status=complete):
>> not installed
>>
>> If I try to use the crm command line, it rejects any configuration changes
>> I
>> make:
>> crm configure edit
>> ERROR: could not replace mysql
>> INFO: offending xml: 
>> 
>> > value="Started"/>
>> 
>> > type="Filesystem">
>> 
>> > value="/dev/drbd0"/>
>> > name="directory" value="/var/lib/mysql/"/>
>> > value="ext3"/>
>> 
>> 
>> > timeout="60"/>
>> > timeout="60"/>
>> 
>> 
>> 
>> 
>> > value="/usr/bin/mysqld_safe"/>
>> > value="/var/lib/mysql/mysql.pid"/>
>> 
>> 
>> > timeout="60"/>
>> > timeout="240"/>
>> > timeout="240"/>
>> 
>> 
>> 
>>
>>
>> What could be causing the configuration to fail?
>>
>> Thank you for any assistance,
>> Kelly Wong
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs:
>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
>>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Resources are not restarted on definition change after f59d7460bdde (devel)

2011-08-03 Thread Andrew Beekhof
On Wed, Aug 3, 2011 at 7:35 PM, Vladislav Bogdanov  wrote:
> 01.08.2011 02:05, Andrew Beekhof wrote:
>> On Wed, Jul 27, 2011 at 11:46 AM, Andrew Beekhof  wrote:
>>> On Fri, Jul 1, 2011 at 4:59 PM, Andrew Beekhof  wrote:
 Hmm.  Interesting. I will investigate.
>>>
>>> This is an unfortunate side-effect of my history compression patch.
>>
>> Actually I'm mistaken on this.  There should be enough information in
>> the CIB to handle definition changes properly.
>> Could you reproduce and include a hb_report please?
>
> Just returned from vacations.
>
> Does 885007a1795e address this issue?
>

Quite possibly.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Live demo of Pacemaker Cloud on Fedora: Friday August 5th at 8am PST

2011-08-03 Thread Bob Schatz
Steven,

Are you planning on recording/taping it if I want to watch it later?

Thanks,

Bob



From: Steven Dake 
To: pcmk-cl...@oss.clusterlabs.org
Cc: aeolus-de...@lists.fedorahosted.org; Fedora Cloud SIG 
; "open...@lists.linux-foundation.org" 
; The Pacemaker cluster resource manager 

Sent: Wednesday, August 3, 2011 9:42 AM
Subject: [Pacemaker] Live demo of Pacemaker Cloud on Fedora: Friday August 5th 
at 8am PST

Extending a general invitation to the high availability communities and
other cloud community contributors to participate in a live demo I am
giving on Friday August 5th 8am PST (GMT-7).  Demo portion of session is
15 minutes and will be provided first followed by more details of our
approach to high availability.

I will use elluminate to show the demo on my desktop machine.  To make
elluminate work, you will need icedtea-web installed on your system
which is not typically installed by default.

You will also need a conference # and bridge code.  Please contact me
offlist with your location and I'll provide you with a hopefully toll
free conference # and bridge code.

Elluminate link:
https://sas.elluminate.com/m.jnlp?sid=819&password=M.13AB020AEBE358D265FD925A07335F

Bridge Code:  Please contact me off list with your location and I'll
respond back with dial-in information.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Talk about linux clusters

2011-08-03 Thread Michael Schwartzkopff
Hi,

I have the pleasure to deliver a talk about linux clusters tomorrow in Berlin. 
The talk will be in German.

For details please see:

http://www.guug.de/lokal/berlin/

Please feel free to attend if you have time.

Greetings,

-- 
Dr. Michael Schwartzkopff
Guardinistr. 63
81375 München

Tel: (0163) 172 50 98


signature.asc
Description: This is a digitally signed message part.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Live demo of Pacemaker Cloud on Fedora: Friday August 5th at 8am PST

2011-08-03 Thread Steven Dake
Extending a general invitation to the high availability communities and
other cloud community contributors to participate in a live demo I am
giving on Friday August 5th 8am PST (GMT-7).  Demo portion of session is
15 minutes and will be provided first followed by more details of our
approach to high availability.

I will use elluminate to show the demo on my desktop machine.  To make
elluminate work, you will need icedtea-web installed on your system
which is not typically installed by default.

You will also need a conference # and bridge code.  Please contact me
offlist with your location and I'll provide you with a hopefully toll
free conference # and bridge code.

Elluminate link:
https://sas.elluminate.com/m.jnlp?sid=819&password=M.13AB020AEBE358D265FD925A07335F

Bridge Code:  Please contact me off list with your location and I'll
respond back with dial-in information.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Reload action and stop/start sequence questions

2011-08-03 Thread Vladislav Bogdanov
27.07.2011 05:25, Andrew Beekhof wrote:
...
>> * Dependent resources should not be stopped/started for 'reload' action.
>> Of course they are restarted if reload fails and stop/start is executed
>> then. (I see that they are restarted now for reload of a resource they
>> depend on, is it a bug?)
> 
> More like a limitation.  Which is a round-a-bout way of saying "really
> hard to fix bug".
> You're welcome to create a BZ for it though, maybe one day I'll figure
> out how to resolve it.
> 
>> * (wish) Resources should be migrated out of node (if they support live
>> migration) for stop/start sequence of resource they depend on.
> 
> Migration can only occur if a resource at the bottom (excluding any
> clones) of the resource stack.
> In order to migrate any colocation dependancies need to be running at
> _both_ the old and the new locations.
> 
> This can only be true for resources that depend on clones.

Yep, I actually had clones in mind.

> 
>> * (wish) Redefinition of clones should be handled in a way which allows
>> dependent live-migratable resources to survive (if reload action for
>> clone instance either is not supported or fails).
> 
> This doesn't make sense.
> If the definition of one clone changes, then they all change and there
> is nowhere for dependant resources to migrate to.

Yes, I understand your point. That's why I marked this as a wish. It
would be a killer feature - serialization of clone instances restarts.

> 
>> That is: dependent
>> resources which support live migration are first tried to migrate out of
>> one node, and are stopped if migration fails. Then clone instance is
>> restarted on that node. Then the same procedure applies to next cluster
>> node so resources may return back to a first node.
>>
>> If above (at least first three points) is right, then is it possible to
>> get a set of previous instance parameters the same way new configuration
>> is passed (env vars), or RA should save that information itself in advance?
>>
>> Best,
>> Vladislav
>>
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: 
>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Resources are not restarted on definition change after f59d7460bdde (devel)

2011-08-03 Thread Vladislav Bogdanov
01.08.2011 02:05, Andrew Beekhof wrote:
> On Wed, Jul 27, 2011 at 11:46 AM, Andrew Beekhof  wrote:
>> On Fri, Jul 1, 2011 at 4:59 PM, Andrew Beekhof  wrote:
>>> Hmm.  Interesting. I will investigate.
>>
>> This is an unfortunate side-effect of my history compression patch.
> 
> Actually I'm mistaken on this.  There should be enough information in
> the CIB to handle definition changes properly.
> Could you reproduce and include a hb_report please?

Just returned from vacations.

Does 885007a1795e address this issue?

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Backup ring is marked faulty

2011-08-03 Thread Tegtmeier.Martin
Hello,

we have exactly the same issue! Same version of corosync (1.3.1), also running 
on SuSE Linux Enterprise Server 11 SP1 with HAE.

Aug 01 15:45:18 corosync [TOTEM ] Received ringid(172.20.16.2:308) seq 6a

Aug 01 15:45:18 corosync [TOTEM ] Received ringid(172.20.16.2:308) seq 63

Aug 01 15:45:18 corosync [TOTEM ] releasing messages up to and including 60

Aug 01 15:45:18 corosync [TOTEM ] releasing messages up to and including 6d

Aug 01 15:45:18 corosync [TOTEM ] Marking seqid 162 ringid 1 interface 10.2.2.6 
FAULTY - administrative intervention required.

rksaph06:/var/log/cluster # corosync-cfgtool -s

Printing ring status.

Local node ID 101717164

RING ID 0

id  = 172.20.16.6

status  = ring 0 active with no faults

RING ID 1

id  = 10.2.2.6

status  = Marking seqid 162 ringid 1 interface 10.2.2.6 FAULTY - 
administrative intervention required.



rrp_mode is set to "passive"
Ring 0 (172.20.16.0) supports 1GB and ring 1 (10.2.2.0) supports 100 MBit. 
There was no other network traffic on ring 1 - only corosync (!)

After re-activating both rings with "corosync-cfgtool -r" the problem is 
reproducable by simply connecting a crm_gui and hitting "refresh" inside the 
GUI 3-5 times. After that ring 1 (10.2.2.0) will be marked as "faulty" again.

Thanks and best regards,
  -Martin Tegtmeier




-Ursprüngliche Nachricht-
Von: Sebastian Kaps [mailto:sebastian.k...@imail.de]
Gesendet: Mi 03.08.2011 08:53
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] Backup ring is marked faulty
 
 Hi Steven!

 On Tue, 02 Aug 2011 17:45:46 -0700, Steven Dake wrote:
> Which version of corosync?

 # corosync -v
 Corosync Cluster Engine, version '1.3.1'
 Copyright (c) 2006-2009 Red Hat, Inc.

 It's the version that comes with SLES11-SP1-HA.

-- 
 Sebastian

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


<>___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Backup ring is marked faulty

2011-08-03 Thread Sebastian Kaps

Hi Steven!

On Tue, 02 Aug 2011 17:45:46 -0700, Steven Dake wrote:

Which version of corosync?


# corosync -v
Corosync Cluster Engine, version '1.3.1'
Copyright (c) 2006-2009 Red Hat, Inc.

It's the version that comes with SLES11-SP1-HA.

--
Sebastian

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker