Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?

2013-05-10 Thread Fabio M. Di Nitto
On 5/10/2013 1:57 AM, Andrew Beekhof wrote:
 
 On 10/05/2013, at 6:05 AM, Rainer Brestan rainer.bres...@gmx.net wrote:
 
 Hi Andrew,
 yes, this clarifies a lot.
 Seems that it is really time to throw away the plugin.
 The CMAN solution wont be able (at least from the documentation) to attach 
 new nodes without reconfiguration and restart CMAN on the existing nodes
 
 That doesn't sound right to me.
 CC'ing Fabio who should know more (or who does)
 

In cman you can add and remove nodes without restarting. You need to
change the configuration tho.

short version to add a node:

- edit cluster.conf to add the node (remember to bump config_version)
- either copy it across all nodes (including the new one)
  or use ccs_sync/ricci
- issue cman_tool version -r (-S if you did copy manually) to reload
  configuration without restart
- start cman on new node

short version to remove a node:

- stop cman on the node
- edit cluster.conf to drop the node
- propagate cluster.conf
- cman_tool version -r

note that if you are moving from 2 to 3+ node or from 3+ to 2 node, you
_must_ stop the cluster first. This is because some internal corosync
defaults are different and cannot be changed at runtime yet.

Also, when removing nodes, you have to ensure that you do not remove too
many nodes at the same time or you can lose quorum.

Fabio

 .
 The alternative is corosync 2.x.
 
 Not on RHEL6 - unless you're building things yourself of course.
 
 ClusterLabs has a quite long list of corosync versions from branch 2.0, 2.1, 
 2.2 und 2.3.
 Beside the current reported issue of version 2.3, which version does 
 ClusterLabs use for its regression test.
 I found somewhere a note for 2.1.x, is this true ?
 
 According to rpm, I've been using:
 
  Source RPM  : corosync-2.3.0-1.1.2c22.el7.src.rpm
 and
  Source RPM  : corosync-2.3.0-1.fc18.src.rpm
 
 
 
 Rainer
  
 Gesendet: Donnerstag, 09. Mai 2013 um 04:31 Uhr
 Von: Andrew Beekhof and...@beekhof.net
 An: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org
 Betreff: Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?

 On 08/05/2013, at 4:53 PM, Andrew Beekhof and...@beekhof.net wrote:


 On 08/05/2013, at 4:08 PM, Andrew Beekhof and...@beekhof.net wrote:


 On 03/05/2013, at 8:46 PM, Rainer Brestan rainer.bres...@gmx.net wrote:

 Now i have all the logs for some combinations.

 Corosync: 1.4.1-7 for all the tests on all nodes
 Base is always fresh installation of each node with all packages equal 
 except pacemaker version.
 int2node1 node id: 1743917066
 int2node2 node id: 1777471498

 In each ZIP file log from both nodes and the status output of crm_mon and 
 cibadmin -Q is included.

 1.) 1.1.8-4 attaches to running 1.1.7-6 cluster
 https://www.dropbox.com/s/06oyrle4ny47uv9/attach_1.1.8-4_to_1.1.7-6.zip
 Result: join outstanding

 2.) 1.1.9-2 attaches to running 1.1.7-6 cluster
 https://www.dropbox.com/s/fv5kcm2yb5jz56z/attach_1.1.9-2_to_1.1.7-6.zip
 Result: join outstanding

 Neither side is seeing anything from the other, which is very unexpected.
 I notice you're using the plugin... which acts as a message router.

 So I suspect something in there has changed (though I'm at a loss to say 
 what) and that cman based clusters are unaffected.

 Confirmed, cman clusters are unaffected.
 I'm yet to work out what changed in the plugin.

 I worked it out...

 The Red Hat changelog for 1.1.8-2 originally contained

 +- Cman is the only supported membership  quorum provider, do not ship the 
 corosync plugin

 When this decision was reversed (when I realised no-one was seeing the ERROR 
 logs indicating it was going away), I neglected to re-instate the following 
 distro specific patch (which avoided conflicts between the ID used by CMAN 
 and Pacemaker):

 diff --git a/configure.ac b/configure.ac
 index a3784d5..dafa9e2 100644
 --- a/configure.ac
 +++ b/configure.ac
 @@ -1133,7 +1133,7 @@ AC_MSG_CHECKING(for native corosync)
 COROSYNC_LIBS=
 CS_USES_LIBQB=0

 -PCMK_SERVICE_ID=9
 +PCMK_SERVICE_ID=10
 LCRSODIR=$libdir

 if test $SUPPORT_CS = no; then


 So Pacemaker  6.4 is talking on slot 10, while Pacemaker == 6.4 is using 
 slot 9.
 This is why the two versions cannot see each other :-(
 I'm very sorry.


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo

Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?

2013-05-09 Thread Rainer Brestan

Hi Andrew,

yes, this clarifies a lot.

Seems that it is really time to throw away the plugin.

The CMAN solution wont be able (at least from the documentation) to attach new nodes without reconfiguration and restart CMAN on the existing nodes.

The alternative is corosync 2.x.

ClusterLabs has a quite long list of corosync versions from branch 2.0, 2.1, 2.2 und 2.3.

Beside the current reported issue of version 2.3, which version does ClusterLabs use for its regression test.

I found somewhere a note for 2.1.x, is this true ?

Rainer



Gesendet:Donnerstag, 09. Mai 2013 um 04:31 Uhr
Von:Andrew Beekhof and...@beekhof.net
An:The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org
Betreff:Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?


On 08/05/2013, at 4:53 PM, Andrew Beekhof and...@beekhof.net wrote:


 On 08/05/2013, at 4:08 PM, Andrew Beekhof and...@beekhof.net wrote:


 On 03/05/2013, at 8:46 PM, Rainer Brestan rainer.bres...@gmx.net wrote:

 Now i have all the logs for some combinations.

 Corosync: 1.4.1-7 for all the tests on all nodes
 Base is always fresh installation of each node with all packages equal except pacemaker version.
 int2node1 node id: 1743917066
 int2node2 node id: 1777471498

 In each ZIP file log from both nodes and the status output of crm_mon and cibadmin -Q is included.

 1.) 1.1.8-4 attaches to running 1.1.7-6 cluster
 https://www.dropbox.com/s/06oyrle4ny47uv9/attach_1.1.8-4_to_1.1.7-6.zip
 Result: join outstanding

 2.) 1.1.9-2 attaches to running 1.1.7-6 cluster
 https://www.dropbox.com/s/fv5kcm2yb5jz56z/attach_1.1.9-2_to_1.1.7-6.zip
 Result: join outstanding

 Neither side is seeing anything from the other, which is very unexpected.
 I notice youre using the plugin... which acts as a message router.

 So I suspect something in there has changed (though Im at a loss to say what) and that cman based clusters are unaffected.

 Confirmed, cman clusters are unaffected.
 Im yet to work out what changed in the plugin.

I worked it out...

The Red Hat changelog for 1.1.8-2 originally contained

+- Cman is the only supported membership  quorum provider, do not ship the corosync plugin

When this decision was reversed (when I realised no-one was seeing the ERROR logs indicating it was going away), I neglected to re-instate the following distro specific patch (which avoided conflicts between the ID used by CMAN and Pacemaker):

diff --git a/configure.ac b/configure.ac
index a3784d5..dafa9e2 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1133,7 +1133,7 @@ AC_MSG_CHECKING(for native corosync)
COROSYNC_LIBS=
CS_USES_LIBQB=0

-PCMK_SERVICE_ID=9
+PCMK_SERVICE_ID=10
LCRSODIR=libdir

if test SUPPORT_CS = no; then


So Pacemaker  6.4 is talking on slot 10, while Pacemaker == 6.4 is using slot 9.
This is why the two versions cannot see each other :-(
Im very sorry.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?

2013-05-09 Thread Andrew Beekhof

On 10/05/2013, at 6:05 AM, Rainer Brestan rainer.bres...@gmx.net wrote:

 Hi Andrew,
 yes, this clarifies a lot.
 Seems that it is really time to throw away the plugin.
 The CMAN solution wont be able (at least from the documentation) to attach 
 new nodes without reconfiguration and restart CMAN on the existing nodes

That doesn't sound right to me.
CC'ing Fabio who should know more (or who does)

 .
 The alternative is corosync 2.x.

Not on RHEL6 - unless you're building things yourself of course.

 ClusterLabs has a quite long list of corosync versions from branch 2.0, 2.1, 
 2.2 und 2.3.
 Beside the current reported issue of version 2.3, which version does 
 ClusterLabs use for its regression test.
 I found somewhere a note for 2.1.x, is this true ?

According to rpm, I've been using:

 Source RPM  : corosync-2.3.0-1.1.2c22.el7.src.rpm
and
 Source RPM  : corosync-2.3.0-1.fc18.src.rpm



 Rainer
  
 Gesendet: Donnerstag, 09. Mai 2013 um 04:31 Uhr
 Von: Andrew Beekhof and...@beekhof.net
 An: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org
 Betreff: Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?
 
 On 08/05/2013, at 4:53 PM, Andrew Beekhof and...@beekhof.net wrote:
 
 
  On 08/05/2013, at 4:08 PM, Andrew Beekhof and...@beekhof.net wrote:
 
 
  On 03/05/2013, at 8:46 PM, Rainer Brestan rainer.bres...@gmx.net wrote:
 
  Now i have all the logs for some combinations.
 
  Corosync: 1.4.1-7 for all the tests on all nodes
  Base is always fresh installation of each node with all packages equal 
  except pacemaker version.
  int2node1 node id: 1743917066
  int2node2 node id: 1777471498
 
  In each ZIP file log from both nodes and the status output of crm_mon and 
  cibadmin -Q is included.
 
  1.) 1.1.8-4 attaches to running 1.1.7-6 cluster
  https://www.dropbox.com/s/06oyrle4ny47uv9/attach_1.1.8-4_to_1.1.7-6.zip
  Result: join outstanding
 
  2.) 1.1.9-2 attaches to running 1.1.7-6 cluster
  https://www.dropbox.com/s/fv5kcm2yb5jz56z/attach_1.1.9-2_to_1.1.7-6.zip
  Result: join outstanding
 
  Neither side is seeing anything from the other, which is very unexpected.
  I notice you're using the plugin... which acts as a message router.
 
  So I suspect something in there has changed (though I'm at a loss to say 
  what) and that cman based clusters are unaffected.
 
  Confirmed, cman clusters are unaffected.
  I'm yet to work out what changed in the plugin.
 
 I worked it out...
 
 The Red Hat changelog for 1.1.8-2 originally contained
 
 +- Cman is the only supported membership  quorum provider, do not ship the 
 corosync plugin
 
 When this decision was reversed (when I realised no-one was seeing the ERROR 
 logs indicating it was going away), I neglected to re-instate the following 
 distro specific patch (which avoided conflicts between the ID used by CMAN 
 and Pacemaker):
 
 diff --git a/configure.ac b/configure.ac
 index a3784d5..dafa9e2 100644
 --- a/configure.ac
 +++ b/configure.ac
 @@ -1133,7 +1133,7 @@ AC_MSG_CHECKING(for native corosync)
 COROSYNC_LIBS=
 CS_USES_LIBQB=0
 
 -PCMK_SERVICE_ID=9
 +PCMK_SERVICE_ID=10
 LCRSODIR=$libdir
 
 if test $SUPPORT_CS = no; then
 
 
 So Pacemaker  6.4 is talking on slot 10, while Pacemaker == 6.4 is using 
 slot 9.
 This is why the two versions cannot see each other :-(
 I'm very sorry.
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?

2013-05-08 Thread Andrew Beekhof

On 03/05/2013, at 8:46 PM, Rainer Brestan rainer.bres...@gmx.net wrote:

 Now i have all the logs for some combinations.
  
 Corosync: 1.4.1-7 for all the tests on all nodes
 Base is always fresh installation of each node with all packages equal except 
 pacemaker version.
 int2node1 node id: 1743917066
 int2node2 node id: 1777471498
  
 In each ZIP file log from both nodes and the status output of crm_mon and 
 cibadmin -Q is  included.
  
 1.) 1.1.8-4 attaches to running 1.1.7-6 cluster
 https://www.dropbox.com/s/06oyrle4ny47uv9/attach_1.1.8-4_to_1.1.7-6.zip
 Result: join outstanding
  
 2.) 1.1.9-2 attaches to running 1.1.7-6 cluster
 https://www.dropbox.com/s/fv5kcm2yb5jz56z/attach_1.1.9-2_to_1.1.7-6.zip
 Result: join outstanding

Neither side is seeing anything from the other, which is very unexpected.
I notice you're using the plugin... which acts as a message router.

So I suspect something in there has changed (though I'm at a loss to say what) 
and that cman based clusters are unaffected.
Since you've already gone to a lot of effort, I've spent the afternoon putting 
1.1.7 onto one of my nodes to do some testing with.

I'll let you know what I discover.

  
 3.) 1.1.9-2 attaches to running 1.1.8-4 cluster
 https://www.dropbox.com/s/y9o4yo8g8ahwjga/attach_1.1.9-2_to_1.1.8-4.zip
 Result: join successful
  
 Rainer
 Gesendet: Freitag, 03. Mai 2013 um 01:30 Uhr
 Von: Andrew Beekhof and...@beekhof.net
 An: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org
 Betreff: Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?
 
 On 03/05/2013, at 4:46 AM, Rainer Brestan rainer.bres...@gmx.net wrote:
 
  Hi Lars,
  i have tried 1.1.9-2 from download area at clusterlabs for RHEL6 with 
  corosync 1.4.1-17, also running with 1.1.7-6 at the other node.
  I have to go deeper in details later on (with logs), but the first try was 
  worse than 1.1.8-7.
  When the node with 1.1.9-2 joins the cluster, it could not even decode the 
  ais_message to get the node name of the node running 1.1.7-6.
 
 Logs?
 
  It states a new node has joined with the correct node id, but as name it 
  could only decode (null) as node name.
 
  Just as first impression.
 
  Rainer
 
  Gesendet: Dienstag, 30. April 2013 um 17:16 Uhr
  Von: Lars Marowsky-Bree l...@suse.com
  An: pacemaker@oss.clusterlabs.org
  Betreff: Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?
  On 2013-04-24T11:44:57, Rainer Brestan rainer.bres...@gmx.net wrote:
 
   Current DC: int2node2 - partition WITHOUT quorum
   Version: 1.1.8-7.el6-394e906
 
  This may not be the answer you want, since it is fairly unspecific. But
  I think we noticed something similar when we pulled in 1.1.8, I don't
  recall the bug number, but I *think* it worked out with a later git
  version.
 
  Can you try a newer build than 1.1.8?
 
 
  Regards,
  Lars
 
  --
  Architect Storage/HA
  SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, 
  HRB 21284 (AG Nürnberg)
  Experience is the name everyone gives to their mistakes. -- Oscar Wilde
 
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?

2013-05-08 Thread Andrew Beekhof

On 08/05/2013, at 4:08 PM, Andrew Beekhof and...@beekhof.net wrote:

 
 On 03/05/2013, at 8:46 PM, Rainer Brestan rainer.bres...@gmx.net wrote:
 
 Now i have all the logs for some combinations.
 
 Corosync: 1.4.1-7 for all the tests on all nodes
 Base is always fresh installation of each node with all packages equal 
 except pacemaker version.
 int2node1 node id: 1743917066
 int2node2 node id: 1777471498
 
 In each ZIP file log from both nodes and the status output of crm_mon and 
 cibadmin -Q is  included.
 
 1.) 1.1.8-4 attaches to running 1.1.7-6 cluster
 https://www.dropbox.com/s/06oyrle4ny47uv9/attach_1.1.8-4_to_1.1.7-6.zip
 Result: join outstanding
 
 2.) 1.1.9-2 attaches to running 1.1.7-6 cluster
 https://www.dropbox.com/s/fv5kcm2yb5jz56z/attach_1.1.9-2_to_1.1.7-6.zip
 Result: join outstanding
 
 Neither side is seeing anything from the other, which is very unexpected.
 I notice you're using the plugin... which acts as a message router.
 
 So I suspect something in there has changed (though I'm at a loss to say 
 what) and that cman based clusters are unaffected.

Confirmed, cman clusters are unaffected.
I'm yet to work out what changed in the plugin.

 Since you've already gone to a lot of effort, I've spent the afternoon 
 putting 1.1.7 onto one of my nodes to do some testing with.
 
 I'll let you know what I discover.
 
 
 3.) 1.1.9-2 attaches to running 1.1.8-4 cluster
 https://www.dropbox.com/s/y9o4yo8g8ahwjga/attach_1.1.9-2_to_1.1.8-4.zip
 Result: join successful
 
 Rainer
 Gesendet: Freitag, 03. Mai 2013 um 01:30 Uhr
 Von: Andrew Beekhof and...@beekhof.net
 An: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org
 Betreff: Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?
 
 On 03/05/2013, at 4:46 AM, Rainer Brestan rainer.bres...@gmx.net wrote:
 
 Hi Lars,
 i have tried 1.1.9-2 from download area at clusterlabs for RHEL6 with 
 corosync 1.4.1-17, also running with 1.1.7-6 at the other node.
 I have to go deeper in details later on (with logs), but the first try was 
 worse than 1.1.8-7.
 When the node with 1.1.9-2 joins the cluster, it could not even decode the 
 ais_message to get the node name of the node running 1.1.7-6.
 
 Logs?
 
 It states a new node has joined with the correct node id, but as name it 
 could only decode (null) as node name.
 
 Just as first impression.
 
 Rainer
 
 Gesendet: Dienstag, 30. April 2013 um 17:16 Uhr
 Von: Lars Marowsky-Bree l...@suse.com
 An: pacemaker@oss.clusterlabs.org
 Betreff: Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?
 On 2013-04-24T11:44:57, Rainer Brestan rainer.bres...@gmx.net wrote:
 
 Current DC: int2node2 - partition WITHOUT quorum
 Version: 1.1.8-7.el6-394e906
 
 This may not be the answer you want, since it is fairly unspecific. But
 I think we noticed something similar when we pulled in 1.1.8, I don't
 recall the bug number, but I *think* it worked out with a later git
 version.
 
 Can you try a newer build than 1.1.8?
 
 
 Regards,
 Lars
 
 --
 Architect Storage/HA
 SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, 
 HRB 21284 (AG Nürnberg)
 Experience is the name everyone gives to their mistakes. -- Oscar Wilde
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?

2013-05-08 Thread Andrew Beekhof

On 08/05/2013, at 4:53 PM, Andrew Beekhof and...@beekhof.net wrote:

 
 On 08/05/2013, at 4:08 PM, Andrew Beekhof and...@beekhof.net wrote:
 
 
 On 03/05/2013, at 8:46 PM, Rainer Brestan rainer.bres...@gmx.net wrote:
 
 Now i have all the logs for some combinations.
 
 Corosync: 1.4.1-7 for all the tests on all nodes
 Base is always fresh installation of each node with all packages equal 
 except pacemaker version.
 int2node1 node id: 1743917066
 int2node2 node id: 1777471498
 
 In each ZIP file log from both nodes and the status output of crm_mon and 
 cibadmin -Q is  included.
 
 1.) 1.1.8-4 attaches to running 1.1.7-6 cluster
 https://www.dropbox.com/s/06oyrle4ny47uv9/attach_1.1.8-4_to_1.1.7-6.zip
 Result: join outstanding
 
 2.) 1.1.9-2 attaches to running 1.1.7-6 cluster
 https://www.dropbox.com/s/fv5kcm2yb5jz56z/attach_1.1.9-2_to_1.1.7-6.zip
 Result: join outstanding
 
 Neither side is seeing anything from the other, which is very unexpected.
 I notice you're using the plugin... which acts as a message router.
 
 So I suspect something in there has changed (though I'm at a loss to say 
 what) and that cman based clusters are unaffected.
 
 Confirmed, cman clusters are unaffected.
 I'm yet to work out what changed in the plugin.

I worked it out...

The Red Hat changelog for 1.1.8-2 originally contained

+- Cman is the only supported membership  quorum provider, do not ship the 
corosync plugin

When this decision was reversed (when I realised no-one was seeing the ERROR 
logs indicating it was going away), I neglected to re-instate the following 
distro specific patch (which avoided conflicts between the ID used by CMAN and 
Pacemaker):

diff --git a/configure.ac b/configure.ac
index a3784d5..dafa9e2 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1133,7 +1133,7 @@ AC_MSG_CHECKING(for native corosync)
 COROSYNC_LIBS=
 CS_USES_LIBQB=0
 
-PCMK_SERVICE_ID=9
+PCMK_SERVICE_ID=10
 LCRSODIR=$libdir
 
 if test $SUPPORT_CS = no; then


So Pacemaker  6.4 is talking on slot 10, while Pacemaker == 6.4 is using slot 
9.
This is why the two versions cannot see each other :-(
I'm very sorry.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?

2013-05-03 Thread Rainer Brestan

Now i have all the logs for some combinations.



Corosync: 1.4.1-7 for all the tests on all nodes

Base is always fresh installation of each node with all packages equal except pacemaker version.

int2node1 node id: 1743917066
int2node2 node id: 1777471498




In each ZIP file log from both nodes and the status output of crm_mon and cibadmin -Q is included.



1.) 1.1.8-4 attaches to running 1.1.7-6 cluster

https://www.dropbox.com/s/06oyrle4ny47uv9/attach_1.1.8-4_to_1.1.7-6.zip

Result: join outstanding



2.) 1.1.9-2 attaches to running 1.1.7-6 cluster

https://www.dropbox.com/s/fv5kcm2yb5jz56z/attach_1.1.9-2_to_1.1.7-6.zip

Result: join outstanding



3.) 1.1.9-2 attaches to running 1.1.8-4 cluster

https://www.dropbox.com/s/y9o4yo8g8ahwjga/attach_1.1.9-2_to_1.1.8-4.zip

Result: join successful



Rainer


Gesendet:Freitag, 03. Mai 2013 um 01:30 Uhr
Von:Andrew Beekhof and...@beekhof.net
An:The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org
Betreff:Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?


On 03/05/2013, at 4:46 AM, Rainer Brestan rainer.bres...@gmx.net wrote:

 Hi Lars,
 i have tried 1.1.9-2 from download area at clusterlabs for RHEL6 with corosync 1.4.1-17, also running with 1.1.7-6 at the other node.
 I have to go deeper in details later on (with logs), but the first try was worse than 1.1.8-7.
 When the node with 1.1.9-2 joins the cluster, it could not even decode the ais_message to get the node name of the node running 1.1.7-6.

Logs?

 It states a new node has joined with the correct node id, but as name it could only decode (null) as node name.

 Just as first impression.

 Rainer

 Gesendet: Dienstag, 30. April 2013 um 17:16 Uhr
 Von: Lars Marowsky-Bree l...@suse.com
 An: pacemaker@oss.clusterlabs.org
 Betreff: Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?
 On 2013-04-24T11:44:57, Rainer Brestan rainer.bres...@gmx.net wrote:

  Current DC: int2node2 - partition WITHOUT quorum
  Version: 1.1.8-7.el6-394e906

 This may not be the answer you want, since it is fairly unspecific. But
 I think we noticed something similar when we pulled in 1.1.8, I dont
 recall the bug number, but I *think* it worked out with a later git
 version.

 Can you try a newer build than 1.1.8?


 Regards,
 Lars

 --
 Architect Storage/HA
 SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendrffer, HRB 21284 (AG Nrnberg)
 Experience is the name everyone gives to their mistakes. -- Oscar Wilde


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?

2013-05-02 Thread Rainer Brestan

Hi Lars,

i have tried 1.1.9-2 from download area at clusterlabs for RHEL6 with corosync 1.4.1-17, also running with 1.1.7-6 at the other node.

I have to go deeper in details later on (with logs), but the first try was worse than 1.1.8-7.

When the node with 1.1.9-2 joins the cluster, it could not even decode the ais_message to get the node name of the node running 1.1.7-6.

It states a new node has joined with the correct node id, but as name it could only decode (null) as node name.



Just as first impression.



Rainer




Gesendet:Dienstag, 30. April 2013 um 17:16 Uhr


Von:Lars Marowsky-Bree l...@suse.com
An:pacemaker@oss.clusterlabs.org
Betreff:Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?

On 2013-04-24T11:44:57, Rainer Brestan rainer.bres...@gmx.net wrote:

 Current DC: int2node2 - partition WITHOUT quorum
 Version: 1.1.8-7.el6-394e906

This may not be the answer you want, since it is fairly unspecific. But
I think we noticed something similar when we pulled in 1.1.8, I dont
recall the bug number, but I *think* it worked out with a later git
version.

Can you try a newer build than 1.1.8?


Regards,
Lars

--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendrffer, HRB 21284 (AG Nrnberg)
Experience is the name everyone gives to their mistakes. -- Oscar Wilde


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?

2013-05-02 Thread Andrew Beekhof

On 03/05/2013, at 4:46 AM, Rainer Brestan rainer.bres...@gmx.net wrote:

 Hi Lars,
 i have tried 1.1.9-2 from download area at clusterlabs for RHEL6 with 
 corosync 1.4.1-17, also running with 1.1.7-6 at the other node.
 I have to go deeper in details later on (with logs), but the first try was 
 worse than 1.1.8-7.
 When the node with 1.1.9-2 joins the cluster, it could not even decode the 
 ais_message to get the node name of the node running 1.1.7-6.

Logs?

 It states a new node has joined with the correct node id, but as name it 
 could only decode (null) as node name.
  
 Just as first impression.
  
 Rainer
  
 Gesendet: Dienstag, 30. April 2013 um 17:16 Uhr
 Von: Lars Marowsky-Bree l...@suse.com
 An: pacemaker@oss.clusterlabs.org
 Betreff: Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?
 On 2013-04-24T11:44:57, Rainer Brestan rainer.bres...@gmx.net wrote:
 
  Current DC: int2node2 - partition WITHOUT quorum
  Version: 1.1.8-7.el6-394e906
 
 This may not be the answer you want, since it is fairly unspecific. But
 I think we noticed something similar when we pulled in 1.1.8, I don't
 recall the bug number, but I *think* it worked out with a later git
 version.
 
 Can you try a newer build than 1.1.8?
 
 
 Regards,
 Lars
 
 --
 Architect Storage/HA
 SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, 
 HRB 21284 (AG Nürnberg)
 Experience is the name everyone gives to their mistakes. -- Oscar Wilde
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?

2013-04-30 Thread Lars Marowsky-Bree
On 2013-04-24T11:44:57, Rainer Brestan rainer.bres...@gmx.net wrote:

 Current DC: int2node2 - partition WITHOUT quorum
 Version: 1.1.8-7.el6-394e906

This may not be the answer you want, since it is fairly unspecific. But
I think we noticed something similar when we pulled in 1.1.8, I don't
recall the bug number, but I *think* it worked out with a later git
version.

Can you try a newer build than 1.1.8?


Regards,
Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
21284 (AG Nürnberg)
Experience is the name everyone gives to their mistakes. -- Oscar Wilde


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?

2013-04-29 Thread Andrew Beekhof

On 24/04/2013, at 7:44 PM, Rainer Brestan rainer.bres...@gmx.net wrote:

 Pacemaker log of int2node2 with trace setting.
 https://www.dropbox.com/s/04ciy2g6dfbauxy/pacemaker.log?n=165978094
 On int2node1 (1.1.7) the trace setting did not create the pacemaker.log file.
  

Ah, yes, 1.1.7 wasn't so smart yet.
Can you make sure there is a logfile specified in corosync.conf?
Looking at the node2 logs was useful (nothing is arriving from node1) but I 
really need to see node1's logs.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?

2013-04-24 Thread Rainer Brestan

I have tried to make this test, because I had the same problem.



Origin:

One node cluster, node int2node1 running with IP address 10.16.242.231, quorum ignore, DC int2node1




[root@int2node1 sysconfig]# crm_mon -1

Last updated: Wed Apr 24 09:49:32 2013
Last change: Wed Apr 24 09:44:55 2013 via crm_resource on int2node1
Stack: openais
Current DC: int2node1 - partition WITHOUT quorum
Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
1 Nodes configured, 2 expected votes
1 Resources configured.


Online: [ int2node1 ]

Clone Set: cloneSysInfo [resSysInfo]
 Started: [ int2node1 ]



Next step:

Node int2node2 with IP address 10.16.242.233 joins the cluster.



Result:




[root@int2node1 sysconfig]# crm_mon -1

Last updated: Wed Apr 24 10:14:18 2013
Last change: Wed Apr 24 10:05:20 2013 via crmd on int2node1
Stack: openais
Current DC: int2node1 - partition WITHOUT quorum
Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
2 Nodes configured, 2 expected votes
1 Resources configured.


Online: [ int2node1 ]
OFFLINE: [ int2node2 ]

Clone Set: cloneSysInfo [resSysInfo]
 Started: [ int2node1 ]




[root@int2node1 sysconfig]# corosync-objctl  grep member
runtime.totem.pg.mrp.srp.members.1743917066.ip=r(0) ip(10.16.242.231)
runtime.totem.pg.mrp.srp.members.1743917066.join_count=1
runtime.totem.pg.mrp.srp.members.1743917066.status=joined
runtime.totem.pg.mrp.srp.members.1777471498.ip=r(0) ip(10.16.242.233)
runtime.totem.pg.mrp.srp.members.1777471498.join_count=1
runtime.totem.pg.mrp.srp.members.1777471498.status=joined




[root@int2node1 sysconfig]# crm_node -l
1743917066 int2node1 member




[root@int2node2 ~]# crm_mon -1
Last updated: Wed Apr 24 11:27:39 2013
Last change: Wed Apr 24 10:07:45 2013 via crm_resource on int2node2
Stack: classic openais (with plugin)
Current DC: int2node2 - partition WITHOUT quorum
Version: 1.1.8-7.el6-394e906
2 Nodes configured, 2 expected votes
1 Resources configured.


Online: [ int2node2 ]
OFFLINE: [ int2node1 ]

Clone Set: cloneSysInfo [resSysInfo]
 Started: [ int2node2 ]




[root@int2node2 ~]# corosync-objctl  grep member
runtime.totem.pg.mrp.srp.members.1743917066.ip=r(0) ip(10.16.242.231)
runtime.totem.pg.mrp.srp.members.1743917066.join_count=1
runtime.totem.pg.mrp.srp.members.1743917066.status=joined
runtime.totem.pg.mrp.srp.members.1777471498.ip=r(0) ip(10.16.242.233)
runtime.totem.pg.mrp.srp.members.1777471498.join_count=1
runtime.totem.pg.mrp.srp.members.1777471498.status=joined




[root@int2node2 ~]# crm_node -l
1777471498 int2node2 member



Pacemaker log of int2node2 with trace setting.

https://www.dropbox.com/s/04ciy2g6dfbauxy/pacemaker.log?n=165978094

On int2node1 (1.1.7) the trace setting did not create the pacemaker.log file.











Below the excerpt of cib with node information from int2node2.

[root@int2node2 ~]# cibadmin -Q
cib epoch=17 num_updates=51 admin_epoch=0 validate-with=pacemaker-1.2 crm_feature_set=3.0.7 update-origin=int2node2 update-client=crm_resource cib-last-written=Wed Apr 24 10:07:45 2013 have-quorum=0 dc-uuid=int2node2
 configuration
 crm_config
 cluster_property_set id=cib-bootstrap-options
 ...
 /cluster_property_set
 /crm_config
 nodes
 node id=int2node2 uname=int2node2/
 node id=int2node1 uname=int2node1/
 /nodes
 resources
 ...
 /resources
 rsc_defaults
 ...
 /rsc_defaults
 /configuration
 status
 node_state id=int2node2 uname=int2node2 in_ccm=true crmd=online crm-debug-origin=do_update_resource join=member expected=member
 transient_attributes id=int2node2
 instance_attributes id=status-int2node2
 ...
 /instance_attributes
 /transient_attributes
 lrm id=int2node2
 lrm_resources
 ...
 /lrm_resources
 /lrm
 /node_state
 node_state id=int2node1 uname=int2node1 in_ccm=true crmd=online join=down crm-debug-origin=do_state_transition/
 /status
/cib


On int2node2 the node state in the cib is different.


 status
 node_state id=int2node1 uname=int2node1 ha=active in_ccm=true crmd=online join=member expected=member crm-debug-origin=do_state_transition shutdown=0
 transient_attributes id=int2node1


 /transient_attributes
 lrm id=int2node1
 lrm_resources


 ...
 /lrm_resources
 /lrm
 /node_state
 node_state id=int2node2 uname=int2node2 crmd=online crm-debug-origin=do_state_transition ha=active in_ccm=true join=pending/
 /status






Rainer


Gesendet:Mittwoch, 17. April 2013 um 07:32 Uhr
Von:Andrew Beekhof and...@beekhof.net
An:The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org
Betreff:Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?


On 15/04/2013, at 7:08 PM, Pavlos Parissis pavlos.paris...@gmail.com wrote:

 Hoi,

 I upgraded 1st node and here are the logs
 https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node1.debuglog
 https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node2.debuglog

 Enabling tracing on the mentioned functions didnt give at least to me any more information.

10:22:08 pacemakerd[53588]: notice: 

Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?

2013-04-16 Thread Andrew Beekhof

On 15/04/2013, at 7:08 PM, Pavlos Parissis pavlos.paris...@gmail.com wrote:

 Hoi,
 
 I upgraded 1st node and here are the logs
 https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node1.debuglog
 https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node2.debuglog
 
 Enabling tracing on the mentioned functions didn't give at least to me any 
 more information.

10:22:08 pacemakerd[53588]:   notice: crm_add_logfile: Additional logging 
available in /var/log/pacemaker.log

Thats the file(s) we need :)

 
 Cheers,
 Pavlos
 
 
 On 15 April 2013 01:42, Andrew Beekhof and...@beekhof.net wrote:
 
 On 15/04/2013, at 7:31 AM, Pavlos Parissis pavlos.paris...@gmail.com wrote:
 
  On 12/04/2013 09:37 μμ, Pavlos Parissis wrote:
  Hoi,
 
  As I wrote to another post[1] I failed to upgrade to 1.1.8 for a 2 node
  cluster.
 
  Before the upgrade process both nodes are using CentOS 6.3, corosync
  1.4.1-7 and pacemaker-1.1.7.
 
  I followed the rolling upgrade process, so I stopped pacemaker and then
  corosync on node1 and upgraded to CentOS 6.4. The OS upgrade upgrades
  also pacemaker to 1.1.8-7 and corosync to 1.4.1-15.
  The upgrade of rpms went smoothly as I knew about the crmsh issue so I
  made sure I had crmsh rpm on my repos.
 
  Corosync started without any problems and both nodes could see each
  other[2]. But for some reason node2 failed to receive a reply on join
  offer from node1 and node1 never joined the cluster. Node1 formed a new
  cluster as it never got an reply from node2, so I ended up with a
  split-brain situation.
 
  Logs of node1 can be found here
  https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node1.log
  and of node2 here
  https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node2.log
 
 
  Doing a Disconnect  Reattach upgrade of both nodes at the same time
  brings me a working 1.1.8 cluster. Any attempt to make a 1.1.8 node to
  join a cluster with a 1.1.7 failed.
 
 There wasn't enough detail in the logs to suggest a solution, but if you add 
 the following to /etc/sysconfig/pacemaker and re-test, it might shed some 
 additional light on the problem.
 
 export PCMK_trace_functions=ais_dispatch_message
 
 Certainly there was no intention to make them incompatible.
 
 
  Cheers,
  Pavlos
 
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?

2013-04-15 Thread Pavlos Parissis
Hoi,

I upgraded 1st node and here are the logs
https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node1.debuglog
https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node2.debuglog

Enabling tracing on the mentioned functions didn't give at least to me any
more information.

Cheers,
Pavlos


On 15 April 2013 01:42, Andrew Beekhof and...@beekhof.net wrote:


 On 15/04/2013, at 7:31 AM, Pavlos Parissis pavlos.paris...@gmail.com
 wrote:

  On 12/04/2013 09:37 μμ, Pavlos Parissis wrote:
  Hoi,
 
  As I wrote to another post[1] I failed to upgrade to 1.1.8 for a 2 node
  cluster.
 
  Before the upgrade process both nodes are using CentOS 6.3, corosync
  1.4.1-7 and pacemaker-1.1.7.
 
  I followed the rolling upgrade process, so I stopped pacemaker and then
  corosync on node1 and upgraded to CentOS 6.4. The OS upgrade upgrades
  also pacemaker to 1.1.8-7 and corosync to 1.4.1-15.
  The upgrade of rpms went smoothly as I knew about the crmsh issue so I
  made sure I had crmsh rpm on my repos.
 
  Corosync started without any problems and both nodes could see each
  other[2]. But for some reason node2 failed to receive a reply on join
  offer from node1 and node1 never joined the cluster. Node1 formed a new
  cluster as it never got an reply from node2, so I ended up with a
  split-brain situation.
 
  Logs of node1 can be found here
  https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node1.log
  and of node2 here
  https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node2.log
 
 
  Doing a Disconnect  Reattach upgrade of both nodes at the same time
  brings me a working 1.1.8 cluster. Any attempt to make a 1.1.8 node to
  join a cluster with a 1.1.7 failed.

 There wasn't enough detail in the logs to suggest a solution, but if you
 add the following to /etc/sysconfig/pacemaker and re-test, it might shed
 some additional light on the problem.

 export PCMK_trace_functions=ais_dispatch_message

 Certainly there was no intention to make them incompatible.

 
  Cheers,
  Pavlos
 
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?

2013-04-14 Thread Pavlos Parissis
On 12/04/2013 09:37 μμ, Pavlos Parissis wrote:
 Hoi,
 
 As I wrote to another post[1] I failed to upgrade to 1.1.8 for a 2 node
 cluster.
 
 Before the upgrade process both nodes are using CentOS 6.3, corosync
 1.4.1-7 and pacemaker-1.1.7.
 
 I followed the rolling upgrade process, so I stopped pacemaker and then
 corosync on node1 and upgraded to CentOS 6.4. The OS upgrade upgrades
 also pacemaker to 1.1.8-7 and corosync to 1.4.1-15.
 The upgrade of rpms went smoothly as I knew about the crmsh issue so I
 made sure I had crmsh rpm on my repos.
 
 Corosync started without any problems and both nodes could see each
 other[2]. But for some reason node2 failed to receive a reply on join
 offer from node1 and node1 never joined the cluster. Node1 formed a new
 cluster as it never got an reply from node2, so I ended up with a
 split-brain situation.
 
 Logs of node1 can be found here
 https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node1.log
 and of node2 here
 https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node2.log


Doing a Disconnect  Reattach upgrade of both nodes at the same time
brings me a working 1.1.8 cluster. Any attempt to make a 1.1.8 node to
join a cluster with a 1.1.7 failed.

Cheers,
Pavlos




signature.asc
Description: OpenPGP digital signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?

2013-04-14 Thread Andrew Beekhof

On 15/04/2013, at 7:31 AM, Pavlos Parissis pavlos.paris...@gmail.com wrote:

 On 12/04/2013 09:37 μμ, Pavlos Parissis wrote:
 Hoi,
 
 As I wrote to another post[1] I failed to upgrade to 1.1.8 for a 2 node
 cluster.
 
 Before the upgrade process both nodes are using CentOS 6.3, corosync
 1.4.1-7 and pacemaker-1.1.7.
 
 I followed the rolling upgrade process, so I stopped pacemaker and then
 corosync on node1 and upgraded to CentOS 6.4. The OS upgrade upgrades
 also pacemaker to 1.1.8-7 and corosync to 1.4.1-15.
 The upgrade of rpms went smoothly as I knew about the crmsh issue so I
 made sure I had crmsh rpm on my repos.
 
 Corosync started without any problems and both nodes could see each
 other[2]. But for some reason node2 failed to receive a reply on join
 offer from node1 and node1 never joined the cluster. Node1 formed a new
 cluster as it never got an reply from node2, so I ended up with a
 split-brain situation.
 
 Logs of node1 can be found here
 https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node1.log
 and of node2 here
 https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node2.log
 
 
 Doing a Disconnect  Reattach upgrade of both nodes at the same time
 brings me a working 1.1.8 cluster. Any attempt to make a 1.1.8 node to
 join a cluster with a 1.1.7 failed.

There wasn't enough detail in the logs to suggest a solution, but if you add 
the following to /etc/sysconfig/pacemaker and re-test, it might shed some 
additional light on the problem.

export PCMK_trace_functions=ais_dispatch_message

Certainly there was no intention to make them incompatible.

 
 Cheers,
 Pavlos
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org