Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?
On 5/10/2013 1:57 AM, Andrew Beekhof wrote: On 10/05/2013, at 6:05 AM, Rainer Brestan rainer.bres...@gmx.net wrote: Hi Andrew, yes, this clarifies a lot. Seems that it is really time to throw away the plugin. The CMAN solution wont be able (at least from the documentation) to attach new nodes without reconfiguration and restart CMAN on the existing nodes That doesn't sound right to me. CC'ing Fabio who should know more (or who does) In cman you can add and remove nodes without restarting. You need to change the configuration tho. short version to add a node: - edit cluster.conf to add the node (remember to bump config_version) - either copy it across all nodes (including the new one) or use ccs_sync/ricci - issue cman_tool version -r (-S if you did copy manually) to reload configuration without restart - start cman on new node short version to remove a node: - stop cman on the node - edit cluster.conf to drop the node - propagate cluster.conf - cman_tool version -r note that if you are moving from 2 to 3+ node or from 3+ to 2 node, you _must_ stop the cluster first. This is because some internal corosync defaults are different and cannot be changed at runtime yet. Also, when removing nodes, you have to ensure that you do not remove too many nodes at the same time or you can lose quorum. Fabio . The alternative is corosync 2.x. Not on RHEL6 - unless you're building things yourself of course. ClusterLabs has a quite long list of corosync versions from branch 2.0, 2.1, 2.2 und 2.3. Beside the current reported issue of version 2.3, which version does ClusterLabs use for its regression test. I found somewhere a note for 2.1.x, is this true ? According to rpm, I've been using: Source RPM : corosync-2.3.0-1.1.2c22.el7.src.rpm and Source RPM : corosync-2.3.0-1.fc18.src.rpm Rainer Gesendet: Donnerstag, 09. Mai 2013 um 04:31 Uhr Von: Andrew Beekhof and...@beekhof.net An: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org Betreff: Re: [Pacemaker] 1.1.8 not compatible with 1.1.7? On 08/05/2013, at 4:53 PM, Andrew Beekhof and...@beekhof.net wrote: On 08/05/2013, at 4:08 PM, Andrew Beekhof and...@beekhof.net wrote: On 03/05/2013, at 8:46 PM, Rainer Brestan rainer.bres...@gmx.net wrote: Now i have all the logs for some combinations. Corosync: 1.4.1-7 for all the tests on all nodes Base is always fresh installation of each node with all packages equal except pacemaker version. int2node1 node id: 1743917066 int2node2 node id: 1777471498 In each ZIP file log from both nodes and the status output of crm_mon and cibadmin -Q is included. 1.) 1.1.8-4 attaches to running 1.1.7-6 cluster https://www.dropbox.com/s/06oyrle4ny47uv9/attach_1.1.8-4_to_1.1.7-6.zip Result: join outstanding 2.) 1.1.9-2 attaches to running 1.1.7-6 cluster https://www.dropbox.com/s/fv5kcm2yb5jz56z/attach_1.1.9-2_to_1.1.7-6.zip Result: join outstanding Neither side is seeing anything from the other, which is very unexpected. I notice you're using the plugin... which acts as a message router. So I suspect something in there has changed (though I'm at a loss to say what) and that cman based clusters are unaffected. Confirmed, cman clusters are unaffected. I'm yet to work out what changed in the plugin. I worked it out... The Red Hat changelog for 1.1.8-2 originally contained +- Cman is the only supported membership quorum provider, do not ship the corosync plugin When this decision was reversed (when I realised no-one was seeing the ERROR logs indicating it was going away), I neglected to re-instate the following distro specific patch (which avoided conflicts between the ID used by CMAN and Pacemaker): diff --git a/configure.ac b/configure.ac index a3784d5..dafa9e2 100644 --- a/configure.ac +++ b/configure.ac @@ -1133,7 +1133,7 @@ AC_MSG_CHECKING(for native corosync) COROSYNC_LIBS= CS_USES_LIBQB=0 -PCMK_SERVICE_ID=9 +PCMK_SERVICE_ID=10 LCRSODIR=$libdir if test $SUPPORT_CS = no; then So Pacemaker 6.4 is talking on slot 10, while Pacemaker == 6.4 is using slot 9. This is why the two versions cannot see each other :-( I'm very sorry. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo
Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?
Hi Andrew, yes, this clarifies a lot. Seems that it is really time to throw away the plugin. The CMAN solution wont be able (at least from the documentation) to attach new nodes without reconfiguration and restart CMAN on the existing nodes. The alternative is corosync 2.x. ClusterLabs has a quite long list of corosync versions from branch 2.0, 2.1, 2.2 und 2.3. Beside the current reported issue of version 2.3, which version does ClusterLabs use for its regression test. I found somewhere a note for 2.1.x, is this true ? Rainer Gesendet:Donnerstag, 09. Mai 2013 um 04:31 Uhr Von:Andrew Beekhof and...@beekhof.net An:The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org Betreff:Re: [Pacemaker] 1.1.8 not compatible with 1.1.7? On 08/05/2013, at 4:53 PM, Andrew Beekhof and...@beekhof.net wrote: On 08/05/2013, at 4:08 PM, Andrew Beekhof and...@beekhof.net wrote: On 03/05/2013, at 8:46 PM, Rainer Brestan rainer.bres...@gmx.net wrote: Now i have all the logs for some combinations. Corosync: 1.4.1-7 for all the tests on all nodes Base is always fresh installation of each node with all packages equal except pacemaker version. int2node1 node id: 1743917066 int2node2 node id: 1777471498 In each ZIP file log from both nodes and the status output of crm_mon and cibadmin -Q is included. 1.) 1.1.8-4 attaches to running 1.1.7-6 cluster https://www.dropbox.com/s/06oyrle4ny47uv9/attach_1.1.8-4_to_1.1.7-6.zip Result: join outstanding 2.) 1.1.9-2 attaches to running 1.1.7-6 cluster https://www.dropbox.com/s/fv5kcm2yb5jz56z/attach_1.1.9-2_to_1.1.7-6.zip Result: join outstanding Neither side is seeing anything from the other, which is very unexpected. I notice youre using the plugin... which acts as a message router. So I suspect something in there has changed (though Im at a loss to say what) and that cman based clusters are unaffected. Confirmed, cman clusters are unaffected. Im yet to work out what changed in the plugin. I worked it out... The Red Hat changelog for 1.1.8-2 originally contained +- Cman is the only supported membership quorum provider, do not ship the corosync plugin When this decision was reversed (when I realised no-one was seeing the ERROR logs indicating it was going away), I neglected to re-instate the following distro specific patch (which avoided conflicts between the ID used by CMAN and Pacemaker): diff --git a/configure.ac b/configure.ac index a3784d5..dafa9e2 100644 --- a/configure.ac +++ b/configure.ac @@ -1133,7 +1133,7 @@ AC_MSG_CHECKING(for native corosync) COROSYNC_LIBS= CS_USES_LIBQB=0 -PCMK_SERVICE_ID=9 +PCMK_SERVICE_ID=10 LCRSODIR=libdir if test SUPPORT_CS = no; then So Pacemaker 6.4 is talking on slot 10, while Pacemaker == 6.4 is using slot 9. This is why the two versions cannot see each other :-( Im very sorry. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?
On 10/05/2013, at 6:05 AM, Rainer Brestan rainer.bres...@gmx.net wrote: Hi Andrew, yes, this clarifies a lot. Seems that it is really time to throw away the plugin. The CMAN solution wont be able (at least from the documentation) to attach new nodes without reconfiguration and restart CMAN on the existing nodes That doesn't sound right to me. CC'ing Fabio who should know more (or who does) . The alternative is corosync 2.x. Not on RHEL6 - unless you're building things yourself of course. ClusterLabs has a quite long list of corosync versions from branch 2.0, 2.1, 2.2 und 2.3. Beside the current reported issue of version 2.3, which version does ClusterLabs use for its regression test. I found somewhere a note for 2.1.x, is this true ? According to rpm, I've been using: Source RPM : corosync-2.3.0-1.1.2c22.el7.src.rpm and Source RPM : corosync-2.3.0-1.fc18.src.rpm Rainer Gesendet: Donnerstag, 09. Mai 2013 um 04:31 Uhr Von: Andrew Beekhof and...@beekhof.net An: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org Betreff: Re: [Pacemaker] 1.1.8 not compatible with 1.1.7? On 08/05/2013, at 4:53 PM, Andrew Beekhof and...@beekhof.net wrote: On 08/05/2013, at 4:08 PM, Andrew Beekhof and...@beekhof.net wrote: On 03/05/2013, at 8:46 PM, Rainer Brestan rainer.bres...@gmx.net wrote: Now i have all the logs for some combinations. Corosync: 1.4.1-7 for all the tests on all nodes Base is always fresh installation of each node with all packages equal except pacemaker version. int2node1 node id: 1743917066 int2node2 node id: 1777471498 In each ZIP file log from both nodes and the status output of crm_mon and cibadmin -Q is included. 1.) 1.1.8-4 attaches to running 1.1.7-6 cluster https://www.dropbox.com/s/06oyrle4ny47uv9/attach_1.1.8-4_to_1.1.7-6.zip Result: join outstanding 2.) 1.1.9-2 attaches to running 1.1.7-6 cluster https://www.dropbox.com/s/fv5kcm2yb5jz56z/attach_1.1.9-2_to_1.1.7-6.zip Result: join outstanding Neither side is seeing anything from the other, which is very unexpected. I notice you're using the plugin... which acts as a message router. So I suspect something in there has changed (though I'm at a loss to say what) and that cman based clusters are unaffected. Confirmed, cman clusters are unaffected. I'm yet to work out what changed in the plugin. I worked it out... The Red Hat changelog for 1.1.8-2 originally contained +- Cman is the only supported membership quorum provider, do not ship the corosync plugin When this decision was reversed (when I realised no-one was seeing the ERROR logs indicating it was going away), I neglected to re-instate the following distro specific patch (which avoided conflicts between the ID used by CMAN and Pacemaker): diff --git a/configure.ac b/configure.ac index a3784d5..dafa9e2 100644 --- a/configure.ac +++ b/configure.ac @@ -1133,7 +1133,7 @@ AC_MSG_CHECKING(for native corosync) COROSYNC_LIBS= CS_USES_LIBQB=0 -PCMK_SERVICE_ID=9 +PCMK_SERVICE_ID=10 LCRSODIR=$libdir if test $SUPPORT_CS = no; then So Pacemaker 6.4 is talking on slot 10, while Pacemaker == 6.4 is using slot 9. This is why the two versions cannot see each other :-( I'm very sorry. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?
On 03/05/2013, at 8:46 PM, Rainer Brestan rainer.bres...@gmx.net wrote: Now i have all the logs for some combinations. Corosync: 1.4.1-7 for all the tests on all nodes Base is always fresh installation of each node with all packages equal except pacemaker version. int2node1 node id: 1743917066 int2node2 node id: 1777471498 In each ZIP file log from both nodes and the status output of crm_mon and cibadmin -Q is included. 1.) 1.1.8-4 attaches to running 1.1.7-6 cluster https://www.dropbox.com/s/06oyrle4ny47uv9/attach_1.1.8-4_to_1.1.7-6.zip Result: join outstanding 2.) 1.1.9-2 attaches to running 1.1.7-6 cluster https://www.dropbox.com/s/fv5kcm2yb5jz56z/attach_1.1.9-2_to_1.1.7-6.zip Result: join outstanding Neither side is seeing anything from the other, which is very unexpected. I notice you're using the plugin... which acts as a message router. So I suspect something in there has changed (though I'm at a loss to say what) and that cman based clusters are unaffected. Since you've already gone to a lot of effort, I've spent the afternoon putting 1.1.7 onto one of my nodes to do some testing with. I'll let you know what I discover. 3.) 1.1.9-2 attaches to running 1.1.8-4 cluster https://www.dropbox.com/s/y9o4yo8g8ahwjga/attach_1.1.9-2_to_1.1.8-4.zip Result: join successful Rainer Gesendet: Freitag, 03. Mai 2013 um 01:30 Uhr Von: Andrew Beekhof and...@beekhof.net An: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org Betreff: Re: [Pacemaker] 1.1.8 not compatible with 1.1.7? On 03/05/2013, at 4:46 AM, Rainer Brestan rainer.bres...@gmx.net wrote: Hi Lars, i have tried 1.1.9-2 from download area at clusterlabs for RHEL6 with corosync 1.4.1-17, also running with 1.1.7-6 at the other node. I have to go deeper in details later on (with logs), but the first try was worse than 1.1.8-7. When the node with 1.1.9-2 joins the cluster, it could not even decode the ais_message to get the node name of the node running 1.1.7-6. Logs? It states a new node has joined with the correct node id, but as name it could only decode (null) as node name. Just as first impression. Rainer Gesendet: Dienstag, 30. April 2013 um 17:16 Uhr Von: Lars Marowsky-Bree l...@suse.com An: pacemaker@oss.clusterlabs.org Betreff: Re: [Pacemaker] 1.1.8 not compatible with 1.1.7? On 2013-04-24T11:44:57, Rainer Brestan rainer.bres...@gmx.net wrote: Current DC: int2node2 - partition WITHOUT quorum Version: 1.1.8-7.el6-394e906 This may not be the answer you want, since it is fairly unspecific. But I think we noticed something similar when we pulled in 1.1.8, I don't recall the bug number, but I *think* it worked out with a later git version. Can you try a newer build than 1.1.8? Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?
On 08/05/2013, at 4:08 PM, Andrew Beekhof and...@beekhof.net wrote: On 03/05/2013, at 8:46 PM, Rainer Brestan rainer.bres...@gmx.net wrote: Now i have all the logs for some combinations. Corosync: 1.4.1-7 for all the tests on all nodes Base is always fresh installation of each node with all packages equal except pacemaker version. int2node1 node id: 1743917066 int2node2 node id: 1777471498 In each ZIP file log from both nodes and the status output of crm_mon and cibadmin -Q is included. 1.) 1.1.8-4 attaches to running 1.1.7-6 cluster https://www.dropbox.com/s/06oyrle4ny47uv9/attach_1.1.8-4_to_1.1.7-6.zip Result: join outstanding 2.) 1.1.9-2 attaches to running 1.1.7-6 cluster https://www.dropbox.com/s/fv5kcm2yb5jz56z/attach_1.1.9-2_to_1.1.7-6.zip Result: join outstanding Neither side is seeing anything from the other, which is very unexpected. I notice you're using the plugin... which acts as a message router. So I suspect something in there has changed (though I'm at a loss to say what) and that cman based clusters are unaffected. Confirmed, cman clusters are unaffected. I'm yet to work out what changed in the plugin. Since you've already gone to a lot of effort, I've spent the afternoon putting 1.1.7 onto one of my nodes to do some testing with. I'll let you know what I discover. 3.) 1.1.9-2 attaches to running 1.1.8-4 cluster https://www.dropbox.com/s/y9o4yo8g8ahwjga/attach_1.1.9-2_to_1.1.8-4.zip Result: join successful Rainer Gesendet: Freitag, 03. Mai 2013 um 01:30 Uhr Von: Andrew Beekhof and...@beekhof.net An: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org Betreff: Re: [Pacemaker] 1.1.8 not compatible with 1.1.7? On 03/05/2013, at 4:46 AM, Rainer Brestan rainer.bres...@gmx.net wrote: Hi Lars, i have tried 1.1.9-2 from download area at clusterlabs for RHEL6 with corosync 1.4.1-17, also running with 1.1.7-6 at the other node. I have to go deeper in details later on (with logs), but the first try was worse than 1.1.8-7. When the node with 1.1.9-2 joins the cluster, it could not even decode the ais_message to get the node name of the node running 1.1.7-6. Logs? It states a new node has joined with the correct node id, but as name it could only decode (null) as node name. Just as first impression. Rainer Gesendet: Dienstag, 30. April 2013 um 17:16 Uhr Von: Lars Marowsky-Bree l...@suse.com An: pacemaker@oss.clusterlabs.org Betreff: Re: [Pacemaker] 1.1.8 not compatible with 1.1.7? On 2013-04-24T11:44:57, Rainer Brestan rainer.bres...@gmx.net wrote: Current DC: int2node2 - partition WITHOUT quorum Version: 1.1.8-7.el6-394e906 This may not be the answer you want, since it is fairly unspecific. But I think we noticed something similar when we pulled in 1.1.8, I don't recall the bug number, but I *think* it worked out with a later git version. Can you try a newer build than 1.1.8? Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?
On 08/05/2013, at 4:53 PM, Andrew Beekhof and...@beekhof.net wrote: On 08/05/2013, at 4:08 PM, Andrew Beekhof and...@beekhof.net wrote: On 03/05/2013, at 8:46 PM, Rainer Brestan rainer.bres...@gmx.net wrote: Now i have all the logs for some combinations. Corosync: 1.4.1-7 for all the tests on all nodes Base is always fresh installation of each node with all packages equal except pacemaker version. int2node1 node id: 1743917066 int2node2 node id: 1777471498 In each ZIP file log from both nodes and the status output of crm_mon and cibadmin -Q is included. 1.) 1.1.8-4 attaches to running 1.1.7-6 cluster https://www.dropbox.com/s/06oyrle4ny47uv9/attach_1.1.8-4_to_1.1.7-6.zip Result: join outstanding 2.) 1.1.9-2 attaches to running 1.1.7-6 cluster https://www.dropbox.com/s/fv5kcm2yb5jz56z/attach_1.1.9-2_to_1.1.7-6.zip Result: join outstanding Neither side is seeing anything from the other, which is very unexpected. I notice you're using the plugin... which acts as a message router. So I suspect something in there has changed (though I'm at a loss to say what) and that cman based clusters are unaffected. Confirmed, cman clusters are unaffected. I'm yet to work out what changed in the plugin. I worked it out... The Red Hat changelog for 1.1.8-2 originally contained +- Cman is the only supported membership quorum provider, do not ship the corosync plugin When this decision was reversed (when I realised no-one was seeing the ERROR logs indicating it was going away), I neglected to re-instate the following distro specific patch (which avoided conflicts between the ID used by CMAN and Pacemaker): diff --git a/configure.ac b/configure.ac index a3784d5..dafa9e2 100644 --- a/configure.ac +++ b/configure.ac @@ -1133,7 +1133,7 @@ AC_MSG_CHECKING(for native corosync) COROSYNC_LIBS= CS_USES_LIBQB=0 -PCMK_SERVICE_ID=9 +PCMK_SERVICE_ID=10 LCRSODIR=$libdir if test $SUPPORT_CS = no; then So Pacemaker 6.4 is talking on slot 10, while Pacemaker == 6.4 is using slot 9. This is why the two versions cannot see each other :-( I'm very sorry. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?
Now i have all the logs for some combinations. Corosync: 1.4.1-7 for all the tests on all nodes Base is always fresh installation of each node with all packages equal except pacemaker version. int2node1 node id: 1743917066 int2node2 node id: 1777471498 In each ZIP file log from both nodes and the status output of crm_mon and cibadmin -Q is included. 1.) 1.1.8-4 attaches to running 1.1.7-6 cluster https://www.dropbox.com/s/06oyrle4ny47uv9/attach_1.1.8-4_to_1.1.7-6.zip Result: join outstanding 2.) 1.1.9-2 attaches to running 1.1.7-6 cluster https://www.dropbox.com/s/fv5kcm2yb5jz56z/attach_1.1.9-2_to_1.1.7-6.zip Result: join outstanding 3.) 1.1.9-2 attaches to running 1.1.8-4 cluster https://www.dropbox.com/s/y9o4yo8g8ahwjga/attach_1.1.9-2_to_1.1.8-4.zip Result: join successful Rainer Gesendet:Freitag, 03. Mai 2013 um 01:30 Uhr Von:Andrew Beekhof and...@beekhof.net An:The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org Betreff:Re: [Pacemaker] 1.1.8 not compatible with 1.1.7? On 03/05/2013, at 4:46 AM, Rainer Brestan rainer.bres...@gmx.net wrote: Hi Lars, i have tried 1.1.9-2 from download area at clusterlabs for RHEL6 with corosync 1.4.1-17, also running with 1.1.7-6 at the other node. I have to go deeper in details later on (with logs), but the first try was worse than 1.1.8-7. When the node with 1.1.9-2 joins the cluster, it could not even decode the ais_message to get the node name of the node running 1.1.7-6. Logs? It states a new node has joined with the correct node id, but as name it could only decode (null) as node name. Just as first impression. Rainer Gesendet: Dienstag, 30. April 2013 um 17:16 Uhr Von: Lars Marowsky-Bree l...@suse.com An: pacemaker@oss.clusterlabs.org Betreff: Re: [Pacemaker] 1.1.8 not compatible with 1.1.7? On 2013-04-24T11:44:57, Rainer Brestan rainer.bres...@gmx.net wrote: Current DC: int2node2 - partition WITHOUT quorum Version: 1.1.8-7.el6-394e906 This may not be the answer you want, since it is fairly unspecific. But I think we noticed something similar when we pulled in 1.1.8, I dont recall the bug number, but I *think* it worked out with a later git version. Can you try a newer build than 1.1.8? Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendrffer, HRB 21284 (AG Nrnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?
Hi Lars, i have tried 1.1.9-2 from download area at clusterlabs for RHEL6 with corosync 1.4.1-17, also running with 1.1.7-6 at the other node. I have to go deeper in details later on (with logs), but the first try was worse than 1.1.8-7. When the node with 1.1.9-2 joins the cluster, it could not even decode the ais_message to get the node name of the node running 1.1.7-6. It states a new node has joined with the correct node id, but as name it could only decode (null) as node name. Just as first impression. Rainer Gesendet:Dienstag, 30. April 2013 um 17:16 Uhr Von:Lars Marowsky-Bree l...@suse.com An:pacemaker@oss.clusterlabs.org Betreff:Re: [Pacemaker] 1.1.8 not compatible with 1.1.7? On 2013-04-24T11:44:57, Rainer Brestan rainer.bres...@gmx.net wrote: Current DC: int2node2 - partition WITHOUT quorum Version: 1.1.8-7.el6-394e906 This may not be the answer you want, since it is fairly unspecific. But I think we noticed something similar when we pulled in 1.1.8, I dont recall the bug number, but I *think* it worked out with a later git version. Can you try a newer build than 1.1.8? Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendrffer, HRB 21284 (AG Nrnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?
On 03/05/2013, at 4:46 AM, Rainer Brestan rainer.bres...@gmx.net wrote: Hi Lars, i have tried 1.1.9-2 from download area at clusterlabs for RHEL6 with corosync 1.4.1-17, also running with 1.1.7-6 at the other node. I have to go deeper in details later on (with logs), but the first try was worse than 1.1.8-7. When the node with 1.1.9-2 joins the cluster, it could not even decode the ais_message to get the node name of the node running 1.1.7-6. Logs? It states a new node has joined with the correct node id, but as name it could only decode (null) as node name. Just as first impression. Rainer Gesendet: Dienstag, 30. April 2013 um 17:16 Uhr Von: Lars Marowsky-Bree l...@suse.com An: pacemaker@oss.clusterlabs.org Betreff: Re: [Pacemaker] 1.1.8 not compatible with 1.1.7? On 2013-04-24T11:44:57, Rainer Brestan rainer.bres...@gmx.net wrote: Current DC: int2node2 - partition WITHOUT quorum Version: 1.1.8-7.el6-394e906 This may not be the answer you want, since it is fairly unspecific. But I think we noticed something similar when we pulled in 1.1.8, I don't recall the bug number, but I *think* it worked out with a later git version. Can you try a newer build than 1.1.8? Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?
On 2013-04-24T11:44:57, Rainer Brestan rainer.bres...@gmx.net wrote: Current DC: int2node2 - partition WITHOUT quorum Version: 1.1.8-7.el6-394e906 This may not be the answer you want, since it is fairly unspecific. But I think we noticed something similar when we pulled in 1.1.8, I don't recall the bug number, but I *think* it worked out with a later git version. Can you try a newer build than 1.1.8? Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?
On 24/04/2013, at 7:44 PM, Rainer Brestan rainer.bres...@gmx.net wrote: Pacemaker log of int2node2 with trace setting. https://www.dropbox.com/s/04ciy2g6dfbauxy/pacemaker.log?n=165978094 On int2node1 (1.1.7) the trace setting did not create the pacemaker.log file. Ah, yes, 1.1.7 wasn't so smart yet. Can you make sure there is a logfile specified in corosync.conf? Looking at the node2 logs was useful (nothing is arriving from node1) but I really need to see node1's logs. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?
I have tried to make this test, because I had the same problem. Origin: One node cluster, node int2node1 running with IP address 10.16.242.231, quorum ignore, DC int2node1 [root@int2node1 sysconfig]# crm_mon -1 Last updated: Wed Apr 24 09:49:32 2013 Last change: Wed Apr 24 09:44:55 2013 via crm_resource on int2node1 Stack: openais Current DC: int2node1 - partition WITHOUT quorum Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14 1 Nodes configured, 2 expected votes 1 Resources configured. Online: [ int2node1 ] Clone Set: cloneSysInfo [resSysInfo] Started: [ int2node1 ] Next step: Node int2node2 with IP address 10.16.242.233 joins the cluster. Result: [root@int2node1 sysconfig]# crm_mon -1 Last updated: Wed Apr 24 10:14:18 2013 Last change: Wed Apr 24 10:05:20 2013 via crmd on int2node1 Stack: openais Current DC: int2node1 - partition WITHOUT quorum Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14 2 Nodes configured, 2 expected votes 1 Resources configured. Online: [ int2node1 ] OFFLINE: [ int2node2 ] Clone Set: cloneSysInfo [resSysInfo] Started: [ int2node1 ] [root@int2node1 sysconfig]# corosync-objctl grep member runtime.totem.pg.mrp.srp.members.1743917066.ip=r(0) ip(10.16.242.231) runtime.totem.pg.mrp.srp.members.1743917066.join_count=1 runtime.totem.pg.mrp.srp.members.1743917066.status=joined runtime.totem.pg.mrp.srp.members.1777471498.ip=r(0) ip(10.16.242.233) runtime.totem.pg.mrp.srp.members.1777471498.join_count=1 runtime.totem.pg.mrp.srp.members.1777471498.status=joined [root@int2node1 sysconfig]# crm_node -l 1743917066 int2node1 member [root@int2node2 ~]# crm_mon -1 Last updated: Wed Apr 24 11:27:39 2013 Last change: Wed Apr 24 10:07:45 2013 via crm_resource on int2node2 Stack: classic openais (with plugin) Current DC: int2node2 - partition WITHOUT quorum Version: 1.1.8-7.el6-394e906 2 Nodes configured, 2 expected votes 1 Resources configured. Online: [ int2node2 ] OFFLINE: [ int2node1 ] Clone Set: cloneSysInfo [resSysInfo] Started: [ int2node2 ] [root@int2node2 ~]# corosync-objctl grep member runtime.totem.pg.mrp.srp.members.1743917066.ip=r(0) ip(10.16.242.231) runtime.totem.pg.mrp.srp.members.1743917066.join_count=1 runtime.totem.pg.mrp.srp.members.1743917066.status=joined runtime.totem.pg.mrp.srp.members.1777471498.ip=r(0) ip(10.16.242.233) runtime.totem.pg.mrp.srp.members.1777471498.join_count=1 runtime.totem.pg.mrp.srp.members.1777471498.status=joined [root@int2node2 ~]# crm_node -l 1777471498 int2node2 member Pacemaker log of int2node2 with trace setting. https://www.dropbox.com/s/04ciy2g6dfbauxy/pacemaker.log?n=165978094 On int2node1 (1.1.7) the trace setting did not create the pacemaker.log file. Below the excerpt of cib with node information from int2node2. [root@int2node2 ~]# cibadmin -Q cib epoch=17 num_updates=51 admin_epoch=0 validate-with=pacemaker-1.2 crm_feature_set=3.0.7 update-origin=int2node2 update-client=crm_resource cib-last-written=Wed Apr 24 10:07:45 2013 have-quorum=0 dc-uuid=int2node2 configuration crm_config cluster_property_set id=cib-bootstrap-options ... /cluster_property_set /crm_config nodes node id=int2node2 uname=int2node2/ node id=int2node1 uname=int2node1/ /nodes resources ... /resources rsc_defaults ... /rsc_defaults /configuration status node_state id=int2node2 uname=int2node2 in_ccm=true crmd=online crm-debug-origin=do_update_resource join=member expected=member transient_attributes id=int2node2 instance_attributes id=status-int2node2 ... /instance_attributes /transient_attributes lrm id=int2node2 lrm_resources ... /lrm_resources /lrm /node_state node_state id=int2node1 uname=int2node1 in_ccm=true crmd=online join=down crm-debug-origin=do_state_transition/ /status /cib On int2node2 the node state in the cib is different. status node_state id=int2node1 uname=int2node1 ha=active in_ccm=true crmd=online join=member expected=member crm-debug-origin=do_state_transition shutdown=0 transient_attributes id=int2node1 /transient_attributes lrm id=int2node1 lrm_resources ... /lrm_resources /lrm /node_state node_state id=int2node2 uname=int2node2 crmd=online crm-debug-origin=do_state_transition ha=active in_ccm=true join=pending/ /status Rainer Gesendet:Mittwoch, 17. April 2013 um 07:32 Uhr Von:Andrew Beekhof and...@beekhof.net An:The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org Betreff:Re: [Pacemaker] 1.1.8 not compatible with 1.1.7? On 15/04/2013, at 7:08 PM, Pavlos Parissis pavlos.paris...@gmail.com wrote: Hoi, I upgraded 1st node and here are the logs https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node1.debuglog https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node2.debuglog Enabling tracing on the mentioned functions didnt give at least to me any more information. 10:22:08 pacemakerd[53588]: notice:
Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?
On 15/04/2013, at 7:08 PM, Pavlos Parissis pavlos.paris...@gmail.com wrote: Hoi, I upgraded 1st node and here are the logs https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node1.debuglog https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node2.debuglog Enabling tracing on the mentioned functions didn't give at least to me any more information. 10:22:08 pacemakerd[53588]: notice: crm_add_logfile: Additional logging available in /var/log/pacemaker.log Thats the file(s) we need :) Cheers, Pavlos On 15 April 2013 01:42, Andrew Beekhof and...@beekhof.net wrote: On 15/04/2013, at 7:31 AM, Pavlos Parissis pavlos.paris...@gmail.com wrote: On 12/04/2013 09:37 μμ, Pavlos Parissis wrote: Hoi, As I wrote to another post[1] I failed to upgrade to 1.1.8 for a 2 node cluster. Before the upgrade process both nodes are using CentOS 6.3, corosync 1.4.1-7 and pacemaker-1.1.7. I followed the rolling upgrade process, so I stopped pacemaker and then corosync on node1 and upgraded to CentOS 6.4. The OS upgrade upgrades also pacemaker to 1.1.8-7 and corosync to 1.4.1-15. The upgrade of rpms went smoothly as I knew about the crmsh issue so I made sure I had crmsh rpm on my repos. Corosync started without any problems and both nodes could see each other[2]. But for some reason node2 failed to receive a reply on join offer from node1 and node1 never joined the cluster. Node1 formed a new cluster as it never got an reply from node2, so I ended up with a split-brain situation. Logs of node1 can be found here https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node1.log and of node2 here https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node2.log Doing a Disconnect Reattach upgrade of both nodes at the same time brings me a working 1.1.8 cluster. Any attempt to make a 1.1.8 node to join a cluster with a 1.1.7 failed. There wasn't enough detail in the logs to suggest a solution, but if you add the following to /etc/sysconfig/pacemaker and re-test, it might shed some additional light on the problem. export PCMK_trace_functions=ais_dispatch_message Certainly there was no intention to make them incompatible. Cheers, Pavlos ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?
Hoi, I upgraded 1st node and here are the logs https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node1.debuglog https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node2.debuglog Enabling tracing on the mentioned functions didn't give at least to me any more information. Cheers, Pavlos On 15 April 2013 01:42, Andrew Beekhof and...@beekhof.net wrote: On 15/04/2013, at 7:31 AM, Pavlos Parissis pavlos.paris...@gmail.com wrote: On 12/04/2013 09:37 μμ, Pavlos Parissis wrote: Hoi, As I wrote to another post[1] I failed to upgrade to 1.1.8 for a 2 node cluster. Before the upgrade process both nodes are using CentOS 6.3, corosync 1.4.1-7 and pacemaker-1.1.7. I followed the rolling upgrade process, so I stopped pacemaker and then corosync on node1 and upgraded to CentOS 6.4. The OS upgrade upgrades also pacemaker to 1.1.8-7 and corosync to 1.4.1-15. The upgrade of rpms went smoothly as I knew about the crmsh issue so I made sure I had crmsh rpm on my repos. Corosync started without any problems and both nodes could see each other[2]. But for some reason node2 failed to receive a reply on join offer from node1 and node1 never joined the cluster. Node1 formed a new cluster as it never got an reply from node2, so I ended up with a split-brain situation. Logs of node1 can be found here https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node1.log and of node2 here https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node2.log Doing a Disconnect Reattach upgrade of both nodes at the same time brings me a working 1.1.8 cluster. Any attempt to make a 1.1.8 node to join a cluster with a 1.1.7 failed. There wasn't enough detail in the logs to suggest a solution, but if you add the following to /etc/sysconfig/pacemaker and re-test, it might shed some additional light on the problem. export PCMK_trace_functions=ais_dispatch_message Certainly there was no intention to make them incompatible. Cheers, Pavlos ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?
On 12/04/2013 09:37 μμ, Pavlos Parissis wrote: Hoi, As I wrote to another post[1] I failed to upgrade to 1.1.8 for a 2 node cluster. Before the upgrade process both nodes are using CentOS 6.3, corosync 1.4.1-7 and pacemaker-1.1.7. I followed the rolling upgrade process, so I stopped pacemaker and then corosync on node1 and upgraded to CentOS 6.4. The OS upgrade upgrades also pacemaker to 1.1.8-7 and corosync to 1.4.1-15. The upgrade of rpms went smoothly as I knew about the crmsh issue so I made sure I had crmsh rpm on my repos. Corosync started without any problems and both nodes could see each other[2]. But for some reason node2 failed to receive a reply on join offer from node1 and node1 never joined the cluster. Node1 formed a new cluster as it never got an reply from node2, so I ended up with a split-brain situation. Logs of node1 can be found here https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node1.log and of node2 here https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node2.log Doing a Disconnect Reattach upgrade of both nodes at the same time brings me a working 1.1.8 cluster. Any attempt to make a 1.1.8 node to join a cluster with a 1.1.7 failed. Cheers, Pavlos signature.asc Description: OpenPGP digital signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?
On 15/04/2013, at 7:31 AM, Pavlos Parissis pavlos.paris...@gmail.com wrote: On 12/04/2013 09:37 μμ, Pavlos Parissis wrote: Hoi, As I wrote to another post[1] I failed to upgrade to 1.1.8 for a 2 node cluster. Before the upgrade process both nodes are using CentOS 6.3, corosync 1.4.1-7 and pacemaker-1.1.7. I followed the rolling upgrade process, so I stopped pacemaker and then corosync on node1 and upgraded to CentOS 6.4. The OS upgrade upgrades also pacemaker to 1.1.8-7 and corosync to 1.4.1-15. The upgrade of rpms went smoothly as I knew about the crmsh issue so I made sure I had crmsh rpm on my repos. Corosync started without any problems and both nodes could see each other[2]. But for some reason node2 failed to receive a reply on join offer from node1 and node1 never joined the cluster. Node1 formed a new cluster as it never got an reply from node2, so I ended up with a split-brain situation. Logs of node1 can be found here https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node1.log and of node2 here https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node2.log Doing a Disconnect Reattach upgrade of both nodes at the same time brings me a working 1.1.8 cluster. Any attempt to make a 1.1.8 node to join a cluster with a 1.1.7 failed. There wasn't enough detail in the logs to suggest a solution, but if you add the following to /etc/sysconfig/pacemaker and re-test, it might shed some additional light on the problem. export PCMK_trace_functions=ais_dispatch_message Certainly there was no intention to make them incompatible. Cheers, Pavlos ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org