I have tried to make this test, because I had the same problem.
 
Origin:
One node cluster, node int2node1 running with IP address 10.16.242.231, quorum ignore, DC int2node1
 
[root@int2node1 sysconfig]# crm_mon -1
============
Last updated: Wed Apr 24 09:49:32 2013
Last change: Wed Apr 24 09:44:55 2013 via crm_resource on int2node1
Stack: openais
Current DC: int2node1 - partition WITHOUT quorum
Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
1 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ int2node1 ]
 Clone Set: cloneSysInfo [resSysInfo]
     Started: [ int2node1 ]
 
Next step:
Node int2node2 with IP address 10.16.242.233 joins the cluster.
 
Result:
 
[root@int2node1 sysconfig]# crm_mon -1
============
Last updated: Wed Apr 24 10:14:18 2013
Last change: Wed Apr 24 10:05:20 2013 via crmd on int2node1
Stack: openais
Current DC: int2node1 - partition WITHOUT quorum
Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ int2node1 ]
OFFLINE: [ int2node2 ]
 Clone Set: cloneSysInfo [resSysInfo]
     Started: [ int2node1 ]
 
[root@int2node1 sysconfig]# corosync-objctl | grep member
runtime.totem.pg.mrp.srp.members.1743917066.ip=r(0) ip(10.16.242.231)
runtime.totem.pg.mrp.srp.members.1743917066.join_count=1
runtime.totem.pg.mrp.srp.members.1743917066.status=joined
runtime.totem.pg.mrp.srp.members.1777471498.ip=r(0) ip(10.16.242.233)
runtime.totem.pg.mrp.srp.members.1777471498.join_count=1
runtime.totem.pg.mrp.srp.members.1777471498.status=joined
 
[root@int2node1 sysconfig]# crm_node -l
1743917066 int2node1 member
 
[root@int2node2 ~]# crm_mon -1
Last updated: Wed Apr 24 11:27:39 2013
Last change: Wed Apr 24 10:07:45 2013 via crm_resource on int2node2
Stack: classic openais (with plugin)
Current DC: int2node2 - partition WITHOUT quorum
Version: 1.1.8-7.el6-394e906
2 Nodes configured, 2 expected votes
1 Resources configured.

Online: [ int2node2 ]
OFFLINE: [ int2node1 ]
 Clone Set: cloneSysInfo [resSysInfo]
     Started: [ int2node2 ]
 
[root@int2node2 ~]# corosync-objctl | grep member
runtime.totem.pg.mrp.srp.members.1743917066.ip=r(0) ip(10.16.242.231)
runtime.totem.pg.mrp.srp.members.1743917066.join_count=1
runtime.totem.pg.mrp.srp.members.1743917066.status=joined
runtime.totem.pg.mrp.srp.members.1777471498.ip=r(0) ip(10.16.242.233)
runtime.totem.pg.mrp.srp.members.1777471498.join_count=1
runtime.totem.pg.mrp.srp.members.1777471498.status=joined
 
[root@int2node2 ~]# crm_node -l
1777471498 int2node2 member
 
Pacemaker log of int2node2 with trace setting.
https://www.dropbox.com/s/04ciy2g6dfbauxy/pacemaker.log?n=165978094
On int2node1 (1.1.7) the trace setting did not create the pacemaker.log file.
 
Below the excerpt of cib with node information from int2node2.
[root@int2node2 ~]# cibadmin -Q
<cib epoch="17" num_updates="51" admin_epoch="0" validate-with="pacemaker-1.2" crm_feature_set="3.0.7" update-origin="int2node2" update-client="crm_resource" cib-last-written="Wed Apr 24 10:07:45 2013" have-quorum="0" dc-uuid="int2node2">
  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
      ...
      </cluster_property_set>
    </crm_config>
    <nodes>
      <node id="int2node2" uname="int2node2"/>
      <node id="int2node1" uname="int2node1"/>
    </nodes>
    <resources>
    ...
    </resources>
    <rsc_defaults>
    ...
    </rsc_defaults>
  </configuration>
  <status>
    <node_state id="int2node2" uname="int2node2" in_ccm="true" crmd="online" crm-debug-origin="do_update_resource" join="member" expected="member">
      <transient_attributes id="int2node2">
        <instance_attributes id="status-int2node2">
        ...
        </instance_attributes>
      </transient_attributes>
      <lrm id="int2node2">
        <lrm_resources>
        ...
        </lrm_resources>
      </lrm>
    </node_state>
    <node_state id="int2node1" uname="int2node1" in_ccm="true" crmd="online" join="down" crm-debug-origin="do_state_transition"/>
  </status>
</cib>
 
On int2node2 the node state in the cib is different.
  <status>
    <node_state id="int2node1" uname="int2node1" ha="active" in_ccm="true" crmd="online" join="member" expected="member" crm-debug-origin="do_state_transition" shutdown="0">
      <transient_attributes id="int2node1">
      </transient_attributes>
      <lrm id="int2node1">
        <lrm_resources>
        ...
        </lrm_resources>
      </lrm>
    </node_state>
    <node_state id="int2node2" uname="int2node2" crmd="online" crm-debug-origin="do_state_transition" ha="active" in_ccm="true" join="pending"/>
  </status>
 
Rainer
Gesendet: Mittwoch, 17. April 2013 um 07:32 Uhr
Von: "Andrew Beekhof" <and...@beekhof.net>
An: "The Pacemaker cluster resource manager" <pacemaker@oss.clusterlabs.org>
Betreff: Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?

On 15/04/2013, at 7:08 PM, Pavlos Parissis <pavlos.paris...@gmail.com> wrote:

> Hoi,
>
> I upgraded 1st node and here are the logs
> https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node1.debuglog
> https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node2.debuglog
>
> Enabling tracing on the mentioned functions didn't give at least to me any more information.

10:22:08 pacemakerd[53588]: notice: crm_add_logfile: Additional logging available in /var/log/pacemaker.log

Thats the file(s) we need :)

>
> Cheers,
> Pavlos
>
>
> On 15 April 2013 01:42, Andrew Beekhof <and...@beekhof.net> wrote:
>
> On 15/04/2013, at 7:31 AM, Pavlos Parissis <pavlos.paris...@gmail.com> wrote:
>
> > On 12/04/2013 09:37 μμ, Pavlos Parissis wrote:
> >> Hoi,
> >>
> >> As I wrote to another post[1] I failed to upgrade to 1.1.8 for a 2 node
> >> cluster.
> >>
> >> Before the upgrade process both nodes are using CentOS 6.3, corosync
> >> 1.4.1-7 and pacemaker-1.1.7.
> >>
> >> I followed the rolling upgrade process, so I stopped pacemaker and then
> >> corosync on node1 and upgraded to CentOS 6.4. The OS upgrade upgrades
> >> also pacemaker to 1.1.8-7 and corosync to 1.4.1-15.
> >> The upgrade of rpms went smoothly as I knew about the crmsh issue so I
> >> made sure I had crmsh rpm on my repos.
> >>
> >> Corosync started without any problems and both nodes could see each
> >> other[2]. But for some reason node2 failed to receive a reply on join
> >> offer from node1 and node1 never joined the cluster. Node1 formed a new
> >> cluster as it never got an reply from node2, so I ended up with a
> >> split-brain situation.
> >>
> >> Logs of node1 can be found here
> >> https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node1.log
> >> and of node2 here
> >> https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node2.log
> >>
> >
> > Doing a Disconnect & Reattach upgrade of both nodes at the same time
> > brings me a working 1.1.8 cluster. Any attempt to make a 1.1.8 node to
> > join a cluster with a 1.1.7 failed.
>
> There wasn't enough detail in the logs to suggest a solution, but if you add the following to /etc/sysconfig/pacemaker and re-test, it might shed some additional light on the problem.
>
> export PCMK_trace_functions=ais_dispatch_message
>
> Certainly there was no intention to make them incompatible.
>
> >
> > Cheers,
> > Pavlos
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to