[Pacemaker] Cluster Refuses to Stop/Shutdown

2009-09-24 Thread Remi Broemeling




I posted this to the OpenAIS Mailing List
(open...@lists.linux-foundation.org) yesterday, but haven't received a
response and upon further reflection I think that maybe I chose the
wrong list to post it to.  That list seems to be far less about user
support and far more about developer communication.  Therefore
re-trying here, as the archives show it to be somewhat more
user-focused.

The problem is that I'm having an issue with corosync refusing to
shutdown in response to a
QUIT signal.  Given the below cluster (output of crm_mon):


Last updated: Wed Sep 23 15:56:24 2009
Stack: openais
Current DC: boot1 - partition with quorum
Version: 1.0.5-3840e6b5a305ccb803d29b468556739e75532d56
2 Nodes configured, 2 expected votes
0 Resources configured.


Online: [ boot1 boot2 ]

If I go onto the host 'boot2', and issue the command "killall -QUIT
corosync", the anticipated result would be that boot2 would go offline
(out of the cluster), and all of the cluster processes
(corosync/stonithd/cib/lrmd/attrd/pengine/crmd) would shut-down. 
However, this is not occurring, and I don't really have any idea why. 
After logging into boot2, and issuing the command "killall -QUIT
corosync", the result is a split-brain:

>From boot1's viewpoint:

Last updated: Wed Sep 23 15:58:27 2009
Stack: openais
Current DC: boot1 - partition WITHOUT quorum
Version: 1.0.5-3840e6b5a305ccb803d29b468556739e75532d56
2 Nodes configured, 2 expected votes
0 Resources configured.


Online: [ boot1 ]
OFFLINE: [ boot2 ]

>From boot2's viewpoint:

Last updated: Wed Sep 23 15:58:35 2009
Stack: openais
Current DC: boot1 - partition with quorum
Version: 1.0.5-3840e6b5a305ccb803d29b468556739e75532d56
2 Nodes configured, 2 expected votes
0 Resources configured.


Online: [ boot1 boot2 ]

At this point the status quo holds until such time as ANOTHER QUIT
signal is sent to corosync, (i.e. the command "killall -QUIT corosync"
is executed on boot2 again).  Then, boot2 shuts down properly and
everything appears to be kosher.  Basically, what I expect to happen
after a single QUIT signal is instead taking two QUIT signals to occur;
and that summarizes my question: why does it take two QUIT signals to
force corosync to actually shutdown?  Is that desired behavior?  From
everything online that I have read it seems to be very strange, and it
makes me think that I have a problem in my configuration(s), but I've
no idea what that would be even after playing with things and
investigating for the day.

I would be very grateful for any guidance that could be provided, as at
the moment I seem to be at an impasse.

Log files, with debugging set to 'on', can be found at the following
pastebin locations:
    After first QUIT signal issued on boot2:
        boot1:/var/log/syslog: http://pastebin.com/m7f9a61fd
        boot2:/var/log/syslog: http://pastebin.com/d26fdfee
    After second QUIT signal issued on boot2:
        boot1:/var/log/syslog: http://pastebin.com/m755fb989
        boot2:/var/log/syslog: http://pastebin.com/m22dcef45

OS, Software Packages, and Versions:
    * two nodes, each running Ubuntu Hardy Heron LTS
    * ubuntu-ha packages, as downloaded from
http://ppa.launchpad.net/ubuntu-ha-maintainers/ppa/ubuntu/:
        * pacemaker-openais package version
1.0.5+hg20090813-0ubuntu2~hardy1
        * openais package version 1.0.0-3ubuntu1~hardy1
        * corosync package version 1.0.0-4ubuntu1~hardy2
        * heartbeat-common package version
heartbeat-common_2.99.2+sles11r9-5ubuntu1~hardy1

Network Setup:
    * boot1
        * eth0 is 192.168.10.192
        * eth1 is 172.16.1.1
    * boot2
        * eth0 is 192.168.10.193
        * eth1 is 172.16.1.2
    * boot1:eth0 and boot2:eth0 both connect to the same switch.
    * boot1:eth1 and boot2:eth1 are connected directly to each other
via a cross-over cable.
    * no firewalls are involved, and tcpdump shows the multicast and
UDP traffic flowing correctly over these links.
    * I attempted a broadcast (rather than multicast) configuration, to
see if that would fix the problem.  It did not.

`crm configure show` output:
    node boot1
    node boot2
    property $id="cib-bootstrap-options" \
            dc-version="1.0.5-3840e6b5a305ccb803d29b468556739e75532d56"
\
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        stonith-enabled="false" \
        no-quorum-policy="ignore"

Contents of /etc/corosync/corosync.conf:
    # Please read the corosync.conf.5 manual page
    compatibility: whitetank

    totem {
    clear_node_high_bit: yes
    version: 2
    secauth: on
    threads: 1
    heartbeat_failures_allowed: 3
    interface {
    ringnumber: 0
    bindnetaddr: 172.16.1.0
    mcastaddr: 239.42.0.1
    mcastport: 5505
    }
    interface {
    ringnumber: 1
    bindnetaddr: 192.168.10.0

Re: [Pacemaker] Cluster Refuses to Stop/Shutdown

2009-09-24 Thread Remi Broemeling




I've spent all day working on this; even going so far as to completely
build my own set of packages from the Debian-available ones (which
appear to be different than the Ubuntu-available ones).  It didn't have
any effect on the issue at all: the cluster still freaks out and
becomes a split-brain after a single SIGQUIT.

The debian packages that also demonstrate this behavior were the below
versions:
    cluster-glue_1.0+hg20090915-1~bpo50+1_i386.deb
    corosync_1.0.0-5~bpo50+1_i386.deb
    libcorosync4_1.0.0-5~bpo50+1_i386.deb
    libopenais3_1.0.0-4~bpo50+1_i386.deb
    openais_1.0.0-4~bpo50+1_i386.deb
    pacemaker-openais_1.0.5+hg20090915-1~bpo50+1_i386.deb

These packages were re-built (under Ubuntu Hardy Heron LTS) from the
*.diff.gz, *.dsc, and *.orig.tar.gz files available at http://people.debian.org/~madkiss/ha-corosync,
and as I said the symptoms remain exactly the same, both under the
configuration that I list below and the sample configuration that came
with these packages.  I also attempted the same with a single IP
Address resource associated with the cluster; just to be sure it wasn't
an edge case for a cluster with no resources; but again that had no
effect.

Basically I'm still exactly at the point that I was at yesterday
morning at about 0900.

Remi Broemeling wrote:
I
posted this to the OpenAIS Mailing List
(open...@lists.linux-foundation.org)
yesterday, but haven't received a
response and upon further reflection I think that maybe I chose the
wrong list to post it to.  That list seems to be far less about user
support and far more about developer communication.  Therefore
re-trying here, as the archives show it to be somewhat more
user-focused.
  
The problem is that I'm having an issue with corosync refusing to
shutdown in response to a
QUIT signal.  Given the below cluster (output of crm_mon):
  

Last updated: Wed Sep 23 15:56:24 2009
Stack: openais
Current DC: boot1 - partition with quorum
Version: 1.0.5-3840e6b5a305ccb803d29b468556739e75532d56
2 Nodes configured, 2 expected votes
0 Resources configured.

  
Online: [ boot1 boot2 ]
  
If I go onto the host 'boot2', and issue the command "killall -QUIT
corosync", the anticipated result would be that boot2 would go offline
(out of the cluster), and all of the cluster processes
(corosync/stonithd/cib/lrmd/attrd/pengine/crmd) would shut-down. 
However, this is not occurring, and I don't really have any idea why. 
After logging into boot2, and issuing the command "killall -QUIT
corosync", the result is a split-brain:
  
>From boot1's viewpoint:
  
Last updated: Wed Sep 23 15:58:27 2009
Stack: openais
Current DC: boot1 - partition WITHOUT quorum
Version: 1.0.5-3840e6b5a305ccb803d29b468556739e75532d56
2 Nodes configured, 2 expected votes
0 Resources configured.

  
Online: [ boot1 ]
OFFLINE: [ boot2 ]
  
>From boot2's viewpoint:
  
Last updated: Wed Sep 23 15:58:35 2009
Stack: openais
Current DC: boot1 - partition with quorum
Version: 1.0.5-3840e6b5a305ccb803d29b468556739e75532d56
2 Nodes configured, 2 expected votes
0 Resources configured.

  
Online: [ boot1 boot2 ]
  
At this point the status quo holds until such time as ANOTHER QUIT
signal is sent to corosync, (i.e. the command "killall -QUIT corosync"
is executed on boot2 again).  Then, boot2 shuts down properly and
everything appears to be kosher.  Basically, what I expect to happen
after a single QUIT signal is instead taking two QUIT signals to occur;
and that summarizes my question: why does it take two QUIT signals to
force corosync to actually shutdown?  Is that desired behavior?  From
everything online that I have read it seems to be very strange, and it
makes me think that I have a problem in my configuration(s), but I've
no idea what that would be even after playing with things and
investigating for the day.
  
I would be very grateful for any guidance that could be provided, as at
the moment I seem to be at an impasse.
  
Log files, with debugging set to 'on', can be found at the following
pastebin locations:
    After first QUIT signal issued on boot2:
        boot1:/var/log/syslog: http://pastebin.com/m7f9a61fd
        boot2:/var/log/syslog: http://pastebin.com/d26fdfee
    After second QUIT signal issued on boot2:
        boot1:/var/log/syslog: http://pastebin.com/m755fb989
        boot2:/var/log/syslog: http://pastebin.com/m22dcef45
  
OS, Software Packages, and Versions:
    * two nodes, each running Ubuntu Hardy Heron LTS
    * ubuntu-ha packages, as downloaded from
  http://ppa.launchpad.net/ubuntu-ha-maintainers/ppa/ubuntu/:
        * pacemaker-openais package version
1.0.5+hg20090813-0ubuntu2~hardy1
        * openais package version 1.0.0-3ubuntu1~hardy1
        * corosync package version 1.0.0-4ubuntu1~hardy2
        * heartbeat-common package version
heartbeat-common_2.99.2+sles11r9-5ubuntu1~hardy1
  
Network Setup:
    * boot1
        * eth0 is 192.168.10.192
        * eth1 is 172.16.1.1

Re: [Pacemaker] Cluster Refuses to Stop/Shutdown

2009-09-24 Thread Steven Dake
Remi,

Likely a defect.  We will have to look into it.  Please file a bug as
per instructions on the corosync wiki at www.corosync.org.

On Thu, 2009-09-24 at 16:47 -0600, Remi Broemeling wrote:
> I've spent all day working on this; even going so far as to completely
> build my own set of packages from the Debian-available ones (which
> appear to be different than the Ubuntu-available ones).  It didn't
> have any effect on the issue at all: the cluster still freaks out and
> becomes a split-brain after a single SIGQUIT.
> 
> The debian packages that also demonstrate this behavior were the below
> versions:
> cluster-glue_1.0+hg20090915-1~bpo50+1_i386.deb
> corosync_1.0.0-5~bpo50+1_i386.deb
> libcorosync4_1.0.0-5~bpo50+1_i386.deb
> libopenais3_1.0.0-4~bpo50+1_i386.deb
> openais_1.0.0-4~bpo50+1_i386.deb
> pacemaker-openais_1.0.5+hg20090915-1~bpo50+1_i386.deb
> 
> These packages were re-built (under Ubuntu Hardy Heron LTS) from the
> *.diff.gz, *.dsc, and *.orig.tar.gz files available at
> http://people.debian.org/~madkiss/ha-corosync, and as I said the
> symptoms remain exactly the same, both under the configuration that I
> list below and the sample configuration that came with these packages.
> I also attempted the same with a single IP Address resource associated
> with the cluster; just to be sure it wasn't an edge case for a cluster
> with no resources; but again that had no effect.
> 
> Basically I'm still exactly at the point that I was at yesterday
> morning at about 0900.
> 
> Remi Broemeling wrote: 
> > I posted this to the OpenAIS Mailing List
> > (open...@lists.linux-foundation.org) yesterday, but haven't received
> > a response and upon further reflection I think that maybe I chose
> > the wrong list to post it to.  That list seems to be far less about
> > user support and far more about developer communication.  Therefore
> > re-trying here, as the archives show it to be somewhat more
> > user-focused.
> > 
> > The problem is that I'm having an issue with corosync refusing to
> > shutdown in response to a QUIT signal.  Given the below cluster
> > (output of crm_mon):
> > 
> > 
> > Last updated: Wed Sep 23 15:56:24 2009
> > Stack: openais
> > Current DC: boot1 - partition with quorum
> > Version: 1.0.5-3840e6b5a305ccb803d29b468556739e75532d56
> > 2 Nodes configured, 2 expected votes
> > 0 Resources configured.
> > 
> > 
> > Online: [ boot1 boot2 ]
> > 
> > If I go onto the host 'boot2', and issue the command "killall -QUIT
> > corosync", the anticipated result would be that boot2 would go
> > offline (out of the cluster), and all of the cluster processes
> > (corosync/stonithd/cib/lrmd/attrd/pengine/crmd) would shut-down.
> > However, this is not occurring, and I don't really have any idea
> > why.  After logging into boot2, and issuing the command "killall
> > -QUIT corosync", the result is a split-brain:
> > 
> > From boot1's viewpoint:
> > 
> > Last updated: Wed Sep 23 15:58:27 2009
> > Stack: openais
> > Current DC: boot1 - partition WITHOUT quorum
> > Version: 1.0.5-3840e6b5a305ccb803d29b468556739e75532d56
> > 2 Nodes configured, 2 expected votes
> > 0 Resources configured.
> > 
> > 
> > Online: [ boot1 ]
> > OFFLINE: [ boot2 ]
> > 
> > From boot2's viewpoint:
> > 
> > Last updated: Wed Sep 23 15:58:35 2009
> > Stack: openais
> > Current DC: boot1 - partition with quorum
> > Version: 1.0.5-3840e6b5a305ccb803d29b468556739e75532d56
> > 2 Nodes configured, 2 expected votes
> > 0 Resources configured.
> > 
> > 
> > Online: [ boot1 boot2 ]
> > 
> > At this point the status quo holds until such time as ANOTHER QUIT
> > signal is sent to corosync, (i.e. the command "killall -QUIT
> > corosync" is executed on boot2 again).  Then, boot2 shuts down
> > properly and everything appears to be kosher.  Basically, what I
> > expect to happen after a single QUIT signal is instead taking two
> > QUIT signals to occur; and that summarizes my question: why does it
> > take two QUIT signals to force corosync to actually shutdown?  Is
> > that desired behavior?  From everything online that I have read it
> > seems to be very strange, and it makes me think that I have a
> > problem in my configuration(s), but I've no idea what that would be
> > even after playing with things and investigating for the day.
> > 
> > I would be very grateful for any guidance that could be provided, as
> > at the moment I seem to be at an impasse.
> > 
> > Log files, with debugging set to 'on', can be found at the following
> > pastebin locations:
> > After first QUIT signal issued on boot2:
> > boot1:/var/log/syslog: http://pastebin.com/m7f9a61fd
> > boot2:/var/log/syslog: http://pastebin.com/d26fdfee
> > After second QUIT signal issued on boot2:
> > boot1:/var/log/syslog: http://pastebin.com/m755fb989
> > boot2:/var/log/syslog: http://pastebin.com/m22dcef45
> > 
> > OS, Software Packages, and Vers

Re: [Pacemaker] Cluster Refuses to Stop/Shutdown

2009-09-24 Thread Remi Broemeling




Ok, thanks for the note Steven.  I've filed the bug, it is #525589.

Steven Dake wrote:

  Remi,

Likely a defect.  We will have to look into it.  Please file a bug as
per instructions on the corosync wiki at www.corosync.org.

On Thu, 2009-09-24 at 16:47 -0600, Remi Broemeling wrote:
  
  
I've spent all day working on this; even going so far as to completely
build my own set of packages from the Debian-available ones (which
appear to be different than the Ubuntu-available ones).  It didn't
have any effect on the issue at all: the cluster still freaks out and
becomes a split-brain after a single SIGQUIT.

The debian packages that also demonstrate this behavior were the below
versions:
cluster-glue_1.0+hg20090915-1~bpo50+1_i386.deb
corosync_1.0.0-5~bpo50+1_i386.deb
libcorosync4_1.0.0-5~bpo50+1_i386.deb
libopenais3_1.0.0-4~bpo50+1_i386.deb
openais_1.0.0-4~bpo50+1_i386.deb
pacemaker-openais_1.0.5+hg20090915-1~bpo50+1_i386.deb

These packages were re-built (under Ubuntu Hardy Heron LTS) from the
*.diff.gz, *.dsc, and *.orig.tar.gz files available at
http://people.debian.org/~madkiss/ha-corosync, and as I said the
symptoms remain exactly the same, both under the configuration that I
list below and the sample configuration that came with these packages.
I also attempted the same with a single IP Address resource associated
with the cluster; just to be sure it wasn't an edge case for a cluster
with no resources; but again that had no effect.

Basically I'm still exactly at the point that I was at yesterday
morning at about 0900.

Remi Broemeling wrote: 


  I posted this to the OpenAIS Mailing List
(open...@lists.linux-foundation.org) yesterday, but haven't received
a response and upon further reflection I think that maybe I chose
the wrong list to post it to.  That list seems to be far less about
user support and far more about developer communication.  Therefore
re-trying here, as the archives show it to be somewhat more
user-focused.

The problem is that I'm having an issue with corosync refusing to
shutdown in response to a QUIT signal.  Given the below cluster
(output of crm_mon):


Last updated: Wed Sep 23 15:56:24 2009
Stack: openais
Current DC: boot1 - partition with quorum
Version: 1.0.5-3840e6b5a305ccb803d29b468556739e75532d56
2 Nodes configured, 2 expected votes
0 Resources configured.


Online: [ boot1 boot2 ]

If I go onto the host 'boot2', and issue the command "killall -QUIT
corosync", the anticipated result would be that boot2 would go
offline (out of the cluster), and all of the cluster processes
(corosync/stonithd/cib/lrmd/attrd/pengine/crmd) would shut-down.
However, this is not occurring, and I don't really have any idea
why.  After logging into boot2, and issuing the command "killall
-QUIT corosync", the result is a split-brain:

>From boot1's viewpoint:

Last updated: Wed Sep 23 15:58:27 2009
Stack: openais
Current DC: boot1 - partition WITHOUT quorum
Version: 1.0.5-3840e6b5a305ccb803d29b468556739e75532d56
2 Nodes configured, 2 expected votes
0 Resources configured.


Online: [ boot1 ]
OFFLINE: [ boot2 ]

>From boot2's viewpoint:

Last updated: Wed Sep 23 15:58:35 2009
Stack: openais
Current DC: boot1 - partition with quorum
Version: 1.0.5-3840e6b5a305ccb803d29b468556739e75532d56
2 Nodes configured, 2 expected votes
0 Resources configured.


Online: [ boot1 boot2 ]

At this point the status quo holds until such time as ANOTHER QUIT
signal is sent to corosync, (i.e. the command "killall -QUIT
corosync" is executed on boot2 again).  Then, boot2 shuts down
properly and everything appears to be kosher.  Basically, what I
expect to happen after a single QUIT signal is instead taking two
QUIT signals to occur; and that summarizes my question: why does it
take two QUIT signals to force corosync to actually shutdown?  Is
that desired behavior?  From everything online that I have read it
seems to be very strange, and it makes me think that I have a
problem in my configuration(s), but I've no idea what that would be
even after playing with things and investigating for the day.

I would be very grateful for any guidance that could be provided, as
at the moment I seem to be at an impasse.

Log files, with debugging set to 'on', can be found at the following
pastebin locations:
After first QUIT signal issued on boot2:
boot1:/var/log/syslog: http://pastebin.com/m7f9a61fd
boot2:/var/log/syslog: http://pastebin.com/d26fdfee
After second QUIT signal issued on boot2:
boot1:/var/log/syslog: http://pastebin.com/m755fb989
boot2:/var/log/syslog: http://pastebin.com/m22dcef45

OS, Software Packages, and Versions:
* two nodes, each running Ubuntu Hardy Heron LTS
* ubuntu-ha packages, as downloaded from
http://ppa.launchpad.net/ubuntu-ha-maintainers/ppa/ubuntu/:
* pacemaker-openais package version 1.0.5
+hg20090813-0ubuntu2~hardy1
* opena

Re: [Pacemaker] Cluster Refuses to Stop/Shutdown

2009-09-26 Thread Ante Karamatić

Steven Dake wrote:


Likely a defect.  We will have to look into it.  Please file a bug as
per instructions on the corosync wiki at www.corosync.org.


FWIW:

Cluster with three nodes, pace-1, pace-2 and pace-3.

pace-1:

Sep 26 10:33:40 pace-1 crmd: [2716]: info: do_shutdown_req: Sending 
shutdown request to DC: pace-2


pace-2:

Sep 26 10:33:44 pace-2 corosync[531]:   [pcmk  ] info: update_member: 
Node 1984866496/pace-1 is now: lost
Sep 26 10:33:44 pace-2 corosync[531]:   [pcmk  ] info: 
send_member_notification: Sending membership update 280 to 2 children


Sep 26 10:33:44 pace-2 crmd: [601]: info: ais_status_callback: status: 
pace-1 is now lost (was member)


Sep 26 10:33:44 pace-2 crmd: [601]: WARN: match_down_event: No match for 
shutdown action on pace-1


So, pace-2 doesn't understand 'shutdown action'. Is pace-1 waiting for 
an 'ACK' or something from pace-2, before stopping lrmd and/or crmd?


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Cluster Refuses to Stop/Shutdown

2009-10-06 Thread Andrew Beekhof
I could re-paste the whole thing, but its easier to just throw up the link:

http://theclusterguy.clusterlabs.org/post/205886990/advisory-dont-use-pacemaker-on-corosync-yet

On Thu, Sep 24, 2009 at 4:56 PM, Remi Broemeling  wrote:

>  I posted this to the OpenAIS Mailing List (
> open...@lists.linux-foundation.org) yesterday, but haven't received a
> response and upon further reflection I think that maybe I chose the wrong
> list to post it to.  That list seems to be far less about user support and
> far more about developer communication.  Therefore re-trying here, as the
> archives show it to be somewhat more user-focused.
>
> The problem is that I'm having an issue with corosync refusing to shutdown
> in response to a QUIT signal.  Given the below cluster (output of crm_mon):
>
> 
> Last updated: Wed Sep 23 15:56:24 2009
> Stack: openais
> Current DC: boot1 - partition with quorum
> Version: 1.0.5-3840e6b5a305ccb803d29b468556739e75532d56
> 2 Nodes configured, 2 expected votes
> 0 Resources configured.
> 
>
> Online: [ boot1 boot2 ]
>
> If I go onto the host 'boot2', and issue the command "killall -QUIT
> corosync", the anticipated result would be that boot2 would go offline (out
> of the cluster), and all of the cluster processes
> (corosync/stonithd/cib/lrmd/attrd/pengine/crmd) would shut-down.  However,
> this is not occurring, and I don't really have any idea why.  After logging
> into boot2, and issuing the command "killall -QUIT corosync", the result is
> a split-brain:
>
> From boot1's viewpoint:
> 
> Last updated: Wed Sep 23 15:58:27 2009
> Stack: openais
> Current DC: boot1 - partition WITHOUT quorum
> Version: 1.0.5-3840e6b5a305ccb803d29b468556739e75532d56
> 2 Nodes configured, 2 expected votes
> 0 Resources configured.
> 
>
> Online: [ boot1 ]
> OFFLINE: [ boot2 ]
>
> From boot2's viewpoint:
> 
> Last updated: Wed Sep 23 15:58:35 2009
> Stack: openais
> Current DC: boot1 - partition with quorum
> Version: 1.0.5-3840e6b5a305ccb803d29b468556739e75532d56
> 2 Nodes configured, 2 expected votes
> 0 Resources configured.
> 
>
> Online: [ boot1 boot2 ]
>
> At this point the status quo holds until such time as ANOTHER QUIT signal
> is sent to corosync, (i.e. the command "killall -QUIT corosync" is executed
> on boot2 again).  Then, boot2 shuts down properly and everything appears to
> be kosher.  Basically, what I expect to happen after a single QUIT signal is
> instead taking two QUIT signals to occur; and that summarizes my question:
> why does it take two QUIT signals to force corosync to actually shutdown?
> Is that desired behavior?  From everything online that I have read it seems
> to be very strange, and it makes me think that I have a problem in my
> configuration(s), but I've no idea what that would be even after playing
> with things and investigating for the day.
>
> I would be very grateful for any guidance that could be provided, as at the
> moment I seem to be at an impasse.
>
> Log files, with debugging set to 'on', can be found at the following
> pastebin locations:
> After first QUIT signal issued on boot2:
> boot1:/var/log/syslog: http://pastebin.com/m7f9a61fd
> boot2:/var/log/syslog: http://pastebin.com/d26fdfee
> After second QUIT signal issued on boot2:
> boot1:/var/log/syslog: http://pastebin.com/m755fb989
> boot2:/var/log/syslog: http://pastebin.com/m22dcef45
>
> OS, Software Packages, and Versions:
> * two nodes, each running Ubuntu Hardy Heron LTS
> * ubuntu-ha packages, as downloaded from
> http://ppa.launchpad.net/ubuntu-ha-maintainers/ppa/ubuntu/:
> * pacemaker-openais package version
> 1.0.5+hg20090813-0ubuntu2~hardy1
> * openais package version 1.0.0-3ubuntu1~hardy1
> * corosync package version 1.0.0-4ubuntu1~hardy2
> * heartbeat-common package version
> heartbeat-common_2.99.2+sles11r9-5ubuntu1~hardy1
>
> Network Setup:
> * boot1
> * eth0 is 192.168.10.192
> * eth1 is 172.16.1.1
> * boot2
> * eth0 is 192.168.10.193
> * eth1 is 172.16.1.2
> * boot1:eth0 and boot2:eth0 both connect to the same switch.
> * boot1:eth1 and boot2:eth1 are connected directly to each other via a
> cross-over cable.
> * no firewalls are involved, and tcpdump shows the multicast and UDP
> traffic flowing correctly over these links.
> * I attempted a broadcast (rather than multicast) configuration, to see
> if that would fix the problem.  It did not.
>
> `crm configure show` output:
> node boot1
> node boot2
> property $id="cib-bootstrap-options" \
> dc-version="1.0.5-3840e6b5a305ccb803d29b468556739e75532d56" \
> cluster-infrastructure="openais" \
> expected-quorum-votes="2" \
> stonith-enabled="false" \
> no-quorum-policy="ignore"
>
> Contents of /etc/corosync/corosync.conf:
> # Please read the corosync.conf.5 manual