Re: [Linux-HA] New user can't get cman to recognize other systems

Digimer Tue, 21 Oct 2014 15:16:47 -0700

Blocked for me, too. Possible to clone - client data?


On 21/10/14 06:14 PM, jayknowsu...@gmail.com wrote:

Sure! But i can't seem to get Redhat to let me see the bug, even though I have 
an account.

Sent from my iPad

On Oct 21, 2014, at 5:51 PM, Andrew Beekhof <and...@beekhof.net> wrote:

On 22 Oct 2014, at 7:36 am, jayknowsu...@gmail.com wrote:

Yep, my network engineer and I found that the multicast packets were being 
blocked by the underlying hypervisor for the VM systems.


Yeah, that'll happen :-(
I believe its fixed in newer kernels, but for a while there multicast would 
appear to work and then stop for no good reason.
Putting the device into promiscuous mode seemed to help IIRC.

This is the bug I knew it as: 
https://bugzilla.redhat.com/show_bug.cgi?id=1090670

At first we thought it was just iptables on the servers, but i was certain I 
had actually turned that off. The issue has been bumped up to the operations 
team for a fixing this, but since I've gotten it to work with unicast, there's 
no pressure

Sent from my iPad

On Oct 21, 2014, at 3:15 PM, Digimer <li...@alteeve.ca> wrote:

Glad you sorted it out!

So then, it was almost certainly a multicast issue. I would still strongly 
recommend trying to source and fix the problem, and reverting to mcast if you 
can. More efficient. :)

digimer

On 21/10/14 02:59 PM, John Scalia wrote:
Ok, got it working after a little more effort, and the cluster is now
properly reporting.

On Tue, Oct 21, 2014 at 1:34 PM, John Scalia <jayknowsu...@gmail.com> wrote:

So, I set "transport="udpi"' in the cluster.conf file, and it now looks
like this:

<cluster config_version="11" name="pgdb_cluster" transport="udpu">

<fence_daemon/>
<clusternodes>
   <clusternode name="csgha1" nodeid="1">
     <fence>
       <method name="pcmk-redirect">
         <device name="pcmk" port="csgha1"/>
       </method>
     </fence>
   </clusternode>
   <clusternode name="csgha2" nodeid="2">
     <fence>
       <method name="pcmk-redirect">
         <device name="pcmk" port="csgha2"/>
       </method>
     </fence>
   </clusternode>
   <clusternode name="csgha3" nodeid="3">
     <fence>
       <method name="pcmk-redirect">
         <device name="pcmk" port="csgha3"/>
       </method>
     </fence>
   </clusternode>
</clusternodes>
<cman/>
<fencedevices>
   <fencedevice agent="fence_pcmk" name="pcmk"/>
</fencedevices>
<rm>
   <failoverdomains/>
   <resources/>
</rm>
</cluster>

But, after restarting the cluster I don't see any difference. Did I do
something wrong?
--
Jay

On Tue, Oct 21, 2014 at 12:25 PM, Digimer <li...@alteeve.ca> wrote:

No, you don't need to specify anything in cluster.conf for unicast to
work. Corosync will divine the IPs by resolving the node names to IPs. If
you set multicast and don't want to use the auto-selected mcast IP, then
you can specify the mcast IP group to use via <multicast... />.

digimer

On 21/10/14 12:22 PM, John Scalia wrote:

OK, looking at the cman man page on this system, I see the line saying
"the corosync.conf file is not used." So, I'm guessing I need to set a
unicast address somewhere in the cluster.conf file, but the man page
only mentions the <multicast addr="..."/> parameter. What can I use to
set this to a unicast address for ports 5404 and 5405? I'm assuming I
can't just put a unicast address for the multicast parameter, and the
man page for cluster.conf wasn't much help either.

We're still working on having the security team permit these 3 systems
to use multicast.

On 10/21/2014 11:51 AM, Digimer wrote:

Keep us posted. :)

On 21/10/14 08:40 AM, John Scalia wrote:

I've been check hostname resolution this morning, and all the systems
are listed in each /etc/hosts file (No DNS in this environment.) and
ping works on every system both to itself and all the other systems. At
least it's working on the 10.10.1.0/24 network.

I ran tcpdump trying to see what traffic is on port 5405 on each
system,
and I'm only seeing outbound on each, even though netstat shows each is
listening on the multicast address. My suspicion is that the router is
eating the multicast broadcasts, so I may try the unicast address
instead, but I'm waiting on one of our network engineers to see if my
suspicion is correct about the router. He volunteered to help late
yesterday.

On 10/20/2014 4:34 PM, Digimer wrote:

It looks sane on the surface. The 'gethostip' tool comes from the
'syslinux' package, and it's really handy! The '-d' says to give the
IP in dotted-decimanl notation only.

What I was trying to see was whether the 'uname -n' resolved to the IP
on the same network card as the other nodes. This is how corosync
decides which interface to send cluster traffic onto. I suspect you
might have a general network issue, possibly related to multicast.
(Some switches and some hypervisor virtual networks don't play nice
with corosync).

Have you tried unicast? If not, try setting the <cman ../> element to
have the <cman transport="udpu" ... /> attribute. Do note that unicast
isn't as efficient as multicast, so thought it might work, I'd
personally treat it as a debug tool to isolate the source of the
problem.

cheers

digimer

PS - Can you share your pacemaker configuration?

On 20/10/14 03:40 PM, John Scalia wrote:

Sure, and thanks for helping.

Here's the /etc/cluster/cluster.conf file and it is identical on all
three
systems:

<cluster config_version="11" name="pgdb_cluster">
  <fence_daemon/>
  <clusternodes>
    <clusternode name="csgha1" nodeid="1">
      <fence>
        <method name="pcmk-redirect">
          <device name="pcmk" port="csgha1"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="csgha2" nodeid="2">
      <fence>
        <method name="pcmk-redirect">
          <device name="pcmk" port="csgha2"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="csgha3" nodeid="3">
      <fence>
        <method name="pcmk-redirect">
          <device name="pcmk" port="csgha3"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <cman/>
  <fencedevices>
    <fencedevice agent="fence_pcmk" name="pcmk"/>
  </fencedevices>
  <rm>
    <failoverdomains/>
    <resources/>
  </rm>
</cluster>

uname -n reports "csgha1" on that system, "csgha2" on its system, and
"csgha3" on the last system.
I don't seem to have gethostip on any of these systems, so I don't
know if
the next section helps or not.
"ifconfig -a" reports csgha1: eth0 = 172.17.1.21
                                         eth1 = 10.10.1.128
                            csgha2: eth0 = 10.10.1.129
Yeah, I know this looks a little weird, but it was the way our
automated VM
control did the interfaces
                                         eth1 = 172.,17.1.3
                            csgha3: eth0 = 172.17.1.23
                                         eth1 = 10.10.1.130
The /etc/hosts file on each system only has the 10.10.1.0/24
address for
each system in in it.
iptables is not running on these systems.

Let me know if you need more information, and I very much appreciate
your
assistance.
--
Jay

On Mon, Oct 20, 2014 at 3:18 PM, Digimer <li...@alteeve.ca> wrote:

On 20/10/14 02:50 PM, John Scalia wrote:


Hi all,


I'm trying to build my first ever HA cluster and I'm using 3 VMs
running
CentOS 6.5. I followed the instructions to the letter at:

http://clusterlabs.org/quickstart-redhat.html

and everything appears to start normally, but if I run "cman_tool
nodes
-a", I only see:

Node     Sts    Inc          Joined Name
         1      M     64         2014-10--20 14:00:00 csgha1
                 Addresses: 10.10.1.128
         2      X 0
csgha2
         3      X 0
csgha3

In the other systems, the output is the same except for which
system is
shown as joined. Each shows just itself as belonging to the
cluster.
Also, "pcs status" reflects similarly with non-self systems showing
offline. I've checked "netstat -an" and see each machine
listening on
ports 5405 and 5405. And the logs are rather involved, but I'm not
seeing errors in it.

Any ideas for where to look for what's causing them to not
communicate?
--
Jay

Can you share your cluster.conf file please? Also, for each node:

* uname -n
* gethostip -d $(uname -n)
* ifconfig |grep -B 1 $(gethostip -d $(uname -n)) | grep HWaddr |
awk '{
print $1 }'
* iptables-save | grep -i multi

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person
without
access to education?
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

_______________________________________________

Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems



--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems



--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without access 
to education?
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems



--
Digimer
Papers and Projects: https://alteeve.ca/w/

What if the cure for cancer is trapped in the mind of a person withoutaccess to education?

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] New user can't get cman to recognize other systems

Reply via email to