Ok, got it working after a little more effort, and the cluster is now properly reporting.
On Tue, Oct 21, 2014 at 1:34 PM, John Scalia <jayknowsu...@gmail.com> wrote: > So, I set "transport="udpi"' in the cluster.conf file, and it now looks > like this: > > <cluster config_version="11" name="pgdb_cluster" transport="udpu"> > > <fence_daemon/> > <clusternodes> > <clusternode name="csgha1" nodeid="1"> > <fence> > <method name="pcmk-redirect"> > <device name="pcmk" port="csgha1"/> > </method> > </fence> > </clusternode> > <clusternode name="csgha2" nodeid="2"> > <fence> > <method name="pcmk-redirect"> > <device name="pcmk" port="csgha2"/> > </method> > </fence> > </clusternode> > <clusternode name="csgha3" nodeid="3"> > <fence> > <method name="pcmk-redirect"> > <device name="pcmk" port="csgha3"/> > </method> > </fence> > </clusternode> > </clusternodes> > <cman/> > <fencedevices> > <fencedevice agent="fence_pcmk" name="pcmk"/> > </fencedevices> > <rm> > <failoverdomains/> > <resources/> > </rm> > </cluster> > > But, after restarting the cluster I don't see any difference. Did I do > something wrong? > -- > Jay > > On Tue, Oct 21, 2014 at 12:25 PM, Digimer <li...@alteeve.ca> wrote: > >> No, you don't need to specify anything in cluster.conf for unicast to >> work. Corosync will divine the IPs by resolving the node names to IPs. If >> you set multicast and don't want to use the auto-selected mcast IP, then >> you can specify the mcast IP group to use via <multicast... />. >> >> digimer >> >> >> On 21/10/14 12:22 PM, John Scalia wrote: >> >>> OK, looking at the cman man page on this system, I see the line saying >>> "the corosync.conf file is not used." So, I'm guessing I need to set a >>> unicast address somewhere in the cluster.conf file, but the man page >>> only mentions the <multicast addr="..."/> parameter. What can I use to >>> set this to a unicast address for ports 5404 and 5405? I'm assuming I >>> can't just put a unicast address for the multicast parameter, and the >>> man page for cluster.conf wasn't much help either. >>> >>> We're still working on having the security team permit these 3 systems >>> to use multicast. >>> >>> On 10/21/2014 11:51 AM, Digimer wrote: >>> >>>> Keep us posted. :) >>>> >>>> On 21/10/14 08:40 AM, John Scalia wrote: >>>> >>>>> I've been check hostname resolution this morning, and all the systems >>>>> are listed in each /etc/hosts file (No DNS in this environment.) and >>>>> ping works on every system both to itself and all the other systems. At >>>>> least it's working on the 10.10.1.0/24 network. >>>>> >>>>> I ran tcpdump trying to see what traffic is on port 5405 on each >>>>> system, >>>>> and I'm only seeing outbound on each, even though netstat shows each is >>>>> listening on the multicast address. My suspicion is that the router is >>>>> eating the multicast broadcasts, so I may try the unicast address >>>>> instead, but I'm waiting on one of our network engineers to see if my >>>>> suspicion is correct about the router. He volunteered to help late >>>>> yesterday. >>>>> >>>>> On 10/20/2014 4:34 PM, Digimer wrote: >>>>> >>>>>> It looks sane on the surface. The 'gethostip' tool comes from the >>>>>> 'syslinux' package, and it's really handy! The '-d' says to give the >>>>>> IP in dotted-decimanl notation only. >>>>>> >>>>>> What I was trying to see was whether the 'uname -n' resolved to the IP >>>>>> on the same network card as the other nodes. This is how corosync >>>>>> decides which interface to send cluster traffic onto. I suspect you >>>>>> might have a general network issue, possibly related to multicast. >>>>>> (Some switches and some hypervisor virtual networks don't play nice >>>>>> with corosync). >>>>>> >>>>>> Have you tried unicast? If not, try setting the <cman ../> element to >>>>>> have the <cman transport="udpu" ... /> attribute. Do note that unicast >>>>>> isn't as efficient as multicast, so thought it might work, I'd >>>>>> personally treat it as a debug tool to isolate the source of the >>>>>> problem. >>>>>> >>>>>> cheers >>>>>> >>>>>> digimer >>>>>> >>>>>> PS - Can you share your pacemaker configuration? >>>>>> >>>>>> On 20/10/14 03:40 PM, John Scalia wrote: >>>>>> >>>>>>> Sure, and thanks for helping. >>>>>>> >>>>>>> Here's the /etc/cluster/cluster.conf file and it is identical on all >>>>>>> three >>>>>>> systems: >>>>>>> >>>>>>> <cluster config_version="11" name="pgdb_cluster"> >>>>>>> <fence_daemon/> >>>>>>> <clusternodes> >>>>>>> <clusternode name="csgha1" nodeid="1"> >>>>>>> <fence> >>>>>>> <method name="pcmk-redirect"> >>>>>>> <device name="pcmk" port="csgha1"/> >>>>>>> </method> >>>>>>> </fence> >>>>>>> </clusternode> >>>>>>> <clusternode name="csgha2" nodeid="2"> >>>>>>> <fence> >>>>>>> <method name="pcmk-redirect"> >>>>>>> <device name="pcmk" port="csgha2"/> >>>>>>> </method> >>>>>>> </fence> >>>>>>> </clusternode> >>>>>>> <clusternode name="csgha3" nodeid="3"> >>>>>>> <fence> >>>>>>> <method name="pcmk-redirect"> >>>>>>> <device name="pcmk" port="csgha3"/> >>>>>>> </method> >>>>>>> </fence> >>>>>>> </clusternode> >>>>>>> </clusternodes> >>>>>>> <cman/> >>>>>>> <fencedevices> >>>>>>> <fencedevice agent="fence_pcmk" name="pcmk"/> >>>>>>> </fencedevices> >>>>>>> <rm> >>>>>>> <failoverdomains/> >>>>>>> <resources/> >>>>>>> </rm> >>>>>>> </cluster> >>>>>>> >>>>>>> uname -n reports "csgha1" on that system, "csgha2" on its system, and >>>>>>> "csgha3" on the last system. >>>>>>> I don't seem to have gethostip on any of these systems, so I don't >>>>>>> know if >>>>>>> the next section helps or not. >>>>>>> "ifconfig -a" reports csgha1: eth0 = 172.17.1.21 >>>>>>> eth1 = 10.10.1.128 >>>>>>> csgha2: eth0 = 10.10.1.129 >>>>>>> Yeah, I know this looks a little weird, but it was the way our >>>>>>> automated VM >>>>>>> control did the interfaces >>>>>>> eth1 = 172.,17.1.3 >>>>>>> csgha3: eth0 = 172.17.1.23 >>>>>>> eth1 = 10.10.1.130 >>>>>>> The /etc/hosts file on each system only has the 10.10.1.0/24 >>>>>>> address for >>>>>>> each system in in it. >>>>>>> iptables is not running on these systems. >>>>>>> >>>>>>> Let me know if you need more information, and I very much appreciate >>>>>>> your >>>>>>> assistance. >>>>>>> -- >>>>>>> Jay >>>>>>> >>>>>>> On Mon, Oct 20, 2014 at 3:18 PM, Digimer <li...@alteeve.ca> wrote: >>>>>>> >>>>>>> On 20/10/14 02:50 PM, John Scalia wrote: >>>>>>>> >>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I'm trying to build my first ever HA cluster and I'm using 3 VMs >>>>>>>>> running >>>>>>>>> CentOS 6.5. I followed the instructions to the letter at: >>>>>>>>> >>>>>>>>> http://clusterlabs.org/quickstart-redhat.html >>>>>>>>> >>>>>>>>> and everything appears to start normally, but if I run "cman_tool >>>>>>>>> nodes >>>>>>>>> -a", I only see: >>>>>>>>> >>>>>>>>> Node Sts Inc Joined Name >>>>>>>>> 1 M 64 2014-10--20 14:00:00 csgha1 >>>>>>>>> Addresses: 10.10.1.128 >>>>>>>>> 2 X 0 >>>>>>>>> csgha2 >>>>>>>>> 3 X 0 >>>>>>>>> csgha3 >>>>>>>>> >>>>>>>>> In the other systems, the output is the same except for which >>>>>>>>> system is >>>>>>>>> shown as joined. Each shows just itself as belonging to the >>>>>>>>> cluster. >>>>>>>>> Also, "pcs status" reflects similarly with non-self systems showing >>>>>>>>> offline. I've checked "netstat -an" and see each machine >>>>>>>>> listening on >>>>>>>>> ports 5405 and 5405. And the logs are rather involved, but I'm not >>>>>>>>> seeing errors in it. >>>>>>>>> >>>>>>>>> Any ideas for where to look for what's causing them to not >>>>>>>>> communicate? >>>>>>>>> -- >>>>>>>>> Jay >>>>>>>>> >>>>>>>>> >>>>>>>> Can you share your cluster.conf file please? Also, for each node: >>>>>>>> >>>>>>>> * uname -n >>>>>>>> * gethostip -d $(uname -n) >>>>>>>> * ifconfig |grep -B 1 $(gethostip -d $(uname -n)) | grep HWaddr | >>>>>>>> awk '{ >>>>>>>> print $1 }' >>>>>>>> * iptables-save | grep -i multi >>>>>>>> >>>>>>>> -- >>>>>>>> Digimer >>>>>>>> Papers and Projects: https://alteeve.ca/w/ >>>>>>>> What if the cure for cancer is trapped in the mind of a person >>>>>>>> without >>>>>>>> access to education? >>>>>>>> _______________________________________________ >>>>>>>> Linux-HA mailing list >>>>>>>> Linux-HA@lists.linux-ha.org >>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>> Linux-HA mailing list >>>>>>> Linux-HA@lists.linux-ha.org >>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> Linux-HA mailing list >>>>> Linux-HA@lists.linux-ha.org >>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>> See also: http://linux-ha.org/ReportingProblems >>>>> >>>> >>>> >>>> >>> _______________________________________________ >>> Linux-HA mailing list >>> Linux-HA@lists.linux-ha.org >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>> See also: http://linux-ha.org/ReportingProblems >>> >> >> >> -- >> Digimer >> Papers and Projects: https://alteeve.ca/w/ >> What if the cure for cancer is trapped in the mind of a person without >> access to education? >> _______________________________________________ >> Linux-HA mailing list >> Linux-HA@lists.linux-ha.org >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems >> > > _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems