Re: [Linux-HA] New user can't get cman to recognize other systems

John Scalia Tue, 21 Oct 2014 12:00:12 -0700

Ok, got it working after a little more effort, and the cluster is now
properly reporting.


On Tue, Oct 21, 2014 at 1:34 PM, John Scalia <jayknowsu...@gmail.com> wrote:

> So, I set "transport="udpi"' in the cluster.conf file, and it now looks
> like this:
>
> <cluster config_version="11" name="pgdb_cluster" transport="udpu">
>
>   <fence_daemon/>
>   <clusternodes>
>     <clusternode name="csgha1" nodeid="1">
>       <fence>
>         <method name="pcmk-redirect">
>           <device name="pcmk" port="csgha1"/>
>         </method>
>       </fence>
>     </clusternode>
>     <clusternode name="csgha2" nodeid="2">
>       <fence>
>         <method name="pcmk-redirect">
>           <device name="pcmk" port="csgha2"/>
>         </method>
>       </fence>
>     </clusternode>
>     <clusternode name="csgha3" nodeid="3">
>       <fence>
>         <method name="pcmk-redirect">
>           <device name="pcmk" port="csgha3"/>
>         </method>
>       </fence>
>     </clusternode>
>   </clusternodes>
>   <cman/>
>   <fencedevices>
>     <fencedevice agent="fence_pcmk" name="pcmk"/>
>   </fencedevices>
>   <rm>
>     <failoverdomains/>
>     <resources/>
>   </rm>
> </cluster>
>
> But, after restarting the cluster I don't see any difference. Did I do
> something wrong?
> --
> Jay
>
> On Tue, Oct 21, 2014 at 12:25 PM, Digimer <li...@alteeve.ca> wrote:
>
>> No, you don't need to specify anything in cluster.conf for unicast to
>> work. Corosync will divine the IPs by resolving the node names to IPs. If
>> you set multicast and don't want to use the auto-selected mcast IP, then
>> you can specify the mcast IP group to use via <multicast... />.
>>
>> digimer
>>
>>
>> On 21/10/14 12:22 PM, John Scalia wrote:
>>
>>> OK, looking at the cman man page on this system, I see the line saying
>>> "the corosync.conf file is not used." So, I'm guessing I need to set a
>>> unicast address somewhere in the cluster.conf file, but the man page
>>> only mentions the <multicast addr="..."/> parameter. What can I use to
>>> set this to a unicast address for ports 5404 and 5405? I'm assuming I
>>> can't just put a unicast address for the multicast parameter, and the
>>> man page for cluster.conf wasn't much help either.
>>>
>>> We're still working on having the security team permit these 3 systems
>>> to use multicast.
>>>
>>> On 10/21/2014 11:51 AM, Digimer wrote:
>>>
>>>> Keep us posted. :)
>>>>
>>>> On 21/10/14 08:40 AM, John Scalia wrote:
>>>>
>>>>> I've been check hostname resolution this morning, and all the systems
>>>>> are listed in each /etc/hosts file (No DNS in this environment.) and
>>>>> ping works on every system both to itself and all the other systems. At
>>>>> least it's working on the 10.10.1.0/24 network.
>>>>>
>>>>> I ran tcpdump trying to see what traffic is on port 5405 on each
>>>>> system,
>>>>> and I'm only seeing outbound on each, even though netstat shows each is
>>>>> listening on the multicast address. My suspicion is that the router is
>>>>> eating the multicast broadcasts, so I may try the unicast address
>>>>> instead, but I'm waiting on one of our network engineers to see if my
>>>>> suspicion is correct about the router. He volunteered to help late
>>>>> yesterday.
>>>>>
>>>>> On 10/20/2014 4:34 PM, Digimer wrote:
>>>>>
>>>>>> It looks sane on the surface. The 'gethostip' tool comes from the
>>>>>> 'syslinux' package, and it's really handy! The '-d' says to give the
>>>>>> IP in dotted-decimanl notation only.
>>>>>>
>>>>>> What I was trying to see was whether the 'uname -n' resolved to the IP
>>>>>> on the same network card as the other nodes. This is how corosync
>>>>>> decides which interface to send cluster traffic onto. I suspect you
>>>>>> might have a general network issue, possibly related to multicast.
>>>>>> (Some switches and some hypervisor virtual networks don't play nice
>>>>>> with corosync).
>>>>>>
>>>>>> Have you tried unicast? If not, try setting the <cman ../> element to
>>>>>> have the <cman transport="udpu" ... /> attribute. Do note that unicast
>>>>>> isn't as efficient as multicast, so thought it might work, I'd
>>>>>> personally treat it as a debug tool to isolate the source of the
>>>>>> problem.
>>>>>>
>>>>>> cheers
>>>>>>
>>>>>> digimer
>>>>>>
>>>>>> PS - Can you share your pacemaker configuration?
>>>>>>
>>>>>> On 20/10/14 03:40 PM, John Scalia wrote:
>>>>>>
>>>>>>> Sure, and thanks for helping.
>>>>>>>
>>>>>>> Here's the /etc/cluster/cluster.conf file and it is identical on all
>>>>>>> three
>>>>>>> systems:
>>>>>>>
>>>>>>> <cluster config_version="11" name="pgdb_cluster">
>>>>>>>    <fence_daemon/>
>>>>>>>    <clusternodes>
>>>>>>>      <clusternode name="csgha1" nodeid="1">
>>>>>>>        <fence>
>>>>>>>          <method name="pcmk-redirect">
>>>>>>>            <device name="pcmk" port="csgha1"/>
>>>>>>>          </method>
>>>>>>>        </fence>
>>>>>>>      </clusternode>
>>>>>>>      <clusternode name="csgha2" nodeid="2">
>>>>>>>        <fence>
>>>>>>>          <method name="pcmk-redirect">
>>>>>>>            <device name="pcmk" port="csgha2"/>
>>>>>>>          </method>
>>>>>>>        </fence>
>>>>>>>      </clusternode>
>>>>>>>      <clusternode name="csgha3" nodeid="3">
>>>>>>>        <fence>
>>>>>>>          <method name="pcmk-redirect">
>>>>>>>            <device name="pcmk" port="csgha3"/>
>>>>>>>          </method>
>>>>>>>        </fence>
>>>>>>>      </clusternode>
>>>>>>>    </clusternodes>
>>>>>>>    <cman/>
>>>>>>>    <fencedevices>
>>>>>>>      <fencedevice agent="fence_pcmk" name="pcmk"/>
>>>>>>>    </fencedevices>
>>>>>>>    <rm>
>>>>>>>      <failoverdomains/>
>>>>>>>      <resources/>
>>>>>>>    </rm>
>>>>>>> </cluster>
>>>>>>>
>>>>>>> uname -n reports "csgha1" on that system, "csgha2" on its system, and
>>>>>>> "csgha3" on the last system.
>>>>>>> I don't seem to have gethostip on any of these systems, so I don't
>>>>>>> know if
>>>>>>> the next section helps or not.
>>>>>>> "ifconfig -a" reports csgha1: eth0 = 172.17.1.21
>>>>>>>                                           eth1 = 10.10.1.128
>>>>>>>                              csgha2: eth0 = 10.10.1.129
>>>>>>> Yeah, I know this looks a little weird, but it was the way our
>>>>>>> automated VM
>>>>>>> control did the interfaces
>>>>>>>                                           eth1 = 172.,17.1.3
>>>>>>>                              csgha3: eth0 = 172.17.1.23
>>>>>>>                                           eth1 = 10.10.1.130
>>>>>>> The /etc/hosts file on each system only has the 10.10.1.0/24
>>>>>>> address for
>>>>>>> each system in in it.
>>>>>>> iptables is not running on these systems.
>>>>>>>
>>>>>>> Let me know if you need more information, and I very much appreciate
>>>>>>> your
>>>>>>> assistance.
>>>>>>> --
>>>>>>> Jay
>>>>>>>
>>>>>>> On Mon, Oct 20, 2014 at 3:18 PM, Digimer <li...@alteeve.ca> wrote:
>>>>>>>
>>>>>>>  On 20/10/14 02:50 PM, John Scalia wrote:
>>>>>>>>
>>>>>>>>  Hi all,
>>>>>>>>>
>>>>>>>>> I'm trying to build my first ever HA cluster and I'm using 3 VMs
>>>>>>>>> running
>>>>>>>>> CentOS 6.5. I followed the instructions to the letter at:
>>>>>>>>>
>>>>>>>>> http://clusterlabs.org/quickstart-redhat.html
>>>>>>>>>
>>>>>>>>> and everything appears to start normally, but if I run "cman_tool
>>>>>>>>> nodes
>>>>>>>>> -a", I only see:
>>>>>>>>>
>>>>>>>>> Node     Sts    Inc          Joined Name
>>>>>>>>>           1      M     64         2014-10--20 14:00:00 csgha1
>>>>>>>>>                   Addresses: 10.10.1.128
>>>>>>>>>           2      X 0
>>>>>>>>> csgha2
>>>>>>>>>           3      X 0
>>>>>>>>> csgha3
>>>>>>>>>
>>>>>>>>> In the other systems, the output is the same except for which
>>>>>>>>> system is
>>>>>>>>> shown as joined. Each shows just itself as belonging to the
>>>>>>>>> cluster.
>>>>>>>>> Also, "pcs status" reflects similarly with non-self systems showing
>>>>>>>>> offline. I've checked "netstat -an" and see each machine
>>>>>>>>> listening on
>>>>>>>>> ports 5405 and 5405. And the logs are rather involved, but I'm not
>>>>>>>>> seeing errors in it.
>>>>>>>>>
>>>>>>>>> Any ideas for where to look for what's causing them to not
>>>>>>>>> communicate?
>>>>>>>>> --
>>>>>>>>> Jay
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Can you share your cluster.conf file please? Also, for each node:
>>>>>>>>
>>>>>>>> * uname -n
>>>>>>>> * gethostip -d $(uname -n)
>>>>>>>> * ifconfig |grep -B 1 $(gethostip -d $(uname -n)) | grep HWaddr |
>>>>>>>> awk '{
>>>>>>>> print $1 }'
>>>>>>>> * iptables-save | grep -i multi
>>>>>>>>
>>>>>>>> --
>>>>>>>> Digimer
>>>>>>>> Papers and Projects: https://alteeve.ca/w/
>>>>>>>> What if the cure for cancer is trapped in the mind of a person
>>>>>>>> without
>>>>>>>> access to education?
>>>>>>>> _______________________________________________
>>>>>>>> Linux-HA mailing list
>>>>>>>> Linux-HA@lists.linux-ha.org
>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>>>>
>>>>>>>>  _______________________________________________
>>>>>>> Linux-HA mailing list
>>>>>>> Linux-HA@lists.linux-ha.org
>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> Linux-HA mailing list
>>>>> Linux-HA@lists.linux-ha.org
>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> Linux-HA mailing list
>>> Linux-HA@lists.linux-ha.org
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> See also: http://linux-ha.org/ReportingProblems
>>>
>>
>>
>> --
>> Digimer
>> Papers and Projects: https://alteeve.ca/w/
>> What if the cure for cancer is trapped in the mind of a person without
>> access to education?
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
>
>
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] New user can't get cman to recognize other systems

Reply via email to