Re: [Linux-HA] New user can't get cman to recognize other systems

John Scalia Tue, 21 Oct 2014 10:35:29 -0700

So, I set "transport="udpi"' in the cluster.conf file, and it now looks
like this:


<cluster config_version="11" name="pgdb_cluster" transport="udpu">
  <fence_daemon/>
  <clusternodes>
    <clusternode name="csgha1" nodeid="1">
      <fence>
        <method name="pcmk-redirect">
          <device name="pcmk" port="csgha1"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="csgha2" nodeid="2">
      <fence>
        <method name="pcmk-redirect">
          <device name="pcmk" port="csgha2"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="csgha3" nodeid="3">
      <fence>
        <method name="pcmk-redirect">
          <device name="pcmk" port="csgha3"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <cman/>
  <fencedevices>
    <fencedevice agent="fence_pcmk" name="pcmk"/>
  </fencedevices>
  <rm>
    <failoverdomains/>
    <resources/>
  </rm>
</cluster>

But, after restarting the cluster I don't see any difference. Did I do
something wrong?
--
Jay

On Tue, Oct 21, 2014 at 12:25 PM, Digimer <li...@alteeve.ca> wrote:

> No, you don't need to specify anything in cluster.conf for unicast to
> work. Corosync will divine the IPs by resolving the node names to IPs. If
> you set multicast and don't want to use the auto-selected mcast IP, then
> you can specify the mcast IP group to use via <multicast... />.
>
> digimer
>
>
> On 21/10/14 12:22 PM, John Scalia wrote:
>
>> OK, looking at the cman man page on this system, I see the line saying
>> "the corosync.conf file is not used." So, I'm guessing I need to set a
>> unicast address somewhere in the cluster.conf file, but the man page
>> only mentions the <multicast addr="..."/> parameter. What can I use to
>> set this to a unicast address for ports 5404 and 5405? I'm assuming I
>> can't just put a unicast address for the multicast parameter, and the
>> man page for cluster.conf wasn't much help either.
>>
>> We're still working on having the security team permit these 3 systems
>> to use multicast.
>>
>> On 10/21/2014 11:51 AM, Digimer wrote:
>>
>>> Keep us posted. :)
>>>
>>> On 21/10/14 08:40 AM, John Scalia wrote:
>>>
>>>> I've been check hostname resolution this morning, and all the systems
>>>> are listed in each /etc/hosts file (No DNS in this environment.) and
>>>> ping works on every system both to itself and all the other systems. At
>>>> least it's working on the 10.10.1.0/24 network.
>>>>
>>>> I ran tcpdump trying to see what traffic is on port 5405 on each system,
>>>> and I'm only seeing outbound on each, even though netstat shows each is
>>>> listening on the multicast address. My suspicion is that the router is
>>>> eating the multicast broadcasts, so I may try the unicast address
>>>> instead, but I'm waiting on one of our network engineers to see if my
>>>> suspicion is correct about the router. He volunteered to help late
>>>> yesterday.
>>>>
>>>> On 10/20/2014 4:34 PM, Digimer wrote:
>>>>
>>>>> It looks sane on the surface. The 'gethostip' tool comes from the
>>>>> 'syslinux' package, and it's really handy! The '-d' says to give the
>>>>> IP in dotted-decimanl notation only.
>>>>>
>>>>> What I was trying to see was whether the 'uname -n' resolved to the IP
>>>>> on the same network card as the other nodes. This is how corosync
>>>>> decides which interface to send cluster traffic onto. I suspect you
>>>>> might have a general network issue, possibly related to multicast.
>>>>> (Some switches and some hypervisor virtual networks don't play nice
>>>>> with corosync).
>>>>>
>>>>> Have you tried unicast? If not, try setting the <cman ../> element to
>>>>> have the <cman transport="udpu" ... /> attribute. Do note that unicast
>>>>> isn't as efficient as multicast, so thought it might work, I'd
>>>>> personally treat it as a debug tool to isolate the source of the
>>>>> problem.
>>>>>
>>>>> cheers
>>>>>
>>>>> digimer
>>>>>
>>>>> PS - Can you share your pacemaker configuration?
>>>>>
>>>>> On 20/10/14 03:40 PM, John Scalia wrote:
>>>>>
>>>>>> Sure, and thanks for helping.
>>>>>>
>>>>>> Here's the /etc/cluster/cluster.conf file and it is identical on all
>>>>>> three
>>>>>> systems:
>>>>>>
>>>>>> <cluster config_version="11" name="pgdb_cluster">
>>>>>>    <fence_daemon/>
>>>>>>    <clusternodes>
>>>>>>      <clusternode name="csgha1" nodeid="1">
>>>>>>        <fence>
>>>>>>          <method name="pcmk-redirect">
>>>>>>            <device name="pcmk" port="csgha1"/>
>>>>>>          </method>
>>>>>>        </fence>
>>>>>>      </clusternode>
>>>>>>      <clusternode name="csgha2" nodeid="2">
>>>>>>        <fence>
>>>>>>          <method name="pcmk-redirect">
>>>>>>            <device name="pcmk" port="csgha2"/>
>>>>>>          </method>
>>>>>>        </fence>
>>>>>>      </clusternode>
>>>>>>      <clusternode name="csgha3" nodeid="3">
>>>>>>        <fence>
>>>>>>          <method name="pcmk-redirect">
>>>>>>            <device name="pcmk" port="csgha3"/>
>>>>>>          </method>
>>>>>>        </fence>
>>>>>>      </clusternode>
>>>>>>    </clusternodes>
>>>>>>    <cman/>
>>>>>>    <fencedevices>
>>>>>>      <fencedevice agent="fence_pcmk" name="pcmk"/>
>>>>>>    </fencedevices>
>>>>>>    <rm>
>>>>>>      <failoverdomains/>
>>>>>>      <resources/>
>>>>>>    </rm>
>>>>>> </cluster>
>>>>>>
>>>>>> uname -n reports "csgha1" on that system, "csgha2" on its system, and
>>>>>> "csgha3" on the last system.
>>>>>> I don't seem to have gethostip on any of these systems, so I don't
>>>>>> know if
>>>>>> the next section helps or not.
>>>>>> "ifconfig -a" reports csgha1: eth0 = 172.17.1.21
>>>>>>                                           eth1 = 10.10.1.128
>>>>>>                              csgha2: eth0 = 10.10.1.129
>>>>>> Yeah, I know this looks a little weird, but it was the way our
>>>>>> automated VM
>>>>>> control did the interfaces
>>>>>>                                           eth1 = 172.,17.1.3
>>>>>>                              csgha3: eth0 = 172.17.1.23
>>>>>>                                           eth1 = 10.10.1.130
>>>>>> The /etc/hosts file on each system only has the 10.10.1.0/24
>>>>>> address for
>>>>>> each system in in it.
>>>>>> iptables is not running on these systems.
>>>>>>
>>>>>> Let me know if you need more information, and I very much appreciate
>>>>>> your
>>>>>> assistance.
>>>>>> --
>>>>>> Jay
>>>>>>
>>>>>> On Mon, Oct 20, 2014 at 3:18 PM, Digimer <li...@alteeve.ca> wrote:
>>>>>>
>>>>>>  On 20/10/14 02:50 PM, John Scalia wrote:
>>>>>>>
>>>>>>>  Hi all,
>>>>>>>>
>>>>>>>> I'm trying to build my first ever HA cluster and I'm using 3 VMs
>>>>>>>> running
>>>>>>>> CentOS 6.5. I followed the instructions to the letter at:
>>>>>>>>
>>>>>>>> http://clusterlabs.org/quickstart-redhat.html
>>>>>>>>
>>>>>>>> and everything appears to start normally, but if I run "cman_tool
>>>>>>>> nodes
>>>>>>>> -a", I only see:
>>>>>>>>
>>>>>>>> Node     Sts    Inc          Joined Name
>>>>>>>>           1      M     64         2014-10--20 14:00:00 csgha1
>>>>>>>>                   Addresses: 10.10.1.128
>>>>>>>>           2      X 0
>>>>>>>> csgha2
>>>>>>>>           3      X 0
>>>>>>>> csgha3
>>>>>>>>
>>>>>>>> In the other systems, the output is the same except for which
>>>>>>>> system is
>>>>>>>> shown as joined. Each shows just itself as belonging to the cluster.
>>>>>>>> Also, "pcs status" reflects similarly with non-self systems showing
>>>>>>>> offline. I've checked "netstat -an" and see each machine
>>>>>>>> listening on
>>>>>>>> ports 5405 and 5405. And the logs are rather involved, but I'm not
>>>>>>>> seeing errors in it.
>>>>>>>>
>>>>>>>> Any ideas for where to look for what's causing them to not
>>>>>>>> communicate?
>>>>>>>> --
>>>>>>>> Jay
>>>>>>>>
>>>>>>>>
>>>>>>> Can you share your cluster.conf file please? Also, for each node:
>>>>>>>
>>>>>>> * uname -n
>>>>>>> * gethostip -d $(uname -n)
>>>>>>> * ifconfig |grep -B 1 $(gethostip -d $(uname -n)) | grep HWaddr |
>>>>>>> awk '{
>>>>>>> print $1 }'
>>>>>>> * iptables-save | grep -i multi
>>>>>>>
>>>>>>> --
>>>>>>> Digimer
>>>>>>> Papers and Projects: https://alteeve.ca/w/
>>>>>>> What if the cure for cancer is trapped in the mind of a person
>>>>>>> without
>>>>>>> access to education?
>>>>>>> _______________________________________________
>>>>>>> Linux-HA mailing list
>>>>>>> Linux-HA@lists.linux-ha.org
>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>>>
>>>>>>>  _______________________________________________
>>>>>> Linux-HA mailing list
>>>>>> Linux-HA@lists.linux-ha.org
>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Linux-HA mailing list
>>>> Linux-HA@lists.linux-ha.org
>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>> See also: http://linux-ha.org/ReportingProblems
>>>>
>>>
>>>
>>>
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
>
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] New user can't get cman to recognize other systems

Reply via email to