Re: [Linux-HA] New user can't get cman to recognize other systems

2014-10-21 Thread Andrew Beekhof

> On 22 Oct 2014, at 9:16 am, Digimer  wrote:
> 
> Blocked for me, too. Possible to clone - client data?

Needless paranoia more likely.

This is the original fedora bug (nothing marked private):
   https://bugzilla.redhat.com/show_bug.cgi?id=880035

and the kbase:
   https://access.redhat.com/solutions/784373


> 
> On 21/10/14 06:14 PM, jayknowsu...@gmail.com wrote:
>> Sure! But i can't seem to get Redhat to let me see the bug, even though I 
>> have an account.
>> 
>> Sent from my iPad
>> 
>>> On Oct 21, 2014, at 5:51 PM, Andrew Beekhof  wrote:
>>> 
>>> 
 On 22 Oct 2014, at 7:36 am, jayknowsu...@gmail.com wrote:
 
 Yep, my network engineer and I found that the multicast packets were being 
 blocked by the underlying hypervisor for the VM systems.
>>> 
>>> Yeah, that'll happen :-(
>>> I believe its fixed in newer kernels, but for a while there multicast would 
>>> appear to work and then stop for no good reason.
>>> Putting the device into promiscuous mode seemed to help IIRC.
>>> 
>>> This is the bug I knew it as: 
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1090670
>>> 
>>> 
>>> 
 At first we thought it was just iptables on the servers, but i was certain 
 I had actually turned that off. The issue has been bumped up to the 
 operations team for a fixing this, but since I've gotten it to work with 
 unicast, there's no pressure
 
 Sent from my iPad
 
> On Oct 21, 2014, at 3:15 PM, Digimer  wrote:
> 
> Glad you sorted it out!
> 
> So then, it was almost certainly a multicast issue. I would still 
> strongly recommend trying to source and fix the problem, and reverting to 
> mcast if you can. More efficient. :)
> 
> digimer
> 
>> On 21/10/14 02:59 PM, John Scalia wrote:
>> Ok, got it working after a little more effort, and the cluster is now
>> properly reporting.
>> 
>>> On Tue, Oct 21, 2014 at 1:34 PM, John Scalia  
>>> wrote:
>>> 
>>> So, I set "transport="udpi"' in the cluster.conf file, and it now looks
>>> like this:
>>> 
>>> 
>>> 
>>> 
>>> 
>>>   
>>> 
>>>   
>>> 
>>>   
>>> 
>>>   
>>>   
>>> 
>>>   
>>> 
>>>   
>>> 
>>>   
>>>   
>>> 
>>>   
>>> 
>>>   
>>> 
>>>   
>>> 
>>> 
>>> 
>>>   
>>> 
>>> 
>>>   
>>>   
>>> 
>>> 
>>> 
>>> But, after restarting the cluster I don't see any difference. Did I do
>>> something wrong?
>>> --
>>> Jay
>>> 
 On Tue, Oct 21, 2014 at 12:25 PM, Digimer  wrote:
 
 No, you don't need to specify anything in cluster.conf for unicast to
 work. Corosync will divine the IPs by resolving the node names to IPs. 
 If
 you set multicast and don't want to use the auto-selected mcast IP, 
 then
 you can specify the mcast IP group to use via .
 
 digimer
 
 
> On 21/10/14 12:22 PM, John Scalia wrote:
> 
> OK, looking at the cman man page on this system, I see the line saying
> "the corosync.conf file is not used." So, I'm guessing I need to set a
> unicast address somewhere in the cluster.conf file, but the man page
> only mentions the  parameter. What can I use to
> set this to a unicast address for ports 5404 and 5405? I'm assuming I
> can't just put a unicast address for the multicast parameter, and the
> man page for cluster.conf wasn't much help either.
> 
> We're still working on having the security team permit these 3 systems
> to use multicast.
> 
>> On 10/21/2014 11:51 AM, Digimer wrote:
>> 
>> Keep us posted. :)
>> 
>>> On 21/10/14 08:40 AM, John Scalia wrote:
>>> 
>>> I've been check hostname resolution this morning, and all the 
>>> systems
>>> are listed in each /etc/hosts file (No DNS in this environment.) and
>>> ping works on every system both to itself and all the other 
>>> systems. At
>>> least it's working on the 10.10.1.0/24 network.
>>> 
>>> I ran tcpdump trying to see what traffic is on port 5405 on each
>>> system,
>>> and I'm only seeing outbound on each, even though netstat shows 
>>> each is
>>> listening on the multicast address. My suspicion is that the router 
>>> is
>>> eating the multicast broadcasts, so I may try the unicast address
>>> instead, but I'm waiting on one of our network engineers to see if 
>>> my
>>> suspicion is correct about the router. He volunteered to help late
>>> yesterday.
>>> 
 On 10/20/2014 4:34 PM, Digimer wrote:
 
 It looks sane on the su

Re: [Linux-HA] New user can't get cman to recognize other systems

2014-10-21 Thread Digimer

Blocked for me, too. Possible to clone - client data?

On 21/10/14 06:14 PM, jayknowsu...@gmail.com wrote:

Sure! But i can't seem to get Redhat to let me see the bug, even though I have 
an account.

Sent from my iPad


On Oct 21, 2014, at 5:51 PM, Andrew Beekhof  wrote:



On 22 Oct 2014, at 7:36 am, jayknowsu...@gmail.com wrote:

Yep, my network engineer and I found that the multicast packets were being 
blocked by the underlying hypervisor for the VM systems.


Yeah, that'll happen :-(
I believe its fixed in newer kernels, but for a while there multicast would 
appear to work and then stop for no good reason.
Putting the device into promiscuous mode seemed to help IIRC.

This is the bug I knew it as: 
https://bugzilla.redhat.com/show_bug.cgi?id=1090670




At first we thought it was just iptables on the servers, but i was certain I 
had actually turned that off. The issue has been bumped up to the operations 
team for a fixing this, but since I've gotten it to work with unicast, there's 
no pressure

Sent from my iPad


On Oct 21, 2014, at 3:15 PM, Digimer  wrote:

Glad you sorted it out!

So then, it was almost certainly a multicast issue. I would still strongly 
recommend trying to source and fix the problem, and reverting to mcast if you 
can. More efficient. :)

digimer


On 21/10/14 02:59 PM, John Scalia wrote:
Ok, got it working after a little more effort, and the cluster is now
properly reporting.


On Tue, Oct 21, 2014 at 1:34 PM, John Scalia  wrote:

So, I set "transport="udpi"' in the cluster.conf file, and it now looks
like this:





   
 
   
 
   
 
   
   
 
   
 
   
 
   
   
 
   
 
   
 
   



   


   
   



But, after restarting the cluster I don't see any difference. Did I do
something wrong?
--
Jay


On Tue, Oct 21, 2014 at 12:25 PM, Digimer  wrote:

No, you don't need to specify anything in cluster.conf for unicast to
work. Corosync will divine the IPs by resolving the node names to IPs. If
you set multicast and don't want to use the auto-selected mcast IP, then
you can specify the mcast IP group to use via .

digimer



On 21/10/14 12:22 PM, John Scalia wrote:

OK, looking at the cman man page on this system, I see the line saying
"the corosync.conf file is not used." So, I'm guessing I need to set a
unicast address somewhere in the cluster.conf file, but the man page
only mentions the  parameter. What can I use to
set this to a unicast address for ports 5404 and 5405? I'm assuming I
can't just put a unicast address for the multicast parameter, and the
man page for cluster.conf wasn't much help either.

We're still working on having the security team permit these 3 systems
to use multicast.


On 10/21/2014 11:51 AM, Digimer wrote:

Keep us posted. :)


On 21/10/14 08:40 AM, John Scalia wrote:

I've been check hostname resolution this morning, and all the systems
are listed in each /etc/hosts file (No DNS in this environment.) and
ping works on every system both to itself and all the other systems. At
least it's working on the 10.10.1.0/24 network.

I ran tcpdump trying to see what traffic is on port 5405 on each
system,
and I'm only seeing outbound on each, even though netstat shows each is
listening on the multicast address. My suspicion is that the router is
eating the multicast broadcasts, so I may try the unicast address
instead, but I'm waiting on one of our network engineers to see if my
suspicion is correct about the router. He volunteered to help late
yesterday.


On 10/20/2014 4:34 PM, Digimer wrote:

It looks sane on the surface. The 'gethostip' tool comes from the
'syslinux' package, and it's really handy! The '-d' says to give the
IP in dotted-decimanl notation only.

What I was trying to see was whether the 'uname -n' resolved to the IP
on the same network card as the other nodes. This is how corosync
decides which interface to send cluster traffic onto. I suspect you
might have a general network issue, possibly related to multicast.
(Some switches and some hypervisor virtual networks don't play nice
with corosync).

Have you tried unicast? If not, try setting the  element to
have the  attribute. Do note that unicast
isn't as efficient as multicast, so thought it might work, I'd
personally treat it as a debug tool to isolate the source of the
problem.

cheers

digimer

PS - Can you share your pacemaker configuration?


On 20/10/14 03:40 PM, John Scalia wrote:

Sure, and thanks for helping.

Here's the /etc/cluster/cluster.conf file and it is identical on all
three
systems:


  
  

  

  

  


  

  

  


  

  

  

  
  
  

  
  


  


uname -n reports "csgha1" on that system, "csgha2" on its system, and
"csgha3" on the last system.
I don't seem to have gethostip on any of these systems, so I don't
know if
the next section helps or not.
"ifconfig -a" repor

Re: [Linux-HA] New user can't get cman to recognize other systems

2014-10-21 Thread jayknowsunix
Sure! But i can't seem to get Redhat to let me see the bug, even though I have 
an account.

Sent from my iPad

> On Oct 21, 2014, at 5:51 PM, Andrew Beekhof  wrote:
> 
> 
>> On 22 Oct 2014, at 7:36 am, jayknowsu...@gmail.com wrote:
>> 
>> Yep, my network engineer and I found that the multicast packets were being 
>> blocked by the underlying hypervisor for the VM systems.
> 
> Yeah, that'll happen :-(
> I believe its fixed in newer kernels, but for a while there multicast would 
> appear to work and then stop for no good reason.
> Putting the device into promiscuous mode seemed to help IIRC.
> 
> This is the bug I knew it as: 
> https://bugzilla.redhat.com/show_bug.cgi?id=1090670
> 
> 
> 
>> At first we thought it was just iptables on the servers, but i was certain I 
>> had actually turned that off. The issue has been bumped up to the operations 
>> team for a fixing this, but since I've gotten it to work with unicast, 
>> there's no pressure
>> 
>> Sent from my iPad
>> 
>>> On Oct 21, 2014, at 3:15 PM, Digimer  wrote:
>>> 
>>> Glad you sorted it out!
>>> 
>>> So then, it was almost certainly a multicast issue. I would still strongly 
>>> recommend trying to source and fix the problem, and reverting to mcast if 
>>> you can. More efficient. :)
>>> 
>>> digimer
>>> 
 On 21/10/14 02:59 PM, John Scalia wrote:
 Ok, got it working after a little more effort, and the cluster is now
 properly reporting.
 
> On Tue, Oct 21, 2014 at 1:34 PM, John Scalia  
> wrote:
> 
> So, I set "transport="udpi"' in the cluster.conf file, and it now looks
> like this:
> 
> 
> 
> 
> 
>   
> 
>   
> 
>   
> 
>   
>   
> 
>   
> 
>   
> 
>   
>   
> 
>   
> 
>   
> 
>   
> 
> 
> 
>   
> 
> 
>   
>   
> 
> 
> 
> But, after restarting the cluster I don't see any difference. Did I do
> something wrong?
> --
> Jay
> 
>> On Tue, Oct 21, 2014 at 12:25 PM, Digimer  wrote:
>> 
>> No, you don't need to specify anything in cluster.conf for unicast to
>> work. Corosync will divine the IPs by resolving the node names to IPs. If
>> you set multicast and don't want to use the auto-selected mcast IP, then
>> you can specify the mcast IP group to use via .
>> 
>> digimer
>> 
>> 
>>> On 21/10/14 12:22 PM, John Scalia wrote:
>>> 
>>> OK, looking at the cman man page on this system, I see the line saying
>>> "the corosync.conf file is not used." So, I'm guessing I need to set a
>>> unicast address somewhere in the cluster.conf file, but the man page
>>> only mentions the  parameter. What can I use to
>>> set this to a unicast address for ports 5404 and 5405? I'm assuming I
>>> can't just put a unicast address for the multicast parameter, and the
>>> man page for cluster.conf wasn't much help either.
>>> 
>>> We're still working on having the security team permit these 3 systems
>>> to use multicast.
>>> 
 On 10/21/2014 11:51 AM, Digimer wrote:
 
 Keep us posted. :)
 
> On 21/10/14 08:40 AM, John Scalia wrote:
> 
> I've been check hostname resolution this morning, and all the systems
> are listed in each /etc/hosts file (No DNS in this environment.) and
> ping works on every system both to itself and all the other systems. 
> At
> least it's working on the 10.10.1.0/24 network.
> 
> I ran tcpdump trying to see what traffic is on port 5405 on each
> system,
> and I'm only seeing outbound on each, even though netstat shows each 
> is
> listening on the multicast address. My suspicion is that the router is
> eating the multicast broadcasts, so I may try the unicast address
> instead, but I'm waiting on one of our network engineers to see if my
> suspicion is correct about the router. He volunteered to help late
> yesterday.
> 
>> On 10/20/2014 4:34 PM, Digimer wrote:
>> 
>> It looks sane on the surface. The 'gethostip' tool comes from the
>> 'syslinux' package, and it's really handy! The '-d' says to give the
>> IP in dotted-decimanl notation only.
>> 
>> What I was trying to see was whether the 'uname -n' resolved to the 
>> IP
>> on the same network card as the other nodes. This is how corosync
>> decides which interface to send cluster traffic onto. I suspect you
>> might have a general network issue, possibly related to multicast.
>> (Some switches and some hypervisor virtual networks don't play nice
>> with corosync).
>> 
>> Have you tried unicast? If not, try setting the  element to
>> hav

Re: [Linux-HA] New user can't get cman to recognize other systems

2014-10-21 Thread Andrew Beekhof

> On 22 Oct 2014, at 7:36 am, jayknowsu...@gmail.com wrote:
> 
> Yep, my network engineer and I found that the multicast packets were being 
> blocked by the underlying hypervisor for the VM systems.

Yeah, that'll happen :-(
I believe its fixed in newer kernels, but for a while there multicast would 
appear to work and then stop for no good reason.
Putting the device into promiscuous mode seemed to help IIRC.

This is the bug I knew it as: 
https://bugzilla.redhat.com/show_bug.cgi?id=1090670



> At first we thought it was just iptables on the servers, but i was certain I 
> had actually turned that off. The issue has been bumped up to the operations 
> team for a fixing this, but since I've gotten it to work with unicast, 
> there's no pressure
> 
> Sent from my iPad
> 
>> On Oct 21, 2014, at 3:15 PM, Digimer  wrote:
>> 
>> Glad you sorted it out!
>> 
>> So then, it was almost certainly a multicast issue. I would still strongly 
>> recommend trying to source and fix the problem, and reverting to mcast if 
>> you can. More efficient. :)
>> 
>> digimer
>> 
>>> On 21/10/14 02:59 PM, John Scalia wrote:
>>> Ok, got it working after a little more effort, and the cluster is now
>>> properly reporting.
>>> 
 On Tue, Oct 21, 2014 at 1:34 PM, John Scalia  
 wrote:
 
 So, I set "transport="udpi"' in the cluster.conf file, and it now looks
 like this:
 
 
 
  
  

  

  

  


  

  

  


  

  

  

  
  
  

  
  


  
 
 
 But, after restarting the cluster I don't see any difference. Did I do
 something wrong?
 --
 Jay
 
> On Tue, Oct 21, 2014 at 12:25 PM, Digimer  wrote:
> 
> No, you don't need to specify anything in cluster.conf for unicast to
> work. Corosync will divine the IPs by resolving the node names to IPs. If
> you set multicast and don't want to use the auto-selected mcast IP, then
> you can specify the mcast IP group to use via .
> 
> digimer
> 
> 
>> On 21/10/14 12:22 PM, John Scalia wrote:
>> 
>> OK, looking at the cman man page on this system, I see the line saying
>> "the corosync.conf file is not used." So, I'm guessing I need to set a
>> unicast address somewhere in the cluster.conf file, but the man page
>> only mentions the  parameter. What can I use to
>> set this to a unicast address for ports 5404 and 5405? I'm assuming I
>> can't just put a unicast address for the multicast parameter, and the
>> man page for cluster.conf wasn't much help either.
>> 
>> We're still working on having the security team permit these 3 systems
>> to use multicast.
>> 
>>> On 10/21/2014 11:51 AM, Digimer wrote:
>>> 
>>> Keep us posted. :)
>>> 
 On 21/10/14 08:40 AM, John Scalia wrote:
 
 I've been check hostname resolution this morning, and all the systems
 are listed in each /etc/hosts file (No DNS in this environment.) and
 ping works on every system both to itself and all the other systems. At
 least it's working on the 10.10.1.0/24 network.
 
 I ran tcpdump trying to see what traffic is on port 5405 on each
 system,
 and I'm only seeing outbound on each, even though netstat shows each is
 listening on the multicast address. My suspicion is that the router is
 eating the multicast broadcasts, so I may try the unicast address
 instead, but I'm waiting on one of our network engineers to see if my
 suspicion is correct about the router. He volunteered to help late
 yesterday.
 
> On 10/20/2014 4:34 PM, Digimer wrote:
> 
> It looks sane on the surface. The 'gethostip' tool comes from the
> 'syslinux' package, and it's really handy! The '-d' says to give the
> IP in dotted-decimanl notation only.
> 
> What I was trying to see was whether the 'uname -n' resolved to the IP
> on the same network card as the other nodes. This is how corosync
> decides which interface to send cluster traffic onto. I suspect you
> might have a general network issue, possibly related to multicast.
> (Some switches and some hypervisor virtual networks don't play nice
> with corosync).
> 
> Have you tried unicast? If not, try setting the  element to
> have the  attribute. Do note that unicast
> isn't as efficient as multicast, so thought it might work, I'd
> personally treat it as a debug tool to isolate the source of the
> problem.
> 
> cheers
> 
> digimer
> 
> PS - Can you share your pacemaker configu

Re: [Linux-HA] New user can't get cman to recognize other systems

2014-10-21 Thread jayknowsunix
Yep, my network engineer and I found that the multicast packets were being 
blocked by the underlying hypervisor for the VM systems. At first we thought it 
was just iptables on the servers, but i was certain I had actually turned that 
off. The issue has been bumped up to the operations team for a fixing this, but 
since I've gotten it to work with unicast, there's no pressure

Sent from my iPad

> On Oct 21, 2014, at 3:15 PM, Digimer  wrote:
> 
> Glad you sorted it out!
> 
> So then, it was almost certainly a multicast issue. I would still strongly 
> recommend trying to source and fix the problem, and reverting to mcast if you 
> can. More efficient. :)
> 
> digimer
> 
>> On 21/10/14 02:59 PM, John Scalia wrote:
>> Ok, got it working after a little more effort, and the cluster is now
>> properly reporting.
>> 
>>> On Tue, Oct 21, 2014 at 1:34 PM, John Scalia  wrote:
>>> 
>>> So, I set "transport="udpi"' in the cluster.conf file, and it now looks
>>> like this:
>>> 
>>> 
>>> 
>>>   
>>>   
>>> 
>>>   
>>> 
>>>   
>>> 
>>>   
>>> 
>>> 
>>>   
>>> 
>>>   
>>> 
>>>   
>>> 
>>> 
>>>   
>>> 
>>>   
>>> 
>>>   
>>> 
>>>   
>>>   
>>>   
>>> 
>>>   
>>>   
>>> 
>>> 
>>>   
>>> 
>>> 
>>> But, after restarting the cluster I don't see any difference. Did I do
>>> something wrong?
>>> --
>>> Jay
>>> 
 On Tue, Oct 21, 2014 at 12:25 PM, Digimer  wrote:
 
 No, you don't need to specify anything in cluster.conf for unicast to
 work. Corosync will divine the IPs by resolving the node names to IPs. If
 you set multicast and don't want to use the auto-selected mcast IP, then
 you can specify the mcast IP group to use via .
 
 digimer
 
 
> On 21/10/14 12:22 PM, John Scalia wrote:
> 
> OK, looking at the cman man page on this system, I see the line saying
> "the corosync.conf file is not used." So, I'm guessing I need to set a
> unicast address somewhere in the cluster.conf file, but the man page
> only mentions the  parameter. What can I use to
> set this to a unicast address for ports 5404 and 5405? I'm assuming I
> can't just put a unicast address for the multicast parameter, and the
> man page for cluster.conf wasn't much help either.
> 
> We're still working on having the security team permit these 3 systems
> to use multicast.
> 
>> On 10/21/2014 11:51 AM, Digimer wrote:
>> 
>> Keep us posted. :)
>> 
>>> On 21/10/14 08:40 AM, John Scalia wrote:
>>> 
>>> I've been check hostname resolution this morning, and all the systems
>>> are listed in each /etc/hosts file (No DNS in this environment.) and
>>> ping works on every system both to itself and all the other systems. At
>>> least it's working on the 10.10.1.0/24 network.
>>> 
>>> I ran tcpdump trying to see what traffic is on port 5405 on each
>>> system,
>>> and I'm only seeing outbound on each, even though netstat shows each is
>>> listening on the multicast address. My suspicion is that the router is
>>> eating the multicast broadcasts, so I may try the unicast address
>>> instead, but I'm waiting on one of our network engineers to see if my
>>> suspicion is correct about the router. He volunteered to help late
>>> yesterday.
>>> 
 On 10/20/2014 4:34 PM, Digimer wrote:
 
 It looks sane on the surface. The 'gethostip' tool comes from the
 'syslinux' package, and it's really handy! The '-d' says to give the
 IP in dotted-decimanl notation only.
 
 What I was trying to see was whether the 'uname -n' resolved to the IP
 on the same network card as the other nodes. This is how corosync
 decides which interface to send cluster traffic onto. I suspect you
 might have a general network issue, possibly related to multicast.
 (Some switches and some hypervisor virtual networks don't play nice
 with corosync).
 
 Have you tried unicast? If not, try setting the  element to
 have the  attribute. Do note that unicast
 isn't as efficient as multicast, so thought it might work, I'd
 personally treat it as a debug tool to isolate the source of the
 problem.
 
 cheers
 
 digimer
 
 PS - Can you share your pacemaker configuration?
 
> On 20/10/14 03:40 PM, John Scalia wrote:
> 
> Sure, and thanks for helping.
> 
> Here's the /etc/cluster/cluster.conf file and it is identical on all
> three
> systems:
> 
> 
>
>
>  
>
>  
>
>  
>
>  
>  
>
>  
>

Re: [Linux-HA] New user can't get cman to recognize other systems

2014-10-21 Thread Digimer

Glad you sorted it out!

So then, it was almost certainly a multicast issue. I would still 
strongly recommend trying to source and fix the problem, and reverting 
to mcast if you can. More efficient. :)


digimer

On 21/10/14 02:59 PM, John Scalia wrote:

Ok, got it working after a little more effort, and the cluster is now
properly reporting.

On Tue, Oct 21, 2014 at 1:34 PM, John Scalia  wrote:


So, I set "transport="udpi"' in the cluster.conf file, and it now looks
like this:



   
   
 
   
 
   
 
   
 
 
   
 
   
 
   
 
 
   
 
   
 
   
 
   
   
   
 
   
   
 
 
   


But, after restarting the cluster I don't see any difference. Did I do
something wrong?
--
Jay

On Tue, Oct 21, 2014 at 12:25 PM, Digimer  wrote:


No, you don't need to specify anything in cluster.conf for unicast to
work. Corosync will divine the IPs by resolving the node names to IPs. If
you set multicast and don't want to use the auto-selected mcast IP, then
you can specify the mcast IP group to use via .

digimer


On 21/10/14 12:22 PM, John Scalia wrote:


OK, looking at the cman man page on this system, I see the line saying
"the corosync.conf file is not used." So, I'm guessing I need to set a
unicast address somewhere in the cluster.conf file, but the man page
only mentions the  parameter. What can I use to
set this to a unicast address for ports 5404 and 5405? I'm assuming I
can't just put a unicast address for the multicast parameter, and the
man page for cluster.conf wasn't much help either.

We're still working on having the security team permit these 3 systems
to use multicast.

On 10/21/2014 11:51 AM, Digimer wrote:


Keep us posted. :)

On 21/10/14 08:40 AM, John Scalia wrote:


I've been check hostname resolution this morning, and all the systems
are listed in each /etc/hosts file (No DNS in this environment.) and
ping works on every system both to itself and all the other systems. At
least it's working on the 10.10.1.0/24 network.

I ran tcpdump trying to see what traffic is on port 5405 on each
system,
and I'm only seeing outbound on each, even though netstat shows each is
listening on the multicast address. My suspicion is that the router is
eating the multicast broadcasts, so I may try the unicast address
instead, but I'm waiting on one of our network engineers to see if my
suspicion is correct about the router. He volunteered to help late
yesterday.

On 10/20/2014 4:34 PM, Digimer wrote:


It looks sane on the surface. The 'gethostip' tool comes from the
'syslinux' package, and it's really handy! The '-d' says to give the
IP in dotted-decimanl notation only.

What I was trying to see was whether the 'uname -n' resolved to the IP
on the same network card as the other nodes. This is how corosync
decides which interface to send cluster traffic onto. I suspect you
might have a general network issue, possibly related to multicast.
(Some switches and some hypervisor virtual networks don't play nice
with corosync).

Have you tried unicast? If not, try setting the  element to
have the  attribute. Do note that unicast
isn't as efficient as multicast, so thought it might work, I'd
personally treat it as a debug tool to isolate the source of the
problem.

cheers

digimer

PS - Can you share your pacemaker configuration?

On 20/10/14 03:40 PM, John Scalia wrote:


Sure, and thanks for helping.

Here's the /etc/cluster/cluster.conf file and it is identical on all
three
systems:




  

  

  

  
  

  

  

  
  

  

  

  



  


  
  



uname -n reports "csgha1" on that system, "csgha2" on its system, and
"csgha3" on the last system.
I don't seem to have gethostip on any of these systems, so I don't
know if
the next section helps or not.
"ifconfig -a" reports csgha1: eth0 = 172.17.1.21
   eth1 = 10.10.1.128
  csgha2: eth0 = 10.10.1.129
Yeah, I know this looks a little weird, but it was the way our
automated VM
control did the interfaces
   eth1 = 172.,17.1.3
  csgha3: eth0 = 172.17.1.23
   eth1 = 10.10.1.130
The /etc/hosts file on each system only has the 10.10.1.0/24
address for
each system in in it.
iptables is not running on these systems.

Let me know if you need more information, and I very much appreciate
your
assistance.
--
Jay

On Mon, Oct 20, 2014 at 3:18 PM, Digimer  wrote:

  On 20/10/14 02:50 PM, John Scalia wrote:


  Hi all,


I'm trying to build my first ever HA cluster and I'm using 3 VMs
running
CentOS 6.5. I followed the instructions to the letter at:

http://clusterlabs.org/quickstart-redhat.html

Re: [Linux-HA] New user can't get cman to recognize other systems

2014-10-21 Thread John Scalia
Ok, got it working after a little more effort, and the cluster is now
properly reporting.

On Tue, Oct 21, 2014 at 1:34 PM, John Scalia  wrote:

> So, I set "transport="udpi"' in the cluster.conf file, and it now looks
> like this:
>
> 
>
>   
>   
> 
>   
> 
>   
> 
>   
> 
> 
>   
> 
>   
> 
>   
> 
> 
>   
> 
>   
> 
>   
> 
>   
>   
>   
> 
>   
>   
> 
> 
>   
> 
>
> But, after restarting the cluster I don't see any difference. Did I do
> something wrong?
> --
> Jay
>
> On Tue, Oct 21, 2014 at 12:25 PM, Digimer  wrote:
>
>> No, you don't need to specify anything in cluster.conf for unicast to
>> work. Corosync will divine the IPs by resolving the node names to IPs. If
>> you set multicast and don't want to use the auto-selected mcast IP, then
>> you can specify the mcast IP group to use via .
>>
>> digimer
>>
>>
>> On 21/10/14 12:22 PM, John Scalia wrote:
>>
>>> OK, looking at the cman man page on this system, I see the line saying
>>> "the corosync.conf file is not used." So, I'm guessing I need to set a
>>> unicast address somewhere in the cluster.conf file, but the man page
>>> only mentions the  parameter. What can I use to
>>> set this to a unicast address for ports 5404 and 5405? I'm assuming I
>>> can't just put a unicast address for the multicast parameter, and the
>>> man page for cluster.conf wasn't much help either.
>>>
>>> We're still working on having the security team permit these 3 systems
>>> to use multicast.
>>>
>>> On 10/21/2014 11:51 AM, Digimer wrote:
>>>
 Keep us posted. :)

 On 21/10/14 08:40 AM, John Scalia wrote:

> I've been check hostname resolution this morning, and all the systems
> are listed in each /etc/hosts file (No DNS in this environment.) and
> ping works on every system both to itself and all the other systems. At
> least it's working on the 10.10.1.0/24 network.
>
> I ran tcpdump trying to see what traffic is on port 5405 on each
> system,
> and I'm only seeing outbound on each, even though netstat shows each is
> listening on the multicast address. My suspicion is that the router is
> eating the multicast broadcasts, so I may try the unicast address
> instead, but I'm waiting on one of our network engineers to see if my
> suspicion is correct about the router. He volunteered to help late
> yesterday.
>
> On 10/20/2014 4:34 PM, Digimer wrote:
>
>> It looks sane on the surface. The 'gethostip' tool comes from the
>> 'syslinux' package, and it's really handy! The '-d' says to give the
>> IP in dotted-decimanl notation only.
>>
>> What I was trying to see was whether the 'uname -n' resolved to the IP
>> on the same network card as the other nodes. This is how corosync
>> decides which interface to send cluster traffic onto. I suspect you
>> might have a general network issue, possibly related to multicast.
>> (Some switches and some hypervisor virtual networks don't play nice
>> with corosync).
>>
>> Have you tried unicast? If not, try setting the  element to
>> have the  attribute. Do note that unicast
>> isn't as efficient as multicast, so thought it might work, I'd
>> personally treat it as a debug tool to isolate the source of the
>> problem.
>>
>> cheers
>>
>> digimer
>>
>> PS - Can you share your pacemaker configuration?
>>
>> On 20/10/14 03:40 PM, John Scalia wrote:
>>
>>> Sure, and thanks for helping.
>>>
>>> Here's the /etc/cluster/cluster.conf file and it is identical on all
>>> three
>>> systems:
>>>
>>> 
>>>
>>>
>>>  
>>>
>>>  
>>>
>>>  
>>>
>>>  
>>>  
>>>
>>>  
>>>
>>>  
>>>
>>>  
>>>  
>>>
>>>  
>>>
>>>  
>>>
>>>  
>>>
>>>
>>>
>>>  
>>>
>>>
>>>  
>>>  
>>>
>>> 
>>>
>>> uname -n reports "csgha1" on that system, "csgha2" on its system, and
>>> "csgha3" on the last system.
>>> I don't seem to have gethostip on any of these systems, so I don't
>>> know if
>>> the next section helps or not.
>>> "ifconfig -a" reports csgha1: eth0 = 172.17.1.21
>>>   eth1 = 10.10.1.128
>>>  csgha2: eth0 = 10.10.1.129
>>> Yeah, I know this looks a little weird, but it was the way our
>>> automated VM
>>> control did the interfaces
>>>   eth1 = 172.,17.1.3
>>>  csgha3: eth0 = 172.17.1.23
>>>  

Re: [Linux-HA] New user can't get cman to recognize other systems

2014-10-21 Thread John Scalia
So, I set "transport="udpi"' in the cluster.conf file, and it now looks
like this:


  
  

  

  

  


  

  

  


  

  

  

  
  
  

  
  


  


But, after restarting the cluster I don't see any difference. Did I do
something wrong?
--
Jay

On Tue, Oct 21, 2014 at 12:25 PM, Digimer  wrote:

> No, you don't need to specify anything in cluster.conf for unicast to
> work. Corosync will divine the IPs by resolving the node names to IPs. If
> you set multicast and don't want to use the auto-selected mcast IP, then
> you can specify the mcast IP group to use via .
>
> digimer
>
>
> On 21/10/14 12:22 PM, John Scalia wrote:
>
>> OK, looking at the cman man page on this system, I see the line saying
>> "the corosync.conf file is not used." So, I'm guessing I need to set a
>> unicast address somewhere in the cluster.conf file, but the man page
>> only mentions the  parameter. What can I use to
>> set this to a unicast address for ports 5404 and 5405? I'm assuming I
>> can't just put a unicast address for the multicast parameter, and the
>> man page for cluster.conf wasn't much help either.
>>
>> We're still working on having the security team permit these 3 systems
>> to use multicast.
>>
>> On 10/21/2014 11:51 AM, Digimer wrote:
>>
>>> Keep us posted. :)
>>>
>>> On 21/10/14 08:40 AM, John Scalia wrote:
>>>
 I've been check hostname resolution this morning, and all the systems
 are listed in each /etc/hosts file (No DNS in this environment.) and
 ping works on every system both to itself and all the other systems. At
 least it's working on the 10.10.1.0/24 network.

 I ran tcpdump trying to see what traffic is on port 5405 on each system,
 and I'm only seeing outbound on each, even though netstat shows each is
 listening on the multicast address. My suspicion is that the router is
 eating the multicast broadcasts, so I may try the unicast address
 instead, but I'm waiting on one of our network engineers to see if my
 suspicion is correct about the router. He volunteered to help late
 yesterday.

 On 10/20/2014 4:34 PM, Digimer wrote:

> It looks sane on the surface. The 'gethostip' tool comes from the
> 'syslinux' package, and it's really handy! The '-d' says to give the
> IP in dotted-decimanl notation only.
>
> What I was trying to see was whether the 'uname -n' resolved to the IP
> on the same network card as the other nodes. This is how corosync
> decides which interface to send cluster traffic onto. I suspect you
> might have a general network issue, possibly related to multicast.
> (Some switches and some hypervisor virtual networks don't play nice
> with corosync).
>
> Have you tried unicast? If not, try setting the  element to
> have the  attribute. Do note that unicast
> isn't as efficient as multicast, so thought it might work, I'd
> personally treat it as a debug tool to isolate the source of the
> problem.
>
> cheers
>
> digimer
>
> PS - Can you share your pacemaker configuration?
>
> On 20/10/14 03:40 PM, John Scalia wrote:
>
>> Sure, and thanks for helping.
>>
>> Here's the /etc/cluster/cluster.conf file and it is identical on all
>> three
>> systems:
>>
>> 
>>
>>
>>  
>>
>>  
>>
>>  
>>
>>  
>>  
>>
>>  
>>
>>  
>>
>>  
>>  
>>
>>  
>>
>>  
>>
>>  
>>
>>
>>
>>  
>>
>>
>>  
>>  
>>
>> 
>>
>> uname -n reports "csgha1" on that system, "csgha2" on its system, and
>> "csgha3" on the last system.
>> I don't seem to have gethostip on any of these systems, so I don't
>> know if
>> the next section helps or not.
>> "ifconfig -a" reports csgha1: eth0 = 172.17.1.21
>>   eth1 = 10.10.1.128
>>  csgha2: eth0 = 10.10.1.129
>> Yeah, I know this looks a little weird, but it was the way our
>> automated VM
>> control did the interfaces
>>   eth1 = 172.,17.1.3
>>  csgha3: eth0 = 172.17.1.23
>>   eth1 = 10.10.1.130
>> The /etc/hosts file on each system only has the 10.10.1.0/24
>> address for
>> each system in in it.
>> iptables is not running on these systems.
>>
>> Let me know if you need more information, and I very much appreciate
>> your
>> assistance.
>> --
>> Jay
>>
>> On Mon, Oct 20, 201

Re: [Linux-HA] New user can't get cman to recognize other systems

2014-10-21 Thread Digimer
No, you don't need to specify anything in cluster.conf for unicast to 
work. Corosync will divine the IPs by resolving the node names to IPs. 
If you set multicast and don't want to use the auto-selected mcast IP, 
then you can specify the mcast IP group to use via .


digimer

On 21/10/14 12:22 PM, John Scalia wrote:

OK, looking at the cman man page on this system, I see the line saying
"the corosync.conf file is not used." So, I'm guessing I need to set a
unicast address somewhere in the cluster.conf file, but the man page
only mentions the  parameter. What can I use to
set this to a unicast address for ports 5404 and 5405? I'm assuming I
can't just put a unicast address for the multicast parameter, and the
man page for cluster.conf wasn't much help either.

We're still working on having the security team permit these 3 systems
to use multicast.

On 10/21/2014 11:51 AM, Digimer wrote:

Keep us posted. :)

On 21/10/14 08:40 AM, John Scalia wrote:

I've been check hostname resolution this morning, and all the systems
are listed in each /etc/hosts file (No DNS in this environment.) and
ping works on every system both to itself and all the other systems. At
least it's working on the 10.10.1.0/24 network.

I ran tcpdump trying to see what traffic is on port 5405 on each system,
and I'm only seeing outbound on each, even though netstat shows each is
listening on the multicast address. My suspicion is that the router is
eating the multicast broadcasts, so I may try the unicast address
instead, but I'm waiting on one of our network engineers to see if my
suspicion is correct about the router. He volunteered to help late
yesterday.

On 10/20/2014 4:34 PM, Digimer wrote:

It looks sane on the surface. The 'gethostip' tool comes from the
'syslinux' package, and it's really handy! The '-d' says to give the
IP in dotted-decimanl notation only.

What I was trying to see was whether the 'uname -n' resolved to the IP
on the same network card as the other nodes. This is how corosync
decides which interface to send cluster traffic onto. I suspect you
might have a general network issue, possibly related to multicast.
(Some switches and some hypervisor virtual networks don't play nice
with corosync).

Have you tried unicast? If not, try setting the  element to
have the  attribute. Do note that unicast
isn't as efficient as multicast, so thought it might work, I'd
personally treat it as a debug tool to isolate the source of the
problem.

cheers

digimer

PS - Can you share your pacemaker configuration?

On 20/10/14 03:40 PM, John Scalia wrote:

Sure, and thanks for helping.

Here's the /etc/cluster/cluster.conf file and it is identical on all
three
systems:


   
   
 
   
 
   
 
   
 
 
   
 
   
 
   
 
 
   
 
   
 
   
 
   
   
   
 
   
   
 
 
   


uname -n reports "csgha1" on that system, "csgha2" on its system, and
"csgha3" on the last system.
I don't seem to have gethostip on any of these systems, so I don't
know if
the next section helps or not.
"ifconfig -a" reports csgha1: eth0 = 172.17.1.21
  eth1 = 10.10.1.128
 csgha2: eth0 = 10.10.1.129
Yeah, I know this looks a little weird, but it was the way our
automated VM
control did the interfaces
  eth1 = 172.,17.1.3
 csgha3: eth0 = 172.17.1.23
  eth1 = 10.10.1.130
The /etc/hosts file on each system only has the 10.10.1.0/24
address for
each system in in it.
iptables is not running on these systems.

Let me know if you need more information, and I very much appreciate
your
assistance.
--
Jay

On Mon, Oct 20, 2014 at 3:18 PM, Digimer  wrote:


On 20/10/14 02:50 PM, John Scalia wrote:


Hi all,

I'm trying to build my first ever HA cluster and I'm using 3 VMs
running
CentOS 6.5. I followed the instructions to the letter at:

http://clusterlabs.org/quickstart-redhat.html

and everything appears to start normally, but if I run "cman_tool
nodes
-a", I only see:

Node StsInc  Joined Name
  1  M 64 2014-10--20 14:00:00 csgha1
  Addresses: 10.10.1.128
  2  X 0
csgha2
  3  X 0
csgha3

In the other systems, the output is the same except for which
system is
shown as joined. Each shows just itself as belonging to the cluster.
Also, "pcs status" reflects similarly with non-self systems showing
offline. I've checked "netstat -an" and see each machine
listening on
ports 5405 and 5405. And the logs are rather involved, but I'm not
seeing errors in it.

Any ideas for where to look for what's causing them to not
communicate?
--
Jay



Can you share your cluster.conf file please? Also, for each node:

* uname -n
* gethostip -d $(uname -n)
* ifconfig |grep -B 1 $(gethostip -d $(uname -n)) | grep HWaddr |

Re: [Linux-HA] New user can't get cman to recognize other systems

2014-10-21 Thread John Scalia
OK, looking at the cman man page on this system, I see the line saying "the corosync.conf file is not used." So, I'm guessing I need to set a unicast address somewhere in the 
cluster.conf file, but the man page only mentions the  parameter. What can I use to set this to a unicast address for ports 5404 and 5405? I'm assuming I 
can't just put a unicast address for the multicast parameter, and the man page for cluster.conf wasn't much help either.


We're still working on having the security team permit these 3 systems to use 
multicast.

On 10/21/2014 11:51 AM, Digimer wrote:

Keep us posted. :)

On 21/10/14 08:40 AM, John Scalia wrote:

I've been check hostname resolution this morning, and all the systems
are listed in each /etc/hosts file (No DNS in this environment.) and
ping works on every system both to itself and all the other systems. At
least it's working on the 10.10.1.0/24 network.

I ran tcpdump trying to see what traffic is on port 5405 on each system,
and I'm only seeing outbound on each, even though netstat shows each is
listening on the multicast address. My suspicion is that the router is
eating the multicast broadcasts, so I may try the unicast address
instead, but I'm waiting on one of our network engineers to see if my
suspicion is correct about the router. He volunteered to help late
yesterday.

On 10/20/2014 4:34 PM, Digimer wrote:

It looks sane on the surface. The 'gethostip' tool comes from the
'syslinux' package, and it's really handy! The '-d' says to give the
IP in dotted-decimanl notation only.

What I was trying to see was whether the 'uname -n' resolved to the IP
on the same network card as the other nodes. This is how corosync
decides which interface to send cluster traffic onto. I suspect you
might have a general network issue, possibly related to multicast.
(Some switches and some hypervisor virtual networks don't play nice
with corosync).

Have you tried unicast? If not, try setting the  element to
have the  attribute. Do note that unicast
isn't as efficient as multicast, so thought it might work, I'd
personally treat it as a debug tool to isolate the source of the problem.

cheers

digimer

PS - Can you share your pacemaker configuration?

On 20/10/14 03:40 PM, John Scalia wrote:

Sure, and thanks for helping.

Here's the /etc/cluster/cluster.conf file and it is identical on all
three
systems:


   
   
 
   
 
   
 
   
 
 
   
 
   
 
   
 
 
   
 
   
 
   
 
   
   
   
 
   
   
 
 
   


uname -n reports "csgha1" on that system, "csgha2" on its system, and
"csgha3" on the last system.
I don't seem to have gethostip on any of these systems, so I don't
know if
the next section helps or not.
"ifconfig -a" reports csgha1: eth0 = 172.17.1.21
  eth1 = 10.10.1.128
 csgha2: eth0 = 10.10.1.129
Yeah, I know this looks a little weird, but it was the way our
automated VM
control did the interfaces
  eth1 = 172.,17.1.3
 csgha3: eth0 = 172.17.1.23
  eth1 = 10.10.1.130
The /etc/hosts file on each system only has the 10.10.1.0/24 address for
each system in in it.
iptables is not running on these systems.

Let me know if you need more information, and I very much appreciate
your
assistance.
--
Jay

On Mon, Oct 20, 2014 at 3:18 PM, Digimer  wrote:


On 20/10/14 02:50 PM, John Scalia wrote:


Hi all,

I'm trying to build my first ever HA cluster and I'm using 3 VMs
running
CentOS 6.5. I followed the instructions to the letter at:

http://clusterlabs.org/quickstart-redhat.html

and everything appears to start normally, but if I run "cman_tool
nodes
-a", I only see:

Node StsInc  Joined Name
  1  M 64 2014-10--20 14:00:00 csgha1
  Addresses: 10.10.1.128
  2  X 0
csgha2
  3  X 0
csgha3

In the other systems, the output is the same except for which
system is
shown as joined. Each shows just itself as belonging to the cluster.
Also, "pcs status" reflects similarly with non-self systems showing
offline. I've checked "netstat -an" and see each machine listening on
ports 5405 and 5405. And the logs are rather involved, but I'm not
seeing errors in it.

Any ideas for where to look for what's causing them to not
communicate?
--
Jay



Can you share your cluster.conf file please? Also, for each node:

* uname -n
* gethostip -d $(uname -n)
* ifconfig |grep -B 1 $(gethostip -d $(uname -n)) | grep HWaddr |
awk '{
print $1 }'
* iptables-save | grep -i multi

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-h

Re: [Linux-HA] New user can't get cman to recognize other systems

2014-10-21 Thread Digimer

Keep us posted. :)

On 21/10/14 08:40 AM, John Scalia wrote:

I've been check hostname resolution this morning, and all the systems
are listed in each /etc/hosts file (No DNS in this environment.) and
ping works on every system both to itself and all the other systems. At
least it's working on the 10.10.1.0/24 network.

I ran tcpdump trying to see what traffic is on port 5405 on each system,
and I'm only seeing outbound on each, even though netstat shows each is
listening on the multicast address. My suspicion is that the router is
eating the multicast broadcasts, so I may try the unicast address
instead, but I'm waiting on one of our network engineers to see if my
suspicion is correct about the router. He volunteered to help late
yesterday.

On 10/20/2014 4:34 PM, Digimer wrote:

It looks sane on the surface. The 'gethostip' tool comes from the
'syslinux' package, and it's really handy! The '-d' says to give the
IP in dotted-decimanl notation only.

What I was trying to see was whether the 'uname -n' resolved to the IP
on the same network card as the other nodes. This is how corosync
decides which interface to send cluster traffic onto. I suspect you
might have a general network issue, possibly related to multicast.
(Some switches and some hypervisor virtual networks don't play nice
with corosync).

Have you tried unicast? If not, try setting the  element to
have the  attribute. Do note that unicast
isn't as efficient as multicast, so thought it might work, I'd
personally treat it as a debug tool to isolate the source of the problem.

cheers

digimer

PS - Can you share your pacemaker configuration?

On 20/10/14 03:40 PM, John Scalia wrote:

Sure, and thanks for helping.

Here's the /etc/cluster/cluster.conf file and it is identical on all
three
systems:


   
   
 
   
 
   
 
   
 
 
   
 
   
 
   
 
 
   
 
   
 
   
 
   
   
   
 
   
   
 
 
   


uname -n reports "csgha1" on that system, "csgha2" on its system, and
"csgha3" on the last system.
I don't seem to have gethostip on any of these systems, so I don't
know if
the next section helps or not.
"ifconfig -a" reports csgha1: eth0 = 172.17.1.21
  eth1 = 10.10.1.128
 csgha2: eth0 = 10.10.1.129
Yeah, I know this looks a little weird, but it was the way our
automated VM
control did the interfaces
  eth1 = 172.,17.1.3
 csgha3: eth0 = 172.17.1.23
  eth1 = 10.10.1.130
The /etc/hosts file on each system only has the 10.10.1.0/24 address for
each system in in it.
iptables is not running on these systems.

Let me know if you need more information, and I very much appreciate
your
assistance.
--
Jay

On Mon, Oct 20, 2014 at 3:18 PM, Digimer  wrote:


On 20/10/14 02:50 PM, John Scalia wrote:


Hi all,

I'm trying to build my first ever HA cluster and I'm using 3 VMs
running
CentOS 6.5. I followed the instructions to the letter at:

http://clusterlabs.org/quickstart-redhat.html

and everything appears to start normally, but if I run "cman_tool
nodes
-a", I only see:

Node StsInc  Joined Name
  1  M 64 2014-10--20 14:00:00 csgha1
  Addresses: 10.10.1.128
  2  X 0
csgha2
  3  X 0
csgha3

In the other systems, the output is the same except for which
system is
shown as joined. Each shows just itself as belonging to the cluster.
Also, "pcs status" reflects similarly with non-self systems showing
offline. I've checked "netstat -an" and see each machine listening on
ports 5405 and 5405. And the logs are rather involved, but I'm not
seeing errors in it.

Any ideas for where to look for what's causing them to not
communicate?
--
Jay



Can you share your cluster.conf file please? Also, for each node:

* uname -n
* gethostip -d $(uname -n)
* ifconfig |grep -B 1 $(gethostip -d $(uname -n)) | grep HWaddr |
awk '{
print $1 }'
* iptables-save | grep -i multi

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems






___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems



--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in t

Re: [Linux-HA] New user can't get cman to recognize other systems

2014-10-21 Thread John Scalia
I've been check hostname resolution this morning, and all the systems are listed in each /etc/hosts file (No DNS in this environment.) and ping works on every system both to itself 
and all the other systems. At least it's working on the 10.10.1.0/24 network.


I ran tcpdump trying to see what traffic is on port 5405 on each system, and I'm only seeing outbound on each, even though netstat shows each is listening on the multicast address. 
My suspicion is that the router is eating the multicast broadcasts, so I may try the unicast address instead, but I'm waiting on one of our network engineers to see if my suspicion 
is correct about the router. He volunteered to help late yesterday.


On 10/20/2014 4:34 PM, Digimer wrote:

It looks sane on the surface. The 'gethostip' tool comes from the 'syslinux' 
package, and it's really handy! The '-d' says to give the IP in dotted-decimanl 
notation only.

What I was trying to see was whether the 'uname -n' resolved to the IP on the same network card as the other nodes. This is how corosync decides which interface to send cluster 
traffic onto. I suspect you might have a general network issue, possibly related to multicast. (Some switches and some hypervisor virtual networks don't play nice with corosync).


Have you tried unicast? If not, try setting the  element to have the  attribute. Do note that unicast isn't as efficient as multicast, so 
thought it might work, I'd personally treat it as a debug tool to isolate the source of the problem.


cheers

digimer

PS - Can you share your pacemaker configuration?

On 20/10/14 03:40 PM, John Scalia wrote:

Sure, and thanks for helping.

Here's the /etc/cluster/cluster.conf file and it is identical on all three
systems:


   
   
 
   
 
   
 
   
 
 
   
 
   
 
   
 
 
   
 
   
 
   
 
   
   
   
 
   
   
 
 
   


uname -n reports "csgha1" on that system, "csgha2" on its system, and
"csgha3" on the last system.
I don't seem to have gethostip on any of these systems, so I don't know if
the next section helps or not.
"ifconfig -a" reports csgha1: eth0 = 172.17.1.21
  eth1 = 10.10.1.128
 csgha2: eth0 = 10.10.1.129
Yeah, I know this looks a little weird, but it was the way our automated VM
control did the interfaces
  eth1 = 172.,17.1.3
 csgha3: eth0 = 172.17.1.23
  eth1 = 10.10.1.130
The /etc/hosts file on each system only has the 10.10.1.0/24 address for
each system in in it.
iptables is not running on these systems.

Let me know if you need more information, and I very much appreciate your
assistance.
--
Jay

On Mon, Oct 20, 2014 at 3:18 PM, Digimer  wrote:


On 20/10/14 02:50 PM, John Scalia wrote:


Hi all,

I'm trying to build my first ever HA cluster and I'm using 3 VMs running
CentOS 6.5. I followed the instructions to the letter at:

http://clusterlabs.org/quickstart-redhat.html

and everything appears to start normally, but if I run "cman_tool nodes
-a", I only see:

Node StsInc  Joined Name
  1  M 64 2014-10--20 14:00:00 csgha1
  Addresses: 10.10.1.128
  2  X 0
csgha2
  3  X 0
csgha3

In the other systems, the output is the same except for which system is
shown as joined. Each shows just itself as belonging to the cluster.
Also, "pcs status" reflects similarly with non-self systems showing
offline. I've checked "netstat -an" and see each machine listening on
ports 5405 and 5405. And the logs are rather involved, but I'm not
seeing errors in it.

Any ideas for where to look for what's causing them to not communicate?
--
Jay



Can you share your cluster.conf file please? Also, for each node:

* uname -n
* gethostip -d $(uname -n)
* ifconfig |grep -B 1 $(gethostip -d $(uname -n)) | grep HWaddr | awk '{
print $1 }'
* iptables-save | grep -i multi

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems






___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] New user can't get cman to recognize other systems

2014-10-20 Thread jayknowsunix
OK, got it.

Sent from my iPad

> On Oct 20, 2014, at 10:10 PM, Andrew Beekhof  wrote:
> 
> 
>> On 21 Oct 2014, at 7:17 am, John Scalia  wrote:
>> 
>> Thanks, but on centOS are you saying to use "pcs cluster start" rather than 
>> using "service cman start" and "service pacemaker start"? I was just going 
>> by the tutorial, which doesn't mention this.
> 
> 'service pacemaker start' and 'pcs cluster start' are pretty much equivalent.
> both will start cman if its not running already
> 
>> 
>>> On 10/20/2014 3:44 PM, Maciej Rostański wrote:
>>> Hello,
>>> 
>>> In my experience such problems were the effect of my mistakes, such as not
>>> having all hosts in /etc/hosts file. Check this, please, I know it sounds
>>> simple.
>>> 
>>> Also, commands:
>>> pcs cluster setup --name clustername node1 node2 node3
>>> pcs cluster enable
>>> pcs cluster start
>>> 
>>> are much more pleasant to run than ccs method you use, and they work on
>>> Centos6.5
>>> 
>>> Regards,
>>> Maciej
>>> 
>>> 
>>> 
>>> 2014-10-20 20:50 GMT+02:00 John Scalia :
>>> 
 Hi all,
 
 I'm trying to build my first ever HA cluster and I'm using 3 VMs running
 CentOS 6.5. I followed the instructions to the letter at:
 
 http://clusterlabs.org/quickstart-redhat.html
 
 and everything appears to start normally, but if I run "cman_tool nodes
 -a", I only see:
 
 Node StsInc  Joined Name
1  M 64 2014-10--20 14:00:00  csgha1
Addresses: 10.10.1.128
2  X 0  csgha2
3  X 0  csgha3
 
 In the other systems, the output is the same except for which system is
 shown as joined. Each shows just itself as belonging to the cluster. Also,
 "pcs status" reflects similarly with non-self systems showing offline. I've
 checked "netstat -an" and see each machine listening on ports 5405 and
 5405. And the logs are rather involved, but I'm not seeing errors in it.
 
 Any ideas for where to look for what's causing them to not communicate?
 --
 Jay
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
>>> 
>>> 
>> 
>> ___
>> Linux-HA mailing list
>> Linux-HA@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
> 
> ___
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] New user can't get cman to recognize other systems

2014-10-20 Thread Andrew Beekhof

> On 21 Oct 2014, at 7:17 am, John Scalia  wrote:
> 
> Thanks, but on centOS are you saying to use "pcs cluster start" rather than 
> using "service cman start" and "service pacemaker start"? I was just going by 
> the tutorial, which doesn't mention this.

'service pacemaker start' and 'pcs cluster start' are pretty much equivalent.
both will start cman if its not running already

> 
> On 10/20/2014 3:44 PM, Maciej Rostański wrote:
>> Hello,
>> 
>> In my experience such problems were the effect of my mistakes, such as not
>> having all hosts in /etc/hosts file. Check this, please, I know it sounds
>> simple.
>> 
>> Also, commands:
>> pcs cluster setup --name clustername node1 node2 node3
>> pcs cluster enable
>> pcs cluster start
>> 
>> are much more pleasant to run than ccs method you use, and they work on
>> Centos6.5
>> 
>> Regards,
>> Maciej
>> 
>> 
>> 
>> 2014-10-20 20:50 GMT+02:00 John Scalia :
>> 
>>> Hi all,
>>> 
>>> I'm trying to build my first ever HA cluster and I'm using 3 VMs running
>>> CentOS 6.5. I followed the instructions to the letter at:
>>> 
>>> http://clusterlabs.org/quickstart-redhat.html
>>> 
>>> and everything appears to start normally, but if I run "cman_tool nodes
>>> -a", I only see:
>>> 
>>> Node StsInc  Joined Name
>>> 1  M 64 2014-10--20 14:00:00  csgha1
>>> Addresses: 10.10.1.128
>>> 2  X 0  csgha2
>>> 3  X 0  csgha3
>>> 
>>> In the other systems, the output is the same except for which system is
>>> shown as joined. Each shows just itself as belonging to the cluster. Also,
>>> "pcs status" reflects similarly with non-self systems showing offline. I've
>>> checked "netstat -an" and see each machine listening on ports 5405 and
>>> 5405. And the logs are rather involved, but I'm not seeing errors in it.
>>> 
>>> Any ideas for where to look for what's causing them to not communicate?
>>> --
>>> Jay
>>> ___
>>> Linux-HA mailing list
>>> Linux-HA@lists.linux-ha.org
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> See also: http://linux-ha.org/ReportingProblems
>>> 
>> 
>> 
> 
> ___
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] New user can't get cman to recognize other systems

2014-10-20 Thread Maciej Rostański
Well, with 6.4 and 6.5 (which I like a lot) there is this specific
situation - no more crm, only pcs and ccs, but on the other hand, stack
with cman (which is being replaced by corosync 2.0 now). So the
documentation found on various sites is rarely handy...

2014-10-20 22:17 GMT+02:00 John Scalia :

> Thanks, but on centOS are you saying to use "pcs cluster start" rather
> than using "service cman start" and "service pacemaker start"? I was just
> going by the tutorial, which doesn't mention this.
>
>
> On 10/20/2014 3:44 PM, Maciej Rostański wrote:
>
>> Hello,
>>
>> In my experience such problems were the effect of my mistakes, such as not
>> having all hosts in /etc/hosts file. Check this, please, I know it sounds
>> simple.
>>
>> Also, commands:
>> pcs cluster setup --name clustername node1 node2 node3
>> pcs cluster enable
>> pcs cluster start
>>
>> are much more pleasant to run than ccs method you use, and they work on
>> Centos6.5
>>
>> Regards,
>> Maciej
>>
>>
>>
>> 2014-10-20 20:50 GMT+02:00 John Scalia :
>>
>>  Hi all,
>>>
>>> I'm trying to build my first ever HA cluster and I'm using 3 VMs running
>>> CentOS 6.5. I followed the instructions to the letter at:
>>>
>>> http://clusterlabs.org/quickstart-redhat.html
>>>
>>> and everything appears to start normally, but if I run "cman_tool nodes
>>> -a", I only see:
>>>
>>> Node StsInc  Joined Name
>>>  1  M 64 2014-10--20 14:00:00  csgha1
>>>  Addresses: 10.10.1.128
>>>  2  X 0
>>> csgha2
>>>  3  X 0
>>> csgha3
>>>
>>> In the other systems, the output is the same except for which system is
>>> shown as joined. Each shows just itself as belonging to the cluster.
>>> Also,
>>> "pcs status" reflects similarly with non-self systems showing offline.
>>> I've
>>> checked "netstat -an" and see each machine listening on ports 5405 and
>>> 5405. And the logs are rather involved, but I'm not seeing errors in it.
>>>
>>> Any ideas for where to look for what's causing them to not communicate?
>>> --
>>> Jay
>>> ___
>>> Linux-HA mailing list
>>> Linux-HA@lists.linux-ha.org
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> See also: http://linux-ha.org/ReportingProblems
>>>
>>>
>>
>>
> ___
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>



-- 
Maciej Rostanski
mrostan...@gmail.com
http://mrdean.wordpress.com
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] New user can't get cman to recognize other systems

2014-10-20 Thread Digimer
It looks sane on the surface. The 'gethostip' tool comes from the 
'syslinux' package, and it's really handy! The '-d' says to give the IP 
in dotted-decimanl notation only.


What I was trying to see was whether the 'uname -n' resolved to the IP 
on the same network card as the other nodes. This is how corosync 
decides which interface to send cluster traffic onto. I suspect you 
might have a general network issue, possibly related to multicast. (Some 
switches and some hypervisor virtual networks don't play nice with 
corosync).


Have you tried unicast? If not, try setting the  element to 
have the  attribute. Do note that unicast 
isn't as efficient as multicast, so thought it might work, I'd 
personally treat it as a debug tool to isolate the source of the problem.


cheers

digimer

PS - Can you share your pacemaker configuration?

On 20/10/14 03:40 PM, John Scalia wrote:

Sure, and thanks for helping.

Here's the /etc/cluster/cluster.conf file and it is identical on all three
systems:


   
   
 
   
 
   
 
   
 
 
   
 
   
 
   
 
 
   
 
   
 
   
 
   
   
   
 
   
   
 
 
   


uname -n reports "csgha1" on that system, "csgha2" on its system, and
"csgha3" on the last system.
I don't seem to have gethostip on any of these systems, so I don't know if
the next section helps or not.
"ifconfig -a" reports csgha1: eth0 = 172.17.1.21
  eth1 = 10.10.1.128
 csgha2: eth0 = 10.10.1.129
Yeah, I know this looks a little weird, but it was the way our automated VM
control did the interfaces
  eth1 = 172.,17.1.3
 csgha3: eth0 = 172.17.1.23
  eth1 = 10.10.1.130
The /etc/hosts file on each system only has the 10.10.1.0/24 address for
each system in in it.
iptables is not running on these systems.

Let me know if you need more information, and I very much appreciate your
assistance.
--
Jay

On Mon, Oct 20, 2014 at 3:18 PM, Digimer  wrote:


On 20/10/14 02:50 PM, John Scalia wrote:


Hi all,

I'm trying to build my first ever HA cluster and I'm using 3 VMs running
CentOS 6.5. I followed the instructions to the letter at:

http://clusterlabs.org/quickstart-redhat.html

and everything appears to start normally, but if I run "cman_tool nodes
-a", I only see:

Node StsInc  Joined Name
  1  M 64 2014-10--20 14:00:00  csgha1
  Addresses: 10.10.1.128
  2  X 0
csgha2
  3  X 0
csgha3

In the other systems, the output is the same except for which system is
shown as joined. Each shows just itself as belonging to the cluster.
Also, "pcs status" reflects similarly with non-self systems showing
offline. I've checked "netstat -an" and see each machine listening on
ports 5405 and 5405. And the logs are rather involved, but I'm not
seeing errors in it.

Any ideas for where to look for what's causing them to not communicate?
--
Jay



Can you share your cluster.conf file please? Also, for each node:

* uname -n
* gethostip -d $(uname -n)
* ifconfig |grep -B 1 $(gethostip -d $(uname -n)) | grep HWaddr | awk '{
print $1 }'
* iptables-save | grep -i multi

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems




--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] New user can't get cman to recognize other systems

2014-10-20 Thread John Scalia
Thanks, but on centOS are you saying to use "pcs cluster start" rather than using "service cman start" and "service pacemaker start"? I was just going by the tutorial, which 
doesn't mention this.


On 10/20/2014 3:44 PM, Maciej Rostański wrote:

Hello,

In my experience such problems were the effect of my mistakes, such as not
having all hosts in /etc/hosts file. Check this, please, I know it sounds
simple.

Also, commands:
pcs cluster setup --name clustername node1 node2 node3
pcs cluster enable
pcs cluster start

are much more pleasant to run than ccs method you use, and they work on
Centos6.5

Regards,
Maciej



2014-10-20 20:50 GMT+02:00 John Scalia :


Hi all,

I'm trying to build my first ever HA cluster and I'm using 3 VMs running
CentOS 6.5. I followed the instructions to the letter at:

http://clusterlabs.org/quickstart-redhat.html

and everything appears to start normally, but if I run "cman_tool nodes
-a", I only see:

Node StsInc  Joined Name
 1  M 64 2014-10--20 14:00:00  csgha1
 Addresses: 10.10.1.128
 2  X 0  csgha2
 3  X 0  csgha3

In the other systems, the output is the same except for which system is
shown as joined. Each shows just itself as belonging to the cluster. Also,
"pcs status" reflects similarly with non-self systems showing offline. I've
checked "netstat -an" and see each machine listening on ports 5405 and
5405. And the logs are rather involved, but I'm not seeing errors in it.

Any ideas for where to look for what's causing them to not communicate?
--
Jay
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems






___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] New user can't get cman to recognize other systems

2014-10-20 Thread Maciej Rostański
Hello,

In my experience such problems were the effect of my mistakes, such as not
having all hosts in /etc/hosts file. Check this, please, I know it sounds
simple.

Also, commands:
pcs cluster setup --name clustername node1 node2 node3
pcs cluster enable
pcs cluster start

are much more pleasant to run than ccs method you use, and they work on
Centos6.5

Regards,
Maciej



2014-10-20 20:50 GMT+02:00 John Scalia :

> Hi all,
>
> I'm trying to build my first ever HA cluster and I'm using 3 VMs running
> CentOS 6.5. I followed the instructions to the letter at:
>
> http://clusterlabs.org/quickstart-redhat.html
>
> and everything appears to start normally, but if I run "cman_tool nodes
> -a", I only see:
>
> Node StsInc  Joined Name
> 1  M 64 2014-10--20 14:00:00  csgha1
> Addresses: 10.10.1.128
> 2  X 0  csgha2
> 3  X 0  csgha3
>
> In the other systems, the output is the same except for which system is
> shown as joined. Each shows just itself as belonging to the cluster. Also,
> "pcs status" reflects similarly with non-self systems showing offline. I've
> checked "netstat -an" and see each machine listening on ports 5405 and
> 5405. And the logs are rather involved, but I'm not seeing errors in it.
>
> Any ideas for where to look for what's causing them to not communicate?
> --
> Jay
> ___
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>



-- 
Maciej Rostanski
mrostan...@gmail.com
http://mrdean.wordpress.com
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] New user can't get cman to recognize other systems

2014-10-20 Thread John Scalia
Sure, and thanks for helping.

Here's the /etc/cluster/cluster.conf file and it is identical on all three
systems:


  
  

  

  

  


  

  

  


  

  

  

  
  
  

  
  


  


uname -n reports "csgha1" on that system, "csgha2" on its system, and
"csgha3" on the last system.
I don't seem to have gethostip on any of these systems, so I don't know if
the next section helps or not.
"ifconfig -a" reports csgha1: eth0 = 172.17.1.21
 eth1 = 10.10.1.128
csgha2: eth0 = 10.10.1.129
Yeah, I know this looks a little weird, but it was the way our automated VM
control did the interfaces
 eth1 = 172.,17.1.3
csgha3: eth0 = 172.17.1.23
 eth1 = 10.10.1.130
The /etc/hosts file on each system only has the 10.10.1.0/24 address for
each system in in it.
iptables is not running on these systems.

Let me know if you need more information, and I very much appreciate your
assistance.
--
Jay

On Mon, Oct 20, 2014 at 3:18 PM, Digimer  wrote:

> On 20/10/14 02:50 PM, John Scalia wrote:
>
>> Hi all,
>>
>> I'm trying to build my first ever HA cluster and I'm using 3 VMs running
>> CentOS 6.5. I followed the instructions to the letter at:
>>
>> http://clusterlabs.org/quickstart-redhat.html
>>
>> and everything appears to start normally, but if I run "cman_tool nodes
>> -a", I only see:
>>
>> Node StsInc  Joined Name
>>  1  M 64 2014-10--20 14:00:00  csgha1
>>  Addresses: 10.10.1.128
>>  2  X 0
>> csgha2
>>  3  X 0
>> csgha3
>>
>> In the other systems, the output is the same except for which system is
>> shown as joined. Each shows just itself as belonging to the cluster.
>> Also, "pcs status" reflects similarly with non-self systems showing
>> offline. I've checked "netstat -an" and see each machine listening on
>> ports 5405 and 5405. And the logs are rather involved, but I'm not
>> seeing errors in it.
>>
>> Any ideas for where to look for what's causing them to not communicate?
>> --
>> Jay
>>
>
> Can you share your cluster.conf file please? Also, for each node:
>
> * uname -n
> * gethostip -d $(uname -n)
> * ifconfig |grep -B 1 $(gethostip -d $(uname -n)) | grep HWaddr | awk '{
> print $1 }'
> * iptables-save | grep -i multi
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
> ___
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] New user can't get cman to recognize other systems

2014-10-20 Thread Digimer

On 20/10/14 02:50 PM, John Scalia wrote:

Hi all,

I'm trying to build my first ever HA cluster and I'm using 3 VMs running
CentOS 6.5. I followed the instructions to the letter at:

http://clusterlabs.org/quickstart-redhat.html

and everything appears to start normally, but if I run "cman_tool nodes
-a", I only see:

Node StsInc  Joined Name
 1  M 64 2014-10--20 14:00:00  csgha1
 Addresses: 10.10.1.128
 2  X 0  csgha2
 3  X 0  csgha3

In the other systems, the output is the same except for which system is
shown as joined. Each shows just itself as belonging to the cluster.
Also, "pcs status" reflects similarly with non-self systems showing
offline. I've checked "netstat -an" and see each machine listening on
ports 5405 and 5405. And the logs are rather involved, but I'm not
seeing errors in it.

Any ideas for where to look for what's causing them to not communicate?
--
Jay


Can you share your cluster.conf file please? Also, for each node:

* uname -n
* gethostip -d $(uname -n)
* ifconfig |grep -B 1 $(gethostip -d $(uname -n)) | grep HWaddr | awk '{ 
print $1 }'

* iptables-save | grep -i multi

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems