Re: [Linux-HA] New user can't get cman to recognize other systems
> On 22 Oct 2014, at 9:16 am, Digimer wrote: > > Blocked for me, too. Possible to clone - client data? Needless paranoia more likely. This is the original fedora bug (nothing marked private): https://bugzilla.redhat.com/show_bug.cgi?id=880035 and the kbase: https://access.redhat.com/solutions/784373 > > On 21/10/14 06:14 PM, jayknowsu...@gmail.com wrote: >> Sure! But i can't seem to get Redhat to let me see the bug, even though I >> have an account. >> >> Sent from my iPad >> >>> On Oct 21, 2014, at 5:51 PM, Andrew Beekhof wrote: >>> >>> On 22 Oct 2014, at 7:36 am, jayknowsu...@gmail.com wrote: Yep, my network engineer and I found that the multicast packets were being blocked by the underlying hypervisor for the VM systems. >>> >>> Yeah, that'll happen :-( >>> I believe its fixed in newer kernels, but for a while there multicast would >>> appear to work and then stop for no good reason. >>> Putting the device into promiscuous mode seemed to help IIRC. >>> >>> This is the bug I knew it as: >>> https://bugzilla.redhat.com/show_bug.cgi?id=1090670 >>> >>> >>> At first we thought it was just iptables on the servers, but i was certain I had actually turned that off. The issue has been bumped up to the operations team for a fixing this, but since I've gotten it to work with unicast, there's no pressure Sent from my iPad > On Oct 21, 2014, at 3:15 PM, Digimer wrote: > > Glad you sorted it out! > > So then, it was almost certainly a multicast issue. I would still > strongly recommend trying to source and fix the problem, and reverting to > mcast if you can. More efficient. :) > > digimer > >> On 21/10/14 02:59 PM, John Scalia wrote: >> Ok, got it working after a little more effort, and the cluster is now >> properly reporting. >> >>> On Tue, Oct 21, 2014 at 1:34 PM, John Scalia >>> wrote: >>> >>> So, I set "transport="udpi"' in the cluster.conf file, and it now looks >>> like this: >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> But, after restarting the cluster I don't see any difference. Did I do >>> something wrong? >>> -- >>> Jay >>> On Tue, Oct 21, 2014 at 12:25 PM, Digimer wrote: No, you don't need to specify anything in cluster.conf for unicast to work. Corosync will divine the IPs by resolving the node names to IPs. If you set multicast and don't want to use the auto-selected mcast IP, then you can specify the mcast IP group to use via . digimer > On 21/10/14 12:22 PM, John Scalia wrote: > > OK, looking at the cman man page on this system, I see the line saying > "the corosync.conf file is not used." So, I'm guessing I need to set a > unicast address somewhere in the cluster.conf file, but the man page > only mentions the parameter. What can I use to > set this to a unicast address for ports 5404 and 5405? I'm assuming I > can't just put a unicast address for the multicast parameter, and the > man page for cluster.conf wasn't much help either. > > We're still working on having the security team permit these 3 systems > to use multicast. > >> On 10/21/2014 11:51 AM, Digimer wrote: >> >> Keep us posted. :) >> >>> On 21/10/14 08:40 AM, John Scalia wrote: >>> >>> I've been check hostname resolution this morning, and all the >>> systems >>> are listed in each /etc/hosts file (No DNS in this environment.) and >>> ping works on every system both to itself and all the other >>> systems. At >>> least it's working on the 10.10.1.0/24 network. >>> >>> I ran tcpdump trying to see what traffic is on port 5405 on each >>> system, >>> and I'm only seeing outbound on each, even though netstat shows >>> each is >>> listening on the multicast address. My suspicion is that the router >>> is >>> eating the multicast broadcasts, so I may try the unicast address >>> instead, but I'm waiting on one of our network engineers to see if >>> my >>> suspicion is correct about the router. He volunteered to help late >>> yesterday. >>> On 10/20/2014 4:34 PM, Digimer wrote: It looks sane on the su
Re: [Linux-HA] New user can't get cman to recognize other systems
Blocked for me, too. Possible to clone - client data? On 21/10/14 06:14 PM, jayknowsu...@gmail.com wrote: Sure! But i can't seem to get Redhat to let me see the bug, even though I have an account. Sent from my iPad On Oct 21, 2014, at 5:51 PM, Andrew Beekhof wrote: On 22 Oct 2014, at 7:36 am, jayknowsu...@gmail.com wrote: Yep, my network engineer and I found that the multicast packets were being blocked by the underlying hypervisor for the VM systems. Yeah, that'll happen :-( I believe its fixed in newer kernels, but for a while there multicast would appear to work and then stop for no good reason. Putting the device into promiscuous mode seemed to help IIRC. This is the bug I knew it as: https://bugzilla.redhat.com/show_bug.cgi?id=1090670 At first we thought it was just iptables on the servers, but i was certain I had actually turned that off. The issue has been bumped up to the operations team for a fixing this, but since I've gotten it to work with unicast, there's no pressure Sent from my iPad On Oct 21, 2014, at 3:15 PM, Digimer wrote: Glad you sorted it out! So then, it was almost certainly a multicast issue. I would still strongly recommend trying to source and fix the problem, and reverting to mcast if you can. More efficient. :) digimer On 21/10/14 02:59 PM, John Scalia wrote: Ok, got it working after a little more effort, and the cluster is now properly reporting. On Tue, Oct 21, 2014 at 1:34 PM, John Scalia wrote: So, I set "transport="udpi"' in the cluster.conf file, and it now looks like this: But, after restarting the cluster I don't see any difference. Did I do something wrong? -- Jay On Tue, Oct 21, 2014 at 12:25 PM, Digimer wrote: No, you don't need to specify anything in cluster.conf for unicast to work. Corosync will divine the IPs by resolving the node names to IPs. If you set multicast and don't want to use the auto-selected mcast IP, then you can specify the mcast IP group to use via . digimer On 21/10/14 12:22 PM, John Scalia wrote: OK, looking at the cman man page on this system, I see the line saying "the corosync.conf file is not used." So, I'm guessing I need to set a unicast address somewhere in the cluster.conf file, but the man page only mentions the parameter. What can I use to set this to a unicast address for ports 5404 and 5405? I'm assuming I can't just put a unicast address for the multicast parameter, and the man page for cluster.conf wasn't much help either. We're still working on having the security team permit these 3 systems to use multicast. On 10/21/2014 11:51 AM, Digimer wrote: Keep us posted. :) On 21/10/14 08:40 AM, John Scalia wrote: I've been check hostname resolution this morning, and all the systems are listed in each /etc/hosts file (No DNS in this environment.) and ping works on every system both to itself and all the other systems. At least it's working on the 10.10.1.0/24 network. I ran tcpdump trying to see what traffic is on port 5405 on each system, and I'm only seeing outbound on each, even though netstat shows each is listening on the multicast address. My suspicion is that the router is eating the multicast broadcasts, so I may try the unicast address instead, but I'm waiting on one of our network engineers to see if my suspicion is correct about the router. He volunteered to help late yesterday. On 10/20/2014 4:34 PM, Digimer wrote: It looks sane on the surface. The 'gethostip' tool comes from the 'syslinux' package, and it's really handy! The '-d' says to give the IP in dotted-decimanl notation only. What I was trying to see was whether the 'uname -n' resolved to the IP on the same network card as the other nodes. This is how corosync decides which interface to send cluster traffic onto. I suspect you might have a general network issue, possibly related to multicast. (Some switches and some hypervisor virtual networks don't play nice with corosync). Have you tried unicast? If not, try setting the element to have the attribute. Do note that unicast isn't as efficient as multicast, so thought it might work, I'd personally treat it as a debug tool to isolate the source of the problem. cheers digimer PS - Can you share your pacemaker configuration? On 20/10/14 03:40 PM, John Scalia wrote: Sure, and thanks for helping. Here's the /etc/cluster/cluster.conf file and it is identical on all three systems: uname -n reports "csgha1" on that system, "csgha2" on its system, and "csgha3" on the last system. I don't seem to have gethostip on any of these systems, so I don't know if the next section helps or not. "ifconfig -a" repor
Re: [Linux-HA] New user can't get cman to recognize other systems
Sure! But i can't seem to get Redhat to let me see the bug, even though I have an account. Sent from my iPad > On Oct 21, 2014, at 5:51 PM, Andrew Beekhof wrote: > > >> On 22 Oct 2014, at 7:36 am, jayknowsu...@gmail.com wrote: >> >> Yep, my network engineer and I found that the multicast packets were being >> blocked by the underlying hypervisor for the VM systems. > > Yeah, that'll happen :-( > I believe its fixed in newer kernels, but for a while there multicast would > appear to work and then stop for no good reason. > Putting the device into promiscuous mode seemed to help IIRC. > > This is the bug I knew it as: > https://bugzilla.redhat.com/show_bug.cgi?id=1090670 > > > >> At first we thought it was just iptables on the servers, but i was certain I >> had actually turned that off. The issue has been bumped up to the operations >> team for a fixing this, but since I've gotten it to work with unicast, >> there's no pressure >> >> Sent from my iPad >> >>> On Oct 21, 2014, at 3:15 PM, Digimer wrote: >>> >>> Glad you sorted it out! >>> >>> So then, it was almost certainly a multicast issue. I would still strongly >>> recommend trying to source and fix the problem, and reverting to mcast if >>> you can. More efficient. :) >>> >>> digimer >>> On 21/10/14 02:59 PM, John Scalia wrote: Ok, got it working after a little more effort, and the cluster is now properly reporting. > On Tue, Oct 21, 2014 at 1:34 PM, John Scalia > wrote: > > So, I set "transport="udpi"' in the cluster.conf file, and it now looks > like this: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > But, after restarting the cluster I don't see any difference. Did I do > something wrong? > -- > Jay > >> On Tue, Oct 21, 2014 at 12:25 PM, Digimer wrote: >> >> No, you don't need to specify anything in cluster.conf for unicast to >> work. Corosync will divine the IPs by resolving the node names to IPs. If >> you set multicast and don't want to use the auto-selected mcast IP, then >> you can specify the mcast IP group to use via . >> >> digimer >> >> >>> On 21/10/14 12:22 PM, John Scalia wrote: >>> >>> OK, looking at the cman man page on this system, I see the line saying >>> "the corosync.conf file is not used." So, I'm guessing I need to set a >>> unicast address somewhere in the cluster.conf file, but the man page >>> only mentions the parameter. What can I use to >>> set this to a unicast address for ports 5404 and 5405? I'm assuming I >>> can't just put a unicast address for the multicast parameter, and the >>> man page for cluster.conf wasn't much help either. >>> >>> We're still working on having the security team permit these 3 systems >>> to use multicast. >>> On 10/21/2014 11:51 AM, Digimer wrote: Keep us posted. :) > On 21/10/14 08:40 AM, John Scalia wrote: > > I've been check hostname resolution this morning, and all the systems > are listed in each /etc/hosts file (No DNS in this environment.) and > ping works on every system both to itself and all the other systems. > At > least it's working on the 10.10.1.0/24 network. > > I ran tcpdump trying to see what traffic is on port 5405 on each > system, > and I'm only seeing outbound on each, even though netstat shows each > is > listening on the multicast address. My suspicion is that the router is > eating the multicast broadcasts, so I may try the unicast address > instead, but I'm waiting on one of our network engineers to see if my > suspicion is correct about the router. He volunteered to help late > yesterday. > >> On 10/20/2014 4:34 PM, Digimer wrote: >> >> It looks sane on the surface. The 'gethostip' tool comes from the >> 'syslinux' package, and it's really handy! The '-d' says to give the >> IP in dotted-decimanl notation only. >> >> What I was trying to see was whether the 'uname -n' resolved to the >> IP >> on the same network card as the other nodes. This is how corosync >> decides which interface to send cluster traffic onto. I suspect you >> might have a general network issue, possibly related to multicast. >> (Some switches and some hypervisor virtual networks don't play nice >> with corosync). >> >> Have you tried unicast? If not, try setting the element to >> hav
Re: [Linux-HA] New user can't get cman to recognize other systems
> On 22 Oct 2014, at 7:36 am, jayknowsu...@gmail.com wrote: > > Yep, my network engineer and I found that the multicast packets were being > blocked by the underlying hypervisor for the VM systems. Yeah, that'll happen :-( I believe its fixed in newer kernels, but for a while there multicast would appear to work and then stop for no good reason. Putting the device into promiscuous mode seemed to help IIRC. This is the bug I knew it as: https://bugzilla.redhat.com/show_bug.cgi?id=1090670 > At first we thought it was just iptables on the servers, but i was certain I > had actually turned that off. The issue has been bumped up to the operations > team for a fixing this, but since I've gotten it to work with unicast, > there's no pressure > > Sent from my iPad > >> On Oct 21, 2014, at 3:15 PM, Digimer wrote: >> >> Glad you sorted it out! >> >> So then, it was almost certainly a multicast issue. I would still strongly >> recommend trying to source and fix the problem, and reverting to mcast if >> you can. More efficient. :) >> >> digimer >> >>> On 21/10/14 02:59 PM, John Scalia wrote: >>> Ok, got it working after a little more effort, and the cluster is now >>> properly reporting. >>> On Tue, Oct 21, 2014 at 1:34 PM, John Scalia wrote: So, I set "transport="udpi"' in the cluster.conf file, and it now looks like this: But, after restarting the cluster I don't see any difference. Did I do something wrong? -- Jay > On Tue, Oct 21, 2014 at 12:25 PM, Digimer wrote: > > No, you don't need to specify anything in cluster.conf for unicast to > work. Corosync will divine the IPs by resolving the node names to IPs. If > you set multicast and don't want to use the auto-selected mcast IP, then > you can specify the mcast IP group to use via . > > digimer > > >> On 21/10/14 12:22 PM, John Scalia wrote: >> >> OK, looking at the cman man page on this system, I see the line saying >> "the corosync.conf file is not used." So, I'm guessing I need to set a >> unicast address somewhere in the cluster.conf file, but the man page >> only mentions the parameter. What can I use to >> set this to a unicast address for ports 5404 and 5405? I'm assuming I >> can't just put a unicast address for the multicast parameter, and the >> man page for cluster.conf wasn't much help either. >> >> We're still working on having the security team permit these 3 systems >> to use multicast. >> >>> On 10/21/2014 11:51 AM, Digimer wrote: >>> >>> Keep us posted. :) >>> On 21/10/14 08:40 AM, John Scalia wrote: I've been check hostname resolution this morning, and all the systems are listed in each /etc/hosts file (No DNS in this environment.) and ping works on every system both to itself and all the other systems. At least it's working on the 10.10.1.0/24 network. I ran tcpdump trying to see what traffic is on port 5405 on each system, and I'm only seeing outbound on each, even though netstat shows each is listening on the multicast address. My suspicion is that the router is eating the multicast broadcasts, so I may try the unicast address instead, but I'm waiting on one of our network engineers to see if my suspicion is correct about the router. He volunteered to help late yesterday. > On 10/20/2014 4:34 PM, Digimer wrote: > > It looks sane on the surface. The 'gethostip' tool comes from the > 'syslinux' package, and it's really handy! The '-d' says to give the > IP in dotted-decimanl notation only. > > What I was trying to see was whether the 'uname -n' resolved to the IP > on the same network card as the other nodes. This is how corosync > decides which interface to send cluster traffic onto. I suspect you > might have a general network issue, possibly related to multicast. > (Some switches and some hypervisor virtual networks don't play nice > with corosync). > > Have you tried unicast? If not, try setting the element to > have the attribute. Do note that unicast > isn't as efficient as multicast, so thought it might work, I'd > personally treat it as a debug tool to isolate the source of the > problem. > > cheers > > digimer > > PS - Can you share your pacemaker configu
Re: [Linux-HA] New user can't get cman to recognize other systems
Yep, my network engineer and I found that the multicast packets were being blocked by the underlying hypervisor for the VM systems. At first we thought it was just iptables on the servers, but i was certain I had actually turned that off. The issue has been bumped up to the operations team for a fixing this, but since I've gotten it to work with unicast, there's no pressure Sent from my iPad > On Oct 21, 2014, at 3:15 PM, Digimer wrote: > > Glad you sorted it out! > > So then, it was almost certainly a multicast issue. I would still strongly > recommend trying to source and fix the problem, and reverting to mcast if you > can. More efficient. :) > > digimer > >> On 21/10/14 02:59 PM, John Scalia wrote: >> Ok, got it working after a little more effort, and the cluster is now >> properly reporting. >> >>> On Tue, Oct 21, 2014 at 1:34 PM, John Scalia wrote: >>> >>> So, I set "transport="udpi"' in the cluster.conf file, and it now looks >>> like this: >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> But, after restarting the cluster I don't see any difference. Did I do >>> something wrong? >>> -- >>> Jay >>> On Tue, Oct 21, 2014 at 12:25 PM, Digimer wrote: No, you don't need to specify anything in cluster.conf for unicast to work. Corosync will divine the IPs by resolving the node names to IPs. If you set multicast and don't want to use the auto-selected mcast IP, then you can specify the mcast IP group to use via . digimer > On 21/10/14 12:22 PM, John Scalia wrote: > > OK, looking at the cman man page on this system, I see the line saying > "the corosync.conf file is not used." So, I'm guessing I need to set a > unicast address somewhere in the cluster.conf file, but the man page > only mentions the parameter. What can I use to > set this to a unicast address for ports 5404 and 5405? I'm assuming I > can't just put a unicast address for the multicast parameter, and the > man page for cluster.conf wasn't much help either. > > We're still working on having the security team permit these 3 systems > to use multicast. > >> On 10/21/2014 11:51 AM, Digimer wrote: >> >> Keep us posted. :) >> >>> On 21/10/14 08:40 AM, John Scalia wrote: >>> >>> I've been check hostname resolution this morning, and all the systems >>> are listed in each /etc/hosts file (No DNS in this environment.) and >>> ping works on every system both to itself and all the other systems. At >>> least it's working on the 10.10.1.0/24 network. >>> >>> I ran tcpdump trying to see what traffic is on port 5405 on each >>> system, >>> and I'm only seeing outbound on each, even though netstat shows each is >>> listening on the multicast address. My suspicion is that the router is >>> eating the multicast broadcasts, so I may try the unicast address >>> instead, but I'm waiting on one of our network engineers to see if my >>> suspicion is correct about the router. He volunteered to help late >>> yesterday. >>> On 10/20/2014 4:34 PM, Digimer wrote: It looks sane on the surface. The 'gethostip' tool comes from the 'syslinux' package, and it's really handy! The '-d' says to give the IP in dotted-decimanl notation only. What I was trying to see was whether the 'uname -n' resolved to the IP on the same network card as the other nodes. This is how corosync decides which interface to send cluster traffic onto. I suspect you might have a general network issue, possibly related to multicast. (Some switches and some hypervisor virtual networks don't play nice with corosync). Have you tried unicast? If not, try setting the element to have the attribute. Do note that unicast isn't as efficient as multicast, so thought it might work, I'd personally treat it as a debug tool to isolate the source of the problem. cheers digimer PS - Can you share your pacemaker configuration? > On 20/10/14 03:40 PM, John Scalia wrote: > > Sure, and thanks for helping. > > Here's the /etc/cluster/cluster.conf file and it is identical on all > three > systems: > > > > > > > > > > > > > > >
Re: [Linux-HA] New user can't get cman to recognize other systems
Glad you sorted it out! So then, it was almost certainly a multicast issue. I would still strongly recommend trying to source and fix the problem, and reverting to mcast if you can. More efficient. :) digimer On 21/10/14 02:59 PM, John Scalia wrote: Ok, got it working after a little more effort, and the cluster is now properly reporting. On Tue, Oct 21, 2014 at 1:34 PM, John Scalia wrote: So, I set "transport="udpi"' in the cluster.conf file, and it now looks like this: But, after restarting the cluster I don't see any difference. Did I do something wrong? -- Jay On Tue, Oct 21, 2014 at 12:25 PM, Digimer wrote: No, you don't need to specify anything in cluster.conf for unicast to work. Corosync will divine the IPs by resolving the node names to IPs. If you set multicast and don't want to use the auto-selected mcast IP, then you can specify the mcast IP group to use via . digimer On 21/10/14 12:22 PM, John Scalia wrote: OK, looking at the cman man page on this system, I see the line saying "the corosync.conf file is not used." So, I'm guessing I need to set a unicast address somewhere in the cluster.conf file, but the man page only mentions the parameter. What can I use to set this to a unicast address for ports 5404 and 5405? I'm assuming I can't just put a unicast address for the multicast parameter, and the man page for cluster.conf wasn't much help either. We're still working on having the security team permit these 3 systems to use multicast. On 10/21/2014 11:51 AM, Digimer wrote: Keep us posted. :) On 21/10/14 08:40 AM, John Scalia wrote: I've been check hostname resolution this morning, and all the systems are listed in each /etc/hosts file (No DNS in this environment.) and ping works on every system both to itself and all the other systems. At least it's working on the 10.10.1.0/24 network. I ran tcpdump trying to see what traffic is on port 5405 on each system, and I'm only seeing outbound on each, even though netstat shows each is listening on the multicast address. My suspicion is that the router is eating the multicast broadcasts, so I may try the unicast address instead, but I'm waiting on one of our network engineers to see if my suspicion is correct about the router. He volunteered to help late yesterday. On 10/20/2014 4:34 PM, Digimer wrote: It looks sane on the surface. The 'gethostip' tool comes from the 'syslinux' package, and it's really handy! The '-d' says to give the IP in dotted-decimanl notation only. What I was trying to see was whether the 'uname -n' resolved to the IP on the same network card as the other nodes. This is how corosync decides which interface to send cluster traffic onto. I suspect you might have a general network issue, possibly related to multicast. (Some switches and some hypervisor virtual networks don't play nice with corosync). Have you tried unicast? If not, try setting the element to have the attribute. Do note that unicast isn't as efficient as multicast, so thought it might work, I'd personally treat it as a debug tool to isolate the source of the problem. cheers digimer PS - Can you share your pacemaker configuration? On 20/10/14 03:40 PM, John Scalia wrote: Sure, and thanks for helping. Here's the /etc/cluster/cluster.conf file and it is identical on all three systems: uname -n reports "csgha1" on that system, "csgha2" on its system, and "csgha3" on the last system. I don't seem to have gethostip on any of these systems, so I don't know if the next section helps or not. "ifconfig -a" reports csgha1: eth0 = 172.17.1.21 eth1 = 10.10.1.128 csgha2: eth0 = 10.10.1.129 Yeah, I know this looks a little weird, but it was the way our automated VM control did the interfaces eth1 = 172.,17.1.3 csgha3: eth0 = 172.17.1.23 eth1 = 10.10.1.130 The /etc/hosts file on each system only has the 10.10.1.0/24 address for each system in in it. iptables is not running on these systems. Let me know if you need more information, and I very much appreciate your assistance. -- Jay On Mon, Oct 20, 2014 at 3:18 PM, Digimer wrote: On 20/10/14 02:50 PM, John Scalia wrote: Hi all, I'm trying to build my first ever HA cluster and I'm using 3 VMs running CentOS 6.5. I followed the instructions to the letter at: http://clusterlabs.org/quickstart-redhat.html
Re: [Linux-HA] New user can't get cman to recognize other systems
Ok, got it working after a little more effort, and the cluster is now properly reporting. On Tue, Oct 21, 2014 at 1:34 PM, John Scalia wrote: > So, I set "transport="udpi"' in the cluster.conf file, and it now looks > like this: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > But, after restarting the cluster I don't see any difference. Did I do > something wrong? > -- > Jay > > On Tue, Oct 21, 2014 at 12:25 PM, Digimer wrote: > >> No, you don't need to specify anything in cluster.conf for unicast to >> work. Corosync will divine the IPs by resolving the node names to IPs. If >> you set multicast and don't want to use the auto-selected mcast IP, then >> you can specify the mcast IP group to use via . >> >> digimer >> >> >> On 21/10/14 12:22 PM, John Scalia wrote: >> >>> OK, looking at the cman man page on this system, I see the line saying >>> "the corosync.conf file is not used." So, I'm guessing I need to set a >>> unicast address somewhere in the cluster.conf file, but the man page >>> only mentions the parameter. What can I use to >>> set this to a unicast address for ports 5404 and 5405? I'm assuming I >>> can't just put a unicast address for the multicast parameter, and the >>> man page for cluster.conf wasn't much help either. >>> >>> We're still working on having the security team permit these 3 systems >>> to use multicast. >>> >>> On 10/21/2014 11:51 AM, Digimer wrote: >>> Keep us posted. :) On 21/10/14 08:40 AM, John Scalia wrote: > I've been check hostname resolution this morning, and all the systems > are listed in each /etc/hosts file (No DNS in this environment.) and > ping works on every system both to itself and all the other systems. At > least it's working on the 10.10.1.0/24 network. > > I ran tcpdump trying to see what traffic is on port 5405 on each > system, > and I'm only seeing outbound on each, even though netstat shows each is > listening on the multicast address. My suspicion is that the router is > eating the multicast broadcasts, so I may try the unicast address > instead, but I'm waiting on one of our network engineers to see if my > suspicion is correct about the router. He volunteered to help late > yesterday. > > On 10/20/2014 4:34 PM, Digimer wrote: > >> It looks sane on the surface. The 'gethostip' tool comes from the >> 'syslinux' package, and it's really handy! The '-d' says to give the >> IP in dotted-decimanl notation only. >> >> What I was trying to see was whether the 'uname -n' resolved to the IP >> on the same network card as the other nodes. This is how corosync >> decides which interface to send cluster traffic onto. I suspect you >> might have a general network issue, possibly related to multicast. >> (Some switches and some hypervisor virtual networks don't play nice >> with corosync). >> >> Have you tried unicast? If not, try setting the element to >> have the attribute. Do note that unicast >> isn't as efficient as multicast, so thought it might work, I'd >> personally treat it as a debug tool to isolate the source of the >> problem. >> >> cheers >> >> digimer >> >> PS - Can you share your pacemaker configuration? >> >> On 20/10/14 03:40 PM, John Scalia wrote: >> >>> Sure, and thanks for helping. >>> >>> Here's the /etc/cluster/cluster.conf file and it is identical on all >>> three >>> systems: >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> uname -n reports "csgha1" on that system, "csgha2" on its system, and >>> "csgha3" on the last system. >>> I don't seem to have gethostip on any of these systems, so I don't >>> know if >>> the next section helps or not. >>> "ifconfig -a" reports csgha1: eth0 = 172.17.1.21 >>> eth1 = 10.10.1.128 >>> csgha2: eth0 = 10.10.1.129 >>> Yeah, I know this looks a little weird, but it was the way our >>> automated VM >>> control did the interfaces >>> eth1 = 172.,17.1.3 >>> csgha3: eth0 = 172.17.1.23 >>>
Re: [Linux-HA] New user can't get cman to recognize other systems
So, I set "transport="udpi"' in the cluster.conf file, and it now looks like this: But, after restarting the cluster I don't see any difference. Did I do something wrong? -- Jay On Tue, Oct 21, 2014 at 12:25 PM, Digimer wrote: > No, you don't need to specify anything in cluster.conf for unicast to > work. Corosync will divine the IPs by resolving the node names to IPs. If > you set multicast and don't want to use the auto-selected mcast IP, then > you can specify the mcast IP group to use via . > > digimer > > > On 21/10/14 12:22 PM, John Scalia wrote: > >> OK, looking at the cman man page on this system, I see the line saying >> "the corosync.conf file is not used." So, I'm guessing I need to set a >> unicast address somewhere in the cluster.conf file, but the man page >> only mentions the parameter. What can I use to >> set this to a unicast address for ports 5404 and 5405? I'm assuming I >> can't just put a unicast address for the multicast parameter, and the >> man page for cluster.conf wasn't much help either. >> >> We're still working on having the security team permit these 3 systems >> to use multicast. >> >> On 10/21/2014 11:51 AM, Digimer wrote: >> >>> Keep us posted. :) >>> >>> On 21/10/14 08:40 AM, John Scalia wrote: >>> I've been check hostname resolution this morning, and all the systems are listed in each /etc/hosts file (No DNS in this environment.) and ping works on every system both to itself and all the other systems. At least it's working on the 10.10.1.0/24 network. I ran tcpdump trying to see what traffic is on port 5405 on each system, and I'm only seeing outbound on each, even though netstat shows each is listening on the multicast address. My suspicion is that the router is eating the multicast broadcasts, so I may try the unicast address instead, but I'm waiting on one of our network engineers to see if my suspicion is correct about the router. He volunteered to help late yesterday. On 10/20/2014 4:34 PM, Digimer wrote: > It looks sane on the surface. The 'gethostip' tool comes from the > 'syslinux' package, and it's really handy! The '-d' says to give the > IP in dotted-decimanl notation only. > > What I was trying to see was whether the 'uname -n' resolved to the IP > on the same network card as the other nodes. This is how corosync > decides which interface to send cluster traffic onto. I suspect you > might have a general network issue, possibly related to multicast. > (Some switches and some hypervisor virtual networks don't play nice > with corosync). > > Have you tried unicast? If not, try setting the element to > have the attribute. Do note that unicast > isn't as efficient as multicast, so thought it might work, I'd > personally treat it as a debug tool to isolate the source of the > problem. > > cheers > > digimer > > PS - Can you share your pacemaker configuration? > > On 20/10/14 03:40 PM, John Scalia wrote: > >> Sure, and thanks for helping. >> >> Here's the /etc/cluster/cluster.conf file and it is identical on all >> three >> systems: >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> uname -n reports "csgha1" on that system, "csgha2" on its system, and >> "csgha3" on the last system. >> I don't seem to have gethostip on any of these systems, so I don't >> know if >> the next section helps or not. >> "ifconfig -a" reports csgha1: eth0 = 172.17.1.21 >> eth1 = 10.10.1.128 >> csgha2: eth0 = 10.10.1.129 >> Yeah, I know this looks a little weird, but it was the way our >> automated VM >> control did the interfaces >> eth1 = 172.,17.1.3 >> csgha3: eth0 = 172.17.1.23 >> eth1 = 10.10.1.130 >> The /etc/hosts file on each system only has the 10.10.1.0/24 >> address for >> each system in in it. >> iptables is not running on these systems. >> >> Let me know if you need more information, and I very much appreciate >> your >> assistance. >> -- >> Jay >> >> On Mon, Oct 20, 201
Re: [Linux-HA] New user can't get cman to recognize other systems
No, you don't need to specify anything in cluster.conf for unicast to work. Corosync will divine the IPs by resolving the node names to IPs. If you set multicast and don't want to use the auto-selected mcast IP, then you can specify the mcast IP group to use via . digimer On 21/10/14 12:22 PM, John Scalia wrote: OK, looking at the cman man page on this system, I see the line saying "the corosync.conf file is not used." So, I'm guessing I need to set a unicast address somewhere in the cluster.conf file, but the man page only mentions the parameter. What can I use to set this to a unicast address for ports 5404 and 5405? I'm assuming I can't just put a unicast address for the multicast parameter, and the man page for cluster.conf wasn't much help either. We're still working on having the security team permit these 3 systems to use multicast. On 10/21/2014 11:51 AM, Digimer wrote: Keep us posted. :) On 21/10/14 08:40 AM, John Scalia wrote: I've been check hostname resolution this morning, and all the systems are listed in each /etc/hosts file (No DNS in this environment.) and ping works on every system both to itself and all the other systems. At least it's working on the 10.10.1.0/24 network. I ran tcpdump trying to see what traffic is on port 5405 on each system, and I'm only seeing outbound on each, even though netstat shows each is listening on the multicast address. My suspicion is that the router is eating the multicast broadcasts, so I may try the unicast address instead, but I'm waiting on one of our network engineers to see if my suspicion is correct about the router. He volunteered to help late yesterday. On 10/20/2014 4:34 PM, Digimer wrote: It looks sane on the surface. The 'gethostip' tool comes from the 'syslinux' package, and it's really handy! The '-d' says to give the IP in dotted-decimanl notation only. What I was trying to see was whether the 'uname -n' resolved to the IP on the same network card as the other nodes. This is how corosync decides which interface to send cluster traffic onto. I suspect you might have a general network issue, possibly related to multicast. (Some switches and some hypervisor virtual networks don't play nice with corosync). Have you tried unicast? If not, try setting the element to have the attribute. Do note that unicast isn't as efficient as multicast, so thought it might work, I'd personally treat it as a debug tool to isolate the source of the problem. cheers digimer PS - Can you share your pacemaker configuration? On 20/10/14 03:40 PM, John Scalia wrote: Sure, and thanks for helping. Here's the /etc/cluster/cluster.conf file and it is identical on all three systems: uname -n reports "csgha1" on that system, "csgha2" on its system, and "csgha3" on the last system. I don't seem to have gethostip on any of these systems, so I don't know if the next section helps or not. "ifconfig -a" reports csgha1: eth0 = 172.17.1.21 eth1 = 10.10.1.128 csgha2: eth0 = 10.10.1.129 Yeah, I know this looks a little weird, but it was the way our automated VM control did the interfaces eth1 = 172.,17.1.3 csgha3: eth0 = 172.17.1.23 eth1 = 10.10.1.130 The /etc/hosts file on each system only has the 10.10.1.0/24 address for each system in in it. iptables is not running on these systems. Let me know if you need more information, and I very much appreciate your assistance. -- Jay On Mon, Oct 20, 2014 at 3:18 PM, Digimer wrote: On 20/10/14 02:50 PM, John Scalia wrote: Hi all, I'm trying to build my first ever HA cluster and I'm using 3 VMs running CentOS 6.5. I followed the instructions to the letter at: http://clusterlabs.org/quickstart-redhat.html and everything appears to start normally, but if I run "cman_tool nodes -a", I only see: Node StsInc Joined Name 1 M 64 2014-10--20 14:00:00 csgha1 Addresses: 10.10.1.128 2 X 0 csgha2 3 X 0 csgha3 In the other systems, the output is the same except for which system is shown as joined. Each shows just itself as belonging to the cluster. Also, "pcs status" reflects similarly with non-self systems showing offline. I've checked "netstat -an" and see each machine listening on ports 5405 and 5405. And the logs are rather involved, but I'm not seeing errors in it. Any ideas for where to look for what's causing them to not communicate? -- Jay Can you share your cluster.conf file please? Also, for each node: * uname -n * gethostip -d $(uname -n) * ifconfig |grep -B 1 $(gethostip -d $(uname -n)) | grep HWaddr |
Re: [Linux-HA] New user can't get cman to recognize other systems
OK, looking at the cman man page on this system, I see the line saying "the corosync.conf file is not used." So, I'm guessing I need to set a unicast address somewhere in the cluster.conf file, but the man page only mentions the parameter. What can I use to set this to a unicast address for ports 5404 and 5405? I'm assuming I can't just put a unicast address for the multicast parameter, and the man page for cluster.conf wasn't much help either. We're still working on having the security team permit these 3 systems to use multicast. On 10/21/2014 11:51 AM, Digimer wrote: Keep us posted. :) On 21/10/14 08:40 AM, John Scalia wrote: I've been check hostname resolution this morning, and all the systems are listed in each /etc/hosts file (No DNS in this environment.) and ping works on every system both to itself and all the other systems. At least it's working on the 10.10.1.0/24 network. I ran tcpdump trying to see what traffic is on port 5405 on each system, and I'm only seeing outbound on each, even though netstat shows each is listening on the multicast address. My suspicion is that the router is eating the multicast broadcasts, so I may try the unicast address instead, but I'm waiting on one of our network engineers to see if my suspicion is correct about the router. He volunteered to help late yesterday. On 10/20/2014 4:34 PM, Digimer wrote: It looks sane on the surface. The 'gethostip' tool comes from the 'syslinux' package, and it's really handy! The '-d' says to give the IP in dotted-decimanl notation only. What I was trying to see was whether the 'uname -n' resolved to the IP on the same network card as the other nodes. This is how corosync decides which interface to send cluster traffic onto. I suspect you might have a general network issue, possibly related to multicast. (Some switches and some hypervisor virtual networks don't play nice with corosync). Have you tried unicast? If not, try setting the element to have the attribute. Do note that unicast isn't as efficient as multicast, so thought it might work, I'd personally treat it as a debug tool to isolate the source of the problem. cheers digimer PS - Can you share your pacemaker configuration? On 20/10/14 03:40 PM, John Scalia wrote: Sure, and thanks for helping. Here's the /etc/cluster/cluster.conf file and it is identical on all three systems: uname -n reports "csgha1" on that system, "csgha2" on its system, and "csgha3" on the last system. I don't seem to have gethostip on any of these systems, so I don't know if the next section helps or not. "ifconfig -a" reports csgha1: eth0 = 172.17.1.21 eth1 = 10.10.1.128 csgha2: eth0 = 10.10.1.129 Yeah, I know this looks a little weird, but it was the way our automated VM control did the interfaces eth1 = 172.,17.1.3 csgha3: eth0 = 172.17.1.23 eth1 = 10.10.1.130 The /etc/hosts file on each system only has the 10.10.1.0/24 address for each system in in it. iptables is not running on these systems. Let me know if you need more information, and I very much appreciate your assistance. -- Jay On Mon, Oct 20, 2014 at 3:18 PM, Digimer wrote: On 20/10/14 02:50 PM, John Scalia wrote: Hi all, I'm trying to build my first ever HA cluster and I'm using 3 VMs running CentOS 6.5. I followed the instructions to the letter at: http://clusterlabs.org/quickstart-redhat.html and everything appears to start normally, but if I run "cman_tool nodes -a", I only see: Node StsInc Joined Name 1 M 64 2014-10--20 14:00:00 csgha1 Addresses: 10.10.1.128 2 X 0 csgha2 3 X 0 csgha3 In the other systems, the output is the same except for which system is shown as joined. Each shows just itself as belonging to the cluster. Also, "pcs status" reflects similarly with non-self systems showing offline. I've checked "netstat -an" and see each machine listening on ports 5405 and 5405. And the logs are rather involved, but I'm not seeing errors in it. Any ideas for where to look for what's causing them to not communicate? -- Jay Can you share your cluster.conf file please? Also, for each node: * uname -n * gethostip -d $(uname -n) * ifconfig |grep -B 1 $(gethostip -d $(uname -n)) | grep HWaddr | awk '{ print $1 }' * iptables-save | grep -i multi -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-h
Re: [Linux-HA] New user can't get cman to recognize other systems
Keep us posted. :) On 21/10/14 08:40 AM, John Scalia wrote: I've been check hostname resolution this morning, and all the systems are listed in each /etc/hosts file (No DNS in this environment.) and ping works on every system both to itself and all the other systems. At least it's working on the 10.10.1.0/24 network. I ran tcpdump trying to see what traffic is on port 5405 on each system, and I'm only seeing outbound on each, even though netstat shows each is listening on the multicast address. My suspicion is that the router is eating the multicast broadcasts, so I may try the unicast address instead, but I'm waiting on one of our network engineers to see if my suspicion is correct about the router. He volunteered to help late yesterday. On 10/20/2014 4:34 PM, Digimer wrote: It looks sane on the surface. The 'gethostip' tool comes from the 'syslinux' package, and it's really handy! The '-d' says to give the IP in dotted-decimanl notation only. What I was trying to see was whether the 'uname -n' resolved to the IP on the same network card as the other nodes. This is how corosync decides which interface to send cluster traffic onto. I suspect you might have a general network issue, possibly related to multicast. (Some switches and some hypervisor virtual networks don't play nice with corosync). Have you tried unicast? If not, try setting the element to have the attribute. Do note that unicast isn't as efficient as multicast, so thought it might work, I'd personally treat it as a debug tool to isolate the source of the problem. cheers digimer PS - Can you share your pacemaker configuration? On 20/10/14 03:40 PM, John Scalia wrote: Sure, and thanks for helping. Here's the /etc/cluster/cluster.conf file and it is identical on all three systems: uname -n reports "csgha1" on that system, "csgha2" on its system, and "csgha3" on the last system. I don't seem to have gethostip on any of these systems, so I don't know if the next section helps or not. "ifconfig -a" reports csgha1: eth0 = 172.17.1.21 eth1 = 10.10.1.128 csgha2: eth0 = 10.10.1.129 Yeah, I know this looks a little weird, but it was the way our automated VM control did the interfaces eth1 = 172.,17.1.3 csgha3: eth0 = 172.17.1.23 eth1 = 10.10.1.130 The /etc/hosts file on each system only has the 10.10.1.0/24 address for each system in in it. iptables is not running on these systems. Let me know if you need more information, and I very much appreciate your assistance. -- Jay On Mon, Oct 20, 2014 at 3:18 PM, Digimer wrote: On 20/10/14 02:50 PM, John Scalia wrote: Hi all, I'm trying to build my first ever HA cluster and I'm using 3 VMs running CentOS 6.5. I followed the instructions to the letter at: http://clusterlabs.org/quickstart-redhat.html and everything appears to start normally, but if I run "cman_tool nodes -a", I only see: Node StsInc Joined Name 1 M 64 2014-10--20 14:00:00 csgha1 Addresses: 10.10.1.128 2 X 0 csgha2 3 X 0 csgha3 In the other systems, the output is the same except for which system is shown as joined. Each shows just itself as belonging to the cluster. Also, "pcs status" reflects similarly with non-self systems showing offline. I've checked "netstat -an" and see each machine listening on ports 5405 and 5405. And the logs are rather involved, but I'm not seeing errors in it. Any ideas for where to look for what's causing them to not communicate? -- Jay Can you share your cluster.conf file please? Also, for each node: * uname -n * gethostip -d $(uname -n) * ifconfig |grep -B 1 $(gethostip -d $(uname -n)) | grep HWaddr | awk '{ print $1 }' * iptables-save | grep -i multi -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in t
Re: [Linux-HA] New user can't get cman to recognize other systems
I've been check hostname resolution this morning, and all the systems are listed in each /etc/hosts file (No DNS in this environment.) and ping works on every system both to itself and all the other systems. At least it's working on the 10.10.1.0/24 network. I ran tcpdump trying to see what traffic is on port 5405 on each system, and I'm only seeing outbound on each, even though netstat shows each is listening on the multicast address. My suspicion is that the router is eating the multicast broadcasts, so I may try the unicast address instead, but I'm waiting on one of our network engineers to see if my suspicion is correct about the router. He volunteered to help late yesterday. On 10/20/2014 4:34 PM, Digimer wrote: It looks sane on the surface. The 'gethostip' tool comes from the 'syslinux' package, and it's really handy! The '-d' says to give the IP in dotted-decimanl notation only. What I was trying to see was whether the 'uname -n' resolved to the IP on the same network card as the other nodes. This is how corosync decides which interface to send cluster traffic onto. I suspect you might have a general network issue, possibly related to multicast. (Some switches and some hypervisor virtual networks don't play nice with corosync). Have you tried unicast? If not, try setting the element to have the attribute. Do note that unicast isn't as efficient as multicast, so thought it might work, I'd personally treat it as a debug tool to isolate the source of the problem. cheers digimer PS - Can you share your pacemaker configuration? On 20/10/14 03:40 PM, John Scalia wrote: Sure, and thanks for helping. Here's the /etc/cluster/cluster.conf file and it is identical on all three systems: uname -n reports "csgha1" on that system, "csgha2" on its system, and "csgha3" on the last system. I don't seem to have gethostip on any of these systems, so I don't know if the next section helps or not. "ifconfig -a" reports csgha1: eth0 = 172.17.1.21 eth1 = 10.10.1.128 csgha2: eth0 = 10.10.1.129 Yeah, I know this looks a little weird, but it was the way our automated VM control did the interfaces eth1 = 172.,17.1.3 csgha3: eth0 = 172.17.1.23 eth1 = 10.10.1.130 The /etc/hosts file on each system only has the 10.10.1.0/24 address for each system in in it. iptables is not running on these systems. Let me know if you need more information, and I very much appreciate your assistance. -- Jay On Mon, Oct 20, 2014 at 3:18 PM, Digimer wrote: On 20/10/14 02:50 PM, John Scalia wrote: Hi all, I'm trying to build my first ever HA cluster and I'm using 3 VMs running CentOS 6.5. I followed the instructions to the letter at: http://clusterlabs.org/quickstart-redhat.html and everything appears to start normally, but if I run "cman_tool nodes -a", I only see: Node StsInc Joined Name 1 M 64 2014-10--20 14:00:00 csgha1 Addresses: 10.10.1.128 2 X 0 csgha2 3 X 0 csgha3 In the other systems, the output is the same except for which system is shown as joined. Each shows just itself as belonging to the cluster. Also, "pcs status" reflects similarly with non-self systems showing offline. I've checked "netstat -an" and see each machine listening on ports 5405 and 5405. And the logs are rather involved, but I'm not seeing errors in it. Any ideas for where to look for what's causing them to not communicate? -- Jay Can you share your cluster.conf file please? Also, for each node: * uname -n * gethostip -d $(uname -n) * ifconfig |grep -B 1 $(gethostip -d $(uname -n)) | grep HWaddr | awk '{ print $1 }' * iptables-save | grep -i multi -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] New user can't get cman to recognize other systems
OK, got it. Sent from my iPad > On Oct 20, 2014, at 10:10 PM, Andrew Beekhof wrote: > > >> On 21 Oct 2014, at 7:17 am, John Scalia wrote: >> >> Thanks, but on centOS are you saying to use "pcs cluster start" rather than >> using "service cman start" and "service pacemaker start"? I was just going >> by the tutorial, which doesn't mention this. > > 'service pacemaker start' and 'pcs cluster start' are pretty much equivalent. > both will start cman if its not running already > >> >>> On 10/20/2014 3:44 PM, Maciej Rostański wrote: >>> Hello, >>> >>> In my experience such problems were the effect of my mistakes, such as not >>> having all hosts in /etc/hosts file. Check this, please, I know it sounds >>> simple. >>> >>> Also, commands: >>> pcs cluster setup --name clustername node1 node2 node3 >>> pcs cluster enable >>> pcs cluster start >>> >>> are much more pleasant to run than ccs method you use, and they work on >>> Centos6.5 >>> >>> Regards, >>> Maciej >>> >>> >>> >>> 2014-10-20 20:50 GMT+02:00 John Scalia : >>> Hi all, I'm trying to build my first ever HA cluster and I'm using 3 VMs running CentOS 6.5. I followed the instructions to the letter at: http://clusterlabs.org/quickstart-redhat.html and everything appears to start normally, but if I run "cman_tool nodes -a", I only see: Node StsInc Joined Name 1 M 64 2014-10--20 14:00:00 csgha1 Addresses: 10.10.1.128 2 X 0 csgha2 3 X 0 csgha3 In the other systems, the output is the same except for which system is shown as joined. Each shows just itself as belonging to the cluster. Also, "pcs status" reflects similarly with non-self systems showing offline. I've checked "netstat -an" and see each machine listening on ports 5405 and 5405. And the logs are rather involved, but I'm not seeing errors in it. Any ideas for where to look for what's causing them to not communicate? -- Jay ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems >>> >>> >> >> ___ >> Linux-HA mailing list >> Linux-HA@lists.linux-ha.org >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems > > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] New user can't get cman to recognize other systems
> On 21 Oct 2014, at 7:17 am, John Scalia wrote: > > Thanks, but on centOS are you saying to use "pcs cluster start" rather than > using "service cman start" and "service pacemaker start"? I was just going by > the tutorial, which doesn't mention this. 'service pacemaker start' and 'pcs cluster start' are pretty much equivalent. both will start cman if its not running already > > On 10/20/2014 3:44 PM, Maciej Rostański wrote: >> Hello, >> >> In my experience such problems were the effect of my mistakes, such as not >> having all hosts in /etc/hosts file. Check this, please, I know it sounds >> simple. >> >> Also, commands: >> pcs cluster setup --name clustername node1 node2 node3 >> pcs cluster enable >> pcs cluster start >> >> are much more pleasant to run than ccs method you use, and they work on >> Centos6.5 >> >> Regards, >> Maciej >> >> >> >> 2014-10-20 20:50 GMT+02:00 John Scalia : >> >>> Hi all, >>> >>> I'm trying to build my first ever HA cluster and I'm using 3 VMs running >>> CentOS 6.5. I followed the instructions to the letter at: >>> >>> http://clusterlabs.org/quickstart-redhat.html >>> >>> and everything appears to start normally, but if I run "cman_tool nodes >>> -a", I only see: >>> >>> Node StsInc Joined Name >>> 1 M 64 2014-10--20 14:00:00 csgha1 >>> Addresses: 10.10.1.128 >>> 2 X 0 csgha2 >>> 3 X 0 csgha3 >>> >>> In the other systems, the output is the same except for which system is >>> shown as joined. Each shows just itself as belonging to the cluster. Also, >>> "pcs status" reflects similarly with non-self systems showing offline. I've >>> checked "netstat -an" and see each machine listening on ports 5405 and >>> 5405. And the logs are rather involved, but I'm not seeing errors in it. >>> >>> Any ideas for where to look for what's causing them to not communicate? >>> -- >>> Jay >>> ___ >>> Linux-HA mailing list >>> Linux-HA@lists.linux-ha.org >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>> See also: http://linux-ha.org/ReportingProblems >>> >> >> > > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] New user can't get cman to recognize other systems
Well, with 6.4 and 6.5 (which I like a lot) there is this specific situation - no more crm, only pcs and ccs, but on the other hand, stack with cman (which is being replaced by corosync 2.0 now). So the documentation found on various sites is rarely handy... 2014-10-20 22:17 GMT+02:00 John Scalia : > Thanks, but on centOS are you saying to use "pcs cluster start" rather > than using "service cman start" and "service pacemaker start"? I was just > going by the tutorial, which doesn't mention this. > > > On 10/20/2014 3:44 PM, Maciej Rostański wrote: > >> Hello, >> >> In my experience such problems were the effect of my mistakes, such as not >> having all hosts in /etc/hosts file. Check this, please, I know it sounds >> simple. >> >> Also, commands: >> pcs cluster setup --name clustername node1 node2 node3 >> pcs cluster enable >> pcs cluster start >> >> are much more pleasant to run than ccs method you use, and they work on >> Centos6.5 >> >> Regards, >> Maciej >> >> >> >> 2014-10-20 20:50 GMT+02:00 John Scalia : >> >> Hi all, >>> >>> I'm trying to build my first ever HA cluster and I'm using 3 VMs running >>> CentOS 6.5. I followed the instructions to the letter at: >>> >>> http://clusterlabs.org/quickstart-redhat.html >>> >>> and everything appears to start normally, but if I run "cman_tool nodes >>> -a", I only see: >>> >>> Node StsInc Joined Name >>> 1 M 64 2014-10--20 14:00:00 csgha1 >>> Addresses: 10.10.1.128 >>> 2 X 0 >>> csgha2 >>> 3 X 0 >>> csgha3 >>> >>> In the other systems, the output is the same except for which system is >>> shown as joined. Each shows just itself as belonging to the cluster. >>> Also, >>> "pcs status" reflects similarly with non-self systems showing offline. >>> I've >>> checked "netstat -an" and see each machine listening on ports 5405 and >>> 5405. And the logs are rather involved, but I'm not seeing errors in it. >>> >>> Any ideas for where to look for what's causing them to not communicate? >>> -- >>> Jay >>> ___ >>> Linux-HA mailing list >>> Linux-HA@lists.linux-ha.org >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>> See also: http://linux-ha.org/ReportingProblems >>> >>> >> >> > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > -- Maciej Rostanski mrostan...@gmail.com http://mrdean.wordpress.com ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] New user can't get cman to recognize other systems
It looks sane on the surface. The 'gethostip' tool comes from the 'syslinux' package, and it's really handy! The '-d' says to give the IP in dotted-decimanl notation only. What I was trying to see was whether the 'uname -n' resolved to the IP on the same network card as the other nodes. This is how corosync decides which interface to send cluster traffic onto. I suspect you might have a general network issue, possibly related to multicast. (Some switches and some hypervisor virtual networks don't play nice with corosync). Have you tried unicast? If not, try setting the element to have the attribute. Do note that unicast isn't as efficient as multicast, so thought it might work, I'd personally treat it as a debug tool to isolate the source of the problem. cheers digimer PS - Can you share your pacemaker configuration? On 20/10/14 03:40 PM, John Scalia wrote: Sure, and thanks for helping. Here's the /etc/cluster/cluster.conf file and it is identical on all three systems: uname -n reports "csgha1" on that system, "csgha2" on its system, and "csgha3" on the last system. I don't seem to have gethostip on any of these systems, so I don't know if the next section helps or not. "ifconfig -a" reports csgha1: eth0 = 172.17.1.21 eth1 = 10.10.1.128 csgha2: eth0 = 10.10.1.129 Yeah, I know this looks a little weird, but it was the way our automated VM control did the interfaces eth1 = 172.,17.1.3 csgha3: eth0 = 172.17.1.23 eth1 = 10.10.1.130 The /etc/hosts file on each system only has the 10.10.1.0/24 address for each system in in it. iptables is not running on these systems. Let me know if you need more information, and I very much appreciate your assistance. -- Jay On Mon, Oct 20, 2014 at 3:18 PM, Digimer wrote: On 20/10/14 02:50 PM, John Scalia wrote: Hi all, I'm trying to build my first ever HA cluster and I'm using 3 VMs running CentOS 6.5. I followed the instructions to the letter at: http://clusterlabs.org/quickstart-redhat.html and everything appears to start normally, but if I run "cman_tool nodes -a", I only see: Node StsInc Joined Name 1 M 64 2014-10--20 14:00:00 csgha1 Addresses: 10.10.1.128 2 X 0 csgha2 3 X 0 csgha3 In the other systems, the output is the same except for which system is shown as joined. Each shows just itself as belonging to the cluster. Also, "pcs status" reflects similarly with non-self systems showing offline. I've checked "netstat -an" and see each machine listening on ports 5405 and 5405. And the logs are rather involved, but I'm not seeing errors in it. Any ideas for where to look for what's causing them to not communicate? -- Jay Can you share your cluster.conf file please? Also, for each node: * uname -n * gethostip -d $(uname -n) * ifconfig |grep -B 1 $(gethostip -d $(uname -n)) | grep HWaddr | awk '{ print $1 }' * iptables-save | grep -i multi -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] New user can't get cman to recognize other systems
Thanks, but on centOS are you saying to use "pcs cluster start" rather than using "service cman start" and "service pacemaker start"? I was just going by the tutorial, which doesn't mention this. On 10/20/2014 3:44 PM, Maciej Rostański wrote: Hello, In my experience such problems were the effect of my mistakes, such as not having all hosts in /etc/hosts file. Check this, please, I know it sounds simple. Also, commands: pcs cluster setup --name clustername node1 node2 node3 pcs cluster enable pcs cluster start are much more pleasant to run than ccs method you use, and they work on Centos6.5 Regards, Maciej 2014-10-20 20:50 GMT+02:00 John Scalia : Hi all, I'm trying to build my first ever HA cluster and I'm using 3 VMs running CentOS 6.5. I followed the instructions to the letter at: http://clusterlabs.org/quickstart-redhat.html and everything appears to start normally, but if I run "cman_tool nodes -a", I only see: Node StsInc Joined Name 1 M 64 2014-10--20 14:00:00 csgha1 Addresses: 10.10.1.128 2 X 0 csgha2 3 X 0 csgha3 In the other systems, the output is the same except for which system is shown as joined. Each shows just itself as belonging to the cluster. Also, "pcs status" reflects similarly with non-self systems showing offline. I've checked "netstat -an" and see each machine listening on ports 5405 and 5405. And the logs are rather involved, but I'm not seeing errors in it. Any ideas for where to look for what's causing them to not communicate? -- Jay ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] New user can't get cman to recognize other systems
Hello, In my experience such problems were the effect of my mistakes, such as not having all hosts in /etc/hosts file. Check this, please, I know it sounds simple. Also, commands: pcs cluster setup --name clustername node1 node2 node3 pcs cluster enable pcs cluster start are much more pleasant to run than ccs method you use, and they work on Centos6.5 Regards, Maciej 2014-10-20 20:50 GMT+02:00 John Scalia : > Hi all, > > I'm trying to build my first ever HA cluster and I'm using 3 VMs running > CentOS 6.5. I followed the instructions to the letter at: > > http://clusterlabs.org/quickstart-redhat.html > > and everything appears to start normally, but if I run "cman_tool nodes > -a", I only see: > > Node StsInc Joined Name > 1 M 64 2014-10--20 14:00:00 csgha1 > Addresses: 10.10.1.128 > 2 X 0 csgha2 > 3 X 0 csgha3 > > In the other systems, the output is the same except for which system is > shown as joined. Each shows just itself as belonging to the cluster. Also, > "pcs status" reflects similarly with non-self systems showing offline. I've > checked "netstat -an" and see each machine listening on ports 5405 and > 5405. And the logs are rather involved, but I'm not seeing errors in it. > > Any ideas for where to look for what's causing them to not communicate? > -- > Jay > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > -- Maciej Rostanski mrostan...@gmail.com http://mrdean.wordpress.com ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] New user can't get cman to recognize other systems
Sure, and thanks for helping. Here's the /etc/cluster/cluster.conf file and it is identical on all three systems: uname -n reports "csgha1" on that system, "csgha2" on its system, and "csgha3" on the last system. I don't seem to have gethostip on any of these systems, so I don't know if the next section helps or not. "ifconfig -a" reports csgha1: eth0 = 172.17.1.21 eth1 = 10.10.1.128 csgha2: eth0 = 10.10.1.129 Yeah, I know this looks a little weird, but it was the way our automated VM control did the interfaces eth1 = 172.,17.1.3 csgha3: eth0 = 172.17.1.23 eth1 = 10.10.1.130 The /etc/hosts file on each system only has the 10.10.1.0/24 address for each system in in it. iptables is not running on these systems. Let me know if you need more information, and I very much appreciate your assistance. -- Jay On Mon, Oct 20, 2014 at 3:18 PM, Digimer wrote: > On 20/10/14 02:50 PM, John Scalia wrote: > >> Hi all, >> >> I'm trying to build my first ever HA cluster and I'm using 3 VMs running >> CentOS 6.5. I followed the instructions to the letter at: >> >> http://clusterlabs.org/quickstart-redhat.html >> >> and everything appears to start normally, but if I run "cman_tool nodes >> -a", I only see: >> >> Node StsInc Joined Name >> 1 M 64 2014-10--20 14:00:00 csgha1 >> Addresses: 10.10.1.128 >> 2 X 0 >> csgha2 >> 3 X 0 >> csgha3 >> >> In the other systems, the output is the same except for which system is >> shown as joined. Each shows just itself as belonging to the cluster. >> Also, "pcs status" reflects similarly with non-self systems showing >> offline. I've checked "netstat -an" and see each machine listening on >> ports 5405 and 5405. And the logs are rather involved, but I'm not >> seeing errors in it. >> >> Any ideas for where to look for what's causing them to not communicate? >> -- >> Jay >> > > Can you share your cluster.conf file please? Also, for each node: > > * uname -n > * gethostip -d $(uname -n) > * ifconfig |grep -B 1 $(gethostip -d $(uname -n)) | grep HWaddr | awk '{ > print $1 }' > * iptables-save | grep -i multi > > -- > Digimer > Papers and Projects: https://alteeve.ca/w/ > What if the cure for cancer is trapped in the mind of a person without > access to education? > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] New user can't get cman to recognize other systems
On 20/10/14 02:50 PM, John Scalia wrote: Hi all, I'm trying to build my first ever HA cluster and I'm using 3 VMs running CentOS 6.5. I followed the instructions to the letter at: http://clusterlabs.org/quickstart-redhat.html and everything appears to start normally, but if I run "cman_tool nodes -a", I only see: Node StsInc Joined Name 1 M 64 2014-10--20 14:00:00 csgha1 Addresses: 10.10.1.128 2 X 0 csgha2 3 X 0 csgha3 In the other systems, the output is the same except for which system is shown as joined. Each shows just itself as belonging to the cluster. Also, "pcs status" reflects similarly with non-self systems showing offline. I've checked "netstat -an" and see each machine listening on ports 5405 and 5405. And the logs are rather involved, but I'm not seeing errors in it. Any ideas for where to look for what's causing them to not communicate? -- Jay Can you share your cluster.conf file please? Also, for each node: * uname -n * gethostip -d $(uname -n) * ifconfig |grep -B 1 $(gethostip -d $(uname -n)) | grep HWaddr | awk '{ print $1 }' * iptables-save | grep -i multi -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems