Re: [ClusterLabs] Odd clvmd error - clvmd: Unable to create DLM lockspace for CLVM: Address already in use
On 25/09/15 00:09, Digimer wrote: > I had a RHEL 6.7, cman + rgmanager cluster that I've built many times > before. Oddly, I just hit this error: > > > [root@node2 ~]# /etc/init.d/clvmd start > Starting clvmd: clvmd could not connect to cluster manager > Consult syslog for more information > > > syslog: > > Sep 24 23:00:30 node2 kernel: dlm: Using SCTP for communications > Sep 24 23:00:30 node2 clvmd: Unable to create DLM lockspace for CLVM: > Address already in use > Sep 24 23:00:30 node2 kernel: dlm: Can't bind to port 21064 addr number 1 This seems to be the key to it. I can't imagine what else would be using port 21064 (apart from DLM using TCP as well as SCTP but I don' think that's possible!) Have a look in netstat and see what else is using that port. It could be that the socket was in use and is taking a while to shut down so it might go away on its own too. Chrissie > Sep 24 23:00:30 node2 kernel: dlm: cannot start dlm lowcomms -98 > > > There are no iptables rules: > > > [root@node2 ~]# iptables-save > > > And there are no DLM lockspaces, either: > > > [root@node2 ~]# dlm_tool ls > [root@node2 ~]# > > > I tried withdrawing the node from the cluster entirely, the started cman > alone and tried to start clvmd, same issue. > > Pinging between the two nodes seems OK: > > > [root@node1 ~]# uname -n > node1.ccrs.bcn > [root@node1 ~]# ping -c 2 node1.ccrs.bcn > PING node1.bcn (10.20.10.1) 56(84) bytes of data. > 64 bytes from node1.bcn (10.20.10.1): icmp_seq=1 ttl=64 time=0.015 ms > 64 bytes from node1.bcn (10.20.10.1): icmp_seq=2 ttl=64 time=0.017 ms > > --- node1.bcn ping statistics --- > 2 packets transmitted, 2 received, 0% packet loss, time 1000ms > rtt min/avg/max/mdev = 0.015/0.016/0.017/0.001 ms > > [root@node2 ~]# uname -n > node2.ccrs.bcn > [root@node2 ~]# ping -c 2 node1.ccrs.bcn > PING node1.bcn (10.20.10.1) 56(84) bytes of data. > 64 bytes from node1.bcn (10.20.10.1): icmp_seq=1 ttl=64 time=0.079 ms > 64 bytes from node1.bcn (10.20.10.1): icmp_seq=2 ttl=64 time=0.076 ms > > --- node1.bcn ping statistics --- > 2 packets transmitted, 2 received, 0% packet loss, time 999ms > rtt min/avg/max/mdev = 0.076/0.077/0.079/0.008 ms > > > I have RRP configured and pings work on the second network, too: > > > [root@node1 ~]# corosync-objctl |grep ring -A 5 > totem.interface.ringnumber=0 > totem.interface.bindnetaddr=10.20.10.1 > totem.interface.mcastaddr=239.192.100.163 > totem.interface.mcastport=5405 > totem.interface.member.memberaddr=node1.ccrs.bcn > totem.interface.member.memberaddr=node2.ccrs.bcn > totem.interface.ringnumber=1 > totem.interface.bindnetaddr=10.10.10.1 > totem.interface.mcastaddr=239.192.100.164 > totem.interface.mcastport=5405 > totem.interface.member.memberaddr=node1.sn > totem.interface.member.memberaddr=node2.sn > > [root@node1 ~]# ping -c 2 node2.sn > PING node2.sn (10.10.10.2) 56(84) bytes of data. > 64 bytes from node2.sn (10.10.10.2): icmp_seq=1 ttl=64 time=0.111 ms > 64 bytes from node2.sn (10.10.10.2): icmp_seq=2 ttl=64 time=0.120 ms > > --- node2.sn ping statistics --- > 2 packets transmitted, 2 received, 0% packet loss, time 999ms > rtt min/avg/max/mdev = 0.111/0.115/0.120/0.011 ms > > [root@node2 ~]# ping -c 2 node1.sn > PING node1.sn (10.10.10.1) 56(84) bytes of data. > 64 bytes from node1.sn (10.10.10.1): icmp_seq=1 ttl=64 time=0.079 ms > 64 bytes from node1.sn (10.10.10.1): icmp_seq=2 ttl=64 time=0.171 ms > > --- node1.sn ping statistics --- > 2 packets transmitted, 2 received, 0% packet loss, time 1000ms > rtt min/avg/max/mdev = 0.079/0.125/0.171/0.046 ms > > > Here is the cluster.conf: > > > [root@node1 ~]# cat /etc/cluster/cluster.conf > > > > > > > > >ipaddr="10.250.199.15" login="admin" > passwd="secret" delay="15" action="reboot" /> > > >action="reboot" /> >action="reboot" /> > > > > > > > >ipaddr="10.250.199.17" login="admin" > passwd="secret" action="reboot" /> > > >action="reboot" /> >action="reboot" /> > > > > > > > >ipaddr="pdu1A" /> >ipaddr="pdu1B" /> >ipaddr="pdu2A" /> >ipaddr="pdu2B" /> > > > > > >
Re: [ClusterLabs] Odd clvmd error - clvmd: Unable to create DLM lockspace for CLVM: Address already in use
On 25/09/15 03:44 AM, Christine Caulfield wrote: > On 25/09/15 00:09, Digimer wrote: >> I had a RHEL 6.7, cman + rgmanager cluster that I've built many times >> before. Oddly, I just hit this error: >> >> >> [root@node2 ~]# /etc/init.d/clvmd start >> Starting clvmd: clvmd could not connect to cluster manager >> Consult syslog for more information >> >> >> syslog: >> >> Sep 24 23:00:30 node2 kernel: dlm: Using SCTP for communications >> Sep 24 23:00:30 node2 clvmd: Unable to create DLM lockspace for CLVM: >> Address already in use >> Sep 24 23:00:30 node2 kernel: dlm: Can't bind to port 21064 addr number 1 > > This seems to be the key to it. I can't imagine what else would be using > port 21064 (apart from DLM using TCP as well as SCTP but I don' think > that's possible!) > > Have a look in netstat and see what else is using that port. > > It could be that the socket was in use and is taking a while to shut > down so it might go away on its own too. > > Chrissie netstat and lsof showed nothing using it. Looking at the logs (of our installer), it had manually started drbd + clvmd + gfs2 just fine. Then it asked rgmanager to start which tried to tear down the already running storage services before (re)starting them. It looks like the (drbd) UpToDate node got called to stop before the Inconsistent node, which failed because the Inconsistent node needed the UpToDate node (gfs2 was still mounted so clvmd held open the drbd device). I'm not clear on what happened next, but things went sideways. So I manually stopped everything, including clvmd/cman, and it still threw that error. Eventually I rebooted both nodes and it went back to working. Odd. >> Sep 24 23:00:30 node2 kernel: dlm: cannot start dlm lowcomms -98 >> >> >> There are no iptables rules: >> >> >> [root@node2 ~]# iptables-save >> >> >> And there are no DLM lockspaces, either: >> >> >> [root@node2 ~]# dlm_tool ls >> [root@node2 ~]# >> >> >> I tried withdrawing the node from the cluster entirely, the started cman >> alone and tried to start clvmd, same issue. >> >> Pinging between the two nodes seems OK: >> >> >> [root@node1 ~]# uname -n >> node1.ccrs.bcn >> [root@node1 ~]# ping -c 2 node1.ccrs.bcn >> PING node1.bcn (10.20.10.1) 56(84) bytes of data. >> 64 bytes from node1.bcn (10.20.10.1): icmp_seq=1 ttl=64 time=0.015 ms >> 64 bytes from node1.bcn (10.20.10.1): icmp_seq=2 ttl=64 time=0.017 ms >> >> --- node1.bcn ping statistics --- >> 2 packets transmitted, 2 received, 0% packet loss, time 1000ms >> rtt min/avg/max/mdev = 0.015/0.016/0.017/0.001 ms >> >> [root@node2 ~]# uname -n >> node2.ccrs.bcn >> [root@node2 ~]# ping -c 2 node1.ccrs.bcn >> PING node1.bcn (10.20.10.1) 56(84) bytes of data. >> 64 bytes from node1.bcn (10.20.10.1): icmp_seq=1 ttl=64 time=0.079 ms >> 64 bytes from node1.bcn (10.20.10.1): icmp_seq=2 ttl=64 time=0.076 ms >> >> --- node1.bcn ping statistics --- >> 2 packets transmitted, 2 received, 0% packet loss, time 999ms >> rtt min/avg/max/mdev = 0.076/0.077/0.079/0.008 ms >> >> >> I have RRP configured and pings work on the second network, too: >> >> >> [root@node1 ~]# corosync-objctl |grep ring -A 5 >> totem.interface.ringnumber=0 >> totem.interface.bindnetaddr=10.20.10.1 >> totem.interface.mcastaddr=239.192.100.163 >> totem.interface.mcastport=5405 >> totem.interface.member.memberaddr=node1.ccrs.bcn >> totem.interface.member.memberaddr=node2.ccrs.bcn >> totem.interface.ringnumber=1 >> totem.interface.bindnetaddr=10.10.10.1 >> totem.interface.mcastaddr=239.192.100.164 >> totem.interface.mcastport=5405 >> totem.interface.member.memberaddr=node1.sn >> totem.interface.member.memberaddr=node2.sn >> >> [root@node1 ~]# ping -c 2 node2.sn >> PING node2.sn (10.10.10.2) 56(84) bytes of data. >> 64 bytes from node2.sn (10.10.10.2): icmp_seq=1 ttl=64 time=0.111 ms >> 64 bytes from node2.sn (10.10.10.2): icmp_seq=2 ttl=64 time=0.120 ms >> >> --- node2.sn ping statistics --- >> 2 packets transmitted, 2 received, 0% packet loss, time 999ms >> rtt min/avg/max/mdev = 0.111/0.115/0.120/0.011 ms >> >> [root@node2 ~]# ping -c 2 node1.sn >> PING node1.sn (10.10.10.1) 56(84) bytes of data. >> 64 bytes from node1.sn (10.10.10.1): icmp_seq=1 ttl=64 time=0.079 ms >> 64 bytes from node1.sn (10.10.10.1): icmp_seq=2 ttl=64 time=0.171 ms >> >> --- node1.sn ping statistics --- >> 2 packets transmitted, 2 received, 0% packet loss, time 1000ms >> rtt min/avg/max/mdev = 0.079/0.125/0.171/0.046 ms >> >> >> Here is the cluster.conf: >> >> >> [root@node1 ~]# cat /etc/cluster/cluster.conf >> >> >> >> >> >> >> >> >> > ipaddr="10.250.199.15" login="admin" >> passwd="secret" delay="15" action="reboot" /> >> >> >> > action="reboot" /> >>
[ClusterLabs] Odd clvmd error - clvmd: Unable to create DLM lockspace for CLVM: Address already in use
I had a RHEL 6.7, cman + rgmanager cluster that I've built many times before. Oddly, I just hit this error: [root@node2 ~]# /etc/init.d/clvmd start Starting clvmd: clvmd could not connect to cluster manager Consult syslog for more information syslog: Sep 24 23:00:30 node2 kernel: dlm: Using SCTP for communications Sep 24 23:00:30 node2 clvmd: Unable to create DLM lockspace for CLVM: Address already in use Sep 24 23:00:30 node2 kernel: dlm: Can't bind to port 21064 addr number 1 Sep 24 23:00:30 node2 kernel: dlm: cannot start dlm lowcomms -98 There are no iptables rules: [root@node2 ~]# iptables-save And there are no DLM lockspaces, either: [root@node2 ~]# dlm_tool ls [root@node2 ~]# I tried withdrawing the node from the cluster entirely, the started cman alone and tried to start clvmd, same issue. Pinging between the two nodes seems OK: [root@node1 ~]# uname -n node1.ccrs.bcn [root@node1 ~]# ping -c 2 node1.ccrs.bcn PING node1.bcn (10.20.10.1) 56(84) bytes of data. 64 bytes from node1.bcn (10.20.10.1): icmp_seq=1 ttl=64 time=0.015 ms 64 bytes from node1.bcn (10.20.10.1): icmp_seq=2 ttl=64 time=0.017 ms --- node1.bcn ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1000ms rtt min/avg/max/mdev = 0.015/0.016/0.017/0.001 ms [root@node2 ~]# uname -n node2.ccrs.bcn [root@node2 ~]# ping -c 2 node1.ccrs.bcn PING node1.bcn (10.20.10.1) 56(84) bytes of data. 64 bytes from node1.bcn (10.20.10.1): icmp_seq=1 ttl=64 time=0.079 ms 64 bytes from node1.bcn (10.20.10.1): icmp_seq=2 ttl=64 time=0.076 ms --- node1.bcn ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 0.076/0.077/0.079/0.008 ms I have RRP configured and pings work on the second network, too: [root@node1 ~]# corosync-objctl |grep ring -A 5 totem.interface.ringnumber=0 totem.interface.bindnetaddr=10.20.10.1 totem.interface.mcastaddr=239.192.100.163 totem.interface.mcastport=5405 totem.interface.member.memberaddr=node1.ccrs.bcn totem.interface.member.memberaddr=node2.ccrs.bcn totem.interface.ringnumber=1 totem.interface.bindnetaddr=10.10.10.1 totem.interface.mcastaddr=239.192.100.164 totem.interface.mcastport=5405 totem.interface.member.memberaddr=node1.sn totem.interface.member.memberaddr=node2.sn [root@node1 ~]# ping -c 2 node2.sn PING node2.sn (10.10.10.2) 56(84) bytes of data. 64 bytes from node2.sn (10.10.10.2): icmp_seq=1 ttl=64 time=0.111 ms 64 bytes from node2.sn (10.10.10.2): icmp_seq=2 ttl=64 time=0.120 ms --- node2.sn ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 0.111/0.115/0.120/0.011 ms [root@node2 ~]# ping -c 2 node1.sn PING node1.sn (10.10.10.1) 56(84) bytes of data. 64 bytes from node1.sn (10.10.10.1): icmp_seq=1 ttl=64 time=0.079 ms 64 bytes from node1.sn (10.10.10.1): icmp_seq=2 ttl=64 time=0.171 ms --- node1.sn ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1000ms rtt min/avg/max/mdev = 0.079/0.125/0.171/0.046 ms Here is the cluster.conf: [root@node1 ~]# cat /etc/cluster/cluster.conf