Re: [ClusterLabs] Odd clvmd error - clvmd: Unable to create DLM lockspace for CLVM: Address already in use

2015-09-25 Thread Christine Caulfield
On 25/09/15 00:09, Digimer wrote:
> I had a RHEL 6.7, cman + rgmanager cluster that I've built many times
> before. Oddly, I just hit this error:
> 
> 
> [root@node2 ~]# /etc/init.d/clvmd start
> Starting clvmd: clvmd could not connect to cluster manager
> Consult syslog for more information
> 
> 
> syslog:
> 
> Sep 24 23:00:30 node2 kernel: dlm: Using SCTP for communications
> Sep 24 23:00:30 node2 clvmd: Unable to create DLM lockspace for CLVM:
> Address already in use
> Sep 24 23:00:30 node2 kernel: dlm: Can't bind to port 21064 addr number 1

This seems to be the key to it. I can't imagine what else would be using
port 21064 (apart from DLM using TCP as well as SCTP but I don' think
that's possible!)

Have a look in netstat and see what else is using that port.

It could be that the socket was in use and is taking a while to shut
down so it might go away on its own too.

Chrissie



> Sep 24 23:00:30 node2 kernel: dlm: cannot start dlm lowcomms -98
> 
> 
> There are no iptables rules:
> 
> 
> [root@node2 ~]# iptables-save
> 
> 
> And there are no DLM lockspaces, either:
> 
> 
> [root@node2 ~]# dlm_tool ls
> [root@node2 ~]#
> 
> 
> I tried withdrawing the node from the cluster entirely, the started cman
> alone and tried to start clvmd, same issue.
> 
> Pinging between the two nodes seems OK:
> 
> 
> [root@node1 ~]# uname -n
> node1.ccrs.bcn
> [root@node1 ~]# ping -c 2 node1.ccrs.bcn
> PING node1.bcn (10.20.10.1) 56(84) bytes of data.
> 64 bytes from node1.bcn (10.20.10.1): icmp_seq=1 ttl=64 time=0.015 ms
> 64 bytes from node1.bcn (10.20.10.1): icmp_seq=2 ttl=64 time=0.017 ms
> 
> --- node1.bcn ping statistics ---
> 2 packets transmitted, 2 received, 0% packet loss, time 1000ms
> rtt min/avg/max/mdev = 0.015/0.016/0.017/0.001 ms
> 
> [root@node2 ~]# uname -n
> node2.ccrs.bcn
> [root@node2 ~]# ping -c 2 node1.ccrs.bcn
> PING node1.bcn (10.20.10.1) 56(84) bytes of data.
> 64 bytes from node1.bcn (10.20.10.1): icmp_seq=1 ttl=64 time=0.079 ms
> 64 bytes from node1.bcn (10.20.10.1): icmp_seq=2 ttl=64 time=0.076 ms
> 
> --- node1.bcn ping statistics ---
> 2 packets transmitted, 2 received, 0% packet loss, time 999ms
> rtt min/avg/max/mdev = 0.076/0.077/0.079/0.008 ms
> 
> 
> I have RRP configured and pings work on the second network, too:
> 
> 
> [root@node1 ~]# corosync-objctl |grep ring -A 5
> totem.interface.ringnumber=0
> totem.interface.bindnetaddr=10.20.10.1
> totem.interface.mcastaddr=239.192.100.163
> totem.interface.mcastport=5405
> totem.interface.member.memberaddr=node1.ccrs.bcn
> totem.interface.member.memberaddr=node2.ccrs.bcn
> totem.interface.ringnumber=1
> totem.interface.bindnetaddr=10.10.10.1
> totem.interface.mcastaddr=239.192.100.164
> totem.interface.mcastport=5405
> totem.interface.member.memberaddr=node1.sn
> totem.interface.member.memberaddr=node2.sn
> 
> [root@node1 ~]# ping -c 2 node2.sn
> PING node2.sn (10.10.10.2) 56(84) bytes of data.
> 64 bytes from node2.sn (10.10.10.2): icmp_seq=1 ttl=64 time=0.111 ms
> 64 bytes from node2.sn (10.10.10.2): icmp_seq=2 ttl=64 time=0.120 ms
> 
> --- node2.sn ping statistics ---
> 2 packets transmitted, 2 received, 0% packet loss, time 999ms
> rtt min/avg/max/mdev = 0.111/0.115/0.120/0.011 ms
> 
> [root@node2 ~]# ping -c 2 node1.sn
> PING node1.sn (10.10.10.1) 56(84) bytes of data.
> 64 bytes from node1.sn (10.10.10.1): icmp_seq=1 ttl=64 time=0.079 ms
> 64 bytes from node1.sn (10.10.10.1): icmp_seq=2 ttl=64 time=0.171 ms
> 
> --- node1.sn ping statistics ---
> 2 packets transmitted, 2 received, 0% packet loss, time 1000ms
> rtt min/avg/max/mdev = 0.079/0.125/0.171/0.046 ms
> 
> 
> Here is the cluster.conf:
> 
> 
> [root@node1 ~]# cat /etc/cluster/cluster.conf
> 
> 
>   
>   
>   
>   
>   
>   
>ipaddr="10.250.199.15" login="admin"
> passwd="secret" delay="15" action="reboot" />
>   
>   
>action="reboot" />
>action="reboot" />
>   
>   
>   
>   
>   
>   
>   
>ipaddr="10.250.199.17" login="admin"
> passwd="secret" action="reboot" />
>   
>   
>action="reboot" />
>action="reboot" />
>   
>   
>   
>   
>   
>   
>   
>ipaddr="pdu1A" />
>ipaddr="pdu1B" />
>ipaddr="pdu2A" />
>ipaddr="pdu2B" />
>   
>   
>   
>   
>   
> 

Re: [ClusterLabs] Odd clvmd error - clvmd: Unable to create DLM lockspace for CLVM: Address already in use

2015-09-25 Thread Digimer
On 25/09/15 03:44 AM, Christine Caulfield wrote:
> On 25/09/15 00:09, Digimer wrote:
>> I had a RHEL 6.7, cman + rgmanager cluster that I've built many times
>> before. Oddly, I just hit this error:
>>
>> 
>> [root@node2 ~]# /etc/init.d/clvmd start
>> Starting clvmd: clvmd could not connect to cluster manager
>> Consult syslog for more information
>> 
>>
>> syslog:
>> 
>> Sep 24 23:00:30 node2 kernel: dlm: Using SCTP for communications
>> Sep 24 23:00:30 node2 clvmd: Unable to create DLM lockspace for CLVM:
>> Address already in use
>> Sep 24 23:00:30 node2 kernel: dlm: Can't bind to port 21064 addr number 1
> 
> This seems to be the key to it. I can't imagine what else would be using
> port 21064 (apart from DLM using TCP as well as SCTP but I don' think
> that's possible!)
> 
> Have a look in netstat and see what else is using that port.
> 
> It could be that the socket was in use and is taking a while to shut
> down so it might go away on its own too.
> 
> Chrissie

netstat and lsof showed nothing using it. Looking at the logs (of our
installer), it had manually started drbd + clvmd + gfs2 just fine. Then
it asked rgmanager to start which tried to tear down the already running
storage services before (re)starting them. It looks like the (drbd)
UpToDate node got called to stop before the Inconsistent node, which
failed because the Inconsistent node needed the UpToDate node (gfs2 was
still mounted so clvmd held open the drbd device). I'm not clear on what
happened next, but things went sideways.

So I manually stopped everything, including clvmd/cman, and it still
threw that error. Eventually I rebooted both nodes and it went back to
working.

Odd.

>> Sep 24 23:00:30 node2 kernel: dlm: cannot start dlm lowcomms -98
>> 
>>
>> There are no iptables rules:
>>
>> 
>> [root@node2 ~]# iptables-save
>> 
>>
>> And there are no DLM lockspaces, either:
>>
>> 
>> [root@node2 ~]# dlm_tool ls
>> [root@node2 ~]#
>> 
>>
>> I tried withdrawing the node from the cluster entirely, the started cman
>> alone and tried to start clvmd, same issue.
>>
>> Pinging between the two nodes seems OK:
>>
>> 
>> [root@node1 ~]# uname -n
>> node1.ccrs.bcn
>> [root@node1 ~]# ping -c 2 node1.ccrs.bcn
>> PING node1.bcn (10.20.10.1) 56(84) bytes of data.
>> 64 bytes from node1.bcn (10.20.10.1): icmp_seq=1 ttl=64 time=0.015 ms
>> 64 bytes from node1.bcn (10.20.10.1): icmp_seq=2 ttl=64 time=0.017 ms
>>
>> --- node1.bcn ping statistics ---
>> 2 packets transmitted, 2 received, 0% packet loss, time 1000ms
>> rtt min/avg/max/mdev = 0.015/0.016/0.017/0.001 ms
>> 
>> [root@node2 ~]# uname -n
>> node2.ccrs.bcn
>> [root@node2 ~]# ping -c 2 node1.ccrs.bcn
>> PING node1.bcn (10.20.10.1) 56(84) bytes of data.
>> 64 bytes from node1.bcn (10.20.10.1): icmp_seq=1 ttl=64 time=0.079 ms
>> 64 bytes from node1.bcn (10.20.10.1): icmp_seq=2 ttl=64 time=0.076 ms
>>
>> --- node1.bcn ping statistics ---
>> 2 packets transmitted, 2 received, 0% packet loss, time 999ms
>> rtt min/avg/max/mdev = 0.076/0.077/0.079/0.008 ms
>> 
>>
>> I have RRP configured and pings work on the second network, too:
>>
>> 
>> [root@node1 ~]# corosync-objctl |grep ring -A 5
>> totem.interface.ringnumber=0
>> totem.interface.bindnetaddr=10.20.10.1
>> totem.interface.mcastaddr=239.192.100.163
>> totem.interface.mcastport=5405
>> totem.interface.member.memberaddr=node1.ccrs.bcn
>> totem.interface.member.memberaddr=node2.ccrs.bcn
>> totem.interface.ringnumber=1
>> totem.interface.bindnetaddr=10.10.10.1
>> totem.interface.mcastaddr=239.192.100.164
>> totem.interface.mcastport=5405
>> totem.interface.member.memberaddr=node1.sn
>> totem.interface.member.memberaddr=node2.sn
>>
>> [root@node1 ~]# ping -c 2 node2.sn
>> PING node2.sn (10.10.10.2) 56(84) bytes of data.
>> 64 bytes from node2.sn (10.10.10.2): icmp_seq=1 ttl=64 time=0.111 ms
>> 64 bytes from node2.sn (10.10.10.2): icmp_seq=2 ttl=64 time=0.120 ms
>>
>> --- node2.sn ping statistics ---
>> 2 packets transmitted, 2 received, 0% packet loss, time 999ms
>> rtt min/avg/max/mdev = 0.111/0.115/0.120/0.011 ms
>> 
>> [root@node2 ~]# ping -c 2 node1.sn
>> PING node1.sn (10.10.10.1) 56(84) bytes of data.
>> 64 bytes from node1.sn (10.10.10.1): icmp_seq=1 ttl=64 time=0.079 ms
>> 64 bytes from node1.sn (10.10.10.1): icmp_seq=2 ttl=64 time=0.171 ms
>>
>> --- node1.sn ping statistics ---
>> 2 packets transmitted, 2 received, 0% packet loss, time 1000ms
>> rtt min/avg/max/mdev = 0.079/0.125/0.171/0.046 ms
>> 
>>
>> Here is the cluster.conf:
>>
>> 
>> [root@node1 ~]# cat /etc/cluster/cluster.conf
>> 
>> 
>>  
>>  
>>  
>>  
>>  
>>  
>>  > ipaddr="10.250.199.15" login="admin"
>> passwd="secret" delay="15" action="reboot" />
>>  
>>  
>>  > action="reboot" />
>>   

[ClusterLabs] Odd clvmd error - clvmd: Unable to create DLM lockspace for CLVM: Address already in use

2015-09-24 Thread Digimer
I had a RHEL 6.7, cman + rgmanager cluster that I've built many times
before. Oddly, I just hit this error:


[root@node2 ~]# /etc/init.d/clvmd start
Starting clvmd: clvmd could not connect to cluster manager
Consult syslog for more information


syslog:

Sep 24 23:00:30 node2 kernel: dlm: Using SCTP for communications
Sep 24 23:00:30 node2 clvmd: Unable to create DLM lockspace for CLVM:
Address already in use
Sep 24 23:00:30 node2 kernel: dlm: Can't bind to port 21064 addr number 1
Sep 24 23:00:30 node2 kernel: dlm: cannot start dlm lowcomms -98


There are no iptables rules:


[root@node2 ~]# iptables-save


And there are no DLM lockspaces, either:


[root@node2 ~]# dlm_tool ls
[root@node2 ~]#


I tried withdrawing the node from the cluster entirely, the started cman
alone and tried to start clvmd, same issue.

Pinging between the two nodes seems OK:


[root@node1 ~]# uname -n
node1.ccrs.bcn
[root@node1 ~]# ping -c 2 node1.ccrs.bcn
PING node1.bcn (10.20.10.1) 56(84) bytes of data.
64 bytes from node1.bcn (10.20.10.1): icmp_seq=1 ttl=64 time=0.015 ms
64 bytes from node1.bcn (10.20.10.1): icmp_seq=2 ttl=64 time=0.017 ms

--- node1.bcn ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.015/0.016/0.017/0.001 ms

[root@node2 ~]# uname -n
node2.ccrs.bcn
[root@node2 ~]# ping -c 2 node1.ccrs.bcn
PING node1.bcn (10.20.10.1) 56(84) bytes of data.
64 bytes from node1.bcn (10.20.10.1): icmp_seq=1 ttl=64 time=0.079 ms
64 bytes from node1.bcn (10.20.10.1): icmp_seq=2 ttl=64 time=0.076 ms

--- node1.bcn ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.076/0.077/0.079/0.008 ms


I have RRP configured and pings work on the second network, too:


[root@node1 ~]# corosync-objctl |grep ring -A 5
totem.interface.ringnumber=0
totem.interface.bindnetaddr=10.20.10.1
totem.interface.mcastaddr=239.192.100.163
totem.interface.mcastport=5405
totem.interface.member.memberaddr=node1.ccrs.bcn
totem.interface.member.memberaddr=node2.ccrs.bcn
totem.interface.ringnumber=1
totem.interface.bindnetaddr=10.10.10.1
totem.interface.mcastaddr=239.192.100.164
totem.interface.mcastport=5405
totem.interface.member.memberaddr=node1.sn
totem.interface.member.memberaddr=node2.sn

[root@node1 ~]# ping -c 2 node2.sn
PING node2.sn (10.10.10.2) 56(84) bytes of data.
64 bytes from node2.sn (10.10.10.2): icmp_seq=1 ttl=64 time=0.111 ms
64 bytes from node2.sn (10.10.10.2): icmp_seq=2 ttl=64 time=0.120 ms

--- node2.sn ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.111/0.115/0.120/0.011 ms

[root@node2 ~]# ping -c 2 node1.sn
PING node1.sn (10.10.10.1) 56(84) bytes of data.
64 bytes from node1.sn (10.10.10.1): icmp_seq=1 ttl=64 time=0.079 ms
64 bytes from node1.sn (10.10.10.1): icmp_seq=2 ttl=64 time=0.171 ms

--- node1.sn ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.079/0.125/0.171/0.046 ms


Here is the cluster.conf:


[root@node1 ~]# cat /etc/cluster/cluster.conf