Hi,
I have a 2-node cluster setup and trying to get GFS2 working on top of an iSCSI
volume. Each node is a Xen virtual machine.
I am currently unable to get clvmd working on the 2nd node. It starts fine on
the 1st node:
[root@vm1 ~]# service clvmd start
Starting clvmd: [ OK ]
Activating VGs: Logging initialised at Wed Mar 2 15:25:07 2011
Set umask to 0077
Finding all volume groups
Finding volume group "PcbiHomesVG"
Activated 1 logical volumes in volume group PcbiHomesVG
1 logical volume(s) in volume group "PcbiHomesVG" now active
Finding volume group "VolGroup00"
2 logical volume(s) in volume group "VolGroup00" already active
2 existing logical volume(s) in volume group "VolGroup00" monitored
Activated 2 logical volumes in volume group VolGroup00
2 logical volume(s) in volume group "VolGroup00" now active
Wiping internal VG cache
[root@vm1 ~]# vgs
Logging initialised at Wed Mar 2 15:25:12 2011
Set umask to 0077
Finding all volume groups
Finding volume group "PcbiHomesVG"
Finding volume group "VolGroup00"
VG #PV #LV #SN Attr VSize VFree
PcbiHomesVG 1 1 0 wz--nc 1.17T 0
VolGroup00 1 2 0 wz--n- 4.66G 0
Wiping internal VG cache
But when I try to start clvmd on the 2nd node, it hangs:
[root@vm2 ~]# service clvmd start
Starting clvmd: [ OK ]
...hangs...
I see the following in vm2:/var/log/messages:
Mar 2 15:59:02 vm2 clvmd[2283]: Cluster LVM daemon started - connected to CMAN
Mar 2 16:01:36 vm2 kernel: INFO: task clvmd:2302 blocked for more than 120
seconds.
Mar 2 16:01:36 vm2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
Mar 2 16:01:36 vm2 kernel: clvmd D 0022a86125f49a6a 0 2302 1
2299 (NOTLB)
Mar 2 16:01:36 vm2 kernel: ffff880030cb7db8 0000000000000282
0000000000000000 0000000000000000
Mar 2 16:01:36 vm2 kernel: 0000000000000008 ffff880033e327e0
ffff880000033080 000000000001c2b2
Mar 2 16:01:36 vm2 kernel: ffff880033e329c8 ffffffff8029c48f
Mar 2 16:01:36 vm2 kernel: Call Trace:
Mar 2 16:01:36 vm2 kernel: [<ffffffff8029c48f>]
autoremove_wake_function+0x0/0x2e
Mar 2 16:01:36 vm2 kernel: [<ffffffff802644cb>] __down_read+0x82/0x9a
Mar 2 16:01:36 vm2 kernel: [<ffffffff884f646d>]
:dlm:dlm_user_request+0x2d/0x174
Mar 2 16:01:36 vm2 kernel: [<ffffffff8022d08d>] mntput_no_expire+0x19/0x89
Mar 2 16:01:36 vm2 kernel: [<ffffffff8041716d>] sys_sendto+0x14a/0x164
Mar 2 16:01:36 vm2 kernel: [<ffffffff884fd61f>] :dlm:device_write+0x2f5/0x5e5
Mar 2 16:01:36 vm2 kernel: [<ffffffff80217379>] vfs_write+0xce/0x174
Mar 2 16:01:36 vm2 kernel: [<ffffffff80217bb1>] sys_write+0x45/0x6e
Mar 2 16:01:36 vm2 kernel: [<ffffffff802602f9>] tracesys+0xab/0xb6
[...]
I also noticed that there's a waiting "vgscan" process that "clvmd" is waiting
on:
1 1655 1655 1655 ? -1 Ss 0 0:00 /usr/sbin/sshd
1655 1801 1801 1801 ? -1 Ss 0 0:00 \_ sshd: root@pts/0
1801 1803 1803 1803 pts/0 2187 Ss 0 0:00 | \_ -bash
1803 2187 2187 1803 pts/0 2187 S+ 0 0:00 | \_ /bin/sh
/sbin/service clvmd start
2187 2192 2187 1803 pts/0 2187 S+ 0 0:00 | \_
/bin/bash /etc/init.d/clvmd start
2192 2215 2187 1803 pts/0 2187 S+ 0 0:00 | \_
/usr/sbin/vgscan
Before starting clvmd, cman is started and both nodes are cluster members:
[root@vm1 ~]# cman_tool nodes
Node Sts Inc Joined Name
1 M 544456 2011-03-02 15:24:31 172.16.50.32
2 M 544468 2011-03-02 15:52:29 172.16.50.33
Note that I'm using manual fencing in this configuration.
Both nodes are running CentOS 5.5:
# uname -a
Linux vm2.pcbi.upenn.edu 2.6.18-194.32.1.el5xen #1 SMP Wed Jan 5 18:44:24 EST
2011 x86_64 x86_64 x86_64 GNU/Linux
These package versions were installed on each node:
cman-2.0.115-34.el5_5.4
cman-devel-2.0.115-34.el5_5.4
gfs2-utils-0.1.62-20.el5
lvm2-2.02.56-8.el5_5.6
lvm2-cluster-2.02.56-7.el5_5.4
rgmanager-2.0.52-6.el5.centos.8
system-config-cluster-1.0.57-3.el5_5.1
iptables is turned off on each node.
Does anyone know why clvmd hangs on the 2nd node?
Best,
--
Valeriu Mutu
--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster