[Linux-cluster] Linux clustering (one-node), GFS, iSCSI, clvmd (lock problem)

Paul Risenhoover Mon, 15 Oct 2007 21:53:07 -0700

Hi All,

I am a noob to this maillist, but I've got some kind of locking problemwith Linux and clusters, and iSCSI that plagues me. It's a prettyserious issue because every time I reboot my server, it fails to mountmy primary iSCSI device out of the box, and in order to get it working,I have to perform some pretty manual operations to get it operational again.


Here is some configuration information:

Linux flax.xxx.com 2.6.9-55.0.9.ELsmp #1 SMP Thu Sep 27 18:27:41 EDT2007 i686 i686 i386 GNU/Linux


[EMAIL PROTECTED] ~]# clvmd -V
Cluster LVM daemon version: 2.02.21-RHEL4 (2007-04-17)
Protocol version:           0.2.1

dmesg (excerpted)
iscsi-sfnet: Loading iscsi_sfnet version 4:0.1.11-3
iscsi-sfnet: Control device major number 254
iscsi-sfnet:host3: Session established
scsi3 : SFNet iSCSI driver
 Vendor: Promise   Model: VTrak M500i       Rev: 2211
 Type:   Direct-Access                      ANSI SCSI revision: 04
sdh : very big device. try to use READ CAPACITY(16).
SCSI device sdh: 5859373056 512-byte hdwr sectors (2999999 MB)
SCSI device sdh: drive cache: write back
sdh : very big device. try to use READ CAPACITY(16).
SCSI device sdh: 5859373056 512-byte hdwr sectors (2999999 MB)
SCSI device sdh: drive cache: write back
sdh: unknown partition table

[EMAIL PROTECTED] ~]# clustat
Member Status: Quorate

 Member Name                              Status
 ------ ----                              ------
 flax                                     Online, Local, rgmanager

YES, THIS IS A ONE-NODE CLUSTER (Which, I suspect, might be the problem)

SYMPTOM:

When the server comes up, the clustered logical volume that is on theiSCSI device is labeled "inactive" when I do an "lvscan:"

[EMAIL PROTECTED] ~]# lvscan
 inactive            '/dev/nasvg_00/lvol0' [5.46 TB] inherit
 ACTIVE            '/dev/lgevg_00/lvol0' [3.55 TB] inherit
 ACTIVE            '/dev/noraidvg_01/lvol0' [546.92 GB] inherit
 ACTIVE            '/dev/VolGroup00/LogVol00' [134.47 GB] inherit
 ACTIVE            '/dev/VolGroup00/LogVol01' [1.94 GB] inherit

The thing that's interesting is the lgevg_00 and the noraidvg_01 volumesare also clustered, but they are direct-attached SCSI (ie, not ISCSI).


The volume group that the logical volume is a member of shows clean:
[EMAIL PROTECTED] ~]# vgscan
 Reading all physical volumes.  This may take a while...
 Found volume group "nasvg_00" using metadata type lvm2
 Found volume group "lgevg_00" using metadata type lvm2
 Found volume group "noraidvg_01" using metadata type lvm2

So, in order to fix this, I execute the following:

[EMAIL PROTECTED] ~]# lvchange -a y /dev/nasvg_00/lvol0

Error locking on node flax: Volume group for uuid not found:oNhRO1WqNJp3BZxxrlMT16dwpwcRiIQPejnrEUbQ3HMJ6BjHef1hKAsoA6Sl9ISS


This also shows up in my syslog, as such:

Oct 13 11:27:40 flax vgchange: Error locking on node flax: Volumegroup for uuid not found:oNhRO1WqNJp3BZxxrlMT16dwpwcRiIQPejnrEUbQ3HMJ6BjHef1hKAsoA6Sl9ISS


RESOLUTION:

It took me a very long time to figure this out, but since it happens tome every time I reboot my server, somebody's bound to run into thisagain sometime soon (and it will probably be me).


Here's how I resolved it:

I edited the /etc/lvm/lvm.conf file as such:

was:
   # Type of locking to use. Defaults to local file-based locking (1).
   # Turn locking off by setting to 0 (dangerous: risks metadata corruption
   # if LVM2 commands get run concurrently).
   # Type 2 uses the external shared library locking_library.
   # Type 3 uses built-in clustered locking.
   #locking_type = 1
   locking_type = 3

changed to:

(snip)
   # Type 3 uses built-in clustered locking.
   #locking_type = 1
   locking_type = 2

Then, restart clvmd as such:
[EMAIL PROTECTED] ~]# service clvmd restart

Then:
[EMAIL PROTECTED] ~]# lvchange -a y /dev/nasvg_00/lvol0
[EMAIL PROTECTED] ~]#

(see, no error!)
[EMAIL PROTECTED] ~]# lvscan
 ACTIVE            '/dev/nasvg_00/lvol0' [5.46 TB] inherit
 ACTIVE            '/dev/lgevg_00/lvol0' [3.55 TB] inherit
 ACTIVE            '/dev/noraidvg_01/lvol0' [546.92 GB] inherit
 ACTIVE            '/dev/VolGroup00/LogVol00' [134.47 GB] inherit
 ACTIVE            '/dev/VolGroup00/LogVol01' [1.94 GB] inherit

(it's active!)

Then, go back and modify /etc/lvm/lvm.conf to restore the originallocking_type to 3

Then, restart clvmd.

THOUGHTS:

I admit I don't know much about clustering, but from the evidence I see,the problem appears to be isolated to clvmd and iSCSI, if only for thefact that my direct-attached clustered volumes don't exhibit the symptoms.

I'll make another leap here and guess that it's probably isolated tosingle-node clusters, since I'd imagine that most people who are usingclustering are probably using clustering as it was intended to be used(ie, multiple machines).


--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

[Linux-cluster] Linux clustering (one-node), GFS, iSCSI, clvmd (lock problem)

Reply via email to