In 1.6, mkfs.lustre checks to see if any hostname you give it resolves to 127.0.0.1 and complains if so. The lustre_setup.sh script also does some network sanity checking on all the hosts in the cluster (remote hosts are pingable, that the remote host resolves its own name to it's actual ip address,
that the remote host is in the same subnet)
We could go further in this direction with further checks for LNET sanity (there's at least one non-loopback NID, any NID specified in the CSV is in the nid list), although there's a little chicken-and-egg problem since lustre_setup.sh adds the LNET modprobe.conf lines.

Currently, mkfs.lustre is independent of having any Lustre modules loaded (except ldiskfs), and adding LNET sanity checks to it would break that independence, so I would mildly favor an external script.

Further, we could have Lustre itself return an error if it's ever given a nid of 127.0.0.1

Peter J Braam wrote:
Nathan,

This looks like the beginning of a very useful sanity checking script, which we could perhaps include either as a separate utility or better even integrate with Lustre mount? Koala may have similar thoughts about this kind of utility?

- Peter -



On Dec 10, 2006, at 10:44 AM, Nathaniel Rutman wrote:

# ifconfig eth0 | grep inet
# hostname
# grep `hostname` /etc/hosts

must all match, and not be 127.0.0.1

also,
# lctl network up
# lctl list_nids
to make sure /etc/modprobe.conf is good.

if it still doesn't start, post the rest of the dmesg.


Pappas, Bill wrote:
Hello...
I don't quite understand.
I will tell you this:
The first line of /etc/hosts shows:
127.0.0.1           localhost localhost.localdomain vfscbe1-stage


Should I remove the hostname vfscbe1-stage?

With the /etc/hosts as is, this is what happens...

I run llmount.sh as instructed in the manual, /var/log/messages shows
the following after the out from llmount.sh:

The tail end of the llmount.sh out shows:
.
.
.
MDS mount options: errors=remount-ro,user_xattr,acl,
.
.
.

The tail end of /var/log/messages shows:
.
.
.
Dec  9 10:07:43 vfscbe1-stage kernel: LDISKFS-fs: mounted filesystem
with ordered data mode.
Dec  9 10:07:43 vfscbe1-stage kernel: Lustre: Changing connection for
OSC_vfscbe1-stage_OST_vfscbe1-stage_mds1 to
vfscbe1-stage_UUID/[EMAIL PROTECTED]
Dec  9 10:07:43 vfscbe1-stage kernel: Lustre: Changing connection for
OSC_vfscbe1-stage_OST_vfscbe1-stage_mds1 to
vfscbe1-stage_UUID/[EMAIL PROTECTED]
.
.
.


If I change /etc/hosts to:
127.0.0.1               localhost localhost.localdomain

Then the llmount.sh output is the same and /var/log/messages shows:
.
.
.
Dec  9 10:13:55 vfscbe1-stage kernel: Lustre: Changing connection for
OSC_vfscbe1-stage_OST_vfscbe1-stage_mds1 to
vfscbe1-stage_UUID/[EMAIL PROTECTED]
Dec  9 10:13:55 vfscbe1-stage kernel: Lustre: Changing connection for
OSC_vfscbe1-stage_OST_vfscbe1-stage_mds1 to
vfscbe1-stage_UUID/[EMAIL PROTECTED]
.
.
.

In either case, df does not show a luster fs mounted under /mnt/luster.
Changing the hosts file just seems to change the hostname.

What would you guess I am doing wrong?

I appreciate your help.

Thanks,
Bill Pappas - System Integration Engineer - SAN St. Jude Children's Research Hospital
332 North Lauderdale
Memphis, TN 38105
Danny Thomas Tower - Room D1010
Mail Stop 312
901-495-4549

-----Original Message-----
From: Andreas Dilger [mailto:[EMAIL PROTECTED] Sent: Friday, December 08, 2006 7:02 PM
To: Pappas, Bill
Cc: [email protected]
Subject: Re: [Lustre-devel] Validating Lustre 1.4.7 Install on RHEL 4 AS
U4

On Dec 08, 2006  15:52 -0600, Pappas, Bill wrote:

Everything seems fine, but the test mount fs does not mount:

loading module: ost srcdir None devdir ost
loading module: ldiskfs srcdir None devdir ldiskfs
loading module: fsfilt_ldiskfs srcdir None devdir lvfs
loading module: obdfilter srcdir None devdir obdfilter
loading module: mdc srcdir None devdir mdc
loading module: lov srcdir None devdir lov
loading module: mds srcdir None devdir mds
loading module: llite srcdir None devdir llite
OSD: OST_vfscbe1-stage OST_vfscbe1-stage_UUID obdfilter
/tmp/ost1-vfscbe1-stage 400000 ldiskfs no 0 256
OST mount options: errors=remount-ro
MDSDEV: mds1 mds1_UUID /tmp/mds1-vfscbe1-stage ldiskfs no
recording clients for filesystem: FS_fsname_UUID
Recording log mds1 on mds1
LOV: lov_mds1 76557_lov_mds1_2968db066d mds1_UUID 1 1048576 0 0
[u'OST_vfscbe1-stage_UUID'] mds1
OSC: OSC_vfscbe1-stage_OST_vfscbe1-stage_mds1

76557_lov_mds1_2968db066d

OST_vfscbe1-stage_UUID
End recording log mds1 on mds1
Recording log vfscbe1-stage on mds1
Recording log client on mds1
MDSDEV: mds1 mds1_UUID /tmp/mds1-vfscbe1-stage ldiskfs 400000 no
MDS mount options: errors=remount-ro,user_xattr,acl,

The test stops at MDS mount options: errors=remount-ro,user_xattr,acl,

.

df does not show /mnt/luster as a mount point.


dmesg is your friend.

First guess is hostname maps to localhost (127.0.0.1) in /etc/hosts.


Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



_______________________________________________
Lustre-devel mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-devel



_______________________________________________
Lustre-devel mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-devel


_______________________________________________
Lustre-devel mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-devel

Reply via email to