Re: [Lustre-discuss] Nodes claim error with files, then say everything is fine.

2008-08-15 Thread Chris Worley
On Wed, Aug 6, 2008 at 1:25 PM, Brian J. Murrell [EMAIL PROTECTED] wrote:
 dOn Wed, 2008-08-06 at 12:11 -0600, Chris Worley wrote:
 On Wed, Aug 6, 2008 at 11:26 AM, Brian J. Murrell [EMAIL PROTECTED] wrote:
 
  +rpctrace debug is probably the way to go to see what the client
  are(n't) doing in terms of keeping the MDS aware of it's existence.
 

 The log from the client perspective starts with a lot of packet mismatches:

 0100:0002:3:1217629906.902955:0:5592:0:(events.c:116:reply_in_callback())
 early reply sized 176, expect 128
 0100:0002:3:1217633409.730495:0:5590:0:(events.c:116:reply_in_callback())
 early reply sized 240, expect 128

 Given the amount of debug provided, this looks like it could be bug
 16534 which is on my plate to test the solution for and land.


In adding info to bugzilla, you clarified that this is actually bug #16237.

This bug is fixed and in Lustre 1.6.6... but 1.6.6 isn't released.

Is there a work-around for the 1.6.5.1 client (short of rebooting)?

Thanks,

Chris
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lustre + debian

2008-08-15 Thread Guy Coates
I've just gone through the exercise of recompiling the lenny packages on etch;
it worked like a charm.


This is the procedure I used. Hope it helps.



*Pre-requisites

Any debian machine can be used to build the packages. It does not have to be a
lustre client or server.

*Install build essentials

Install the packages required to build debs. (build-essential, module-assistant 
etc)

*Get Source

Ensure sources.list contains the following lines:

deb http://debian.internal.sanger.ac.uk/debian/ etch main non-free contrib
deb-src http://debian.internal.sanger.ac.uk/debian/ lenny main contrib non-free

*Download the source with:

#aptitude update
#apt-get source linux-image-2.6.18-X-686
#apt-get source lustre

This will unpack two directories, one with the lustre source and one with the
kernel source.

*Build lustre userspace

Change to the lustre directory.

#cd lustre-X.X.X
#dpkg-buildpackage -r fakeroot

If the build fails with automake errors you will need to install a later
automake version. (debian/etch provides several to choose from.)

This will build the following packages:

lustre-utils#Userspace lustre util
lustre-dev  #Development headers
lustre-source   #Source for the kernel module
lustre-tests#Test suite
linux-patch-lustre  #Patch for the linux kernel.

Install the lustre-source and linux-patch-lustre packages on the build machine.
These packages contain the patches to the kernel source tree that are used in
the next step of the build.

#dpkg -i linux-patch-lustre_XXX.deb
#dpkg -i  lustre-source_XXX.deb



*Build lustre patched kernel

#cd linux-2.6-

We need to grab the .config file for the debian kernel. We should be able to
generate the config from the source package, but I'm not sure how. The easiest
way to get the correct config is to copy /boot/config-X.X.X from a machine
already running the debian kernel.

#cp /boot/config-2.6.XXX-686 .config

Check the kernel config works: (This might generate 1 or 2 minor questions. Just
hit m or y.)

#make oldconfig

We can now build the kernel.

#export CONCURRENCY_LEVEL=3
#make-kpkg clean
#make-kpkg --rootcmd fakeroot --added-patches=lustre \
--initrd --append-to-version -lustre-1.6.5.1 --revision=mmdd  kernel_image

(You might be asked about extra scsi statistics options; selecting Y is probably
a good idea)

You should now have a kernel-image deb.

*Build lustre kernel modules

The lustre kernel modules can now be built.

#module-assistant -u/your/working/directory -k /path/to/the/kernel/linux-2.6-X.X
build  lustre

After the build has finished you should now have a lustre-modules.deb

*Install

To install lustre on a client or server machine, simply install the packages you
have created:

linux-image-2.6.XX-lustre-X.X.X._XX.deb
lustre-modules-2.6.XX-lustre-X.X.X._XX.deb
lustre-utils_X.X.deb
liblustre-X-X.deb
lustre-dev-X.X.deb

The test suite is optional. For configuration of networks and timeout options in
/etc/modprobe.d/lustre see the lustre manual.

*Extras not currently packaged

Lustre uses a special version of e2fsprogs. These allow you to specify the disk
raid geometry at filesystem creation time to optimise performance. It also has
extra options to support the lfsck lustre filesystem consistency check. Debian
upstream have said they will package this in the future. In the meantime, you
will have to build it yourself. Note that the modified program is only required
on OST and MDS machines.


*Get the e2fsprogs source

Get the latest sun patch tarball:

http://downloads.lustre.org/public/tools/e2fsprogs/latest/

eg e2fsprogs-1.40.11-sun1-patches.tar.gz

You will also need the upstream source (in this case e2fsprogs 1.40.11). This
can be found

http://downloads.lustre.org/public/tools/e2fsprogs/upstream/



*Patch the source

tar -xvf e2fsprogs 1.40.11.tar.gz
tar -xvf e2fsprogs-1.40.11-sun1-patches.tar.gz

Patch the source with quilt.

#cd e2fsprogs 1.40.11
#ln -s -f ../patches .
#ln -s -f ../patches/series .
#quilt push -av


*Build the source

Note that the ext2fs-dev, libsqlite3-dev, sqlite3 and libdb4.3-dev headers and
libraries must be installed before building. (Note that libdb4.4 does not work,
but the code will compile OK !?)

#./configure --with-lustre=/path/to/lustre/source
#make
#make install




-- 
Dr. Guy Coates,  Informatics System Group
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK
Tel: +44 (0)1223 834244 x 6925
Fax: +44 (0)1223 496802


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Startup Sequence 1.6.5.1

2008-08-15 Thread Mag Gam
The way we do it is, we don't even keep the startup sequence in
/etc/fstab. We have a seperate script we run to mount up lustre stuff.
We do it at the very end.



On Fri, Aug 15, 2008 at 4:03 AM, Jakob Goldbach [EMAIL PROTECTED] wrote:
 On Thu, 2008-08-14 at 17:49 -0500, Mike Feuerstein wrote:
 Is there a recommended startup mounting sequence for 1.6.5.1?

 We are internally debating

 MGS lun - MDS luns - OSTs

 vs.

 MGS lun - OSTs - MDS luns


 Both will work but the latter i more noiceless in dmesg as the MDS is a 
 client of the OST.

 /Jakob

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] LNET packets

2008-08-15 Thread Mag Gam
I am doing a case study at my university and I am trying to analyze
packets for LNET. I want to compare this with other Network based
filesystems, such as NFS and SMB. I plan on using tcpdump to get
capture LNET packets, but I am not sure what port I should listen to.
Does anyone know what port Lustre runs on?

TIA
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lustre + debian

2008-08-15 Thread Robert LeBlanc
You should be able to use official Debian mirrors, that is all that we used,
you don't have to build your own packages from scratch either as the
packages are already in Lenny. Just apt-get the packages that he built:

lustre-utils#Userspace lustre util
lustre-dev  #Development headers
lustre-source   #Source for the kernel module
lustre-tests#Test suite
linux-patch-lustre  #Patch for the linux kernel.

And the kernel-source and just skip down to where he builds his kernel.

Robert


On 8/15/08 10:38 AM, Troy Benjegerdes [EMAIL PROTECTED] wrote:

 I'm about to try this, and figured it would be worth documenting on the
 wiki..
 
 http://wiki.lustre.org/index.php?title=Debian_Install
 
 so far the only issue is debian.internal.sanger.ac.uk is not visible to
 us outsiders ;) 
 


-- 
Robert LeBlanc
Life Sciences Computer Support
Brigham Young University
[EMAIL PROTECTED]
(801)422-1882

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Re-using OST and MDT names

2008-08-15 Thread megan
In replying to myself here,
I did mange to get my disk mounted to perform a new benchmark test.
I could not figure a way around the mount.lustre: mount /dev/sdg1 at /
srv/lustre/mds/crew5-MDT failed:
Address already in use; The target service's index is already in use.
(/dev/sdg1)  error.
Even rebooting both the OSS and MDS did not help.

So, being as this was just a test of hw with a different stripesize
setting on an LSI ELP RAID card (128kB in place of default 64kB),
I re-created both the OST and the MDT using a new, unique fsname and
all of the same hardware.

This worked like a charm.

someday I will have to figure out what to do with the cast-off
MDT names which I apparently may no longer use...

Any comment, observations, suggestions appreciated.

Later,
megan


On Aug 14, 12:55 pm, Ms. Megan Larko [EMAIL PROTECTED] wrote:
 Hello,

 As a part of my continuing to benchmark Lustre to ascertain was may be
 best-suited for our needs here, I have re-created at the LSI ELP
 card level some of my arrays from my earlier benchmark posts.  The
 card is now sending /dev/sdf 998999Mb with 128kB stripesize and
 /dev/sdg 6992995 Mb with 128kB stripesize to my OSS.  The sdf and sdg
 formatted fine with Lustre and mounted without issue on the OSS.
 Recycling the MGS MDT's seem to have been a problem.  When I tried to
 mount the MDT on the MGS after mounting the new OST's the mounts
 performed without error, but the bonnie benchmark test as run before
 would hang every time.

 Sample of errors in MGS file /var/log/messages:
 Aug 13 12:39:30 mds1 kernel: Lustre: crew5-OST0001-osc: Connection to
 service crew5-OST000
 1 via nid [EMAIL PROTECTED] was lost; in progress operations using this
 service will wait for recovery to complete.
                                       Aug 13 12:39:30 mds1 kernel:
 LustreError: 167-0: This client was evicted by crew5-OST0001;
  in progress operations using this service will fail.
                    Aug 13 12:39:30 mds1 kernel: Lustre:
 crew5-OST0001-osc: Connection restored to service crew5-OST0001 using
 nid [EMAIL PROTECTED]
 Aug 13 12:39:30 mds1 kernel: Lustre: MDS crew5-MDT:
 crew5-OST0001_UUID now active, resetting orphans
                                                      Aug 13 12:42:42
 mds1 kernel: Lustre: 3406:0:(ldlm_lib.c:519:target_handle_reconnect())
 cre
 w5-MDT: 50b043bb-0e8c-7a5b-b0fe-6bdb67d21e0b reconnecting
                    Aug 13 12:42:42 mds1 kernel: Lustre:
 3406:0:(ldlm_lib.c:519:target_handle_reconnect()) Skipped 24 previous
 similar messages
 Aug 13 12:42:42 mds1 kernel: Lustre:
 3406:0:(ldlm_lib.c:747:target_handle_connect()) crew5-MDT: refuse
 reconnection from
 [EMAIL PROTECTED]@o2ib to
 0x81006994d000; still busy with 2 active RPCs
 Aug 13 12:42:42 mds1 kernel: Lustre:
 3406:0:(ldlm_lib.c:747:target_handle_connect()) Skipped 24 previous
 similar messages
     Aug 13 12:42:42 mds1 kernel: LustreError:
 3406:0:(ldlm_lib.c:1442:target_send_reply_msg())
  @@@ processing error (-16) [EMAIL PROTECTED] x600107/t0
 o38-[EMAIL PROTECTED]:-1
 lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0    Aug 13 12:42:42 mds1
 kernel: LustreError: 3406:0:(ldlm_lib.c:1442:target_send_reply_msg())
  Skipped 24 previous similar messages
                    Aug 13 12:43:40 mds1 kernel: LustreError: 11-0: an
 error occurred while communicating with [EMAIL PROTECTED] The
 ost_connect operation failed with -19
 Aug 13 12:43:40 mds1 kernel: LustreError: Skipped 7 previous similar
 messages             Aug 13 12:47:50 mds1 kernel: Lustre:
 crew5-OST0001-osc: Connection to service crew5-OST0001 via nid
 [EMAIL PROTECTED] was lost; in progress operations using this service
 will wait f
 or recovery to complete.

 Sample of errors on OSS file /var/log/messages:
 Aug 13 12:39:30 oss4 kernel: Lustre: crew5-OST0001: received MDS
 connection from [EMAIL PROTECTED]
 Aug 13 12:43:57 oss4 kernel: Lustre: crew5-OST0001: haven't heard from
 client crew5-mdtlov_UUID (at [EMAIL PROTECTED]) in 267 seconds. I think
 it's dead, and I am evicting it.
 Aug 13 12:46:27 oss4 kernel: LustreError: 137-5: UUID
 'crew5-OST_UUID' is not available  for connect (no target)
 Aug 13 12:46:27 oss4 kernel: LustreError: Skipped 51 previous similar messages
 Aug 13 12:46:27 oss4 kernel: LustreError:
 4151:0:(ldlm_lib.c:1442:target_send_reply_msg()) @@@ processing error
 (-19) [EMAIL PROTECTED] x600171/t0 o8-?@?:-1 lens 240/0 ref 0
 fl Interpret:/0/0 rc -19/0
 Aug 13 12:46:27 oss4 kernel: LustreError:
 4151:0:(ldlm_lib.c:1442:target_send_reply_msg()) Skipped 52 previous
 similar messages
 Aug 13 12:47:50 oss4 kernel: Lustre: crew5-OST0001: received MDS
 connection from [EMAIL PROTECTED]

 In lctl all pings were successful.  Additionally files on Lustre disks
 on our live system using the same MGS were all fine; no errors in
 logfile.

 I thought that maybe changing the disk kB size and reformatting the
 OST without reformatting the MDT was a problem.  So I unmounted OST
 and 

Re: [Lustre-discuss] mounting an OST from another node attached to a fibre channel switch

2008-08-15 Thread Ron
Which section of which manual? Please.  I.e are you talking about:
Part No. 820-3681-10
Lustre manual version: Lustre_1.6_man_v1.10
December 2007

4.2.3.2 Running the Writeconf Command
and/or 4.2.3.3 Changing a Server NID

Our MGS is not changing nodes, but the OSTs are.
Is there really just one simple MGS only operation?
Those sections do not quite fit (to my way of thinking, which is
in the process of being adjusted :}

Thanks,
Ron

On Aug 14, 4:33 pm, Klaus Steden [EMAIL PROTECTED] wrote:
 Yes. There is an entry in the manual on this topic. You'll have to stop
 Lustre and update the MGS configuration, but it's a pretty quick operation.

 cheers,
 Klaus

 On 8/14/08 2:29 PM, Ron [EMAIL PROTECTED]did etch on stone tablets:

  Hi,
  We have set up a couple of test systems where the OSTs are mounted on
  the same node as the MDT.   The OSTs are LUNS on a SATA Beast
  controller accessible from multiple systems attached to a fibre
  channel switch,. We would like to umount an OST from the MDT system
  and mount it on another system. We've tried doing this and even though
  there is network traffic between the new system and the mds system,
  the mds system seems to be ignoring the OST mount.  Can the an OSTs
  OSS change?
  Thanks,
  Ron
  ___
  Lustre-discuss mailing list
  [EMAIL PROTECTED]
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

 ___
 Lustre-discuss mailing list
 [EMAIL PROTECTED]://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lustre + debian

2008-08-15 Thread David Brown
On Fri, Aug 15, 2008 at 9:48 AM, Robert LeBlanc [EMAIL PROTECTED] wrote:
 You should be able to use official Debian mirrors, that is all that we used,
 you don't have to build your own packages from scratch either as the
 packages are already in Lenny. Just apt-get the packages that he built:

 lustre-utils#Userspace lustre util
 lustre-dev  #Development headers
 lustre-source   #Source for the kernel module
 lustre-tests#Test suite
 linux-patch-lustre  #Patch for the linux kernel.

 And the kernel-source and just skip down to where he builds his kernel.

 Robert

Just thought I'd mention the package repository that I maintain as
well for debian/ubuntu lustre packages

http://www.pdsi-scidac.org/repository/debian
http://www.pdsi-scidac.org/repository/ubuntu

I rebuild the latest packages from debian and build them for all
distributions (except for ubuntu/intrepid and debian/experimental) of
debian and ubuntu.

If you don't want to build everything on your own using mine is what
its there for.

Thanks,
- David Brown

 On 8/15/08 10:38 AM, Troy Benjegerdes [EMAIL PROTECTED] wrote:

 I'm about to try this, and figured it would be worth documenting on the
 wiki..

 http://wiki.lustre.org/index.php?title=Debian_Install

 so far the only issue is debian.internal.sanger.ac.uk is not visible to
 us outsiders ;)



 --
 Robert LeBlanc
 Life Sciences Computer Support
 Brigham Young University
 [EMAIL PROTECTED]
 (801)422-1882

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] mounting an OST from another node attached to a fibre channel switch

2008-08-15 Thread Klaus Steden

Yes.

The MGS contains all the configuration information for the file system it
serves, including the locations and network paths for all the MDS and OSS
nodes within the file system.

What you want to do is tell the MGS that your OSS(es) have been moved to new
addresses and have it update its records. At that point, your file system
will be usable in the new configuration.

hth,
Klaus

On 8/15/08 2:18 PM, Ron [EMAIL PROTECTED]did etch on stone tablets:

 Which section of which manual? Please.  I.e are you talking about:
 Part No. 820-3681-10
 Lustre manual version: Lustre_1.6_man_v1.10
 December 2007
 
 4.2.3.2 Running the Writeconf Command
 and/or 4.2.3.3 Changing a Server NID
 
 Our MGS is not changing nodes, but the OSTs are.
 Is there really just one simple MGS only operation?
 Those sections do not quite fit (to my way of thinking, which is
 in the process of being adjusted :}
 
 Thanks,
 Ron
 
 On Aug 14, 4:33 pm, Klaus Steden [EMAIL PROTECTED] wrote:
 Yes. There is an entry in the manual on this topic. You'll have to stop
 Lustre and update the MGS configuration, but it's a pretty quick operation.
 
 cheers,
 Klaus
 
 On 8/14/08 2:29 PM, Ron [EMAIL PROTECTED]did etch on stone tablets:
 
 Hi,
 We have set up a couple of test systems where the OSTs are mounted on
 the same node as the MDT.   The OSTs are LUNS on a SATA Beast
 controller accessible from multiple systems attached to a fibre
 channel switch,. We would like to umount an OST from the MDT system
 and mount it on another system. We've tried doing this and even though
 there is network traffic between the new system and the mds system,
 the mds system seems to be ignoring the OST mount.  Can the an OSTs
 OSS change?
 Thanks,
 Ron
 ___
 Lustre-discuss mailing list
 [EMAIL PROTECTED]
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 ___
 Lustre-discuss mailing list
 [EMAIL PROTECTED]://lists.lustre.org/mailman/listinfo/lustr
 e-discuss
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss