Re: [Lustre-discuss] Nodes claim error with files, then say everything is fine.
On Wed, Aug 6, 2008 at 1:25 PM, Brian J. Murrell [EMAIL PROTECTED] wrote: dOn Wed, 2008-08-06 at 12:11 -0600, Chris Worley wrote: On Wed, Aug 6, 2008 at 11:26 AM, Brian J. Murrell [EMAIL PROTECTED] wrote: +rpctrace debug is probably the way to go to see what the client are(n't) doing in terms of keeping the MDS aware of it's existence. The log from the client perspective starts with a lot of packet mismatches: 0100:0002:3:1217629906.902955:0:5592:0:(events.c:116:reply_in_callback()) early reply sized 176, expect 128 0100:0002:3:1217633409.730495:0:5590:0:(events.c:116:reply_in_callback()) early reply sized 240, expect 128 Given the amount of debug provided, this looks like it could be bug 16534 which is on my plate to test the solution for and land. In adding info to bugzilla, you clarified that this is actually bug #16237. This bug is fixed and in Lustre 1.6.6... but 1.6.6 isn't released. Is there a work-around for the 1.6.5.1 client (short of rebooting)? Thanks, Chris ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] lustre + debian
I've just gone through the exercise of recompiling the lenny packages on etch; it worked like a charm. This is the procedure I used. Hope it helps. *Pre-requisites Any debian machine can be used to build the packages. It does not have to be a lustre client or server. *Install build essentials Install the packages required to build debs. (build-essential, module-assistant etc) *Get Source Ensure sources.list contains the following lines: deb http://debian.internal.sanger.ac.uk/debian/ etch main non-free contrib deb-src http://debian.internal.sanger.ac.uk/debian/ lenny main contrib non-free *Download the source with: #aptitude update #apt-get source linux-image-2.6.18-X-686 #apt-get source lustre This will unpack two directories, one with the lustre source and one with the kernel source. *Build lustre userspace Change to the lustre directory. #cd lustre-X.X.X #dpkg-buildpackage -r fakeroot If the build fails with automake errors you will need to install a later automake version. (debian/etch provides several to choose from.) This will build the following packages: lustre-utils#Userspace lustre util lustre-dev #Development headers lustre-source #Source for the kernel module lustre-tests#Test suite linux-patch-lustre #Patch for the linux kernel. Install the lustre-source and linux-patch-lustre packages on the build machine. These packages contain the patches to the kernel source tree that are used in the next step of the build. #dpkg -i linux-patch-lustre_XXX.deb #dpkg -i lustre-source_XXX.deb *Build lustre patched kernel #cd linux-2.6- We need to grab the .config file for the debian kernel. We should be able to generate the config from the source package, but I'm not sure how. The easiest way to get the correct config is to copy /boot/config-X.X.X from a machine already running the debian kernel. #cp /boot/config-2.6.XXX-686 .config Check the kernel config works: (This might generate 1 or 2 minor questions. Just hit m or y.) #make oldconfig We can now build the kernel. #export CONCURRENCY_LEVEL=3 #make-kpkg clean #make-kpkg --rootcmd fakeroot --added-patches=lustre \ --initrd --append-to-version -lustre-1.6.5.1 --revision=mmdd kernel_image (You might be asked about extra scsi statistics options; selecting Y is probably a good idea) You should now have a kernel-image deb. *Build lustre kernel modules The lustre kernel modules can now be built. #module-assistant -u/your/working/directory -k /path/to/the/kernel/linux-2.6-X.X build lustre After the build has finished you should now have a lustre-modules.deb *Install To install lustre on a client or server machine, simply install the packages you have created: linux-image-2.6.XX-lustre-X.X.X._XX.deb lustre-modules-2.6.XX-lustre-X.X.X._XX.deb lustre-utils_X.X.deb liblustre-X-X.deb lustre-dev-X.X.deb The test suite is optional. For configuration of networks and timeout options in /etc/modprobe.d/lustre see the lustre manual. *Extras not currently packaged Lustre uses a special version of e2fsprogs. These allow you to specify the disk raid geometry at filesystem creation time to optimise performance. It also has extra options to support the lfsck lustre filesystem consistency check. Debian upstream have said they will package this in the future. In the meantime, you will have to build it yourself. Note that the modified program is only required on OST and MDS machines. *Get the e2fsprogs source Get the latest sun patch tarball: http://downloads.lustre.org/public/tools/e2fsprogs/latest/ eg e2fsprogs-1.40.11-sun1-patches.tar.gz You will also need the upstream source (in this case e2fsprogs 1.40.11). This can be found http://downloads.lustre.org/public/tools/e2fsprogs/upstream/ *Patch the source tar -xvf e2fsprogs 1.40.11.tar.gz tar -xvf e2fsprogs-1.40.11-sun1-patches.tar.gz Patch the source with quilt. #cd e2fsprogs 1.40.11 #ln -s -f ../patches . #ln -s -f ../patches/series . #quilt push -av *Build the source Note that the ext2fs-dev, libsqlite3-dev, sqlite3 and libdb4.3-dev headers and libraries must be installed before building. (Note that libdb4.4 does not work, but the code will compile OK !?) #./configure --with-lustre=/path/to/lustre/source #make #make install -- Dr. Guy Coates, Informatics System Group The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK Tel: +44 (0)1223 834244 x 6925 Fax: +44 (0)1223 496802 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Startup Sequence 1.6.5.1
The way we do it is, we don't even keep the startup sequence in /etc/fstab. We have a seperate script we run to mount up lustre stuff. We do it at the very end. On Fri, Aug 15, 2008 at 4:03 AM, Jakob Goldbach [EMAIL PROTECTED] wrote: On Thu, 2008-08-14 at 17:49 -0500, Mike Feuerstein wrote: Is there a recommended startup mounting sequence for 1.6.5.1? We are internally debating MGS lun - MDS luns - OSTs vs. MGS lun - OSTs - MDS luns Both will work but the latter i more noiceless in dmesg as the MDS is a client of the OST. /Jakob ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] LNET packets
I am doing a case study at my university and I am trying to analyze packets for LNET. I want to compare this with other Network based filesystems, such as NFS and SMB. I plan on using tcpdump to get capture LNET packets, but I am not sure what port I should listen to. Does anyone know what port Lustre runs on? TIA ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] lustre + debian
You should be able to use official Debian mirrors, that is all that we used, you don't have to build your own packages from scratch either as the packages are already in Lenny. Just apt-get the packages that he built: lustre-utils#Userspace lustre util lustre-dev #Development headers lustre-source #Source for the kernel module lustre-tests#Test suite linux-patch-lustre #Patch for the linux kernel. And the kernel-source and just skip down to where he builds his kernel. Robert On 8/15/08 10:38 AM, Troy Benjegerdes [EMAIL PROTECTED] wrote: I'm about to try this, and figured it would be worth documenting on the wiki.. http://wiki.lustre.org/index.php?title=Debian_Install so far the only issue is debian.internal.sanger.ac.uk is not visible to us outsiders ;) -- Robert LeBlanc Life Sciences Computer Support Brigham Young University [EMAIL PROTECTED] (801)422-1882 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Re-using OST and MDT names
In replying to myself here, I did mange to get my disk mounted to perform a new benchmark test. I could not figure a way around the mount.lustre: mount /dev/sdg1 at / srv/lustre/mds/crew5-MDT failed: Address already in use; The target service's index is already in use. (/dev/sdg1) error. Even rebooting both the OSS and MDS did not help. So, being as this was just a test of hw with a different stripesize setting on an LSI ELP RAID card (128kB in place of default 64kB), I re-created both the OST and the MDT using a new, unique fsname and all of the same hardware. This worked like a charm. someday I will have to figure out what to do with the cast-off MDT names which I apparently may no longer use... Any comment, observations, suggestions appreciated. Later, megan On Aug 14, 12:55 pm, Ms. Megan Larko [EMAIL PROTECTED] wrote: Hello, As a part of my continuing to benchmark Lustre to ascertain was may be best-suited for our needs here, I have re-created at the LSI ELP card level some of my arrays from my earlier benchmark posts. The card is now sending /dev/sdf 998999Mb with 128kB stripesize and /dev/sdg 6992995 Mb with 128kB stripesize to my OSS. The sdf and sdg formatted fine with Lustre and mounted without issue on the OSS. Recycling the MGS MDT's seem to have been a problem. When I tried to mount the MDT on the MGS after mounting the new OST's the mounts performed without error, but the bonnie benchmark test as run before would hang every time. Sample of errors in MGS file /var/log/messages: Aug 13 12:39:30 mds1 kernel: Lustre: crew5-OST0001-osc: Connection to service crew5-OST000 1 via nid [EMAIL PROTECTED] was lost; in progress operations using this service will wait for recovery to complete. Aug 13 12:39:30 mds1 kernel: LustreError: 167-0: This client was evicted by crew5-OST0001; in progress operations using this service will fail. Aug 13 12:39:30 mds1 kernel: Lustre: crew5-OST0001-osc: Connection restored to service crew5-OST0001 using nid [EMAIL PROTECTED] Aug 13 12:39:30 mds1 kernel: Lustre: MDS crew5-MDT: crew5-OST0001_UUID now active, resetting orphans Aug 13 12:42:42 mds1 kernel: Lustre: 3406:0:(ldlm_lib.c:519:target_handle_reconnect()) cre w5-MDT: 50b043bb-0e8c-7a5b-b0fe-6bdb67d21e0b reconnecting Aug 13 12:42:42 mds1 kernel: Lustre: 3406:0:(ldlm_lib.c:519:target_handle_reconnect()) Skipped 24 previous similar messages Aug 13 12:42:42 mds1 kernel: Lustre: 3406:0:(ldlm_lib.c:747:target_handle_connect()) crew5-MDT: refuse reconnection from [EMAIL PROTECTED]@o2ib to 0x81006994d000; still busy with 2 active RPCs Aug 13 12:42:42 mds1 kernel: Lustre: 3406:0:(ldlm_lib.c:747:target_handle_connect()) Skipped 24 previous similar messages Aug 13 12:42:42 mds1 kernel: LustreError: 3406:0:(ldlm_lib.c:1442:target_send_reply_msg()) @@@ processing error (-16) [EMAIL PROTECTED] x600107/t0 o38-[EMAIL PROTECTED]:-1 lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0 Aug 13 12:42:42 mds1 kernel: LustreError: 3406:0:(ldlm_lib.c:1442:target_send_reply_msg()) Skipped 24 previous similar messages Aug 13 12:43:40 mds1 kernel: LustreError: 11-0: an error occurred while communicating with [EMAIL PROTECTED] The ost_connect operation failed with -19 Aug 13 12:43:40 mds1 kernel: LustreError: Skipped 7 previous similar messages Aug 13 12:47:50 mds1 kernel: Lustre: crew5-OST0001-osc: Connection to service crew5-OST0001 via nid [EMAIL PROTECTED] was lost; in progress operations using this service will wait f or recovery to complete. Sample of errors on OSS file /var/log/messages: Aug 13 12:39:30 oss4 kernel: Lustre: crew5-OST0001: received MDS connection from [EMAIL PROTECTED] Aug 13 12:43:57 oss4 kernel: Lustre: crew5-OST0001: haven't heard from client crew5-mdtlov_UUID (at [EMAIL PROTECTED]) in 267 seconds. I think it's dead, and I am evicting it. Aug 13 12:46:27 oss4 kernel: LustreError: 137-5: UUID 'crew5-OST_UUID' is not available for connect (no target) Aug 13 12:46:27 oss4 kernel: LustreError: Skipped 51 previous similar messages Aug 13 12:46:27 oss4 kernel: LustreError: 4151:0:(ldlm_lib.c:1442:target_send_reply_msg()) @@@ processing error (-19) [EMAIL PROTECTED] x600171/t0 o8-?@?:-1 lens 240/0 ref 0 fl Interpret:/0/0 rc -19/0 Aug 13 12:46:27 oss4 kernel: LustreError: 4151:0:(ldlm_lib.c:1442:target_send_reply_msg()) Skipped 52 previous similar messages Aug 13 12:47:50 oss4 kernel: Lustre: crew5-OST0001: received MDS connection from [EMAIL PROTECTED] In lctl all pings were successful. Additionally files on Lustre disks on our live system using the same MGS were all fine; no errors in logfile. I thought that maybe changing the disk kB size and reformatting the OST without reformatting the MDT was a problem. So I unmounted OST and
Re: [Lustre-discuss] mounting an OST from another node attached to a fibre channel switch
Which section of which manual? Please. I.e are you talking about: Part No. 820-3681-10 Lustre manual version: Lustre_1.6_man_v1.10 December 2007 4.2.3.2 Running the Writeconf Command and/or 4.2.3.3 Changing a Server NID Our MGS is not changing nodes, but the OSTs are. Is there really just one simple MGS only operation? Those sections do not quite fit (to my way of thinking, which is in the process of being adjusted :} Thanks, Ron On Aug 14, 4:33 pm, Klaus Steden [EMAIL PROTECTED] wrote: Yes. There is an entry in the manual on this topic. You'll have to stop Lustre and update the MGS configuration, but it's a pretty quick operation. cheers, Klaus On 8/14/08 2:29 PM, Ron [EMAIL PROTECTED]did etch on stone tablets: Hi, We have set up a couple of test systems where the OSTs are mounted on the same node as the MDT. The OSTs are LUNS on a SATA Beast controller accessible from multiple systems attached to a fibre channel switch,. We would like to umount an OST from the MDT system and mount it on another system. We've tried doing this and even though there is network traffic between the new system and the mds system, the mds system seems to be ignoring the OST mount. Can the an OSTs OSS change? Thanks, Ron ___ Lustre-discuss mailing list [EMAIL PROTECTED] http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list [EMAIL PROTECTED]://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] lustre + debian
On Fri, Aug 15, 2008 at 9:48 AM, Robert LeBlanc [EMAIL PROTECTED] wrote: You should be able to use official Debian mirrors, that is all that we used, you don't have to build your own packages from scratch either as the packages are already in Lenny. Just apt-get the packages that he built: lustre-utils#Userspace lustre util lustre-dev #Development headers lustre-source #Source for the kernel module lustre-tests#Test suite linux-patch-lustre #Patch for the linux kernel. And the kernel-source and just skip down to where he builds his kernel. Robert Just thought I'd mention the package repository that I maintain as well for debian/ubuntu lustre packages http://www.pdsi-scidac.org/repository/debian http://www.pdsi-scidac.org/repository/ubuntu I rebuild the latest packages from debian and build them for all distributions (except for ubuntu/intrepid and debian/experimental) of debian and ubuntu. If you don't want to build everything on your own using mine is what its there for. Thanks, - David Brown On 8/15/08 10:38 AM, Troy Benjegerdes [EMAIL PROTECTED] wrote: I'm about to try this, and figured it would be worth documenting on the wiki.. http://wiki.lustre.org/index.php?title=Debian_Install so far the only issue is debian.internal.sanger.ac.uk is not visible to us outsiders ;) -- Robert LeBlanc Life Sciences Computer Support Brigham Young University [EMAIL PROTECTED] (801)422-1882 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] mounting an OST from another node attached to a fibre channel switch
Yes. The MGS contains all the configuration information for the file system it serves, including the locations and network paths for all the MDS and OSS nodes within the file system. What you want to do is tell the MGS that your OSS(es) have been moved to new addresses and have it update its records. At that point, your file system will be usable in the new configuration. hth, Klaus On 8/15/08 2:18 PM, Ron [EMAIL PROTECTED]did etch on stone tablets: Which section of which manual? Please. I.e are you talking about: Part No. 820-3681-10 Lustre manual version: Lustre_1.6_man_v1.10 December 2007 4.2.3.2 Running the Writeconf Command and/or 4.2.3.3 Changing a Server NID Our MGS is not changing nodes, but the OSTs are. Is there really just one simple MGS only operation? Those sections do not quite fit (to my way of thinking, which is in the process of being adjusted :} Thanks, Ron On Aug 14, 4:33 pm, Klaus Steden [EMAIL PROTECTED] wrote: Yes. There is an entry in the manual on this topic. You'll have to stop Lustre and update the MGS configuration, but it's a pretty quick operation. cheers, Klaus On 8/14/08 2:29 PM, Ron [EMAIL PROTECTED]did etch on stone tablets: Hi, We have set up a couple of test systems where the OSTs are mounted on the same node as the MDT. The OSTs are LUNS on a SATA Beast controller accessible from multiple systems attached to a fibre channel switch,. We would like to umount an OST from the MDT system and mount it on another system. We've tried doing this and even though there is network traffic between the new system and the mds system, the mds system seems to be ignoring the OST mount. Can the an OSTs OSS change? Thanks, Ron ___ Lustre-discuss mailing list [EMAIL PROTECTED] http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list [EMAIL PROTECTED]://lists.lustre.org/mailman/listinfo/lustr e-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss