Re: [Lustre-discuss] lo2iblnd and Mellanox IB question
I've had to rebuild against the Mellanox OFED every time I change Lustre or OFED versions. It's a bit of a catch 22 situation because you have to build the Mellanox OFED against the Lustre kernel, install the Mellanox OFED, then rebuild the Lustre modules against the Mellanox OFED. The procedure I use is as follows... * install upgraded Lustre kernel and kernel-devel rpms * rebuild Mellanox OFED against Lustre kernel - mount -o loop MLNX_OFED.iso /root/mnt - /root/mnt/docs/mlnx_add_kernel_support.sh -i /root/MLNX_OFED.iso * install Mellanox OFED from rebuilt MLNX_OFED.iso * install kernel-ib-devel from rebuilt MLNX_OFED.iso Now rebuld lustre-modules RPM to get ko2iblnd.ko which is compatible with Mellanox kernel-ib drivers... * cd /usr/src/lustre-x.x.x * configure --with-o2ib=/usr/src/openib * make rpms Ron. -Original Message- From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Ms. Megan Larko Sent: November 20, 2012 4:21 PM To: Lustre User Discussion Mailing List Subject: [Lustre-discuss] lo2iblnd and Mellanox IB question Hello to Everyone! I have a question to which I think I know the answer, but I am seeking confirmation (re-assurance?). I have build a RHEL 6.2 system with lustre-2.1.2. I am using the rpms from the Whamcloud site for linux kernel 2.6.32_220.17.1.el6_lustre.x85_64 along with the version-matching lustre, lustre-modules, lustre-ldiskfs, and kernel-devel,I also have from the Whamcloud site kernel-ib-1.8.5-2.6.32-220.17.1.el6_lustre.x86_64 and the related kernel-ib-devel for same. The lustre file system works properly for TCP. I would like to use InfiniBand. The system has a new Mellanox card for which mlxn1 firmware and drivers were installed. After this was done (I cannot speak to before) the IB network will come up on boot and copy and ping in a traditional network fashion. Hard Part: I would like to run the lustre file system on the IB (ib0). I re-created the lustre network to use /etc/modprobe.d/lustre.conf pointing to o2ib in place of tcp0. I rebuilt the mgs/mdt and all osts to use the IB network (the mgs/mds --failnode=[new_IB_addr] and the osts point to mgs on IB net). When I modprobe lustre to start the system I receive error messages stating that there are Input/Output errors on lustre modules fld.ko, fid,ko, mdc.ko osc.ko lov.ko. The lustre.ko cannot be started. A look in /var/log/messages reveals many Unknown symbol and Disagrees about version of symbol from the ko2iblnd module. A modprobe --dump-modversions /path/to/kernel/lo2iblnd.ko shows it pointing to the Modules.symvers of the lustre kernel. Am I correct in thinking that because of the specific Mellanox IB hardware I have (with its own /usr/src/ofa_kernel/Module.symvers file), that I have to build Lustre-2.1.2 from tarball to use the configure --with-o2ib=/usr/src/ofa_kernel mandating that this system use the ofa_kernel-1.8.5 modules and not the OFED 1.8.5 from the kernel-ib rpms to which Lustre defaults in the Linux kernel? Is a rebuild of lustre from source mandartory or is there a way in which I may point to the appropriate symbols needed by the ko2iblnd.ko? Enjoy the Thanksgiving holiday for those U.S. readers.To everyone else in the world, have a great weekend! Megan Larko Hewlett-Packard ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre newbie
Without actual error messages and the version of the installed code, I don't think anyone is going to be able to help much. Personally, the first place I would start is the logs on the MDS. You can also get the lustre version from that node by running: rpm -qa | grep lustre Recently, I also inherited a Lustre system (running 1.8.3) which was exhibiting similar issues and upgrading all the lustre servers to 1.8.8-wc1 (and CentOS 5.8) seems to have resolved all the issues. Ron. -Original Message- From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss- boun...@lists.lustre.org] On Behalf Of Jason Brooks Sent: June 28, 2012 2:35 PM To: lustre-discuss@lists.lustre.org Subject: [Lustre-discuss] Lustre newbie Hello, I am totally new to lustre. I have inherited a couple of clusters which have a lustre filesystem mounted on each node via infiniband. one cluster has 56 nodes on it, the other has about 18. There are 6 lustre servers, five of which are ost's and the sixth is the ost. I have a problem: namely that the lustre filesystem is not mounting at times, or mysteriously unmounts itself. If I try to mount it, at times I will get an error, but I can't recall what it is. I think the latency values have something to do with it, but in truth, I am kind of at a loss where to start. I have luster 1.8 installed. I have the pdf manual by sun, but what it really doesn't seem to illustrate well is how to step into a running system. Do any of you have any recommended reading I do? Thanks! --jason ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Hardware upgrade routes
Hi Daniel, I have done this on a live file system by deactivating the OSTs, migrating (see section 26.2 of the manual for a sample script) the data off the OSTs in question, replacing the hardware and migrating it back. _ Ron Jerome Programmer/Analyst National Research Council Canada 1200 Montreal Road, Ottawa, Ontario K1A 0R6 Government of Canada From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of daniel.ha...@stfc.ac.uk Sent: November 16, 2009 9:18 AM To: lustre-discuss@lists.lustre.org Subject: [Lustre-discuss] Hardware upgrade routes Hi, I'm interested to find out what possible solutions there are for upgrading storage hardware within a cluster over time, either following a failure or just through nodes coming to the end of their normal working life. We would expect a cluster to exist for many years whereas the individual nodes may only last a few years each. Ideally it should be possible to migrate data off an OST as required but there doesn't appear to be anything in the manual which covers this use case specifically. The closest thing seems to be in section 4.3.11 of the manual Removing and Restoring OSTs (http://manual.lustre.org/manual/LustreManual18_HTML/ConfiguringLustre.h tml#50532400_57420): OSTs can be removed from and restored to a Lustre file system. Currently in Lustre, removing an OST really means that the OST is 'deactivated' in the file system, not permanently removed. A removed OST still appears in the file system; do not create a new OST with the same name. Thus one route to migration to new hardware could be to remove an OST (making sure the name is not reused) then use step 2.5 in section 4.3.11.1 to copy to the _new_ hardware, rather than recovering to the same hardware. Does anyone have experience with this type of use case or knowledge of alternative ways of handling this which they could describe for me? Many thanks, Daniel. *** British Atmospheric Data Centre STFC Rutherford Appleton Laboratory Chilton, Didcot, Oxfordshire, OX11 0QX *** -- Scanned by iCritical. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Bad distribution of files among OSTs
Another question I had with regards to this is how long have your OSS's been running without a reboot? Mine have been up for 148 days which is probably longer than ever before. And now that I've said this, it just occurred to me that one of them was rebooted about a three weeks ago and all the others have been up for almost 6 months. I don't know if this has any relevance, but it's the only thing I can think of that's different. Ron. -Original Message- From: lustre-discuss-boun...@lists.lustre.org on behalf of Thomas Roth Sent: Sun 11/1/2009 4:03 AM To: Andreas Dilger Cc: lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] Bad distribution of files among OSTs Another question: Could this situation, 10 full OSTs out of 200, lead to a significant drop in performance? Before, we could usually get the full 110MB/s or so over the 1Gbit/s ethernet lines of the clients. That had dropped to about 50%, but we did not find any other odd thing than the filling levels of the OSTs. Regards, Thomas Andreas Dilger wrote: On 2009-10-30, at 12:07, Thomas Roth wrote: in our 196 OST - Cluster, the previously perfect distribution of files among the OSTs is not working anymore, since ~ 2 weeks. The filling for most OSTs is between 57% and 62%, but some (~10) have risen up to 94%. I'm trying to fix that by having these OSTs deactivated on the MDT and finding and migrating away data from them, but it seems I'm not fast enough and it's a ongoing problem - I've just deactivated another OST with threatening 67%. Is this correlated to some upgrade of Lustre? What version are you using? Our qos_prio_free is at the default 90%. Our OST's sizes are between 2.3TB and 4.5TB. We use striping level 1, so it would be possible to fill up an OST by just creating a 2TB file. However, I'm not aware of any such gigafiles (using robinhood to get a picture of our file system). To fill the smallest OST from 60% to 90% would only need a few file that total 0.3 * 2.3TB, or 690GB. One way to find such files is to mount the full OSTs with ldiskfs and do find /mnt/ost/O/0 -size +100G to list the object IDs that are very large, and then in bug 21244 I've written a small program that dumps the MDS inode number from the specified objects. You can then use debugfs -c -R ncheck {list of inode numbers} /dev/${mdsdev} on the MDS to find the pathnames of those files. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. -- Thomas Roth Gesellschaft für Schwerionenforschung Planckstr. 1- 64291 Darmstadt, Germany Department: Informationstechnologie Location: SB3 1.262 Phone: +49-6159-71 1453 Fax: +49-6159-71 2986 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Bad distribution of files among OSTs
Strangely (although I'm sure it's not related) I have seen the exact same behavior on my Lustre cluster in the last month or so. I have also never seen this before, and to the best of my knowledge there is no change in usage patterns. I'm running 1.6.7.2 on the servers. Ron Jerome National Research Council Canada. -Original Message- From: lustre-discuss-boun...@lists.lustre.org on behalf of Thomas Roth Sent: Fri 10/30/2009 2:07 PM To: lustre-discuss@lists.lustre.org Subject: [Lustre-discuss] Bad distribution of files among OSTs Hi all, in our 196 OST - Cluster, the previously perfect distribution of files among the OSTs is not working anymore, since ~ 2 weeks. The filling for most OSTs is between 57% and 62%, but some (~10) have risen up to 94%. I'm trying to fix that by having these OSTs deactivated on the MDT and finding and migrating away data from them, but it seems I'm not fast enough and it's a ongoing problem - I've just deactivated another OST with threatening 67%. Our qos_prio_free is at the default 90%. Our OST's sizes are between 2.3TB and 4.5TB. We use striping level 1, so it would be possible to fill up an OST by just creating a 2TB file. However, I'm not aware of any such gigafiles (using robinhood to get a picture of our file system). In addition, our user's behavior should not have changed recently. In August, the entire cluster had filled up to almost 80% in a neatly even distribution among the OSTs, so we extended the cluster by more OSTs, migrating data to even the filling between old and new ones. This also succeeded, and up to October there was no indication of something not working. There are no error message in the logs that would point to some OSTs being favored ;-) So, what could be the cause of this misdistribution? Regards, Thomas ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] MDT backup procedure
Hi John, I migrated my MGS/MDT to new hardware just a few weeks ago without much difficulty. I did not use an LVM snapshots though, rather the procedure outlined in the manual (section 15.1.3.1 of the 1.6 manual) using tar (with the sparse option, this is very important!) and getattr. Mine is a combination MGS/MDT, so I also needed to tunefs.lustre --writeconf to get the OST's to update their configuration logs on the new server. I gave the new server the same IP address as the old one, so there weren't any issues with changing nids. It's been running great ever since. FYI, it took a few hours to create the tar and extended attribute files on the old server (~3.4M inodes) and about half that time to restore them onto the new server (faster disks :) All in all, about 4 hours of down time. Ron Jerome National Research Council Canada. -Original Message- From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss- boun...@lists.lustre.org] On Behalf Of John White Sent: June 19, 2009 1:58 AM To: Adam Knight Cc: lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] MDT backup procedure On Jun 17, 2009, at 4:09 PM, Adam Knight wrote: Pertaining to your original email, rather than taking the MDT down to backup, it is very convenient to use LVM snapshots. With this functionality it creates a LV duplicate of the MDT and allows you to mount that as ldiskfs and backup files from a consistent copy (won't be changing even if your MDT continues to add/remove data). Your lustre filesystem will therefore stay operational during the backup. If you time it cleverly, you can snapshot your MDT and OSTs at the same time and backup from all of them to have a consistent copy of the whole filesystem as well. So following this, has anyone migrated an MDT to new storage with this sort of procedure? -create an lvm'd MDT that produces snapshots -use it for a while in production -get some snazzy new disk -shutdown lustre -take a snapshot of the MDT and shuffle it off to some different storage media -create a new LVM with snazzy new disk (specifically of a different size from the original MDT) -restore snapshot -run lfsck for good measure (is this advisable on what could feasibly be a clean filesystem?) -bring up lustre Please keep in mind, I've used LVM but haven't used snapshots, I'm not familiar with their limitations. We're looking to create a filesystem immediately but would like to get some much faster storage for the MDT later without burning and building a new FS. Thanks for this verbose reply. It is exactly what I needed and what I suspected I would run into. We are planning on multiple backup procedures. Users will backup at checkpoints in their work flow, IT will backup the MDT nightly and we are also looking at the possibility of backup the complete file system. Thanks again for everyone's input, this gives me some good ammunition going forward for proposals. Thanks, Dan Kulinski ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre installation and configuration problems
I think the problem you have, as Cliff alluded to, is a mismatch between your kernel version and the Luster kernel version modules. You have kernel 2.6.18-92.el5 and are installing Lustre 2.6.18_92.1.17.el5 Note the .1.17 is significant as the modules will end up in the wrong directory. There is an update to CentOS to bring the kernel to the matching 2.6.18_92.1.17.el5 version you can pull it off the CentOS mirror site in the updates directory. Ron. From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Carlos Santana Sent: June 17, 2009 11:21 AM To: lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] Lustre installation and configuration problems And is there any specific installation order for patchless client? Could someone please share it with me? - CS. On Wed, Jun 17, 2009 at 10:18 AM, Carlos Santana neu...@gmail.com wrote: Huh... :( Sorry to bug you guys again... I am planning to make a fresh start now as nothing seems to have worked for me. If you have any comments/feedback please share them. I would like to confirm installation order before I make a fresh start. From Arden's experience: http://lists.lustre.org/pipermail/lustre-discuss/2009-June/010710.html , the lusre-module is installed last. As I was installing Lustre 1.8, I was referring 1.8 operations manual http://manual.lustre.org/index.php?title=Main_Page . The installation order in the manual is different than what Arden has suggested. Will it make a difference in configuration at later stage? Which one should I follow now? Any comments? Thanks, CS. On Wed, Jun 17, 2009 at 12:35 AM, Carlos Santana neu...@gmail.com wrote: Thanks Cliff. The depmod -a was successful before as well. I am using CentOS 5.2 box. Following are the packages installed: [r...@localhost tmp]# rpm -qa | grep -i lustre lustre-modules-1.8.0-2.6.18_92.1.17.el5_lustre.1.8.0smp lustre-1.8.0-2.6.18_92.1.17.el5_lustre.1.8.0smp [r...@localhost tmp]# uname -a Linux localhost.localdomain 2.6.18-92.el5 #1 SMP Tue Jun 10 18:49:47 EDT 2008 i686 i686 i386 GNU/Linux And here is a output from strace for mount: http://www.heypasteit.com/clip/8WT Any further debugging hints? Thanks, CS. On 6/16/09, Cliff White cliff.wh...@sun.com wrote: Carlos Santana wrote: The '$ modprobe -l lustre*' did not show any module on a patchless client. modprobe -v returns 'FATAL: Module lustre not found'. How do I install a patchless client? I have tried lustre-client-modules and lustre-client-ver rpm packages in both sequences. Am I missing anything? Make sure the lustre-client-modules package matches your running kernel. Run depmod -a to be sure cliffw Thanks, CS. On Tue, Jun 16, 2009 at 2:28 PM, Cliff White cliff.wh...@sun.com mailto:cliff.wh...@sun.com wrote: Carlos Santana wrote: The lctlt ping and 'net up' failed with the following messages: --- --- [r...@localhost ~]# lctl ping 10.0.0.42 opening /dev/lnet failed: No such device hint: the kernel modules may not be loaded failed to ping 10.0.0...@tcp: No such device [r...@localhost ~]# lctl network up opening /dev/lnet failed: No such device hint: the kernel modules may not be loaded LNET configure error 19: No such device Make sure modules are unloaded, then try modprobe -v. Looks like you have lnet mis-configured, if your module options are wrong, you will see an error during the modprobe. cliffw --- --- I tried lustre_rmmod and depmod commands and it did not return any error messages. Any further clues? Reinstall patchless client again? - CS. On Tue, Jun 16, 2009 at 1:32 PM, Cliff White cliff.wh...@sun.com mailto:cliff.wh...@sun.com mailto:cliff.wh...@sun.com mailto:cliff.wh...@sun.com wrote: Carlos Santana wrote: I was able to run lustre_rmmod and depmod successfully. The '$lctl list_nids' returned the server ip address and interface (tcp0). I tried to mount the file system on a remote client, but it failed with the following message. --- --- [r...@localhost ~]# mount -t lustre 10.0.0...@tcp0:/lustre /mnt/lustre mount.lustre: mount 10.0.0...@tcp0:/lustre at /mnt/lustre failed: No such device Are the lustre modules loaded? Check /etc/modprobe.conf and /proc/filesystems Note 'alias lustre llite' should be removed from modprobe.conf --- --- However, the mounting is successful on a single node configuration - with client on the same machine as MDS and OST. Any clues? Where to look for logs and debug
Re: [Lustre-discuss] MDT backup/restore
Ok, replying to myself for the benefit of others who stumble upon this... Using the -S (or --sparse) argument on the tar command when archiving MDT/MDS file systems solves the issue of the restored files being larger than the original and thus not fitting on the target file system. I would suggest that adding this to the documentation might be beificial to all who attempt to move an MDS file system :-) Thanks to the Lustre team for all their hard work, Ron. -Original Message- From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss- boun...@lists.lustre.org] On Behalf Of Jerome, Ron Sent: May 21, 2009 3:39 PM To: lustre-discuss@lists.lustre.org Cc: Brian J. Murrell Subject: Re: [Lustre-discuss] MDT backup/restore Hmmm, a little research in the archives leads me to believe that the --sparse option is required on the tar create command line. Would this be correct? BTW, this MDT is running 1.6.7.1 Thanks, _ Ron Jerome Programmer/Analyst National Research Council Canada M-2, 1200 Montreal Road, Ottawa, Ontario K1A 0R6 Government of Canada Phone: 613-993-5346 FAX: 613-941-1571 _ -Original Message- From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss- boun...@lists.lustre.org] On Behalf Of Brian J. Murrell Sent: May 21, 2009 2:15 PM To: lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] MDT backup/restore On Thu, 2009-05-21 at 13:39 -0400, Jerome, Ron wrote: Hi all, Hi Ron, I attempting to move my MDT to a new server and I'm seeing strange behavior when trying to restore the MDT tar file taken from the original disk to the new one. Basically, the existing MDT data appears to use about 2.5G, however when I restore the tar file on the new server it completely fills a 120G partition and the restoration fails with out of disk space errors?? There has been a lot of discussion on this list about backing up the MDT for relocation. Please review the archives. IIRC, there was even somebody reporting this exact issue. b. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] MDT backup/restore
Hi all, I attempting to move my MDT to a new server and I'm seeing strange behavior when trying to restore the MDT tar file taken from the original disk to the new one. Basically, the existing MDT data appears to use about 2.5G, however when I restore the tar file on the new server it completely fills a 120G partition and the restoration fails with out of disk space errors?? I'm a little mystified as to what is happening here since the disk format is more or less the same, what follows is the disk information for the original disk followed by the new one. Any insights would be greatly appreciated... = Original MDT = [r...@mds1 data-MDT]# df -i FilesystemInodes IUsed IFree IUse% Mounted on /dev/md0 78151680 3368714 747829665% /mnt/data [r...@mds1 data-MDT]# df -h FilesystemSize Used Avail Use% Mounted on /dev/md0 112G 2.5G 102G 3% /mnt/data [r...@mds1 data-MDT]# tune2fs -l /dev/md0 tune2fs 1.40.11.sun1 (17-June-2008) device /dev/md0 mounted by lustre per /proc/fs/lustre/mds/data-MDT/mntdev Filesystem volume name: data-MDT Last mounted on: not available Filesystem UUID: 57449444-0de9-42e3-a919-70d216bc01f5 Filesystem magic number: 0xEF53 Filesystem revision #:1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file Filesystem flags: signed_directory_hash Default mount options:(none) Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 78151680 Block count: 39070048 Reserved block count: 1953502 Free blocks: 28643263 Free inodes: 74782966 First block: 0 Block size: 4096 Fragment size:4096 Reserved GDT blocks: 1024 Blocks per group: 16384 Fragments per group: 16384 Inodes per group: 32768 Inode blocks per group: 4096 Filesystem created: Tue May 29 13:49:47 2007 Last mount time: Thu May 21 13:18:25 2009 Last write time: Thu May 21 13:18:25 2009 Mount count: 180 Maximum mount count: 22 Last checked: Tue May 29 13:49:47 2007 Check interval: 15552000 (6 months) Next check after: Sun Nov 25 12:49:47 2007 Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 512 Journal inode:8 Default directory hash: tea Directory Hash Seed: cf133553-2ae6-42a1-b0b5-cfbad6fd104b Journal backup: inode blocks = New MDT = [r...@mds2 ~]# df -i FilesystemInodes IUsed IFree IUse% Mounted on /dev/md2 35848192 384427 354637652% /root/data [r...@mds2 ~]# df -h FilesystemSize Used Avail Use% Mounted on /dev/md2 120G 120G 0 100% /root/data [r...@mds2 ~]# tune2fs -l /dev/md2 tune2fs 1.40.11.sun1 (17-June-2008) Filesystem volume name: data-MDT Last mounted on: not available Filesystem UUID: d197b003-5ce5-4d1b-8253-b196b2009d07 Filesystem magic number: 0xEF53 Filesystem revision #:1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file uninit_groups Filesystem flags: signed_directory_hash Default mount options:(none) Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 35848192 Block count: 35842992 Reserved block count: 1792149 Free blocks: 125970 Free inodes: 35463765 First block: 0 Block size: 4096 Fragment size:4096 Reserved GDT blocks: 1015 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 32768 Inode blocks per group: 4096 Filesystem created: Wed May 20 16:45:54 2009 Last mount time: Thu May 21 13:27:52 2009 Last write time: Thu May 21 13:27:52 2009 Mount count: 14 Maximum mount count: 26 Last checked: Wed May 20 16:45:54 2009 Check interval: 15552000 (6 months) Next check after: Mon Nov 16 15:45:54 2009 Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 512 Journal inode:8 Default directory hash: tea Directory Hash Seed: 819ff3d7-b341-4300-8cd8-b6a6b3020c4e Journal backup: inode blocks _ Ron Jerome Programmer/Analyst National Research Council Canada M-2, 1200 Montreal Road, Ottawa, Ontario K1A 0R6 Government of Canada Phone: 613-993-5346 FAX: 613-941-1571
Re: [Lustre-discuss] MDT backup/restore
Hmmm, a little research in the archives leads me to believe that the --sparse option is required on the tar create command line. Would this be correct? BTW, this MDT is running 1.6.7.1 Thanks, _ Ron Jerome Programmer/Analyst National Research Council Canada M-2, 1200 Montreal Road, Ottawa, Ontario K1A 0R6 Government of Canada Phone: 613-993-5346 FAX: 613-941-1571 _ -Original Message- From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss- boun...@lists.lustre.org] On Behalf Of Brian J. Murrell Sent: May 21, 2009 2:15 PM To: lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] MDT backup/restore On Thu, 2009-05-21 at 13:39 -0400, Jerome, Ron wrote: Hi all, Hi Ron, I attempting to move my MDT to a new server and I'm seeing strange behavior when trying to restore the MDT tar file taken from the original disk to the new one. Basically, the existing MDT data appears to use about 2.5G, however when I restore the tar file on the new server it completely fills a 120G partition and the restoration fails with out of disk space errors?? There has been a lot of discussion on this list about backing up the MDT for relocation. Please review the archives. IIRC, there was even somebody reporting this exact issue. b. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] speedy server shutdown
If I'm not mistaken, umount -f will unmount your ost's (in fact any lustre mount be it mds, mgs, ost, client) without delay. Ron Jerome National Reseach Council Canada -Original Message- From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss- boun...@lists.lustre.org] On Behalf Of Robin Humble Sent: February 8, 2009 11:29 PM To: lustre-discuss@lists.lustre.org Subject: [Lustre-discuss] speedy server shutdown Hi, when shutting down our OSS's and then MDS's we often wait 330s for each set of umount's to finish eg. Feb 2 03:20:06 xemds2 kernel: Lustre: Mount still busy with 68 refs, waiting for 330 secs... Feb 2 03:20:11 xemds2 kernel: Lustre: Mount still busy with 68 refs, waiting for 325 secs... ... is there a way to speed this up? we're interested in the (perhaps unusual) case where all clients are gone because the power has failed, and the Lustre servers are running on UPS and need to be shut down ASAP. the tangible reward for a quick shutdown is that we can buy a lower capacity (cheaper) UPS if we can reliably and cleanly shutdown all the Lustre servers in 10mins, and preferably 3 minutes. if we're tweaking timeouts to do this then hopefully we can tweak them just before the shutdown and avoid running short timeouts in normal operation. I'm probably missing something obvious, but I have looked through a bunch of /proc/{fs/lustre,sys/lnet,sys/lustre} entries and the Operations Manual and I can't actually see where the default 330s comes from... ??? it seems to be quite repeatable for both OSS's and MDS's. we're using Lustre 1.6.6 or 1.6.5.1 on servers and patchless 1.6.4.3 on clients with x86_64 RHEL 5.2 everywhere. thanks for any help! cheers, robin ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] 1.6.5.1 - 1.6.6
For what it's worth, I've upgraded through every version of the Lustre 1.6.x series starting at 1.6.0 and now running 1.6.6 on a production cluster (32TB in size with 5 OSS's 15 OST's and one MDS/MDT) and I have never once ran tunefs.luster during the upgrade process. Ron Jerome National Research Council Canada. -Original Message- From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss- boun...@lists.lustre.org] On Behalf Of Jakob Goldbach Sent: January 7, 2009 9:41 AM To: lustre-discuss Subject: [Lustre-discuss] 1.6.5.1 - 1.6.6 Hi, The manual says I need to tunefs.lustre --mgs --writeconf ... when upgrading from 1.6.5.1 to 1.6.6. Whats happens if I want to downgrade to 1.6.5.1 again ? I come from 1.6.4.3 - does that make a difference ? Thanks /Jakob ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre clients failing, and cant reconnect
For what it's worth... I've seen similar problems with clients not being able to connect to OSSs SERVER OS: Linux oss1 2.6.18-53.1.14.el5_lustre.1.6.5.1smp #1 SMP Thu Jun 26 01:38:50 EDT 2008 i686 i686 i386 GNU/Linux CLIENT OS: Linux x15 2.6.18-53.1.14.el5_lustre.1.6.5smp #1 SMP Mon May 12 22:24:24 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux On the client side I see this... CLIENT LOG = Sep 1 15:17:22 x15 kernel: Lustre: Request x30990319 sent from data-OST0004-osc-81022067ec00 to NID [EMAIL PROTECTED] 100s ago has timed out (limit 100s). Sep 1 15:17:22 x15 kernel: Lustre: Skipped 9 previous similar messages Sep 1 15:17:22 x15 kernel: Lustre: data-OST0004-osc-81022067ec00: Connection to service data-OST0004 via nid [EMAIL PROTECTED] was lost; in progress operations using this service will wait for recovery to complete. Sep 1 15:17:22 x15 kernel: LustreError: 3834:0:(ldlm_request.c:986:ldlm_cli_cancel_req()) Got rc -11 from cancel RPC: canceling anyway Sep 1 15:17:22 x15 kernel: LustreError: 3834:0:(ldlm_request.c:1575:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -11 Sep 1 15:17:22 x15 kernel: LustreError: 11-0: an error occurred while communicating with [EMAIL PROTECTED] The ost_connect operation failed with -16 Sep 1 15:17:22 x15 kernel: LustreError: Skipped 2 previous similar messages Sep 1 15:17:47 x15 kernel: Lustre: 3216:0:(import.c:395:import_select_connection()) data-OST0004-osc-81022067ec00: tried all connections, increasing latency to 6s Sep 1 15:17:47 x15 kernel: Lustre: 3216:0:(import.c:395:import_select_connection()) Skipped 4 previous similar messages Sep 1 15:17:47 x15 kernel: LustreError: 11-0: an error occurred while communicating with [EMAIL PROTECTED] The ost_connect operation failed with -16 Sep 1 15:18:37 x15 last message repeated 2 times Sep 1 15:19:02 x15 kernel: LustreError: 11-0: an error occurred while communicating with [EMAIL PROTECTED] The ost_connect operation failed with -16 Sep 1 15:19:27 x15 kernel: Lustre: 3216:0:(import.c:395:import_select_connection()) data-OST0004-osc-81022067ec00: tried all connections, increasing latency to 26s Sep 1 15:19:27 x15 kernel: Lustre: 3216:0:(import.c:395:import_select_connection()) Skipped 3 previous similar messages Sep 1 15:19:27 x15 kernel: LustreError: 11-0: an error occurred while communicating with [EMAIL PROTECTED] The ost_connect operation failed with -16 Sep 1 15:20:17 x15 last message repeated 2 times Sep 1 15:21:07 x15 kernel: LustreError: 11-0: an error occurred while communicating with [EMAIL PROTECTED] The ost_connect operation failed with -16 Sep 1 15:21:07 x15 kernel: LustreError: Skipped 1 previous similar message Sep 1 15:21:57 x15 kernel: Lustre: 3216:0:(import.c:395:import_select_connection()) data-OST0004-osc-81022067ec00: tried all connections, increasing latency to 51s Sep 1 15:21:57 x15 kernel: Lustre: 3216:0:(import.c:395:import_select_connection()) Skipped 5 previous similar messages Sep 1 15:22:22 x15 kernel: LustreError: 11-0: an error occurred while communicating with [EMAIL PROTECTED] The ost_connect operation failed with -16 Sep 1 15:22:22 x15 kernel: LustreError: Skipped 2 previous similar messages Sep 1 15:24:52 x15 kernel: LustreError: 11-0: an error occurred while communicating with [EMAIL PROTECTED] The ost_connect operation failed with -16 Sep 1 15:24:52 x15 kernel: LustreError: Skipped 5 previous similar messages Sep 1 15:27:22 x15 kernel: Lustre: 3216:0:(import.c:395:import_select_connection()) data-OST0004-osc-81022067ec00: tried all connections, increasing latency to 51s Sep 1 15:27:22 x15 kernel: Lustre: 3216:0:(import.c:395:import_select_connection()) Skipped 12 previous similar messages Sep 1 15:29:27 x15 kernel: LustreError: 11-0: an error occurred while communicating with [EMAIL PROTECTED] The ost_connect operation failed with -16 Sep 1 15:29:27 x15 kernel: LustreError: Skipped 10 previous similar messages Sep 1 15:37:47 x15 kernel: Lustre: 3216:0:(import.c:395:import_select_connection()) data-OST0004-osc-81022067ec00: tried all connections, increasing latency to 51s Sep 1 15:37:47 x15 kernel: Lustre: 3216:0:(import.c:395:import_select_connection()) Skipped 24 previous similar messages Sep 1 15:38:12 x15 kernel: LustreError: 11-0: an error occurred while communicating with [EMAIL PROTECTED] The ost_connect operation failed with -16 Sep 1 15:38:12 x15 kernel: LustreError: Skipped 20 previous similar messages Sep 1 15:48:12 x15 kernel: Lustre: 3216:0:(import.c:395:import_select_connection()) data-OST0004-osc-81022067ec00: tried all connections, increasing latency to 51s END CLIENT LOG = Server log at corresponding time... SERVER LOG = Aug 31 04:02:04 oss1 syslogd
Re: [Lustre-discuss] Problem with e2fsprogs
Further to this, I was able to get e2fsprogs-1.40.7.sun3-0redhat.src.rpm to rebuild by changing the order in which the libraries are built. In Makefile.in I changed LIB_SUBDIRS from... LIB_SUBDIRS=lib/et lib/ss lib/e2p lib/ext2fs lib/uuid lib/blkid intl To... LIB_SUBDIRS=lib/et lib/ss lib/ext2fs lib/e2p lib/uuid lib/blkid intl i.e. building ext2fs prior to e2p Otherwise ext2_err.h was missing when trying to build e2p. Ron. From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jerome, Ron Sent: June 19, 2008 9:18 PM To: David Frioni Cc: Lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] Problem with e2fsprogs Thanks David, I also tried that before posting to the list, but unfortunately it failed to build so I didn't really pursue it much further thinking perhaps it was a simple as the wrong file being packaged with the RHEL4 release. Ron. From: David Frioni [mailto:[EMAIL PROTECTED] Sent: Thu 6/19/2008 5:32 PM To: Jerome, Ron Subject: Re: [Lustre-discuss] Problem with e2fsprogs Jerome- When I tried to install this I also ran into the libdb-4.3.so dependency problem. I resolved it by downloading the e2fsprogs src rpm from the lustre site and building it as follows: rpmbuild --rebuild e2fsprogs-1.40.7.sun3-0redhat.src.rpm Hope this helps, Dave On Jun 19, 2008, at 5:10 PM, Jerome, Ron wrote: I downloaded Lustre(TM) 1.6.5 for Red Hat Enterprise Linux 4, i686 and when trying to install it on CentOS 4.6, I get dependency errors as show below. It almost looks like this e2fsprogs package is for RHEL5 not RHEL4 however I did download it from the 4.x page (I did it twice just to make sure). --- Package e2fsprogs.i386 0:1.40.7.sun3-0redhat set to be updated --- Package net-snmp-libs.i386 0:5.1.2-11.el4_6.11.3 set to be updated --- Package lustre-ldiskfs.i686 0:3.0.4-2.6.9_67.0.7.EL_lustre.1.6.5smp set to be updated -- Running transaction check -- Processing Dependency: libc.so.6(GLIBC_2.4) for package: e2fsprogs -- Processing Dependency: libdb-4.3.so for package: e2fsprogs -- Processing Dependency: rtld(GNU_HASH) for package: e2fsprogs -- Finished Dependency Resolution Error: Missing Dependency: libc.so.6(GLIBC_2.4) is needed by package e2fsprogs Error: Missing Dependency: libdb-4.3.so is needed by package e2fsprogs Error: Missing Dependency: rtld(GNU_HASH) is needed by package e2fsprogs _ Ron Jerome Programmer/Analyst National Research Council Canada M-2, 1200 Montreal Road, Ottawa, Ontario K1A 0R6 Government of Canada Phone: 613-993-5346 FAX: 613-941-1571 _ ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Problem with e2fsprogs
I downloaded Lustre(TM) 1.6.5 for Red Hat Enterprise Linux 4, i686 and when trying to install it on CentOS 4.6, I get dependency errors as show below. It almost looks like this e2fsprogs package is for RHEL5 not RHEL4 however I did download it from the 4.x page (I did it twice just to make sure). --- Package e2fsprogs.i386 0:1.40.7.sun3-0redhat set to be updated --- Package net-snmp-libs.i386 0:5.1.2-11.el4_6.11.3 set to be updated --- Package lustre-ldiskfs.i686 0:3.0.4-2.6.9_67.0.7.EL_lustre.1.6.5smp set to be updated -- Running transaction check -- Processing Dependency: libc.so.6(GLIBC_2.4) for package: e2fsprogs -- Processing Dependency: libdb-4.3.so for package: e2fsprogs -- Processing Dependency: rtld(GNU_HASH) for package: e2fsprogs -- Finished Dependency Resolution Error: Missing Dependency: libc.so.6(GLIBC_2.4) is needed by package e2fsprogs Error: Missing Dependency: libdb-4.3.so is needed by package e2fsprogs Error: Missing Dependency: rtld(GNU_HASH) is needed by package e2fsprogs _ Ron Jerome Programmer/Analyst National Research Council Canada M-2, 1200 Montreal Road, Ottawa, Ontario K1A 0R6 Government of Canada Phone: 613-993-5346 FAX: 613-941-1571 _ ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] 1.6.3 - 1.6.4.2 upgrade
I asked the same question a while back, and although I was never given a 100% guarantee (they only test between major and minor increments, not across multiple minor increments), the bottom line was that it should work and it did (from 1.6.0.1 - 1.6.3 without issue). Basically, just install and reboot. Ron Jerome National Research Council Canada. From: [EMAIL PROTECTED] on behalf of Charles Taylor Sent: Sun 2/17/2008 7:04 PM To: Lustre-discuss Subject: Re: [Lustre-discuss] 1.6.3 - 1.6.4.2 upgrade Turns out that the client upgrade worked just fine. I had not noticed that the ko2iblnd module was not in place. I'm still wondering if I need to do anything special with regard to upgrading the MGS/MDS and OSSs. I'm hoping to just dump the software in place and reboot with live clients. Seems kind of risky, but hey, the docs say you can do it for 1.4 - 1.6.3 so going from 1.6.3 - 1.6.4.2 ought to be a no-brainer, right? :) Charlie Taylor UF HPC Center On Feb 17, 2008, at 3:44 PM, Charles Taylor wrote: Just updated a single client from 1.6.3 to 1.6.4.2.The documentation seems to indicate that an upgraded client should still be able to mount the file system from a non-upgraded MGSMDT.The documentation appears to be referring to a 1.4 to 1.6 upgrade but I made the leap that similar things would apply to 1.6.3 - 1.6.4.2. As I said, the servers are still running 1.6.3 and have not been touched.The client is upgraded to 1.6.4.2. When I try to mount the file system I get... Is the MGS specification correct? Is the filesystem name correct? If upgrading, is the copied client log valid? (see upgrade docs) I've double-checked the first two but I have no idea what the third item refers to.The docs doc about using tunefs to manually copy client config logs when upgrading an MGS/MDS but they seem to indicate that the only issue on the client one needs to worry about is the form of the mount command. In going from 1.6.3 to 1.6.4.2, that should not be an issue. Have I missed a step? Do I need to do something to tell the MGS/MDS that the client has been upgraded? Is there newer documentation for going from 1.6.3 to 1.6.4.2?I was hoping that I could just upgrade the software on the MGS/MDS and OSS (in that order) and restart?Is that not the case? Thanks, charlie taylor uf hpc center ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss