Re: [Lustre-discuss] [EXTERNAL] Re: Newbie - Unable to mount the OST on the Client
In lustre you MUST have a MGS _AND_ MDS node and an associated MDT (file system) running on the MDS. Typically the MDS and MGS are configured on the same node. If you don't have this, lustre won't work. Joe Mervini Sandia National Laboratories High Performance Computing 505.844.6770 jame...@sandia.govmailto:jame...@sandia.gov On Jan 8, 2015, at 7:23 AM, Gupta, Amit amit.gu...@optum.commailto:amit.gu...@optum.com wrote: Jongwoo, This is the mount command that is being used on the client machine [root@Client ~]# mount -t lustre 10.177.33.10@tcp:/client1mailto:10.177.33.10@tcp:/client1 /mnt/lustre Here 10.177.33.10 is the MGS Server and “client1” is the filesystem name for the OST On the OSS, here is this filesystem mounted as /dev/zd912 1069706420 483556 1013552180 1% /client1 The commands used on OST to mount and format “client1” were mkfs.lustre --ost --fsname=client1 --reformat --index=0 --mgsnode=10.177.33.10@tcp0 /dev/zvol/LSD1Pool/LusterVola0 mount -t lustre /dev/zvol/LSD1Pool/LusterVola0 /client1 From: 한종우 [mailto:jw@apexcns.com] Sent: Wednesday, January 07, 2015 9:20 PM To: Gupta, Amit Cc: lustre-discuss@lists.lustre.orgmailto:lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] Newbie - Unable to mount the OST on the Client Hi Gupta, What I see is 1. there is no MDS and/or filesystem name 2. mount command did not specify filesystem name, otherwise the filesystem name is client1 try preparing MDT volume and format it with a filesystem name( on the MGS is okay) and mount the MDT, and retry lustre mount with following command [root@Client ~]# mount -t lustre 10.177.33.10@tcp:/FILESYSTEMNAMEmailto:10.177.33.10@tcp:/%3cFILESYSTEMNAME /mnt/lustre Regards, Jongwoo Han 2015-01-07 22:25 GMT+09:00 Gupta, Amit amit.gu...@optum.commailto:amit.gu...@optum.com: Hi All , Need some guidance on getting past this error that I get mounting the OST on the Client. This is the first time I am trying to mount it so not sure what I could be missing.. I will be glad to provide any other logs that would be helpful to determine the root cause of this issue.. Thanks There are 3 servers in the configuration running Lustre kernel-2.6.32-431.20.3.el6_lustre.x86_64 10.177.33.10 is MGS/MGT Server 10.177.33.9 is the Lustre Client 10.177.33.22 is OSS [root@Client ~]# mount -t lustre 10.177.33.10@tcp:/client1mailto:10.177.33.10@tcp:/client1 /mnt/lustre mount.lustre: mount 10.177.33.10@tcp:/client1mailto:10.177.33.10@tcp:/client1 at /mnt/lustre failed: Invalid argument This may have multiple causes. Is 'client1' the correct filesystem name? Are the mount options correct? Check the syslog for more info The OST is mounted on OSS as Client1 Error from Logs Jan 7 13:10:46 client kernel: LustreError: 156-2: The client profile 'client1-client' could not be read from the MGS. Does that filesystem exist? Jan 7 13:10:46 client kernel: LustreError: 11278:0:(lov_obd.c:951:lov_cleanup()) client1-clilov-8820337cec00: lov tgt 0 not cleaned! deathrow=0, lovrc=1 Jan 7 13:10:46 client kernel: Lustre: Unmounted client1-client Jan 7 13:10:46 client kernel: LustreError: 13167:0:(obd_mount.c:1342:lustre_fill_super()) Unable to mount (-22) Pinging MGS and Client from OSS = [root@OSS ]# lctl ping 10.177.33.10 12345-0@lo 12345-10.177.33.10@tcpmailto:12345-10.177.33.10@tcp [root@OSS]# lctl ping 10.177.33.9 12345-0@lo 12345-10.177.33.9@tcpmailto:12345-10.177.33.9@tcp = OST mounted as Client1 on OSS = /dev/zd912 1069706420 483556 1013552180 1% /client1 [root@OSS]# tunefs.lustre /dev/zd912 checking for existing Lustre data: found Reading CONFIGS/mountdata Read previous values: Target: client1-OST Index: 0 Lustre FS: client1 Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro Parameters: mgsnode=10.177.33.10@tcpmailto:mgsnode=10.177.33.10@tcp Permanent disk data: Target: client1-OST Index: 0 Lustre FS: client1 Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro Parameters: mgsnode=10.177.33.10@tcpmailto:mgsnode=10.177.33.10@tcp lctl modules add-symbol-file lustre/osp/osp.o 0xa10f5000 add-symbol-file lustre/mdd/mdd.o 0xa0b12000 add-symbol-file lustre/lod/lod.o 0xa109f000 add-symbol-file lustre/mdt/mdt.o 0xa0fe7000 add-symbol-file lustre/mgs/mgs.o 0xa08c9000 add-symbol-file lustre/lfsck/lfsck.o 0xa0f17000 add-symbol-file lustre/ost/ost.o 0xa018c000 add-symbol-file lustre/mgc/mgc.o 0xa0ef9000 add-symbol-file lustre/quota/lquota.o 0xa0dfc000 add-symbol-file lustre/llite/lustre.o 0xa0ce2000 add-symbol-file lustre/lov/lov.o 0xa0c66000 add-symbol-file lustre/mdc/mdc.o
Re: [Lustre-discuss] [EXTERNAL] Lustre on ZFS MDS/MDT failover
I just ran into this same issue last week. There is a JIRA ticket on it at Intel but in a nutshell mkfs.lustre on zfs will only record the last mgsnode you specify in your command. To add an additional fail node you can use the zfs command to update the configuration: zfs set lustre:failover.node=mgsnode1@network:mgsnode2@network zpool name/zpool volume Hope this helps. Joe Mervini Sandia National Laboratories High Performance Computing 505.844.6770 jame...@sandia.gov On Dec 1, 2014, at 10:41 AM, Ron Croonenberg r...@lanl.gov wrote: Hello, We're running/building Lustre on ZFS and I noticed that when using mkfs.lustre on a zpool, when creating the MDT, with two --mgsnid parameters, one for the MGS and one for the MGS fail over, causes a problem resulting in not being able to mount the MDT. (I think it tries to connect to the fail over instead of the actual MGS) In ldiskfs it just works and I can mount the MDT. For MDT/MDS fail over, is it enough to just specify the --failnode parameter or does the --mgsnid parameter need to be specified too? thanks, Ron ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] [EXTERNAL] Lustre on ZFS MDS/MDT failover
Oh - BTW. You will need to do the same thing with your OSTs for setting both the mgsnodes. Also, you can use zfs show zpool name/zpool volume to get the same info as you would with tunefs.lustre Joe Mervini Sandia National Laboratories High Performance Computing 505.844.6770 jame...@sandia.gov On Dec 1, 2014, at 11:38 AM, Joe Mervini jame...@sandia.gov wrote: I just ran into this same issue last week. There is a JIRA ticket on it at Intel but in a nutshell mkfs.lustre on zfs will only record the last mgsnode you specify in your command. To add an additional fail node you can use the zfs command to update the configuration: zfs set lustre:failover.node=mgsnode1@network:mgsnode2@network zpool name/zpool volume Hope this helps. Joe Mervini Sandia National Laboratories High Performance Computing 505.844.6770 jame...@sandia.gov On Dec 1, 2014, at 10:41 AM, Ron Croonenberg r...@lanl.gov wrote: Hello, We're running/building Lustre on ZFS and I noticed that when using mkfs.lustre on a zpool, when creating the MDT, with two --mgsnid parameters, one for the MGS and one for the MGS fail over, causes a problem resulting in not being able to mount the MDT. (I think it tries to connect to the fail over instead of the actual MGS) In ldiskfs it just works and I can mount the MDT. For MDT/MDS fail over, is it enough to just specify the --failnode parameter or does the --mgsnid parameter need to be specified too? thanks, Ron ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Need help
Hi, I just upgraded our servers from RHEL 5.4 - RHEL 5.5 and went from lustre 1.8.3 to 1.8.5. Now when I try to mount the OSTs I'm getting: [root@aoss1 ~]# mount -t lustre /dev/disk/by-label/scratch2-OST0001 /mnt/lustre/local/scratch2-OST0001 mount.lustre: mount /dev/disk/by-label/scratch2-OST0001 at /mnt/lustre/local/scratch2-OST0001 failed: No such file or directory Is the MGS specification correct? Is the filesystem name correct? If upgrading, is the copied client log valid? (see upgrade docs) tunefs.lustre looks okay on both the MDT (which is mounted) and the OSTs: [root@amds1 ~]# tunefs.lustre /dev/disk/by-label/scratch2-MDT checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: scratch2-MDT Index: 0 Lustre FS: scratch2 Mount type: ldiskfs Flags: 0x5 (MDT MGS ) Persistent mount opts: errors=panic,iopen_nopriv,user_xattr,maxdirsize=2000 Parameters: lov.stripecount=4 failover.node=failnode@tcp1 failover.node=failnode@o2ib1 mdt.group_upcall=/usr/sbin/l_getgroups Permanent disk data: Target: scratch2-MDT Index: 0 Lustre FS: scratch2 Mount type: ldiskfs Flags: 0x5 (MDT MGS ) Persistent mount opts: errors=panic,iopen_nopriv,user_xattr,maxdirsize=2000 Parameters: lov.stripecount=4 failover.node=failnode@tcp1 failover.node=failnode@o2ib1 mdt.group_upcall=/usr/sbin/l_getgroups exiting before disk write. [root@aoss1 ~]# tunefs.lustre /dev/disk/by-label/scratch2-OST0001 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: scratch2-OST0001 Index: 1 Lustre FS: scratch2 Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=panic,extents,mballoc Parameters: mgsnode=mds-server1@tcp1 mgsnode=mds-server1@o2ib1 mgsnode=mds-server2@tcp1 mgsnode=mds-server2@o2ib1 failover.node=failnode@tcp1 failover.node=failnode@o2ib1 Permanent disk data: Target: scratch2-OST0001 Index: 1 Lustre FS: scratch2 Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=panic,extents,mballoc Parameters: mgsnode=mds-server1@tcp1 mgsnode=mds-server1@o2ib1 mgsnode=mds-server2@tcp1 mgsnode=mds-server2@o2ib1 failover.node=falnode@tcp1 failover.node=failnode@o2ib1 exiting before disk write. I am really stuck and could really use some help. Thanks. == Joe Mervini Sandia National Laboratories Dept 09326 PO Box 5800 MS-0823 Albuquerque NM 87185-0823 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] What exactly is punch statistic?
Hi Cliff - it's been a long time... It seems to me that file truncation is a bad thing - i.e., file writes are not completed. Is this an invalid assessment? Thanks, Joe == Joe Mervini Sandia National Laboratories Dept 09326 PO Box 5800 MS-0823 Albuquerque NM 87185-0823 On Jun 16, 2011, at 12:52 PM, Cliff White wrote: It is called when truncating a file - afaik it is showing you the number of truncates, more or less. cliffw On Thu, Jun 16, 2011 at 10:52 AM, Mervini, Joseph A jame...@sandia.govmailto:jame...@sandia.gov wrote: Hi, I have been covertly trying for a long time to find out what punch means as far a lustre llobdstat output but have not really found anything definitive. Can someone answer that for me? (BTW: I am not alone in my ignorance... :) ) Thanks. Joe Mervini Sandia National Laboratories High Performance Computing 505.844.6770 jame...@sandia.govmailto:jame...@sandia.gov ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.orgmailto:Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- cliffw Support Guy WhamCloud, Inc. www.whamcloud.comhttp://www.whamcloud.com/ ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] What exactly is punch statistic?
Hi, I have been covertly trying for a long time to find out what punch means as far a lustre llobdstat output but have not really found anything definitive. Can someone answer that for me? (BTW: I am not alone in my ignorance... :) ) Thanks. Joe Mervini Sandia National Laboratories High Performance Computing 505.844.6770 jame...@sandia.gov ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] OST threads
That could be awful handy - especially when trying to tune a live file system for performance. Is that going to be a 2.0 only enhancement or can it be applied to existing 1.8 versions? On Feb 24, 2011, at 9:19 PM, Andreas Dilger wrote: Yes, this can be set at startup time to limit the number of started threads. There is a patch I wrote to also reduce the number of running treads but it wasn't landed yet. Cheers, Andreas On 2011-02-24, at 14:04, Mervini, Joseph A jame...@sandia.gov wrote: I'm inclined to agree. So apparently the only time that modifying the runtime max values has a benefit is while the threads_started is low? Joe Joe Mervini Sandia National Laboratories High Performance Computing 505.844.6770 jame...@sandia.gov On Feb 24, 2011, at 1:09 PM, Kevin Van Maren wrote: However, I don't think you can decrease the number of running threads. See https://bugzilla.lustre.org/show_bug.cgi?id=22417 (and also https://bugzilla.lustre.org/show_bug.cgi?id=22516 ) Kevin Mervini, Joseph A wrote: Cool! Thank you Johann. Joe Mervini Sandia National Laboratories High Performance Computing 505.844.6770 jame...@sandia.gov On Feb 24, 2011, at 11:05 AM, Johann Lombardi wrote: On Thu, Feb 24, 2011 at 10:48:32AM -0700, Mervini, Joseph A wrote: Quick question: Has runtime modification of the number of OST threads been implemented in Lustre-1.8.3? Yes, see bugzilla ticket 18688. It was landed in 1.8.1. Cheers, Johann -- Johann Lombardi Whamcloud, Inc. www.whamcloud.com ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss == Joe Mervini Sandia National Laboratories Dept 09326 PO Box 5800 MS-0823 Albuquerque NM 87185-0823 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] OST threads
Quick question: Has runtime modification of the number of OST threads been implemented in Lustre-1.8.3? Joe Mervini Sandia National Laboratories High Performance Computing 505.844.6770 jame...@sandia.gov ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] OST threads
Cool! Thank you Johann. Joe Mervini Sandia National Laboratories High Performance Computing 505.844.6770 jame...@sandia.gov On Feb 24, 2011, at 11:05 AM, Johann Lombardi wrote: On Thu, Feb 24, 2011 at 10:48:32AM -0700, Mervini, Joseph A wrote: Quick question: Has runtime modification of the number of OST threads been implemented in Lustre-1.8.3? Yes, see bugzilla ticket 18688. It was landed in 1.8.1. Cheers, Johann -- Johann Lombardi Whamcloud, Inc. www.whamcloud.com ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] OST threads
I'm inclined to agree. So apparently the only time that modifying the runtime max values has a benefit is while the threads_started is low? Joe Joe Mervini Sandia National Laboratories High Performance Computing 505.844.6770 jame...@sandia.gov On Feb 24, 2011, at 1:09 PM, Kevin Van Maren wrote: However, I don't think you can decrease the number of running threads. See https://bugzilla.lustre.org/show_bug.cgi?id=22417 (and also https://bugzilla.lustre.org/show_bug.cgi?id=22516 ) Kevin Mervini, Joseph A wrote: Cool! Thank you Johann. Joe Mervini Sandia National Laboratories High Performance Computing 505.844.6770 jame...@sandia.gov On Feb 24, 2011, at 11:05 AM, Johann Lombardi wrote: On Thu, Feb 24, 2011 at 10:48:32AM -0700, Mervini, Joseph A wrote: Quick question: Has runtime modification of the number of OST threads been implemented in Lustre-1.8.3? Yes, see bugzilla ticket 18688. It was landed in 1.8.1. Cheers, Johann -- Johann Lombardi Whamcloud, Inc. www.whamcloud.com ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] sanity check
Hoping for a quick sanity check: I have migrated all the files that were on a damaged OST and have recreated the software raid array and put a lustre file system on it. I am now at the point where I want to re-introduce it to the scratch file system as if it was never gone. I used: tunefs.lustre --index=27 /dev/md4 to get the right index for the file system (the information is below). I just want to make sure there is nothing else I need to do before I pull the trigger will mounting it. (The things that have me concerned are the differences in the flags, and less so the OST first_time update.) pre rebuild [r...@oss-scratch obdfilter]# tunefs.lustre /dev/md4 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: scratch1-OST001b Index: 27 Lustre FS: scratch1 Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib failover.node=10.10.10...@o2ib Permanent disk data: Target: scratch1-OST001b Index: 27 Lustre FS: scratch1 Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib failover.node=10.10.10...@o2ib exiting before disk write. after reformat and tunefs [r...@oss-scratch obdfilter]# tunefs.lustre --dryrun /dev/md4 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: scratch1-OST001b Index: 27 Lustre FS: scratch1 Mount type: ldiskfs Flags: 0x62 (OST first_time update ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib failover.node=10.10.10...@o2ib Permanent disk data: Target: scratch1-OST001b Index: 27 Lustre FS: scratch1 Mount type: ldiskfs Flags: 0x62 (OST first_time update ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib failover.node=10.10.10...@o2ib exiting before disk write. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] sanity check
Andreas, I migrated all the files off the target with lfs_migrate. I didn't realize that I would need to retain any of the ldiskfs data if everything was moved. (I must have misinterpreted your earlier comment.) So this is my current scenario: 1. All data from a failing OST has been migrated to other targets. 2. The original target was recreated via mdadm. 3. mkfs.lustre was run on the recreated target 4. tunefs.lustre was run on the recreated target to set the index to what it was before it was reformatted. 5. No other data from the original target has been retained. Question: Based on the above conditions, what do I need to do to get this OST back into the file system? Thanks in advance. Joe On May 26, 2010, at 1:29 PM, Andreas Dilger wrote: On 2010-05-26, at 13:18, Mervini, Joseph A wrote: I have migrated all the files that were on a damaged OST and have recreated the software raid array and put a lustre file system on it. I am now at the point where I want to re-introduce it to the scratch file system as if it was never gone. I used: tunefs.lustre --index=27 /dev/md4 to get the right index for the file system (the information is below). I just want to make sure there is nothing else I need to do before I pull the trigger will mounting it. (The things that have me concerned are the differences in the flags, and less so the OST first_time update.) The use of tunefs.lustre is not sufficient to make the new OST identical to the previous one. You should also copy the O/0/LAST_ID file, last_rcvd, and mountdata files over, at which point you don't need tunefs.lustre at all. pre rebuild [r...@oss-scratch obdfilter]# tunefs.lustre /dev/md4 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: scratch1-OST001b Index: 27 Lustre FS: scratch1 Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib failover.node=10.10.10...@o2ib Permanent disk data: Target: scratch1-OST001b Index: 27 Lustre FS: scratch1 Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib failover.node=10.10.10...@o2ib exiting before disk write. after reformat and tunefs [r...@oss-scratch obdfilter]# tunefs.lustre --dryrun /dev/md4 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: scratch1-OST001b Index: 27 Lustre FS: scratch1 Mount type: ldiskfs Flags: 0x62 (OST first_time update ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib failover.node=10.10.10...@o2ib Permanent disk data: Target: scratch1-OST001b Index: 27 Lustre FS: scratch1 Mount type: ldiskfs Flags: 0x62 (OST first_time update ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib failover.node=10.10.10...@o2ib exiting before disk write. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Best way to recover an OST
Hi, We encountered a multi-disk failure on one of our mdadm RAID6 8+2 OSTs. 2 drives failed in the array within the space of a couple of hours and were replaced. It is questionable whether both drives are actually bad because we are seeing the same behavior in a test environment where a bad drive is actually causing a good drive to be kicked out of an array. Unfortunately another of the drives encountered IO errors during the resync process and failed causing the array to go out to lunch. The resync process was attempted two times with the same result. Fortunately I am able (at least for now) to assemble the array with the existing 8/10 arrays and am able to fsck, mount via ldiskfs and lustre and am in the process of copying files from the vulnerable OST to a backup location using lfs find --obd target /scratch|cpio -puvdm ... My question is: What is the best way to restore the OST? Obviously I will need to somehow restore the array to its full 8+2 configuration. Whether we need to start from scratch or use some other means, that is our first priority. But I would like to make the recovery as transparent to the users as possible. One possible option that we are considering is simply removing the OST from Lustre, fixing the array and copying the recovered files to a newly created OST (not desirable). Another is to fix the OST (not remove it from Lustre), delete the files that exist and then copy the recovered files back. The problem that comes to mind in either scenario is what happens if a file is part of a striped file? Does it lose its affinity with the rest of the stripe? Another scenario that we are wondering about is if we mount the OST via ldiskfs and copy everything on the file system to a backup location, fix the array maintaining the same tunefs.lustre configuration, then move everything back using the same method as it was backed up, will the files be presented to lustre (mds and clients) just as it was before when mounted as a lustre file system? Thanks in advance for you advise and help. Joe Mervini Sandia National Laboratories ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Removing OSTs
It is possible but it's painful and probably depends on the reason. I had a situation a while back where the script I was using to mkfs.lustre had the wrong fsname applied and as a result added the OST to the wrong lustre file system. After realizing my mistake I backed out and reformatted the OSTs with the right parameters. The problem was that file system still had those OSTs registered and produced lots of errors. I tried to fix it by doing things like deactivate the OSTs, etc., but nothing worked. My only option appeared to be to reformat the entire file system. However, since I had nothing to lose I tried and successfully used tunefs.lustre --writeconf on the OSTs and MDT to remove and re-registration the OSTs with the MDS and everything came back fine. That being said, there was no data on the OSTs that I removed. I can't say what might happen to the file system if there is. -Original Message- From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Mag Gam Sent: Monday, November 02, 2009 7:15 PM To: lustre-discuss@lists.lustre.org Subject: [Lustre-discuss] Removing OSTs Is it possible to remove a OST permanently on 1.8.x? ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] WARNING: data corruption issue found in 1.8.x releases
I'm not really sure why writethrough_cache_enable is being disabled but the method we have used to disable the read_cache_enable is echo 0 /proc/fs/lustre/obdfilter/ost name/read_cache_enable without any issues. -Original Message- From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Charles A. Taylor Sent: Wednesday, September 09, 2009 12:07 PM To: Johann Lombardi Cc: lustre-discuss@lists.lustre.org discuss Subject: Re: [Lustre-discuss] WARNING: data corruption issue found in 1.8.x releases Just for the record, we've been running 1.8.1 for a several weeks now with no problems. Well, truthfully, no problems is an exaggeration but it is mostly working. We see lots of log messages we are not used to regarding client and server csum differences. Anyway, your email concerned us so we issued the recommended commands on our OSSs to disable the caching. That promptly crashed two of our OSSs. We got the servers back up and after fsck'ing (fsck.ext4) all the OSTs and remounting lustre, one of the two OSSs promptly crashed again. We're still working through it but we weren't having any problems - or at least none we were aware of - until we disabled the caching. Maybe we were already doomed - I don't know. Right now I'm kind of wishing we had moved to 1.6.7.2 rather than 1.8.0.1/1.8.1. I think we got overconfident after running 1.6.4.2 for so long with so few problems. Charlie Taylor UF HPC Center On Wed, 2009-09-09 at 17:00 +0200, Johann Lombardi wrote: A bug has been identified in the 1.8 releases (1.8.0, 1.8.0.1 1.8.1 are impacted) that can cause data corruption on the OSTs. This problem is related to the OSS read cache feature that has been introduced in 1.8.0. This can happen when a bulk read or write request is aborted due to the client being evicted or because the data transfer over the network has timed out. More details are available in bug 20560: https://bugzilla.lustre.org/show_bug.cgi?id=20560 A patch is under testing and will be included in 1.8.1.1. Until 1.8.1.1 is available, we recommend to disable the OSS read cache feature. This feature can be disabled by running the two following commands on the OSSs: # lctl set_param obdfilter.*.writethrough_cache_enable=0 # lctl set_param obdfilter.*.read_cache_enable=0 This has to be done each time an OST is restarted. Best regards, Johann, for the Lustre team ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss