Re: [Lustre-discuss] Permanently removing an OST
To clarify: would still have some vestiges of the old OST, and would have to follow the other procedure if the OST index is reused, but the writeconf should remove all mention of the OST from "lctl dl", right? Kevin Van Maren wrote: > Andreas, > > This isn't the same as the similar thread, where an OST is being > replaced (keeping the same number). Doesn't he just have to re-do the > writeconf, to delete references to the OST in the MGS, as in Bug 22283? > > There will remain a gap in the OST numbers, but that should be okay if > there are no objects, right? > > Kevin > > > On May 26, 2010, at 6:22 PM, Florent Parent > wrote: > > >> On Wed, May 26, 2010 at 19:08, Andreas Dilger > >>> wrote: >>> On 2010-05-26, at 16:49, Florent Parent wrote: >>> on MGS: lctl conf_param osc.lustre1-OST002f-osc.active.osc.active=0 >>> If you specified the above, it is possible that you only >>> deactivated it on the MDS, not on the clients as well. >>> >> Right. It was executed on all clients as well. >> >> Many days later, and even following a complete server/clients reboot, we are now seeing this target being active on clients: on MDT: [r...@mds2 ~]# lctl get_param osc.lustre1-OST002f-osc.active osc.lustre1-OST002f-osc.active=0 [r...@mds2 ~]# lctl dl|grep 002f 51 UP osc lustre1-OST002f-osc lustre1-mdtlov_UUID 5 >>> The device is configured, but if it is not active it will not be >>> used for anything. >>> >>> on client: # ssh r101-n33 lctl dl |grep 002f 50 UP osc lustre1-OST002f-osc-810377354000 ef3f455a-7f67-134e-cf38-bcc0d9b89f26 4 >>> What does "active" report for this OSC on a client? >>> >> Shows 0 (I don't know why we are seeing a double entry here). So I >> guess it's inactive. I was under the impression references to the OST >> would go away. It's also confusing to have the OST show as UP in "lctl >> dl". >> >> # ssh r101-n33 lctl get_param osc.lustre1-OST002f-osc*.active >> osc.lustre1-OST002f-osc-810371ac3c00.active=0 >> osc.lustre1-OST002f-osc-810377354000.active=0 >> >> Thanks >> Florent >> ___ >> Lustre-discuss mailing list >> Lustre-discuss@lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> > ___ > Lustre-discuss mailing list > Lustre-discuss@lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Permanently removing an OST
Andreas, This isn't the same as the similar thread, where an OST is being replaced (keeping the same number). Doesn't he just have to re-do the writeconf, to delete references to the OST in the MGS, as in Bug 22283? There will remain a gap in the OST numbers, but that should be okay if there are no objects, right? Kevin On May 26, 2010, at 6:22 PM, Florent Parent wrote: > On Wed, May 26, 2010 at 19:08, Andreas Dilger > wrote: >> On 2010-05-26, at 16:49, Florent Parent wrote: >>> >>> on MGS: >>> lctl conf_param osc.lustre1-OST002f-osc.active.osc.active=0 >> >> If you specified the above, it is possible that you only >> deactivated it on the MDS, not on the clients as well. > > Right. It was executed on all clients as well. > >> >>> Many days later, and even following a complete server/clients >>> reboot, >>> we are now seeing this target being active on clients: >>> >>> on MDT: >>> [r...@mds2 ~]# lctl get_param osc.lustre1-OST002f-osc.active >>> osc.lustre1-OST002f-osc.active=0 >>> [r...@mds2 ~]# lctl dl|grep 002f >>> 51 UP osc lustre1-OST002f-osc lustre1-mdtlov_UUID 5 >> >> The device is configured, but if it is not active it will not be >> used for anything. >> >>> on client: >>> # ssh r101-n33 lctl dl |grep 002f >>> 50 UP osc lustre1-OST002f-osc-810377354000 >>> ef3f455a-7f67-134e-cf38-bcc0d9b89f26 4 >> >> What does "active" report for this OSC on a client? > > Shows 0 (I don't know why we are seeing a double entry here). So I > guess it's inactive. I was under the impression references to the OST > would go away. It's also confusing to have the OST show as UP in "lctl > dl". > > # ssh r101-n33 lctl get_param osc.lustre1-OST002f-osc*.active > osc.lustre1-OST002f-osc-810371ac3c00.active=0 > osc.lustre1-OST002f-osc-810377354000.active=0 > > Thanks > Florent > ___ > Lustre-discuss mailing list > Lustre-discuss@lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Permanently removing an OST
On Wed, May 26, 2010 at 19:08, Andreas Dilger wrote: > On 2010-05-26, at 16:49, Florent Parent wrote: >> >> on MGS: >> lctl conf_param osc.lustre1-OST002f-osc.active.osc.active=0 > > If you specified the above, it is possible that you only deactivated it on > the MDS, not on the clients as well. Right. It was executed on all clients as well. > >> Many days later, and even following a complete server/clients reboot, >> we are now seeing this target being active on clients: >> >> on MDT: >> [r...@mds2 ~]# lctl get_param osc.lustre1-OST002f-osc.active >> osc.lustre1-OST002f-osc.active=0 >> [r...@mds2 ~]# lctl dl|grep 002f >> 51 UP osc lustre1-OST002f-osc lustre1-mdtlov_UUID 5 > > The device is configured, but if it is not active it will not be used for > anything. > >> on client: >> # ssh r101-n33 lctl dl |grep 002f >> 50 UP osc lustre1-OST002f-osc-810377354000 >> ef3f455a-7f67-134e-cf38-bcc0d9b89f26 4 > > What does "active" report for this OSC on a client? Shows 0 (I don't know why we are seeing a double entry here). So I guess it's inactive. I was under the impression references to the OST would go away. It's also confusing to have the OST show as UP in "lctl dl". # ssh r101-n33 lctl get_param osc.lustre1-OST002f-osc*.active osc.lustre1-OST002f-osc-810371ac3c00.active=0 osc.lustre1-OST002f-osc-810377354000.active=0 Thanks Florent ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Permanently removing an OST
On 2010-05-26, at 16:49, Florent Parent wrote: > A while ago, we experienced multi disk failures on a raid6 ost. We > managed to migrate some data off the OST (lfs_migrate), and the > process was long (software raid was often failing). > > We reconstructed the target from scratch, which introduced a new OST. > Following the Lustre documentation on "Removing an OST from the File > System", we used the following procedure to permanently remove the old > OST: > > on MGS: > lctl conf_param osc.lustre1-OST002f-osc.active.osc.active=0 If you specified the above, it is possible that you only deactivated it on the MDS, not on the clients as well. > Many days later, and even following a complete server/clients reboot, > we are now seeing this target being active on clients: > > on MDT: > [r...@mds2 ~]# lctl get_param osc.lustre1-OST002f-osc.active > osc.lustre1-OST002f-osc.active=0 > [r...@mds2 ~]# lctl dl|grep 002f > 51 UP osc lustre1-OST002f-osc lustre1-mdtlov_UUID 5 The device is configured, but if it is not active it will not be used for anything. > on client: > # ssh r101-n33 lctl dl |grep 002f > 50 UP osc lustre1-OST002f-osc-810377354000 > ef3f455a-7f67-134e-cf38-bcc0d9b89f26 4 What does "active" report for this OSC on a client? Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Permanently removing an OST
Hi, A while ago, we experienced multi disk failures on a raid6 ost. We managed to migrate some data off the OST (lfs_migrate), and the process was long (software raid was often failing). We reconstructed the target from scratch, which introduced a new OST. Following the Lustre documentation on "Removing an OST from the File System", we used the following procedure to permanently remove the old OST: on MGS: lctl conf_param osc.lustre1-OST002f-osc.active.osc.active=0 Many days later, and even following a complete server/clients reboot, we are now seeing this target being active on clients: on MDT: [r...@mds2 ~]# lctl get_param osc.lustre1-OST002f-osc.active osc.lustre1-OST002f-osc.active=0 [r...@mds2 ~]# lctl dl|grep 002f 51 UP osc lustre1-OST002f-osc lustre1-mdtlov_UUID 5 on client: # ssh r101-n33 lctl dl |grep 002f 50 UP osc lustre1-OST002f-osc-810377354000 ef3f455a-7f67-134e-cf38-bcc0d9b89f26 4 What are we missing from the procedure here? I'm really looking at *permanently* disabling OST from Lustre. Thanks for any pointers. Florent ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] sanity check
On 2010-05-26, at 13:47, Mervini, Joseph A wrote: > I migrated all the files off the target with lfs_migrate. I didn't realize > that I would need to retain any of the ldiskfs data if everything was moved. > (I must have misinterpreted your earlier comment.) > > So this is my current scenario: > > 1. All data from a failing OST has been migrated to other targets. > 2. The original target was recreated via mdadm. > 3. mkfs.lustre was run on the recreated target > 4. tunefs.lustre was run on the recreated target to set the index to what it > was before it was reformatted. > 5. No other data from the original target has been retained. > > Question: > > Based on the above conditions, what do I need to do to get this OST back into > the file system? Lustre is fairly robust about handling situations like this (e.g. recreating the last_rcvd file, the object heirarchy O/0/d{0..31}, etc). The one item that it will need help with is to recreate the LAST_ID file on the OST. You can do this by hand by extracting the last-precreated object from the MDS, and writing the LAST_ID file on the OST: # extract last allocated object for all OSTs mds# debugfs -c -R "dump lov_objids /tmp/lo" # cut out the last allocated object for this OST index mds# dd if=/tmp/lo of=/tmp/LAST_ID bs=8 skip=${OST index NN} count=1 # verify value is the right one (LAST_ID = next_id - 1) mds# lctl get_param osc.*OST00NN.prealloc_next_id # NN is OST index mds# od -td8 /tmp/LAST_ID # get OST filesystem ready for this value ossN# mount -t ldiskfs /dev/{ostdev} /mnt/tmp ossN# mkdir -p /mnt/tmp/O/0 mds# scp /tmp/LAST_ID ossN:/mnt/tmp/O/0/LAST_ID This will avoid the OST trying to recreate thousands/millions of objects when the OST next reconnects. This could probably be handled internally by the OST, by simply bumping the LAST_ID value in the case that it is currently < 2 and the MDS is requesting some large value. > On May 26, 2010, at 1:29 PM, Andreas Dilger wrote: > >> On 2010-05-26, at 13:18, Mervini, Joseph A wrote: >>> I have migrated all the files that were on a damaged OST and have recreated >>> the software raid array and put a lustre file system on it. >>> >>> I am now at the point where I want to re-introduce it to the scratch file >>> system as if it was never gone. I used: >>> >>> tunefs.lustre --index=27 /dev/md4 to get the right index for the file >>> system (the information is below). I just want to make sure there is >>> nothing else I need to do before I pull the trigger will mounting it. (The >>> things that have me concerned are the differences in the flags, and less so >>> the "OST first_time update.) >> >> The use of tunefs.lustre is not sufficient to make the new OST identical to >> the previous one. You should also copy the O/0/LAST_ID file, last_rcvd, and >> mountdata files over, at which point you don't need tunefs.lustre at all. >> >>> >>> >>> [r...@oss-scratch obdfilter]# tunefs.lustre /dev/md4 >>> checking for existing Lustre data: found CONFIGS/mountdata >>> Reading CONFIGS/mountdata >>> >>> Read previous values: >>> Target: scratch1-OST001b >>> Index: 27 >>> Lustre FS: scratch1 >>> Mount type: ldiskfs >>> Flags: 0x2 >>> (OST ) >>> Persistent mount opts: errors=remount-ro,extents,mballoc >>> Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib >>> failover.node=10.10.10...@o2ib >>> >>> >>> Permanent disk data: >>> Target: scratch1-OST001b >>> Index: 27 >>> Lustre FS: scratch1 >>> Mount type: ldiskfs >>> Flags: 0x2 >>> (OST ) >>> Persistent mount opts: errors=remount-ro,extents,mballoc >>> Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib >>> failover.node=10.10.10...@o2ib >>> >>> exiting before disk write. >>> >>> >>> >>> >>> [r...@oss-scratch obdfilter]# tunefs.lustre --dryrun /dev/md4 >>> checking for existing Lustre data: found CONFIGS/mountdata >>> Reading CONFIGS/mountdata >>> >>> Read previous values: >>> Target: scratch1-OST001b >>> Index: 27 >>> Lustre FS: scratch1 >>> Mount type: ldiskfs >>> Flags: 0x62 >>> (OST first_time update ) >>> Persistent mount opts: errors=remount-ro,extents,mballoc >>> Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib >>> failover.node=10.10.10...@o2ib >>> >>> >>> Permanent disk data: >>> Target: scratch1-OST001b >>> Index: 27 >>> Lustre FS: scratch1 >>> Mount type: ldiskfs >>> Flags: 0x62 >>> (OST first_time update ) >>> Persistent mount opts: errors=remount-ro,extents,mballoc >>> Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib >>> failover.node=10.10.10...@o2ib >>> >>> exiting before disk write. >>> >>> >>> ___ >>> Lustre-discuss mailing list >>> Lustre-discuss@lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> >> Cheers, Andreas >> -- >> Andreas Dilger >> Lustre Technical Lead >> Oracle Corporation Canada Inc. >> >>
Re: [Lustre-discuss] sanity check
Andreas, I migrated all the files off the target with lfs_migrate. I didn't realize that I would need to retain any of the ldiskfs data if everything was moved. (I must have misinterpreted your earlier comment.) So this is my current scenario: 1. All data from a failing OST has been migrated to other targets. 2. The original target was recreated via mdadm. 3. mkfs.lustre was run on the recreated target 4. tunefs.lustre was run on the recreated target to set the index to what it was before it was reformatted. 5. No other data from the original target has been retained. Question: Based on the above conditions, what do I need to do to get this OST back into the file system? Thanks in advance. Joe On May 26, 2010, at 1:29 PM, Andreas Dilger wrote: > On 2010-05-26, at 13:18, Mervini, Joseph A wrote: >> I have migrated all the files that were on a damaged OST and have recreated >> the software raid array and put a lustre file system on it. >> >> I am now at the point where I want to re-introduce it to the scratch file >> system as if it was never gone. I used: >> >> tunefs.lustre --index=27 /dev/md4 to get the right index for the file system >> (the information is below). I just want to make sure there is nothing else I >> need to do before I pull the trigger will mounting it. (The things that have >> me concerned are the differences in the flags, and less so the "OST >> first_time update.) > > The use of tunefs.lustre is not sufficient to make the new OST identical to > the previous one. You should also copy the O/0/LAST_ID file, last_rcvd, and > mountdata files over, at which point you don't need tunefs.lustre at all. > >> >> >> [r...@oss-scratch obdfilter]# tunefs.lustre /dev/md4 >> checking for existing Lustre data: found CONFIGS/mountdata >> Reading CONFIGS/mountdata >> >> Read previous values: >> Target: scratch1-OST001b >> Index: 27 >> Lustre FS: scratch1 >> Mount type: ldiskfs >> Flags: 0x2 >>(OST ) >> Persistent mount opts: errors=remount-ro,extents,mballoc >> Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib >> failover.node=10.10.10...@o2ib >> >> >> Permanent disk data: >> Target: scratch1-OST001b >> Index: 27 >> Lustre FS: scratch1 >> Mount type: ldiskfs >> Flags: 0x2 >>(OST ) >> Persistent mount opts: errors=remount-ro,extents,mballoc >> Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib >> failover.node=10.10.10...@o2ib >> >> exiting before disk write. >> >> >> >> >> [r...@oss-scratch obdfilter]# tunefs.lustre --dryrun /dev/md4 >> checking for existing Lustre data: found CONFIGS/mountdata >> Reading CONFIGS/mountdata >> >> Read previous values: >> Target: scratch1-OST001b >> Index: 27 >> Lustre FS: scratch1 >> Mount type: ldiskfs >> Flags: 0x62 >>(OST first_time update ) >> Persistent mount opts: errors=remount-ro,extents,mballoc >> Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib >> failover.node=10.10.10...@o2ib >> >> >> Permanent disk data: >> Target: scratch1-OST001b >> Index: 27 >> Lustre FS: scratch1 >> Mount type: ldiskfs >> Flags: 0x62 >>(OST first_time update ) >> Persistent mount opts: errors=remount-ro,extents,mballoc >> Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib >> failover.node=10.10.10...@o2ib >> >> exiting before disk write. >> >> >> ___ >> Lustre-discuss mailing list >> Lustre-discuss@lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > Cheers, Andreas > -- > Andreas Dilger > Lustre Technical Lead > Oracle Corporation Canada Inc. > > ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] sanity check
On 2010-05-26, at 13:18, Mervini, Joseph A wrote: > I have migrated all the files that were on a damaged OST and have recreated > the software raid array and put a lustre file system on it. > > I am now at the point where I want to re-introduce it to the scratch file > system as if it was never gone. I used: > > tunefs.lustre --index=27 /dev/md4 to get the right index for the file system > (the information is below). I just want to make sure there is nothing else I > need to do before I pull the trigger will mounting it. (The things that have > me concerned are the differences in the flags, and less so the "OST > first_time update.) The use of tunefs.lustre is not sufficient to make the new OST identical to the previous one. You should also copy the O/0/LAST_ID file, last_rcvd, and mountdata files over, at which point you don't need tunefs.lustre at all. > > > [r...@oss-scratch obdfilter]# tunefs.lustre /dev/md4 > checking for existing Lustre data: found CONFIGS/mountdata > Reading CONFIGS/mountdata > > Read previous values: > Target: scratch1-OST001b > Index: 27 > Lustre FS: scratch1 > Mount type: ldiskfs > Flags: 0x2 > (OST ) > Persistent mount opts: errors=remount-ro,extents,mballoc > Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib > failover.node=10.10.10...@o2ib > > > Permanent disk data: > Target: scratch1-OST001b > Index: 27 > Lustre FS: scratch1 > Mount type: ldiskfs > Flags: 0x2 > (OST ) > Persistent mount opts: errors=remount-ro,extents,mballoc > Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib > failover.node=10.10.10...@o2ib > > exiting before disk write. > > > > > [r...@oss-scratch obdfilter]# tunefs.lustre --dryrun /dev/md4 > checking for existing Lustre data: found CONFIGS/mountdata > Reading CONFIGS/mountdata > > Read previous values: > Target: scratch1-OST001b > Index: 27 > Lustre FS: scratch1 > Mount type: ldiskfs > Flags: 0x62 > (OST first_time update ) > Persistent mount opts: errors=remount-ro,extents,mballoc > Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib > failover.node=10.10.10...@o2ib > > > Permanent disk data: > Target: scratch1-OST001b > Index: 27 > Lustre FS: scratch1 > Mount type: ldiskfs > Flags: 0x62 > (OST first_time update ) > Persistent mount opts: errors=remount-ro,extents,mballoc > Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib > failover.node=10.10.10...@o2ib > > exiting before disk write. > > > ___ > Lustre-discuss mailing list > Lustre-discuss@lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] sanity check
Hoping for a quick sanity check: I have migrated all the files that were on a damaged OST and have recreated the software raid array and put a lustre file system on it. I am now at the point where I want to re-introduce it to the scratch file system as if it was never gone. I used: tunefs.lustre --index=27 /dev/md4 to get the right index for the file system (the information is below). I just want to make sure there is nothing else I need to do before I pull the trigger will mounting it. (The things that have me concerned are the differences in the flags, and less so the "OST first_time update.) [r...@oss-scratch obdfilter]# tunefs.lustre /dev/md4 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: scratch1-OST001b Index: 27 Lustre FS: scratch1 Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib failover.node=10.10.10...@o2ib Permanent disk data: Target: scratch1-OST001b Index: 27 Lustre FS: scratch1 Mount type: ldiskfs Flags: 0x2 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib failover.node=10.10.10...@o2ib exiting before disk write. [r...@oss-scratch obdfilter]# tunefs.lustre --dryrun /dev/md4 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: scratch1-OST001b Index: 27 Lustre FS: scratch1 Mount type: ldiskfs Flags: 0x62 (OST first_time update ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib failover.node=10.10.10...@o2ib Permanent disk data: Target: scratch1-OST001b Index: 27 Lustre FS: scratch1 Mount type: ldiskfs Flags: 0x62 (OST first_time update ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib failover.node=10.10.10...@o2ib exiting before disk write. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Future of lustre 1.8.3+
On 26/05/10 17:25, Ramiro Alba Queipo wrote: > On Wed, 2010-05-26 at 16:48 +0100, Guy Coates wrote: > One thing to watch out for in your kernel configs is to make sure that: CONFIG_SECURITY_FILE_CAPABILITIES=N >>> >>> OK. But the question is if this issue still applies for lustre-1.8.3 and >>> SLES kernel linux-2.6.27.39-0.3.1.tar.bz2. I mean, is quite surprising >>> that if this problems persist, Oracle is offering lustre packages for >>> SLES11 with CONFIG_SECURITY_FILE_CAPABILITIES=y ??? >>> I am just about to start testing, so I'd like to clarify this. >> >> The binary SLES packages are fine; it is the source packages that may be >> problematic, depending on your config. There is a bug filed against this > > Sorry Guy. May be there is something I am missing, but SLES11 rpm kernel > server packages for lustre-1.8.3 are created using a config with > ONFIG_SECURITY_FILE_CAPABILITIES=y (See yourself on the attachement You are entirely correct. Cheers, Guy -- Dr Guy Coates, Informatics System Group The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK Tel: +44 (0)1223 834244 ex 6925 Fax: +44 (0)1223 496802 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] What would happen if
If you are creating files that are a significant fraction of the free space on your OSTs, you _should_ stripe them across multiple OSTs (lfs setstripe). Also note that ENOSPC is likely to be returned before the OST is actually "out" of space: due to the OST pre-granting space to the clients, ENOSPC is returned when (free - grants) = 0. The more Lustre clients you have, the sooner this occurs. See Bug 22498 Additionally, there are some grant leaks that can cause the server's view of the grant to be inflated; see Bug 22755 Kevin Brian J. Murrell wrote: > On Wed, 2010-05-26 at 09:26 -0400, Scott wrote: > >> Hi, >> > > Hi, > > You don't provide any particulars (lustre version, lfs df output, etc.) > so I will answer generically... > > >> I am not using striping. Question, when creating a new file does Lustre try >> to write to the OST >> with the most free space? >> > > That depends on your lustre version and the values > in /proc/fs/lustre/lov/client-clilov-f385fc00/qos_*. Please see the > manual for details on those files and how they affect object allocation, > if you are running a relevant version of Lustre. > > >> Second question, if for whatever reason it picks the OST with 200Gb free and >> i was writing a file >> that would be 500Gb at finish, what would happen when that particular OST >> ran out of space? >> > > The client gets an ENOSPC. > > >> Would >> it leave the file as incomplete and show no free space? >> > > If the client application doesn't do any cleanup on getting an error, > yes. > > b. > > > > > ___ > Lustre-discuss mailing list > Lustre-discuss@lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Future of lustre 1.8.3+
On 26/05/10 16:31, Ramiro Alba Queipo wrote: > Hi Guy, > > On Wed, 2010-05-26 at 14:59 +0100, Guy Coates wrote: > The SLES11 kernel is at 2.6.27 so it could be usable for this. Also, I > Ok, I am getting http://downloads.lustre.org/public/kernels/sles11/linux-2.6.27.39-0.3.1.tar.bz2 but, please. Where can I get a suitable config file to apply both for servers and clients? >> >> One thing to watch out for in your kernel configs is to make sure that: >> >> CONFIG_SECURITY_FILE_CAPABILITIES=N > > OK. But the question is if this issue still applies for lustre-1.8.3 and > SLES kernel linux-2.6.27.39-0.3.1.tar.bz2. I mean, is quite surprising > that if this problems persist, Oracle is offering lustre packages for > SLES11 with CONFIG_SECURITY_FILE_CAPABILITIES=y ??? > I am just about to start testing, so I'd like to clarify this. The binary SLES packages are fine; it is the source packages that may be problematic, depending on your config. There is a bug filed against this now (22913), so no doubt it will be fixed in a subsequent release. In regard to your testing, it is easy to check if a client is mis-behaving; run: #cat /dev/zero > /lustre/filesystem & and watch the client IO stats with: #watch -n 1 cat /proc/fs/lustre/llite/*/stats If getxattr is going up with write_bytes, then you have a problem. Cheers, Guy -- Dr. Guy Coates, Informatics System Group The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK Tel: +44 (0)1223 834244 x 6925 Fax: +44 (0)1223 496802 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] two problems
Hi, My version of Lustre is 1.8.3 My filesystem is composed of one MGS/MDS server and two OSS. By testing, I tried to delete a OST and replace it with another OST and now the situation is this: cat /proc/fs/lustre/lov/lustre01-mdtlov/target_obd 0: lustre01-OST_UUID ACTIVE 2: lustre01-OST0002_UUID ACTIVE - first problem lustre01-OST0001_UUID ACTIVE is the OST was canceled and it had files, which of course now there are not more: ls -lrt total 12475312 ?- ? ?? ?? zero.dat ?- ? ?? ?? ubuntu-9.10-dvd-i386.iso ?- ? ?? ?? X_CentOS-5.4- x86_64-bin-DVD.iso ?- ? ?? ?? Windows_XP-Capodarco.iso ?- ? ?? ?? UBUNTU_CentOS-5.4- x86_64-bin-DVD.iso ?- ? ?? ?? KK_CentOS-5.4-x86_64- bin-DVD.iso ?- ? ?? ?? F_CentOS-5.4-x86_64- bin-DVD.iso ?- ? ?? ?? CentOS-5.3-i386-bin- DVD.iso ?- ? ?? ?? B_CentOS-5.4-x86_64- bin-DVD.iso ?- ? ?? ?? BAK_CentOS-5.4-x86_64- bin-DVD.iso ?- ? ?? ?? 2.iso I to delete them, follow these steps: on MGS/MDS server: e2fsck -n -v --mdsdb /root/mds_home_db /dev/mpath/mpath2 copy the file mds_home_db on OSS_1 and, one OSS_1 launch the following command: e2fsck -n -v --mdsdb /root/mds_home_db --ostdb /root/home_ost00db /dev/ mpath/mpath1 and do the same thing on the OSS_2: e2fsck -n -v --mdsdb /root/mds_home_db --ostdb /root/home_ost01db /dev/ mpath/mpath2 then copy the files mds_home_db, home_ost00db and home_ost01db on the Lustre Client, mount the lustre filesystem and run the commnand: lfsck -c -v --mdsdb /root/mds_home_db --ostdb /root/home_ost00db /root/ home_ost02db /LUSTRE but the command hangs: . . . . [0] zero-length orphan objid 1182 [0] zero-length orphan objid 1214 [0] zero-length orphan objid 1246 [0] zero-length orphan objid 1183 [0] zero-length orphan objid 1215 [0] zero-length orphan objid 1247 lfsck: ost_idx 0: pass3 OK (218 files total) MDS: max_id 161 OST: max_id 65 lfsck: ost_idx 1: pass1: check for duplicate objects lfsck: ost_idx 1: pass1 OK (11 files total) lfsck: ost_idx 1: pass2: check for missing inode objects and the server MGS/MDS go to in Kernel Panic and the Lustre Client log say: May 26 17:39:35 mdt02prdpom kernel: LustreError: 7105:0:(lov_ea.c: 248:lsm_unpackmd_v1()) OST index 1 missing May 26 17:39:35 mdt02prdpom kernel: LustreError: 7105:0:(lov_ea.c: 248:lsm_unpackmd_v1()) Skipped 21 previous similar messages May 26 17:39:35 mdt02prdpom kernel: Lustre: 7105:0:(lov_pack.c: 64:lov_dump_lmm_common()) objid 0x1b20003, magic 0x0bd10bd0, pattern 0x1 May 26 17:39:35 mdt02prdpom kernel: Lustre: 7105:0:(lov_pack.c: 67:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 1 May 26 17:39:35 mdt02prdpom kernel: Lustre: 7105:0:(lov_pack.c: 84:lov_dump_lmm_objects()) stripe 0 idx 1 subobj 0x0/0x2 May 26 17:39:35 mdt02prdpom kernel: Lustre: 7105:0:(lov_pack.c: 64:lov_dump_lmm_common()) objid 0x1b20005, magic 0x0bd10bd0, pattern 0x1 May 26 17:39:35 mdt02prdpom kernel: Lustre: 7105:0:(lov_pack.c: 67:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 1 May 26 17:39:35 mdt02prdpom kernel: Lustre: 7105:0:(lov_pack.c: 84:lov_dump_lmm_objects()) stripe 0 idx 1 subobj 0x0/0x3 May 26 17:39:35 mdt02prdpom kernel: Lustre: 7105:0:(lov_pack.c: 64:lov_dump_lmm_common()) objid 0x1b20006, magic 0x0bd10bd0, pattern 0x1 May 26 17:39:35 mdt02prdpom kernel: Lustre: 7105:0:(lov_pack.c: 67:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 1 May 26 17:39:35 mdt02prdpom kernel: Lustre: 7105:0:(lov_pack.c: 84:lov_dump_lmm_objects()) stripe 0 idx 1 subobj 0x0/0x4 May 26 17:39:35 mdt02prdpom kernel: Lustre: 7105:0:(lov_pack.c: 64:lov_dump_lmm_common()) objid 0x1b20008, magic 0x0bd10bd0, pattern 0x1 May 26 17:39:35 mdt02prdpom kernel: Lustre: 7105:0:(lov_pack.c: 67:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 1 May 26 17:39:35 mdt02prdpom kernel: Lustre: 7105:0:(lov_pack.c: 84:lov_dump_lmm_objects()) stripe 0 idx 1 subobj 0x0/0x5 May 26 17:39:35 mdt02prdpom kernel: Lustre: 7105:0:(lov_pack.c: 64:lov_dump_lmm_common()) objid 0x1b2000a, magic 0x0bd10bd0, pattern 0x1 May 26 17:39:35 mdt02prdpom kernel: Lustre: 7105:0:(lov_pack.c: 67:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 1 May 26 17:39:35 mdt02prdpom kernel: Lustre: 7105:0:(lov_pack.c: 84:lov_dump_lmm_objects()) stripe 0 idx 1 subobj 0x0/0x6 May 26 17:39:35 mdt02prdpom kernel: Lustre: 7105:0:(lov_pack.c: 64:lov_dump_lmm_common()) objid 0x1b2000c, magic 0x0bd10bd0, pattern 0x1 May 26 17:39:35 mdt02prdpom kernel: Lustre: 7105:0:(lov_pack.c: 67:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 1 May 26 17:39:35 m
Re: [Lustre-discuss] Future of lustre 1.8.3+
Hi Guy, On Wed, 2010-05-26 at 14:59 +0100, Guy Coates wrote: > >> > >>> The SLES11 kernel is at 2.6.27 so it could be usable for this. Also, I > >>> > >> Ok, I am getting > >> http://downloads.lustre.org/public/kernels/sles11/linux-2.6.27.39-0.3.1.tar.bz2 > >> > >> but, please. Where can I get a suitable config file to apply both for > >> servers and clients? > > One thing to watch out for in your kernel configs is to make sure that: > > CONFIG_SECURITY_FILE_CAPABILITIES=N OK. But the question is if this issue still applies for lustre-1.8.3 and SLES kernel linux-2.6.27.39-0.3.1.tar.bz2. I mean, is quite surprising that if this problems persist, Oracle is offering lustre packages for SLES11 with CONFIG_SECURITY_FILE_CAPABILITIES=y ??? I am just about to start testing, so I'd like to clarify this. Cheers > > otherwise you will run into: > > https://bugzilla.lustre.org/show_bug.cgi?id=21439 > > (each write call causes 2 getxattr calls, which will pound your MDS into > the ground). > > The SLES11, debian/lenny and ubuntu kernels all have this feature set, > so if you are building clients against those kernels, you may be in > trouble. > > > Cheers, > > Guy > > -- > Dr. Guy Coates, Informatics System Group > The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK > Tel: +44 (0)1223 834244 x 6925 > Fax: +44 (0)1223 496802 > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > -- Ramiro Alba Centre Tecnològic de Tranferència de Calor http://www.cttc.upc.edu Escola Tècnica Superior d'Enginyeries Industrial i Aeronàutica de Terrassa Colom 11, E-08222, Terrassa, Barcelona, Spain Tel: (+34) 93 739 86 46 -- Aquest missatge ha estat analitzat per MailScanner a la cerca de virus i d'altres continguts perillosos, i es considera que està net. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Future of lustre 1.8.3+
On 26/05/10 16:18, Andreas Dilger wrote: > The problem with SELinux is that it is trying to access the security > xattr for each file access but Lustre does not cache xattrs on the client. > > The other main question about SELinux is whether it even makes sense in > a distributed environment. Just to be clear, SELinux was disabled on these machines (selinux=0 kernel option); simply having the kernel-config set still triggers the code path/bug. Cheers, Guy -- Dr. Guy Coates, Informatics System Group The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK Tel: +44 (0)1223 834244 x 6925 Fax: +44 (0)1223 496802 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Future of lustre 1.8.3+
The problem with SELinux is that it is trying to access the security xattr for each file access but Lustre does not cache xattrs on the client. The other main question about SELinux is whether it even makes sense in a distributed environment. For now (see bug) we have just disabled the access to this specific attribute in Lustre. It would be nice if someone with more understanding of SELinux would investigate if there is some global settings file that could be modified to exclude Lustre from the security policy checking, and then we can push this to the upstream distros. Cheers, Andreas On 2010-05-26, at 8:43, Gregory Matthews wrote: > Guy Coates wrote: >> One thing to watch out for in your kernel configs is to make sure >> that: >> >> CONFIG_SECURITY_FILE_CAPABILITIES=N > > I hope this is not the case for the now obsolete: > > CONFIG_EXT3_FS_SECURITY=y > > which appears to be enabled by default on RHEL5.x > > Its not entirely clear to me what this is for but would metadata > performance be better without it? > > GREG > >> >> otherwise you will run into: >> >> https://bugzilla.lustre.org/show_bug.cgi?id=21439 >> >> (each write call causes 2 getxattr calls, which will pound your MDS >> into >> the ground). >> >> The SLES11, debian/lenny and ubuntu kernels all have this feature >> set, >> so if you are building clients against those kernels, you may be in >> trouble. >> >> >> Cheers, >> >> Guy >> > > > -- > Greg Matthews01235 778658 > Senior Computer Systems Administrator > Diamond Light Source, Oxfordshire, UK > ___ > Lustre-discuss mailing list > Lustre-discuss@lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Future of lustre 1.8.3+
Guy Coates wrote: > One thing to watch out for in your kernel configs is to make sure that: > > CONFIG_SECURITY_FILE_CAPABILITIES=N I hope this is not the case for the now obsolete: CONFIG_EXT3_FS_SECURITY=y which appears to be enabled by default on RHEL5.x Its not entirely clear to me what this is for but would metadata performance be better without it? GREG > > otherwise you will run into: > > https://bugzilla.lustre.org/show_bug.cgi?id=21439 > > (each write call causes 2 getxattr calls, which will pound your MDS into > the ground). > > The SLES11, debian/lenny and ubuntu kernels all have this feature set, > so if you are building clients against those kernels, you may be in > trouble. > > > Cheers, > > Guy > -- Greg Matthews01235 778658 Senior Computer Systems Administrator Diamond Light Source, Oxfordshire, UK ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Future of lustre 1.8.3+
On 21/05/10 10:08, Christopher Huhn wrote: > Hi Ramiro, > > Ramiro Alba Queipo wrote: >> On Thu, 2010-05-20 at 10:16 -0600, Andreas Dilger wrote: >> >>> The SLES11 kernel is at 2.6.27 so it could be usable for this. Also, I >>> >> Ok, I am getting >> http://downloads.lustre.org/public/kernels/sles11/linux-2.6.27.39-0.3.1.tar.bz2 >> >> but, please. Where can I get a suitable config file to apply both for >> servers and clients? One thing to watch out for in your kernel configs is to make sure that: CONFIG_SECURITY_FILE_CAPABILITIES=N otherwise you will run into: https://bugzilla.lustre.org/show_bug.cgi?id=21439 (each write call causes 2 getxattr calls, which will pound your MDS into the ground). The SLES11, debian/lenny and ubuntu kernels all have this feature set, so if you are building clients against those kernels, you may be in trouble. Cheers, Guy -- Dr. Guy Coates, Informatics System Group The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK Tel: +44 (0)1223 834244 x 6925 Fax: +44 (0)1223 496802 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] What would happen if
On Wed, 2010-05-26 at 09:26 -0400, Scott wrote: > Hi, Hi, You don't provide any particulars (lustre version, lfs df output, etc.) so I will answer generically... > I am not using striping. Question, when creating a new file does Lustre try > to write to the OST > with the most free space? That depends on your lustre version and the values in /proc/fs/lustre/lov/client-clilov-f385fc00/qos_*. Please see the manual for details on those files and how they affect object allocation, if you are running a relevant version of Lustre. > Second question, if for whatever reason it picks the OST with 200Gb free and > i was writing a file > that would be 500Gb at finish, what would happen when that particular OST ran > out of space? The client gets an ENOSPC. > Would > it leave the file as incomplete and show no free space? If the client application doesn't do any cleanup on getting an error, yes. b. signature.asc Description: This is a digitally signed message part ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] What would happen if
Hi, My Lustre system has 7 OSTs on 3 OSSs. 2 of the OSTs have less then 200Gb free while the other 5 have 3+ TB free. I am not using striping. Question, when creating a new file does Lustre try to write to the OST with the most free space? If not, how does it choose which OST to write to? Second question, if for whatever reason it picks the OST with 200Gb free and i was writing a file that would be 500Gb at finish, what would happen when that particular OST ran out of space? Would it leave the file as incomplete and show no free space? Thanks, trying to debug an odd problem. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] 1.8.1.1 -> 1.8.3 upgrade questions
Nope, wish i had .. "maximal mount count reached" Thanks, Mathias Gustavsson Linux System Manager AstraZeneca R&D Mölndal SE-431 83 Mölndal, Sweden Phone: +46 31 776 12 58 mathias.gustavs...@astrazeneca.com -- Confidentiality Notice: This message is private and may contain confidential and proprietary information. If you have received this message in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the contents of this message is not permitted and may be unlawful. -Original Message- From: turek.wojci...@googlemail.com on behalf of Wojciech Turek Sent: Tue 5/25/2010 11:33 PM To: Gustavsson, Mathias Cc: lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] 1.8.1.1 -> 1.8.3 upgrade questions Those look familiar. Have you run fsck on the OSTs and MDTs before upgrade? Best regards, Wojciech On 25 May 2010 16:01, Gustavsson, Mathias wrote: > Hi, > > We tried to do a 1.8.1.1 to 1.8.3 version upgrade this weekend, but we got > i/o error on all of our old file systems (created ~4 years ago), we have a > more recently created filesystem (only a couple of moths old) and that was > fine. > This was in the log of the active mds, i couldn't find anything related in > the log of the client at the time. > > May 22 14:16:28 semldxlucky kernel: LustreError: > 2664:0:(mds_open.c:441:mds_create_objects()) error creating objects for inode > 841541: rc = -5 > May 22 14:16:28 semldxlucky kernel: LustreError: > 2664:0:(mds_open.c:826:mds_finish_open()) mds_create_objects: rc = -5 > May 22 14:16:38 semldxlucky kernel: LustreError: > 2648:0:(mds_open.c:441:mds_create_objects()) error creating objects for inode > 841541: rc = -5 > May 22 14:16:38 semldxlucky kernel: LustreError: > 2648:0:(mds_open.c:441:mds_create_objects()) Skipped 1 previous similar > message > May 22 14:16:38 semldxlucky kernel: LustreError: > 2648:0:(mds_open.c:826:mds_finish_open()) mds_create_objects: rc = -5 > May 22 14:16:38 semldxlucky kernel: LustreError: > 2648:0:(mds_open.c:826:mds_finish_open()) Skipped 1 previous similar message > > MDS + OSS's version : CentOS 5.4 and lustre version 1.8.1.1 > Clients version : CentOS 4.8 and lustre version 1.6.7.2 > > Mathias Gustavsson > Linux System Manager > AstraZeneca R&D Mölndal > SE-431 83 Mölndal, Sweden > > Phone: +46 31 776 12 58 > mathias.gustavs...@astrazeneca.com > > > -- > Confidentiality Notice: This message is private and may contain confidential > and proprietary information. If you have received this message in error, > please notify us and remove it from your system and note that you must not > copy, distribute or take any action in reliance on it. Any unauthorized use > or disclosure of the contents of this message is not permitted and may be > unlawful. > > ___ > Lustre-discuss mailing list > Lustre-discuss@lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > -- -- Wojciech Turek Assistant System Manager High Performance Computing Service University of Cambridge Email: wj...@cam.ac.uk Tel: (+)44 1223 763517 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] 1.8.1.1 -> 1.8.3 upgrade questions
These lines is only available when we are on version 1.8.3 May 22 13:19:12 semldxludwig kernel: LustreError: 12412:0:(lib-move.c:2441:LNetPut()) Error sending PUT to 12345-10.0.0@tcp: -113 May 22 13:19:51 semldxluis kernel: Lustre: 13460:0:(fsfilt-ldiskfs.c:1385:fsfilt_ldiskfs_setup()) filesystem doesn't have dir_index feature enabled And a couple of maximal mount count reached, file extents and mballoc enabled. And i got a LustreError with a lustre-log.1274525880.5458 in /tmp. I can't really sort what is relevant information from the OSS logs (except from the obvious client recovery traffic), give me a hint of what i should look for and I'll paste it in here. Thanks, Mathias Gustavsson Linux System Manager AstraZeneca R&D Mölndal SE-431 83 Mölndal, Sweden Phone: +46 31 776 12 58 mathias.gustavs...@astrazeneca.com -- Confidentiality Notice: This message is private and may contain confidential and proprietary information. If you have received this message in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the contents of this message is not permitted and may be unlawful. -Original Message- From: Andreas Dilger [mailto:andreas.dil...@oracle.com] Sent: Tue 5/25/2010 11:04 PM To: Gustavsson, Mathias Cc: lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] 1.8.1.1 -> 1.8.3 upgrade questions On 2010-05-25, at 09:01, Gustavsson, Mathias wrote: > We tried to do a 1.8.1.1 to 1.8.3 version upgrade this weekend, but we got > i/o error on all of our old file systems (created ~4 years ago), we have a > more recently created filesystem (only a couple of moths old) and that was > fine. > This was in the log of the active mds, i couldn't find anything related in > the log of the client at the time. > > May 22 14:16:28 semldxlucky kernel: LustreError: > 2664:0:(mds_open.c:441:mds_create_objects()) error creating objects for inode > 841541: rc = -5 > May 22 14:16:28 semldxlucky kernel: LustreError: > 2664:0:(mds_open.c:826:mds_finish_open()) mds_create_objects: rc = -5 > May 22 14:16:38 semldxlucky kernel: LustreError: > 2648:0:(mds_open.c:441:mds_create_objects()) error creating objects for inode > 841541: rc = -5 > May 22 14:16:38 semldxlucky kernel: LustreError: > 2648:0:(mds_open.c:441:mds_create_objects()) Skipped 1 previous similar > message > May 22 14:16:38 semldxlucky kernel: LustreError: > 2648:0:(mds_open.c:826:mds_finish_open()) mds_create_objects: rc = -5 > May 22 14:16:38 semldxlucky kernel: LustreError: > 2648:0:(mds_open.c:826:mds_finish_open()) Skipped 1 previous similar message > > MDS + OSS's version : CentOS 5.4 and lustre version 1.8.1.1 > Clients version : CentOS 4.8 and lustre version 1.6.7.2 This is really a problem between the MDS and the OSS. Is there anything in the OSS logs? Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] How to eliminate an OST from an OSS server
On Wed, 2010-05-26 at 09:43 +0300, Mohamed Adel wrote: > Dear, > > I have a pre-installed lustre file-system with two MDSs and two OSSs. Each > OSS has 6 arrays of disks. In OSS01, one of the arrays is not working > properly due to a physical error in one of the disks of the array. I wanted > to eliminate this array from OSS01. Simply I restarted MDSs and the OSSs and > didn't mount that array. Then I mounted the file-system on the clients. The > clients didn't produce any error after mounting but the file-system is not > accessible; i.e. I can't even issue an "ls" command to view the contents of > the file-system. I need to eliminate that corrupted array to be able to use > the file-system but don't know how. You likely want to deactivate the OST on the MDT and possibly on the clients also, depending on your goals. > Could anyone help me with that or just guide me what should I read to learn > how to fix this issue. Have you checked the operations manual at http://wiki.lustre.org/index.php/Lustre_Documentation? Maybe you did but you just didn't realize "deactivate" was the operation you were looking to perform. Cheers, b. signature.asc Description: This is a digitally signed message part ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss