Re: [Lustre-discuss] Permanently removing an OST

2010-05-26 Thread Kevin Van Maren
To clarify: would still have some vestiges of the old OST, and would 
have to follow
the other procedure if the OST index is reused, but the writeconf should 
remove all
mention of the OST from "lctl dl", right?


Kevin Van Maren wrote:
> Andreas,
>
> This isn't the same as the similar thread, where an OST is being  
> replaced (keeping the same number).  Doesn't he just have to re-do the  
> writeconf, to delete references to the OST in the MGS, as in Bug 22283?
>
> There will remain a gap in the OST numbers, but that should be okay if  
> there are no objects, right?
>
> Kevin
>
>
> On May 26, 2010, at 6:22 PM, Florent Parent   
> wrote:
>
>   
>> On Wed, May 26, 2010 at 19:08, Andreas Dilger > 
>>> wrote:
>>> On 2010-05-26, at 16:49, Florent Parent wrote:
>>>   
 on MGS:
 lctl conf_param osc.lustre1-OST002f-osc.active.osc.active=0
 
>>> If you specified the above, it is possible that you only  
>>> deactivated it on the MDS, not on the clients as well.
>>>   
>> Right. It was executed on all clients as well.
>>
>> 
 Many days later, and even following a complete server/clients  
 reboot,
 we are now seeing this target being active on clients:

 on MDT:
 [r...@mds2 ~]# lctl get_param osc.lustre1-OST002f-osc.active
 osc.lustre1-OST002f-osc.active=0
 [r...@mds2 ~]# lctl dl|grep 002f
 51 UP osc lustre1-OST002f-osc lustre1-mdtlov_UUID 5
 
>>> The device is configured, but if it is not active it will not be  
>>> used for anything.
>>>
>>>   
 on client:
 # ssh r101-n33 lctl dl |grep 002f
 50 UP osc lustre1-OST002f-osc-810377354000
 ef3f455a-7f67-134e-cf38-bcc0d9b89f26 4
 
>>> What does "active" report for this OSC on a client?
>>>   
>> Shows 0 (I don't know why we are seeing a double entry here). So I
>> guess it's inactive. I was under the impression references to the OST
>> would go away. It's also confusing to have the OST show as UP in "lctl
>> dl".
>>
>> # ssh r101-n33 lctl get_param osc.lustre1-OST002f-osc*.active
>> osc.lustre1-OST002f-osc-810371ac3c00.active=0
>> osc.lustre1-OST002f-osc-810377354000.active=0
>>
>> Thanks
>> Florent
>> ___
>> Lustre-discuss mailing list
>> Lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>> 
> ___
> Lustre-discuss mailing list
> Lustre-discuss@lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>   

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Permanently removing an OST

2010-05-26 Thread Kevin Van Maren
Andreas,

This isn't the same as the similar thread, where an OST is being  
replaced (keeping the same number).  Doesn't he just have to re-do the  
writeconf, to delete references to the OST in the MGS, as in Bug 22283?

There will remain a gap in the OST numbers, but that should be okay if  
there are no objects, right?

Kevin


On May 26, 2010, at 6:22 PM, Florent Parent   
wrote:

> On Wed, May 26, 2010 at 19:08, Andreas Dilger  > wrote:
>> On 2010-05-26, at 16:49, Florent Parent wrote:
>>>
>>> on MGS:
>>> lctl conf_param osc.lustre1-OST002f-osc.active.osc.active=0
>>
>> If you specified the above, it is possible that you only  
>> deactivated it on the MDS, not on the clients as well.
>
> Right. It was executed on all clients as well.
>
>>
>>> Many days later, and even following a complete server/clients  
>>> reboot,
>>> we are now seeing this target being active on clients:
>>>
>>> on MDT:
>>> [r...@mds2 ~]# lctl get_param osc.lustre1-OST002f-osc.active
>>> osc.lustre1-OST002f-osc.active=0
>>> [r...@mds2 ~]# lctl dl|grep 002f
>>> 51 UP osc lustre1-OST002f-osc lustre1-mdtlov_UUID 5
>>
>> The device is configured, but if it is not active it will not be  
>> used for anything.
>>
>>> on client:
>>> # ssh r101-n33 lctl dl |grep 002f
>>> 50 UP osc lustre1-OST002f-osc-810377354000
>>> ef3f455a-7f67-134e-cf38-bcc0d9b89f26 4
>>
>> What does "active" report for this OSC on a client?
>
> Shows 0 (I don't know why we are seeing a double entry here). So I
> guess it's inactive. I was under the impression references to the OST
> would go away. It's also confusing to have the OST show as UP in "lctl
> dl".
>
> # ssh r101-n33 lctl get_param osc.lustre1-OST002f-osc*.active
> osc.lustre1-OST002f-osc-810371ac3c00.active=0
> osc.lustre1-OST002f-osc-810377354000.active=0
>
> Thanks
> Florent
> ___
> Lustre-discuss mailing list
> Lustre-discuss@lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Permanently removing an OST

2010-05-26 Thread Florent Parent
On Wed, May 26, 2010 at 19:08, Andreas Dilger  wrote:
> On 2010-05-26, at 16:49, Florent Parent wrote:
>>
>> on MGS:
>> lctl conf_param osc.lustre1-OST002f-osc.active.osc.active=0
>
> If you specified the above, it is possible that you only deactivated it on 
> the MDS, not on the clients as well.

Right. It was executed on all clients as well.

>
>> Many days later, and even following a complete server/clients reboot,
>> we are now seeing this target being active on clients:
>>
>> on MDT:
>> [r...@mds2 ~]# lctl get_param osc.lustre1-OST002f-osc.active
>> osc.lustre1-OST002f-osc.active=0
>> [r...@mds2 ~]# lctl dl|grep 002f
>> 51 UP osc lustre1-OST002f-osc lustre1-mdtlov_UUID 5
>
> The device is configured, but if it is not active it will not be used for 
> anything.
>
>> on client:
>> # ssh r101-n33 lctl dl |grep 002f
>> 50 UP osc lustre1-OST002f-osc-810377354000
>> ef3f455a-7f67-134e-cf38-bcc0d9b89f26 4
>
> What does "active" report for this OSC on a client?

Shows 0 (I don't know why we are seeing a double entry here). So I
guess it's inactive. I was under the impression references to the OST
would go away. It's also confusing to have the OST show as UP in "lctl
dl".

# ssh r101-n33 lctl get_param osc.lustre1-OST002f-osc*.active
osc.lustre1-OST002f-osc-810371ac3c00.active=0
osc.lustre1-OST002f-osc-810377354000.active=0

Thanks
Florent
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Permanently removing an OST

2010-05-26 Thread Andreas Dilger
On 2010-05-26, at 16:49, Florent Parent wrote:
> A while ago, we experienced multi disk failures on a raid6 ost. We
> managed to migrate some data off the OST (lfs_migrate), and the
> process was long (software raid was often failing).
> 
> We reconstructed the target from scratch, which introduced a new OST.
> Following the Lustre documentation on "Removing an OST from the File
> System", we used the following procedure to permanently remove the old
> OST:
> 
> on MGS:
> lctl conf_param osc.lustre1-OST002f-osc.active.osc.active=0

If you specified the above, it is possible that you only deactivated it on the 
MDS, not on the clients as well.

> Many days later, and even following a complete server/clients reboot,
> we are now seeing this target being active on clients:
> 
> on MDT:
> [r...@mds2 ~]# lctl get_param osc.lustre1-OST002f-osc.active
> osc.lustre1-OST002f-osc.active=0
> [r...@mds2 ~]# lctl dl|grep 002f
> 51 UP osc lustre1-OST002f-osc lustre1-mdtlov_UUID 5

The device is configured, but if it is not active it will not be used for 
anything.

> on client:
> # ssh r101-n33 lctl dl |grep 002f
> 50 UP osc lustre1-OST002f-osc-810377354000
> ef3f455a-7f67-134e-cf38-bcc0d9b89f26 4

What does "active" report for this OSC on a client?

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Permanently removing an OST

2010-05-26 Thread Florent Parent
Hi,

A while ago, we experienced multi disk failures on a raid6 ost. We
managed to migrate some data off the OST (lfs_migrate), and the
process was long (software raid was often failing).

We reconstructed the target from scratch, which introduced a new OST.
Following the Lustre documentation on "Removing an OST from the File
System", we used the following procedure to permanently remove the old
OST:

on MGS:
lctl conf_param osc.lustre1-OST002f-osc.active.osc.active=0

Many days later, and even following a complete server/clients reboot,
we are now seeing this target being active on clients:

on MDT:
[r...@mds2 ~]# lctl get_param osc.lustre1-OST002f-osc.active
osc.lustre1-OST002f-osc.active=0
[r...@mds2 ~]# lctl dl|grep 002f
 51 UP osc lustre1-OST002f-osc lustre1-mdtlov_UUID 5

on client:
# ssh r101-n33 lctl dl |grep 002f
 50 UP osc lustre1-OST002f-osc-810377354000
ef3f455a-7f67-134e-cf38-bcc0d9b89f26 4

What are we missing from the procedure here? I'm really looking at
*permanently* disabling OST from Lustre.

Thanks for any pointers.
Florent
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] sanity check

2010-05-26 Thread Andreas Dilger
On 2010-05-26, at 13:47, Mervini, Joseph A wrote:
> I migrated all the files off the target with lfs_migrate. I didn't realize 
> that I would need to retain any of the ldiskfs data if everything was moved. 
> (I must have misinterpreted your earlier comment.)
> 
> So this is my current scenario:
> 
> 1. All data from a failing OST has been migrated to other targets.
> 2. The original target was recreated via mdadm.
> 3. mkfs.lustre was run on the recreated target
> 4. tunefs.lustre was run on the recreated target to set the index to what it 
> was before it was reformatted.
> 5. No other data from the original target has been retained.
> 
> Question:
> 
> Based on the above conditions, what do I need to do to get this OST back into 
> the file system?

Lustre is fairly robust about handling situations like this (e.g. recreating 
the last_rcvd file, the object heirarchy O/0/d{0..31}, etc).  The one item that 
it will need help with is to recreate the LAST_ID file on the OST.  You can do 
this by hand by extracting the last-precreated object from the MDS, and writing 
the LAST_ID file on the OST:

# extract last allocated object for all OSTs
mds# debugfs -c -R "dump lov_objids /tmp/lo"
# cut out the last allocated object for this OST index
mds# dd if=/tmp/lo of=/tmp/LAST_ID bs=8 skip=${OST index NN} count=1
# verify value is the right one (LAST_ID = next_id - 1)
mds# lctl get_param osc.*OST00NN.prealloc_next_id  # NN is OST index
mds# od -td8 /tmp/LAST_ID
# get OST filesystem ready for this value
ossN# mount -t ldiskfs /dev/{ostdev} /mnt/tmp
ossN# mkdir -p /mnt/tmp/O/0
mds# scp /tmp/LAST_ID ossN:/mnt/tmp/O/0/LAST_ID

This will avoid the OST trying to recreate thousands/millions of objects when 
the OST next reconnects.

This could probably be handled internally by the OST, by simply bumping the 
LAST_ID value in the case that it is currently < 2 and the MDS is requesting 
some large value.

> On May 26, 2010, at 1:29 PM, Andreas Dilger wrote:
> 
>> On 2010-05-26, at 13:18, Mervini, Joseph A wrote:
>>> I have migrated all the files that were on a damaged OST and have recreated 
>>> the software raid array and put a lustre file system on it.
>>> 
>>> I am now at the point where I want to re-introduce it to the scratch file 
>>> system as if it was never gone. I used:
>>> 
>>> tunefs.lustre --index=27 /dev/md4 to get the right index for the file 
>>> system (the information is below). I just want to make sure there is 
>>> nothing else I need to do before I pull the trigger will mounting it. (The 
>>> things that have me concerned are the differences in the flags, and less so 
>>> the "OST first_time update.)
>> 
>> The use of tunefs.lustre is not sufficient to make the new OST identical to 
>> the previous one.  You should also copy the O/0/LAST_ID file, last_rcvd, and 
>> mountdata files over, at which point you don't need tunefs.lustre at all.
>> 
>>> 
>>> 
>>> [r...@oss-scratch obdfilter]# tunefs.lustre /dev/md4
>>> checking for existing Lustre data: found CONFIGS/mountdata
>>> Reading CONFIGS/mountdata
>>> 
>>> Read previous values:
>>> Target: scratch1-OST001b
>>> Index:  27
>>> Lustre FS:  scratch1
>>> Mount type: ldiskfs
>>> Flags:  0x2
>>>   (OST )
>>> Persistent mount opts: errors=remount-ro,extents,mballoc
>>> Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib 
>>> failover.node=10.10.10...@o2ib
>>> 
>>> 
>>> Permanent disk data:
>>> Target: scratch1-OST001b
>>> Index:  27
>>> Lustre FS:  scratch1
>>> Mount type: ldiskfs
>>> Flags:  0x2
>>>   (OST )
>>> Persistent mount opts: errors=remount-ro,extents,mballoc
>>> Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib 
>>> failover.node=10.10.10...@o2ib
>>> 
>>> exiting before disk write.
>>> 
>>> 
>>> 
>>> 
>>> [r...@oss-scratch obdfilter]# tunefs.lustre --dryrun /dev/md4
>>> checking for existing Lustre data: found CONFIGS/mountdata
>>> Reading CONFIGS/mountdata
>>> 
>>> Read previous values:
>>> Target: scratch1-OST001b
>>> Index:  27
>>> Lustre FS:  scratch1
>>> Mount type: ldiskfs
>>> Flags:  0x62
>>>   (OST first_time update )
>>> Persistent mount opts: errors=remount-ro,extents,mballoc
>>> Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib 
>>> failover.node=10.10.10...@o2ib
>>> 
>>> 
>>> Permanent disk data:
>>> Target: scratch1-OST001b
>>> Index:  27
>>> Lustre FS:  scratch1
>>> Mount type: ldiskfs
>>> Flags:  0x62
>>>   (OST first_time update )
>>> Persistent mount opts: errors=remount-ro,extents,mballoc
>>> Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib 
>>> failover.node=10.10.10...@o2ib
>>> 
>>> exiting before disk write.
>>> 
>>> 
>>> ___
>>> Lustre-discuss mailing list
>>> Lustre-discuss@lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>> 
>> 
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Lustre Technical Lead
>> Oracle Corporation Canada Inc.
>> 
>> 

Re: [Lustre-discuss] sanity check

2010-05-26 Thread Mervini, Joseph A
Andreas,

I migrated all the files off the target with lfs_migrate. I didn't realize that 
I would need to retain any of the ldiskfs data if everything was moved. (I must 
have misinterpreted your earlier comment.)

So this is my current scenario:

1. All data from a failing OST has been migrated to other targets.
2. The original target was recreated via mdadm.
3. mkfs.lustre was run on the recreated target
4. tunefs.lustre was run on the recreated target to set the index to what it 
was before it was reformatted.
5. No other data from the original target has been retained.

Question:

Based on the above conditions, what do I need to do to get this OST back into 
the file system?

Thanks in advance.

Joe
 
On May 26, 2010, at 1:29 PM, Andreas Dilger wrote:

> On 2010-05-26, at 13:18, Mervini, Joseph A wrote:
>> I have migrated all the files that were on a damaged OST and have recreated 
>> the software raid array and put a lustre file system on it.
>> 
>> I am now at the point where I want to re-introduce it to the scratch file 
>> system as if it was never gone. I used:
>> 
>> tunefs.lustre --index=27 /dev/md4 to get the right index for the file system 
>> (the information is below). I just want to make sure there is nothing else I 
>> need to do before I pull the trigger will mounting it. (The things that have 
>> me concerned are the differences in the flags, and less so the "OST 
>> first_time update.)
> 
> The use of tunefs.lustre is not sufficient to make the new OST identical to 
> the previous one.  You should also copy the O/0/LAST_ID file, last_rcvd, and 
> mountdata files over, at which point you don't need tunefs.lustre at all.
> 
>> 
>> 
>> [r...@oss-scratch obdfilter]# tunefs.lustre /dev/md4
>> checking for existing Lustre data: found CONFIGS/mountdata
>> Reading CONFIGS/mountdata
>> 
>> Read previous values:
>> Target: scratch1-OST001b
>> Index:  27
>> Lustre FS:  scratch1
>> Mount type: ldiskfs
>> Flags:  0x2
>>(OST )
>> Persistent mount opts: errors=remount-ro,extents,mballoc
>> Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib 
>> failover.node=10.10.10...@o2ib
>> 
>> 
>> Permanent disk data:
>> Target: scratch1-OST001b
>> Index:  27
>> Lustre FS:  scratch1
>> Mount type: ldiskfs
>> Flags:  0x2
>>(OST )
>> Persistent mount opts: errors=remount-ro,extents,mballoc
>> Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib 
>> failover.node=10.10.10...@o2ib
>> 
>> exiting before disk write.
>> 
>> 
>> 
>> 
>> [r...@oss-scratch obdfilter]# tunefs.lustre --dryrun /dev/md4
>> checking for existing Lustre data: found CONFIGS/mountdata
>> Reading CONFIGS/mountdata
>> 
>> Read previous values:
>> Target: scratch1-OST001b
>> Index:  27
>> Lustre FS:  scratch1
>> Mount type: ldiskfs
>> Flags:  0x62
>>(OST first_time update )
>> Persistent mount opts: errors=remount-ro,extents,mballoc
>> Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib 
>> failover.node=10.10.10...@o2ib
>> 
>> 
>> Permanent disk data:
>> Target: scratch1-OST001b
>> Index:  27
>> Lustre FS:  scratch1
>> Mount type: ldiskfs
>> Flags:  0x62
>>(OST first_time update )
>> Persistent mount opts: errors=remount-ro,extents,mballoc
>> Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib 
>> failover.node=10.10.10...@o2ib
>> 
>> exiting before disk write.
>> 
>> 
>> ___
>> Lustre-discuss mailing list
>> Lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Technical Lead
> Oracle Corporation Canada Inc.
> 
> 


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] sanity check

2010-05-26 Thread Andreas Dilger
On 2010-05-26, at 13:18, Mervini, Joseph A wrote:
> I have migrated all the files that were on a damaged OST and have recreated 
> the software raid array and put a lustre file system on it.
> 
> I am now at the point where I want to re-introduce it to the scratch file 
> system as if it was never gone. I used:
> 
> tunefs.lustre --index=27 /dev/md4 to get the right index for the file system 
> (the information is below). I just want to make sure there is nothing else I 
> need to do before I pull the trigger will mounting it. (The things that have 
> me concerned are the differences in the flags, and less so the "OST 
> first_time update.)

The use of tunefs.lustre is not sufficient to make the new OST identical to the 
previous one.  You should also copy the O/0/LAST_ID file, last_rcvd, and 
mountdata files over, at which point you don't need tunefs.lustre at all.

> 
> 
> [r...@oss-scratch obdfilter]# tunefs.lustre /dev/md4
> checking for existing Lustre data: found CONFIGS/mountdata
> Reading CONFIGS/mountdata
> 
>  Read previous values:
> Target: scratch1-OST001b
> Index:  27
> Lustre FS:  scratch1
> Mount type: ldiskfs
> Flags:  0x2
> (OST )
> Persistent mount opts: errors=remount-ro,extents,mballoc
> Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib 
> failover.node=10.10.10...@o2ib
> 
> 
>  Permanent disk data:
> Target: scratch1-OST001b
> Index:  27
> Lustre FS:  scratch1
> Mount type: ldiskfs
> Flags:  0x2
> (OST )
> Persistent mount opts: errors=remount-ro,extents,mballoc
> Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib 
> failover.node=10.10.10...@o2ib
> 
> exiting before disk write.
> 
> 
> 
> 
> [r...@oss-scratch obdfilter]# tunefs.lustre --dryrun /dev/md4
> checking for existing Lustre data: found CONFIGS/mountdata
> Reading CONFIGS/mountdata
> 
>  Read previous values:
> Target: scratch1-OST001b
> Index:  27
> Lustre FS:  scratch1
> Mount type: ldiskfs
> Flags:  0x62
> (OST first_time update )
> Persistent mount opts: errors=remount-ro,extents,mballoc
> Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib 
> failover.node=10.10.10...@o2ib
> 
> 
>  Permanent disk data:
> Target: scratch1-OST001b
> Index:  27
> Lustre FS:  scratch1
> Mount type: ldiskfs
> Flags:  0x62
> (OST first_time update )
> Persistent mount opts: errors=remount-ro,extents,mballoc
> Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib 
> failover.node=10.10.10...@o2ib
> 
> exiting before disk write.
> 
> 
> ___
> Lustre-discuss mailing list
> Lustre-discuss@lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss


Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] sanity check

2010-05-26 Thread Mervini, Joseph A
Hoping for a quick sanity check:

I have migrated all the files that were on a damaged OST and have recreated the 
software raid array and put a lustre file system on it.

I am now at the point where I want to re-introduce it to the scratch file 
system as if it was never
gone. I used:

tunefs.lustre --index=27 /dev/md4 to get the right index for the file system 
(the information is
below). I just want to make sure there is nothing else I need to do before I 
pull the trigger will
mounting it. (The things that have me concerned are the differences in the 
flags, and less so the
"OST first_time update.)





[r...@oss-scratch obdfilter]# tunefs.lustre /dev/md4
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

  Read previous values:
Target: scratch1-OST001b
Index:  27
Lustre FS:  scratch1
Mount type: ldiskfs
Flags:  0x2
 (OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib 
failover.node=10.10.10...@o2ib


  Permanent disk data:
Target: scratch1-OST001b
Index:  27
Lustre FS:  scratch1
Mount type: ldiskfs
Flags:  0x2
 (OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib 
failover.node=10.10.10...@o2ib

exiting before disk write.




[r...@oss-scratch obdfilter]# tunefs.lustre --dryrun /dev/md4
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

  Read previous values:
Target: scratch1-OST001b
Index:  27
Lustre FS:  scratch1
Mount type: ldiskfs
Flags:  0x62
 (OST first_time update )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib 
failover.node=10.10.10...@o2ib


  Permanent disk data:
Target: scratch1-OST001b
Index:  27
Lustre FS:  scratch1
Mount type: ldiskfs
Flags:  0x62
 (OST first_time update )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=10.10.1...@o2ib mgsnode=10.10.1...@o2ib 
failover.node=10.10.10...@o2ib

exiting before disk write.


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Future of lustre 1.8.3+

2010-05-26 Thread Guy Coates
On 26/05/10 17:25, Ramiro Alba Queipo wrote:
> On Wed, 2010-05-26 at 16:48 +0100, Guy Coates wrote:
>
 One thing to watch out for in your kernel configs is to make sure that:

 CONFIG_SECURITY_FILE_CAPABILITIES=N
>>>
>>> OK. But the question is if this issue still applies for lustre-1.8.3 and
>>> SLES kernel linux-2.6.27.39-0.3.1.tar.bz2. I mean, is quite surprising
>>> that if this problems persist, Oracle is offering lustre packages for
>>> SLES11 with CONFIG_SECURITY_FILE_CAPABILITIES=y ???
>>> I am just about to start testing, so I'd like to clarify this.
>>
>> The binary SLES packages are fine; it is the source packages that may be
>> problematic, depending on your config. There is a bug filed against this
>
> Sorry Guy. May be there is something I am missing, but SLES11 rpm kernel
> server packages for lustre-1.8.3 are created using a config with
> ONFIG_SECURITY_FILE_CAPABILITIES=y (See yourself on the attachement

You are entirely correct.

Cheers,

Guy

-- 
Dr Guy Coates,  Informatics System Group
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK
Tel: +44 (0)1223 834244 ex 6925
Fax: +44 (0)1223 496802


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] What would happen if

2010-05-26 Thread Kevin Van Maren
If you are creating files that are a significant fraction of the free 
space on your OSTs, you _should_ stripe
them across multiple OSTs (lfs setstripe).

Also note that ENOSPC is likely to be returned before the OST is 
actually "out" of space: due to the
OST pre-granting space to the clients, ENOSPC is returned when (free - 
grants) = 0.  The more Lustre
clients you have, the sooner this occurs.  See Bug 22498

Additionally, there are some grant leaks that can cause the server's 
view of the grant to be inflated;
see Bug 22755

Kevin


Brian J. Murrell wrote:
> On Wed, 2010-05-26 at 09:26 -0400, Scott wrote: 
>   
>> Hi,
>> 
>
> Hi,
>
> You don't provide any particulars (lustre version, lfs df output, etc.)
> so I will answer generically...
>
>   
>> I am not using striping.  Question, when creating a new file does Lustre try 
>> to write to the OST 
>> with the most free space?
>> 
>
> That depends on your lustre version and the values
> in /proc/fs/lustre/lov/client-clilov-f385fc00/qos_*.  Please see the
> manual for details on those files and how they affect object allocation,
> if you are running a relevant version of Lustre.
>
>   
>> Second question, if for whatever reason it picks the OST with 200Gb free and 
>> i was writing a file 
>> that would be 500Gb at finish, what would happen when that particular OST 
>> ran out of space?
>> 
>
> The client gets an ENOSPC.
>
>   
>> Would 
>> it leave the file as incomplete and show no free space?
>> 
>
> If the client application doesn't do any cleanup on getting an error,
> yes.
>
> b.
>
>   
> 
>
> ___
> Lustre-discuss mailing list
> Lustre-discuss@lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>   

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Future of lustre 1.8.3+

2010-05-26 Thread Guy Coates
On 26/05/10 16:31, Ramiro Alba Queipo wrote:
> Hi Guy,
> 
> On Wed, 2010-05-26 at 14:59 +0100, Guy Coates wrote:
   
> The SLES11 kernel is at 2.6.27 so it could be usable for this. Also, I 
> 
 Ok, I am getting
 http://downloads.lustre.org/public/kernels/sles11/linux-2.6.27.39-0.3.1.tar.bz2

 but, please. Where can I get a suitable config file to apply both for
 servers and clients?
>>
>> One thing to watch out for in your kernel configs is to make sure that:
>>
>> CONFIG_SECURITY_FILE_CAPABILITIES=N
> 
> OK. But the question is if this issue still applies for lustre-1.8.3 and
> SLES kernel linux-2.6.27.39-0.3.1.tar.bz2. I mean, is quite surprising
> that if this problems persist, Oracle is offering lustre packages for
> SLES11 with CONFIG_SECURITY_FILE_CAPABILITIES=y ???
> I am just about to start testing, so I'd like to clarify this.

The binary SLES packages are fine; it is the source packages that may be
problematic, depending on your config. There is a bug filed against this
now (22913), so no doubt it will be fixed in a subsequent release.

In regard to your testing, it is easy to check if a client is mis-behaving;

run:

#cat /dev/zero > /lustre/filesystem &

and watch the client IO stats with:

#watch -n 1 cat /proc/fs/lustre/llite/*/stats


If getxattr is going up with write_bytes, then you have a problem.

Cheers,

Guy

-- 
Dr. Guy Coates, Informatics System Group
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK
Tel: +44 (0)1223 834244 x 6925
Fax: +44 (0)1223 496802


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] two problems

2010-05-26 Thread Stefano Elmopi



Hi,

My version of Lustre is 1.8.3
My filesystem is composed of one MGS/MDS server and two OSS.
By testing, I tried to delete a OST and replace it with another OST
and now the situation is this:

cat /proc/fs/lustre/lov/lustre01-mdtlov/target_obd
0: lustre01-OST_UUID ACTIVE
2: lustre01-OST0002_UUID ACTIVE

- first problem
lustre01-OST0001_UUID ACTIVE is the OST was canceled and it had files,
which of course now there are not more:

ls -lrt
total 12475312
?- ? ?? ?? zero.dat
?- ? ?? ?? ubuntu-9.10-dvd-i386.iso
?- ? ?? ?? X_CentOS-5.4- 
x86_64-bin-DVD.iso

?- ? ?? ?? Windows_XP-Capodarco.iso
?- ? ?? ?? UBUNTU_CentOS-5.4- 
x86_64-bin-DVD.iso
?- ? ?? ?? KK_CentOS-5.4-x86_64- 
bin-DVD.iso
?- ? ?? ?? F_CentOS-5.4-x86_64- 
bin-DVD.iso
?- ? ?? ?? CentOS-5.3-i386-bin- 
DVD.iso
?- ? ?? ?? B_CentOS-5.4-x86_64- 
bin-DVD.iso
?- ? ?? ?? BAK_CentOS-5.4-x86_64- 
bin-DVD.iso

?- ? ?? ?? 2.iso


I to delete them, follow these steps:

on MGS/MDS server:

e2fsck -n -v --mdsdb /root/mds_home_db /dev/mpath/mpath2

copy the file mds_home_db on OSS_1 and, one OSS_1 launch the following  
command:


e2fsck -n -v --mdsdb /root/mds_home_db --ostdb /root/home_ost00db /dev/ 
mpath/mpath1


and do the same thing on the OSS_2:

e2fsck -n -v --mdsdb /root/mds_home_db --ostdb /root/home_ost01db /dev/ 
mpath/mpath2


then copy the files mds_home_db, home_ost00db and home_ost01db on the  
Lustre Client,

mount the lustre filesystem and run the commnand:

lfsck -c -v --mdsdb /root/mds_home_db --ostdb /root/home_ost00db /root/ 
home_ost02db /LUSTRE


but the command hangs:

.
.
.
.
[0] zero-length orphan objid 1182
[0] zero-length orphan objid 1214
[0] zero-length orphan objid 1246
[0] zero-length orphan objid 1183
[0] zero-length orphan objid 1215
[0] zero-length orphan objid 1247
lfsck: ost_idx 0: pass3 OK (218 files total)
MDS: max_id 161 OST: max_id 65
lfsck: ost_idx 1: pass1: check for duplicate objects
lfsck: ost_idx 1: pass1 OK (11 files total)
lfsck: ost_idx 1: pass2: check for missing inode objects


and the server MGS/MDS go to in Kernel Panic
and the Lustre Client log say:
May 26 17:39:35 mdt02prdpom kernel: LustreError: 7105:0:(lov_ea.c: 
248:lsm_unpackmd_v1()) OST index 1 missing
May 26 17:39:35 mdt02prdpom kernel: LustreError: 7105:0:(lov_ea.c: 
248:lsm_unpackmd_v1()) Skipped 21 previous similar messages
May 26 17:39:35 mdt02prdpom kernel: Lustre: 7105:0:(lov_pack.c: 
64:lov_dump_lmm_common()) objid 0x1b20003, magic 0x0bd10bd0, pattern 0x1
May 26 17:39:35 mdt02prdpom kernel: Lustre: 7105:0:(lov_pack.c: 
67:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 1
May 26 17:39:35 mdt02prdpom kernel: Lustre: 7105:0:(lov_pack.c: 
84:lov_dump_lmm_objects()) stripe 0 idx 1 subobj 0x0/0x2
May 26 17:39:35 mdt02prdpom kernel: Lustre: 7105:0:(lov_pack.c: 
64:lov_dump_lmm_common()) objid 0x1b20005, magic 0x0bd10bd0, pattern 0x1
May 26 17:39:35 mdt02prdpom kernel: Lustre: 7105:0:(lov_pack.c: 
67:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 1
May 26 17:39:35 mdt02prdpom kernel: Lustre: 7105:0:(lov_pack.c: 
84:lov_dump_lmm_objects()) stripe 0 idx 1 subobj 0x0/0x3
May 26 17:39:35 mdt02prdpom kernel: Lustre: 7105:0:(lov_pack.c: 
64:lov_dump_lmm_common()) objid 0x1b20006, magic 0x0bd10bd0, pattern 0x1
May 26 17:39:35 mdt02prdpom kernel: Lustre: 7105:0:(lov_pack.c: 
67:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 1
May 26 17:39:35 mdt02prdpom kernel: Lustre: 7105:0:(lov_pack.c: 
84:lov_dump_lmm_objects()) stripe 0 idx 1 subobj 0x0/0x4
May 26 17:39:35 mdt02prdpom kernel: Lustre: 7105:0:(lov_pack.c: 
64:lov_dump_lmm_common()) objid 0x1b20008, magic 0x0bd10bd0, pattern 0x1
May 26 17:39:35 mdt02prdpom kernel: Lustre: 7105:0:(lov_pack.c: 
67:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 1
May 26 17:39:35 mdt02prdpom kernel: Lustre: 7105:0:(lov_pack.c: 
84:lov_dump_lmm_objects()) stripe 0 idx 1 subobj 0x0/0x5
May 26 17:39:35 mdt02prdpom kernel: Lustre: 7105:0:(lov_pack.c: 
64:lov_dump_lmm_common()) objid 0x1b2000a, magic 0x0bd10bd0, pattern 0x1
May 26 17:39:35 mdt02prdpom kernel: Lustre: 7105:0:(lov_pack.c: 
67:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 1
May 26 17:39:35 mdt02prdpom kernel: Lustre: 7105:0:(lov_pack.c: 
84:lov_dump_lmm_objects()) stripe 0 idx 1 subobj 0x0/0x6
May 26 17:39:35 mdt02prdpom kernel: Lustre: 7105:0:(lov_pack.c: 
64:lov_dump_lmm_common()) objid 0x1b2000c, magic 0x0bd10bd0, pattern 0x1
May 26 17:39:35 mdt02prdpom kernel: Lustre: 7105:0:(lov_pack.c: 
67:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 1
May 26 17:39:35 m

Re: [Lustre-discuss] Future of lustre 1.8.3+

2010-05-26 Thread Ramiro Alba Queipo
Hi Guy,

On Wed, 2010-05-26 at 14:59 +0100, Guy Coates wrote:
> >>   
> >>> The SLES11 kernel is at 2.6.27 so it could be usable for this. Also, I 
> >>> 
> >> Ok, I am getting
> >> http://downloads.lustre.org/public/kernels/sles11/linux-2.6.27.39-0.3.1.tar.bz2
> >>
> >> but, please. Where can I get a suitable config file to apply both for
> >> servers and clients?
> 
> One thing to watch out for in your kernel configs is to make sure that:
> 
> CONFIG_SECURITY_FILE_CAPABILITIES=N

OK. But the question is if this issue still applies for lustre-1.8.3 and
SLES kernel linux-2.6.27.39-0.3.1.tar.bz2. I mean, is quite surprising
that if this problems persist, Oracle is offering lustre packages for
SLES11 with CONFIG_SECURITY_FILE_CAPABILITIES=y ???
I am just about to start testing, so I'd like to clarify this.

Cheers

> 
> otherwise you will run into:
> 
> https://bugzilla.lustre.org/show_bug.cgi?id=21439
> 
> (each write call causes 2 getxattr calls, which will pound your MDS into
> the ground).
> 
> The SLES11, debian/lenny and ubuntu kernels all have this feature set,
> so if you are building clients against those kernels, you may be in
> trouble.
> 
> 
> Cheers,
> 
> Guy
> 
> -- 
> Dr. Guy Coates, Informatics System Group
> The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK
> Tel: +44 (0)1223 834244 x 6925
> Fax: +44 (0)1223 496802
> 
> 
> -- 
>  The Wellcome Trust Sanger Institute is operated by Genome Research 
>  Limited, a charity registered in England with number 1021457 and a 
>  company registered in England with number 2742969, whose registered 
>  office is 215 Euston Road, London, NW1 2BE. 
> 
-- 
Ramiro Alba

Centre Tecnològic de Tranferència de Calor
http://www.cttc.upc.edu


Escola Tècnica Superior d'Enginyeries
Industrial i Aeronàutica de Terrassa
Colom 11, E-08222, Terrassa, Barcelona, Spain
Tel: (+34) 93 739 86 46


-- 
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que està net.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Future of lustre 1.8.3+

2010-05-26 Thread Guy Coates
On 26/05/10 16:18, Andreas Dilger wrote:
> The problem with SELinux is that it is trying to access the security
> xattr for each file access but Lustre does not cache xattrs on the client.
> 
> The other main question about SELinux is whether it even makes sense in
> a distributed environment.

Just to be clear, SELinux was disabled on these machines (selinux=0
kernel option); simply having the kernel-config set still triggers the
code path/bug.

Cheers,

Guy

-- 
Dr. Guy Coates, Informatics System Group
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK
Tel: +44 (0)1223 834244 x 6925
Fax: +44 (0)1223 496802


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Future of lustre 1.8.3+

2010-05-26 Thread Andreas Dilger
The problem with SELinux is that it is trying to access the security  
xattr for each file access but Lustre does not cache xattrs on the  
client.

The other main question about SELinux is whether it even makes sense  
in a distributed environment.

For now (see bug) we have just disabled the access to this specific  
attribute in Lustre.  It would be nice if someone with more  
understanding of SELinux would investigate if there is some global  
settings file that could be modified to exclude Lustre from the  
security policy checking, and then we can push this to the upstream  
distros.

Cheers, Andreas

On 2010-05-26, at 8:43, Gregory Matthews   
wrote:

> Guy Coates wrote:
>> One thing to watch out for in your kernel configs is to make sure  
>> that:
>>
>> CONFIG_SECURITY_FILE_CAPABILITIES=N
>
> I hope this is not the case for the now obsolete:
>
> CONFIG_EXT3_FS_SECURITY=y
>
> which appears to be enabled by default on RHEL5.x
>
> Its not entirely clear to me what this is for but would metadata
> performance be better without it?
>
> GREG
>
>>
>> otherwise you will run into:
>>
>> https://bugzilla.lustre.org/show_bug.cgi?id=21439
>>
>> (each write call causes 2 getxattr calls, which will pound your MDS  
>> into
>> the ground).
>>
>> The SLES11, debian/lenny and ubuntu kernels all have this feature  
>> set,
>> so if you are building clients against those kernels, you may be in
>> trouble.
>>
>>
>> Cheers,
>>
>> Guy
>>
>
>
> -- 
> Greg Matthews01235 778658
> Senior Computer Systems Administrator
> Diamond Light Source, Oxfordshire, UK
> ___
> Lustre-discuss mailing list
> Lustre-discuss@lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Future of lustre 1.8.3+

2010-05-26 Thread Gregory Matthews
Guy Coates wrote:
> One thing to watch out for in your kernel configs is to make sure that:
> 
> CONFIG_SECURITY_FILE_CAPABILITIES=N

I hope this is not the case for the now obsolete:

CONFIG_EXT3_FS_SECURITY=y

which appears to be enabled by default on RHEL5.x

Its not entirely clear to me what this is for but would metadata 
performance be better without it?

GREG

> 
> otherwise you will run into:
> 
> https://bugzilla.lustre.org/show_bug.cgi?id=21439
> 
> (each write call causes 2 getxattr calls, which will pound your MDS into
> the ground).
> 
> The SLES11, debian/lenny and ubuntu kernels all have this feature set,
> so if you are building clients against those kernels, you may be in
> trouble.
> 
> 
> Cheers,
> 
> Guy
> 


-- 
Greg Matthews01235 778658
Senior Computer Systems Administrator
Diamond Light Source, Oxfordshire, UK
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Future of lustre 1.8.3+

2010-05-26 Thread Guy Coates
On 21/05/10 10:08, Christopher Huhn wrote:
> Hi Ramiro,
> 
> Ramiro Alba Queipo wrote:
>> On Thu, 2010-05-20 at 10:16 -0600, Andreas Dilger wrote:
>>   
>>> The SLES11 kernel is at 2.6.27 so it could be usable for this. Also, I 
>>> 
>> Ok, I am getting
>> http://downloads.lustre.org/public/kernels/sles11/linux-2.6.27.39-0.3.1.tar.bz2
>>
>> but, please. Where can I get a suitable config file to apply both for
>> servers and clients?

One thing to watch out for in your kernel configs is to make sure that:

CONFIG_SECURITY_FILE_CAPABILITIES=N

otherwise you will run into:

https://bugzilla.lustre.org/show_bug.cgi?id=21439

(each write call causes 2 getxattr calls, which will pound your MDS into
the ground).

The SLES11, debian/lenny and ubuntu kernels all have this feature set,
so if you are building clients against those kernels, you may be in
trouble.


Cheers,

Guy

-- 
Dr. Guy Coates, Informatics System Group
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK
Tel: +44 (0)1223 834244 x 6925
Fax: +44 (0)1223 496802


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] What would happen if

2010-05-26 Thread Brian J. Murrell
On Wed, 2010-05-26 at 09:26 -0400, Scott wrote: 
> Hi,

Hi,

You don't provide any particulars (lustre version, lfs df output, etc.)
so I will answer generically...

> I am not using striping.  Question, when creating a new file does Lustre try 
> to write to the OST 
> with the most free space?

That depends on your lustre version and the values
in /proc/fs/lustre/lov/client-clilov-f385fc00/qos_*.  Please see the
manual for details on those files and how they affect object allocation,
if you are running a relevant version of Lustre.

> Second question, if for whatever reason it picks the OST with 200Gb free and 
> i was writing a file 
> that would be 500Gb at finish, what would happen when that particular OST ran 
> out of space?

The client gets an ENOSPC.

> Would 
> it leave the file as incomplete and show no free space?

If the client application doesn't do any cleanup on getting an error,
yes.

b.



signature.asc
Description: This is a digitally signed message part
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] What would happen if

2010-05-26 Thread Scott
Hi,

My Lustre system has 7 OSTs on 3 OSSs.   2 of the OSTs have less then 200Gb 
free while the other 5 
have 3+ TB free.

I am not using striping.  Question, when creating a new file does Lustre try to 
write to the OST 
with the most free space?  If not, how does it choose which OST to write to?

Second question, if for whatever reason it picks the OST with 200Gb free and i 
was writing a file 
that would be 500Gb at finish, what would happen when that particular OST ran 
out of space?  Would 
it leave the file as incomplete and show no free space?

Thanks, trying to debug an odd problem.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] 1.8.1.1 -> 1.8.3 upgrade questions

2010-05-26 Thread Gustavsson, Mathias
Nope, wish i had .. "maximal mount count reached"

Thanks,

Mathias Gustavsson
Linux System Manager
AstraZeneca R&D Mölndal
SE-431 83 Mölndal, Sweden

Phone:  +46 31 776 12 58
mathias.gustavs...@astrazeneca.com




--
Confidentiality Notice: This message is private and may contain confidential 
and proprietary information. If you have received this message in error, please 
notify us and remove it from your system and note that you must not copy, 
distribute or take any action in reliance on it. Any unauthorized use or 
disclosure of the contents of this message is not permitted and may be unlawful.
 
-Original Message-
From: turek.wojci...@googlemail.com on behalf of Wojciech Turek
Sent: Tue 5/25/2010 11:33 PM
To: Gustavsson, Mathias
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [Lustre-discuss] 1.8.1.1 -> 1.8.3 upgrade questions
 
Those look familiar. Have you run fsck on the OSTs and MDTs before upgrade?

Best regards,

Wojciech

On 25 May 2010 16:01, Gustavsson, Mathias
 wrote:
> Hi,
>
> We tried to do a 1.8.1.1 to 1.8.3 version upgrade this weekend, but we got 
> i/o error on all of our old file systems (created ~4 years ago), we have a 
> more recently created filesystem (only a couple of moths old) and that was 
> fine.
> This was in the log of the active mds, i couldn't find anything related in 
> the log of the client at the time.
>
> May 22 14:16:28 semldxlucky kernel: LustreError: 
> 2664:0:(mds_open.c:441:mds_create_objects()) error creating objects for inode 
> 841541: rc = -5
> May 22 14:16:28 semldxlucky kernel: LustreError: 
> 2664:0:(mds_open.c:826:mds_finish_open()) mds_create_objects: rc = -5
> May 22 14:16:38 semldxlucky kernel: LustreError: 
> 2648:0:(mds_open.c:441:mds_create_objects()) error creating objects for inode 
> 841541: rc = -5
> May 22 14:16:38 semldxlucky kernel: LustreError: 
> 2648:0:(mds_open.c:441:mds_create_objects()) Skipped 1 previous similar 
> message
> May 22 14:16:38 semldxlucky kernel: LustreError: 
> 2648:0:(mds_open.c:826:mds_finish_open()) mds_create_objects: rc = -5
> May 22 14:16:38 semldxlucky kernel: LustreError: 
> 2648:0:(mds_open.c:826:mds_finish_open()) Skipped 1 previous similar message
>
> MDS + OSS's version : CentOS 5.4 and lustre version 1.8.1.1
> Clients version : CentOS 4.8 and lustre version 1.6.7.2
>
> Mathias Gustavsson
> Linux System Manager
> AstraZeneca R&D Mölndal
> SE-431 83 Mölndal, Sweden
>
> Phone:  +46 31 776 12 58
> mathias.gustavs...@astrazeneca.com
>
>
> --
> Confidentiality Notice: This message is private and may contain confidential 
> and proprietary information. If you have received this message in error, 
> please notify us and remove it from your system and note that you must not 
> copy, distribute or take any action in reliance on it. Any unauthorized use 
> or disclosure of the contents of this message is not permitted and may be 
> unlawful.
>
> ___
> Lustre-discuss mailing list
> Lustre-discuss@lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>



-- 
--
Wojciech Turek

Assistant System Manager

High Performance Computing Service
University of Cambridge
Email: wj...@cam.ac.uk
Tel: (+)44 1223 763517

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] 1.8.1.1 -> 1.8.3 upgrade questions

2010-05-26 Thread Gustavsson, Mathias
These lines is only available when we are on version 1.8.3

May 22 13:19:12 semldxludwig kernel: LustreError: 
12412:0:(lib-move.c:2441:LNetPut()) Error sending PUT to 12345-10.0.0@tcp: 
-113
May 22 13:19:51 semldxluis kernel: Lustre: 
13460:0:(fsfilt-ldiskfs.c:1385:fsfilt_ldiskfs_setup()) filesystem doesn't have 
dir_index feature enabled

And a couple of maximal mount count reached, file extents and mballoc enabled.
And i got a LustreError with a lustre-log.1274525880.5458 in /tmp.

I can't really sort what is relevant information from the OSS logs (except from 
the obvious client recovery traffic), give me a hint of what i should look for 
and I'll paste it in here.

Thanks, 


Mathias Gustavsson
Linux System Manager
AstraZeneca R&D Mölndal
SE-431 83 Mölndal, Sweden

Phone:  +46 31 776 12 58
mathias.gustavs...@astrazeneca.com




--
Confidentiality Notice: This message is private and may contain confidential 
and proprietary information. If you have received this message in error, please 
notify us and remove it from your system and note that you must not copy, 
distribute or take any action in reliance on it. Any unauthorized use or 
disclosure of the contents of this message is not permitted and may be unlawful.
 
-Original Message-
From: Andreas Dilger [mailto:andreas.dil...@oracle.com]
Sent: Tue 5/25/2010 11:04 PM
To: Gustavsson, Mathias
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [Lustre-discuss] 1.8.1.1 -> 1.8.3 upgrade questions
 
On 2010-05-25, at 09:01, Gustavsson, Mathias wrote:
> We tried to do a 1.8.1.1 to 1.8.3 version upgrade this weekend, but we got 
> i/o error on all of our old file systems (created ~4 years ago), we have a 
> more recently created filesystem (only a couple of moths old) and that was 
> fine.
> This was in the log of the active mds, i couldn't find anything related in 
> the log of the client at the time.
> 
> May 22 14:16:28 semldxlucky kernel: LustreError: 
> 2664:0:(mds_open.c:441:mds_create_objects()) error creating objects for inode 
> 841541: rc = -5
> May 22 14:16:28 semldxlucky kernel: LustreError: 
> 2664:0:(mds_open.c:826:mds_finish_open()) mds_create_objects: rc = -5
> May 22 14:16:38 semldxlucky kernel: LustreError: 
> 2648:0:(mds_open.c:441:mds_create_objects()) error creating objects for inode 
> 841541: rc = -5
> May 22 14:16:38 semldxlucky kernel: LustreError: 
> 2648:0:(mds_open.c:441:mds_create_objects()) Skipped 1 previous similar 
> message
> May 22 14:16:38 semldxlucky kernel: LustreError: 
> 2648:0:(mds_open.c:826:mds_finish_open()) mds_create_objects: rc = -5
> May 22 14:16:38 semldxlucky kernel: LustreError: 
> 2648:0:(mds_open.c:826:mds_finish_open()) Skipped 1 previous similar message
> 
> MDS + OSS's version : CentOS 5.4 and lustre version 1.8.1.1 
> Clients version : CentOS 4.8 and lustre version 1.6.7.2

This is really a problem between the MDS and the OSS.  Is there anything in the 
OSS logs?

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] How to eliminate an OST from an OSS server

2010-05-26 Thread Brian J. Murrell
On Wed, 2010-05-26 at 09:43 +0300, Mohamed Adel wrote: 
> Dear,
> 
> I have a pre-installed lustre file-system with two MDSs and two OSSs. Each 
> OSS has 6 arrays of disks. In OSS01, one of the arrays is not working 
> properly due to a physical error in one of the disks of the array. I wanted 
> to eliminate this array from OSS01. Simply I restarted MDSs and the OSSs and 
> didn't mount that array. Then I mounted the file-system on the clients. The 
> clients didn't produce any error after mounting but the file-system is not 
> accessible; i.e. I can't even issue an "ls" command to view the contents of 
> the file-system. I need to eliminate that corrupted array to be able to use 
> the file-system but don't know how.

You likely want to deactivate the OST on the MDT and possibly on the
clients also, depending on your goals.

> Could anyone help me with that or just guide me what should I read to learn 
> how to fix this issue.

Have you checked the operations manual at
http://wiki.lustre.org/index.php/Lustre_Documentation?

Maybe you did but you just didn't realize "deactivate" was the operation
you were looking to perform.

Cheers,
b.



signature.asc
Description: This is a digitally signed message part
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss