[Lustre-discuss] Multiple IB ports

2011-03-20 Thread Brian O'Connor
Hi,

Any body actually using multiple IB ports on a client for an
aggregated connection?

 

Ie. Many oss with one qdr IB each. Clients with 4 qdr IB ports. Assuming
the normal

issues with bus bandwidth etc, what sort of perf can I expect 

 

qdr ~ 3-4Gbytes/Sec

 

I'm trying to size a cluster and clients to get ~10GBytes/Sec on *one* 

client node. 

 

If I can aggregate IB linearly the next step will be to try and figure
out

How to get 10Gigabytes/s to local storage :-(

 

 

Some times customers are crazy...

 

 

 

Brian O'Connor

 

-

 

SGI Consulting

 

Email: bri...@sgi.com  , Mobile +61 417 746 452

 

Phone: +61 3 9963 1900, Fax: +61 3 9963 1902

 

357 Camberwell Road, Camberwell, Victoria, 3124 

 

AUSTRALIA http://www.sgi.com/support/services
 

 

-

 

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] persistent client re-connect failure

2011-03-20 Thread Samuel Aparicio
Follow up to this posting. I notice on the client that lctl device_list reports 
the following:


 0 UP mgc MGC10.9.89.51@tcp 5a76b5b6-82bf-2053-8c17-e68ffe552edc 5
  1 UP lov lustre-clilov-8100459a9c00 6775de4c-6c29-9316-a715-3472233477d1 4
  2 UP mdc lustre-MDT-mdc-8100459a9c00 
6775de4c-6c29-9316-a715-3472233477d1 5
  3 UP osc lustre-OST-osc-8100459a9c00 
6775de4c-6c29-9316-a715-3472233477d1 5
  4 UP osc lustre-OST0001-osc-8100459a9c00 
6775de4c-6c29-9316-a715-3472233477d1 5
  5 UP osc lustre-OST0002-osc-8100459a9c00 
6775de4c-6c29-9316-a715-3472233477d1 5
  6 UP osc lustre-OST0003-osc-8100459a9c00 
6775de4c-6c29-9316-a715-3472233477d1 4
  7 UP osc lustre-OST0004-osc-8100459a9c00 
6775de4c-6c29-9316-a715-3472233477d1 5
  8 UP osc lustre-OST0005-osc-8100459a9c00 
6775de4c-6c29-9316-a715-3472233477d1 5
  9 UP osc lustre-OST0006-osc-8100459a9c00 
6775de4c-6c29-9316-a715-3472233477d1 5
 10 UP lov lustre-clilov-810c92f2b800 0ecd69f5-6793-fcb1-0e05-8851c99e5dc5 4
 11 UP mdc lustre-MDT-mdc-810c92f2b800 
0ecd69f5-6793-fcb1-0e05-8851c99e5dc5 5
 12 UP osc lustre-OST-osc-810c92f2b800 
0ecd69f5-6793-fcb1-0e05-8851c99e5dc5 5
 13 UP osc lustre-OST0001-osc-810c92f2b800 
0ecd69f5-6793-fcb1-0e05-8851c99e5dc5 5
 14 UP osc lustre-OST0002-osc-810c92f2b800 
0ecd69f5-6793-fcb1-0e05-8851c99e5dc5 5
 15 UP osc lustre-OST0003-osc-810c92f2b800 
0ecd69f5-6793-fcb1-0e05-8851c99e5dc5 4
 16 UP osc lustre-OST0004-osc-810c92f2b800 
0ecd69f5-6793-fcb1-0e05-8851c99e5dc5 5
 17 UP osc lustre-OST0005-osc-810c92f2b800 
0ecd69f5-6793-fcb1-0e05-8851c99e5dc5 5
 18 UP osc lustre-OST0006-osc-810c92f2b800 
0ecd69f5-6793-fcb1-0e05-8851c99e5dc5 5
 19 UP lov lustre-clilov-81047a45c000 6a3d5815-4851-31b0-9400-c8892e11dae4 4
 20 UP mdc lustre-MDT-mdc-81047a45c000 
6a3d5815-4851-31b0-9400-c8892e11dae4 5
 21 UP osc lustre-OST-osc-81047a45c000 
6a3d5815-4851-31b0-9400-c8892e11dae4 5
 22 UP osc lustre-OST0001-osc-81047a45c000 
6a3d5815-4851-31b0-9400-c8892e11dae4 5
 23 UP osc lustre-OST0002-osc-81047a45c000 
6a3d5815-4851-31b0-9400-c8892e11dae4 5
 24 UP osc lustre-OST0003-osc-81047a45c000 
6a3d5815-4851-31b0-9400-c8892e11dae4 4
 25 UP osc lustre-OST0004-osc-81047a45c000 
6a3d5815-4851-31b0-9400-c8892e11dae4 5
 26 UP osc lustre-OST0005-osc-81047a45c000 
6a3d5815-4851-31b0-9400-c8892e11dae4 5
 27 UP osc lustre-OST0006-osc-81047a45c000 
6a3d5815-4851-31b0-9400-c8892e11dae4 5


However OST3 is non-existent, it was de-activated on the MDS - why would the 
clients think it exists?
















Professor Samuel Aparicio BM BCh PhD FRCPath
Nan and Lorraine Robertson Chair UBC/BC Cancer Agency
675 West 10th, Vancouver V5Z 1L3, Canada.
office: +1 604 675 8200 lab website http://molonc.bccrc.ca

PLEASE SUPPORT MY FUNDRAISING FOR THE RIDE TO SEATTLE AND 
THE WEEKEND TO END WOMENS CANCERS. YOU CAN DONATE AT THE LINKS BELOW
Ride to Seattle Fundraiser
Weekend to End Womens Cancers




On Mar 20, 2011, at 8:41 PM, Samuel Aparicio wrote:

> I am stuck with the following issue on a client attached to a lustre system.
> we are running lustre 1.8.5
> somehow connectivity to the OST failed at some point and the mount hung.
> after unmounting and re-mounting the client attempts to reconnect.
> lctl ping shows the client to be connected and normal ping to the OSS/MGS 
> servers shows connectivity.
> 
> remounting the filesystem results in only some files being visible.
> the kernel messages are as follows:
> -
> Lustre: setting import lustre-OST0003_UUID INACTIVE by administrator request
> Lustre: lustre-OST0003-osc-8110238c7400.osc: set parameter active=0
> Lustre: Skipped 3 previous similar messages
> LustreError: 14114:0:(lov_obd.c:315:lov_connect_obd()) not connecting OSC ^\; 
> administratively disabled
> Lustre: Client lustre-client has started
> LustreError: 14207:0:(file.c:995:ll_glimpse_size()) obd_enqueue returned rc 
> -5, returning -EIO
> LustreError: 14207:0:(file.c:995:ll_glimpse_size()) Skipped 1 previous 
> similar message
> LustreError: 14207:0:(file.c:995:ll_glimpse_size()) obd_enqueue returned rc 
> -5, returning -EIO
> LustreError: 14686:0:(file.c:995:ll_glimpse_size()) obd_enqueue returned rc 
> -5, returning -EIO
> Lustre: 22218:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request 
> x1363662012007464 sent from lustre-OST-osc-8110238c7400 to NID 
> 10.9.89.21@tcp 16s ago has timed out (16s prior to deadline).
>   req@810459ce4c00 x1363662012007464/t0 
> o8->lustre-OST_UUID@10.9.89.21@tcp:28/4 lens 368/584 e 0 to 1 dl 
> 1300678232 ref 1 fl Rpc:N/0/0 rc 0/0
> Lustre: 22218:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 182 
> previous similar messages
> Lustre: 22219:0:(import.c:517:import_select_connection()) 
> lustre-OST-osc-8110238c7400: tried all connections, increasing 
> latency to 18s
> Lustre: 22219:0:(import.c:517:import_select_connection()) Skipped 203 
> previous sim

[Lustre-discuss] persistent client re-connect failure

2011-03-20 Thread Samuel Aparicio
I am stuck with the following issue on a client attached to a lustre system.
we are running lustre 1.8.5
somehow connectivity to the OST failed at some point and the mount hung.
after unmounting and re-mounting the client attempts to reconnect.
lctl ping shows the client to be connected and normal ping to the OSS/MGS 
servers shows connectivity.

remounting the filesystem results in only some files being visible.
the kernel messages are as follows:
-
Lustre: setting import lustre-OST0003_UUID INACTIVE by administrator request
Lustre: lustre-OST0003-osc-8110238c7400.osc: set parameter active=0
Lustre: Skipped 3 previous similar messages
LustreError: 14114:0:(lov_obd.c:315:lov_connect_obd()) not connecting OSC ^\; 
administratively disabled
Lustre: Client lustre-client has started
LustreError: 14207:0:(file.c:995:ll_glimpse_size()) obd_enqueue returned rc -5, 
returning -EIO
LustreError: 14207:0:(file.c:995:ll_glimpse_size()) Skipped 1 previous similar 
message
LustreError: 14207:0:(file.c:995:ll_glimpse_size()) obd_enqueue returned rc -5, 
returning -EIO
LustreError: 14686:0:(file.c:995:ll_glimpse_size()) obd_enqueue returned rc -5, 
returning -EIO
Lustre: 22218:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request 
x1363662012007464 sent from lustre-OST-osc-8110238c7400 to NID 
10.9.89.21@tcp 16s ago has timed out (16s prior to deadline).
  req@810459ce4c00 x1363662012007464/t0 
o8->lustre-OST_UUID@10.9.89.21@tcp:28/4 lens 368/584 e 0 to 1 dl 1300678232 
ref 1 fl Rpc:N/0/0 rc 0/0
Lustre: 22218:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 182 
previous similar messages
Lustre: 22219:0:(import.c:517:import_select_connection()) 
lustre-OST-osc-8110238c7400: tried all connections, increasing latency 
to 18s
Lustre: 22219:0:(import.c:517:import_select_connection()) Skipped 203 previous 
similar messages


an LS of the filesytem shows

drwxr-xr-x 4 amcpherson users 4096 Mar 19 10:38 amcpherson
?- ? ?  ??? compute-2-0-testwrite
?- ? ?  ??? hello

--

other clients on the system are able to mount and see the files perfectly well.

can anyone help with what the errors above imply. 

a simple network connectivity issue does not seem to be the case here,
yet the client attempts to re-connect to the OST, fail.







Professor Samuel Aparicio BM BCh PhD FRCPath
Nan and Lorraine Robertson Chair UBC/BC Cancer Agency
675 West 10th, Vancouver V5Z 1L3, Canada.
office: +1 604 675 8200 lab website http://molonc.bccrc.ca

PLEASE SUPPORT MY FUNDRAISING FOR THE RIDE TO SEATTLE AND 
THE WEEKEND TO END WOMENS CANCERS. YOU CAN DONATE AT THE LINKS BELOW
Ride to Seattle Fundraiser
Weekend to End Womens Cancers




___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lfs check servers and lbug problem

2011-03-20 Thread Colin Faber
Hi,

Exact error messages are useful in this case.

-cf


On 03/20/2011 11:16 AM, Lex wrote:
> Hi there
>
> I have a problem with my lustre system,
>
> After using: *
>
> lctl --device 18 conf_param lustre-OST000d.osc.active=0
>
> *on MDS to permanently deactivate OST, i have a trouble with lfs and 
> lbug on lustre client. In a newly mount lustre client, and when i used 
> lfs to check:
>
> *lfs check servers
>
> *lfs was hang and lbug kept dumping on my lustre client.
> I still could access my data in lustre client, but when i executed 
> "*df -h*" command, it was hang with "*kernel panic - not syncing"* 
> report, it's really bad, i had to restart my server manually.
>
> Although at the same time, every other clients ( mounted client ) 
> worked well, but i still really worry about my data. I don't know 
> whether this bug is harmful with my data or not.
>
> This problem is exactly like this bug 
>  with lfs ( i found 
> it by googling around ) .
>
> Although the status of this bug is "resolved" but i'm currently being 
> in trouble with it, with all the same symptoms.
>
> I also found this post 
>  
> from Andreas that mentioned a patch of this bug but i couldn't find 
> anything.
>
> I'm using Lustre 1.8 and CentOS 5.5
>
> Could anyone give me some advice in this situation ? Is there a way to 
> fix this bug ?
>
> Many thanks
>
>
>
> ___
> Lustre-discuss mailing list
> Lustre-discuss@lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] MDS activity monitoring

2011-03-20 Thread Andreas Davour
Hi

I've decided I want to keep an eye on the activity on my MDS, and poking about 
in /proc I found both /proc/fs/lustre/mds/-MDT/stats and 
/proc/fs/lustre/mdt/MDS/mds/stats which both contain stats about metaops, but 
while similar they are slightly different. 

Now, I'm not really sure I see how they are different, and why there are two 
/proc/fs/lustre/md* directories with stats like that? 

Can somebody who knows something about lustre internals maybe shed some light 
on the layout of the /proc files and where to best look to get a grip on what's 
going on in the file system?

/andreas
-- 
Systems Engineer
PDC Center for High Performance Computing
CSC School of Computer Science and Communication
KTH Royal Institute of Technology
SE-100 44 Stockholm, Sweden
Phone: 087906658
"A satellite, an earring, and a dust bunny are what made America great!"
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] lfs check servers and lbug problem

2011-03-20 Thread Lex
Hi there

I have a problem with my lustre system,

After using: *

lctl --device 18 conf_param lustre-OST000d.osc.active=0

*on MDS to permanently deactivate OST, i have a trouble with lfs and lbug on
lustre client. In a newly mount lustre client, and when i used lfs to check:


*lfs check servers

*lfs was hang and lbug kept dumping on my lustre client.
I still could access my data in lustre client, but when i executed "*df -h*"
command, it was hang with "*kernel panic - not syncing"* report, it's really
bad, i had to restart my server manually.

Although at the same time, every other clients ( mounted client ) worked
well, but i still really worry about my data. I don't know whether this bug
is harmful with my data or not.

This problem is exactly like this
bugwith lfs ( i
found it by googling around ) .

Although the status of this bug is "resolved" but i'm currently being in
trouble with it, with all the same symptoms.

I also found this
postfrom
Andreas that mentioned a patch of this bug but i couldn't find
anything.

I'm using Lustre 1.8 and CentOS 5.5

Could anyone give me some advice in this situation ? Is there a way to fix
this bug ?

Many thanks
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss