[lustre-discuss] (no subject)

2022-05-17 Thread Alastair Basden via lustre-discuss

Hi all,

We had a problem with one of our MDS (ldiskfs) on Lustre 2.12.6, which we 
think is a bug - but haven't been able to identify it.  Can anyone shed 
any light?  We unmounted and remounted the mdt at around 23:00.


Client logs:
May 16 22:15:41 m8011 kernel: LustreError: 11-0: 
lustrefs8-MDT-mdc-956fb73c3800: operation ldlm_enqueue to node 
172.18.185.1@o2ib failed: rc = -107
May 16 22:15:41 m8011 kernel: Lustre: lustrefs8-MDT-mdc-956fb73c3800: 
Connection to lustrefs8-MDT (at 172.18.185.1@o2ib) was lost; in progress 
operations using this service will wait for recovery to complete
May 16 22:15:41 m8011 kernel: LustreError: Skipped 5 previous similar messages
May 16 22:15:48 m8011 kernel: Lustre: 
101710:0:(client.c:2146:ptlrpc_expire_one_request()) @@@ Request sent has timed 
out for slow reply: [sent 1652735641/real 1652735641]  req@949d8cb1de80 
x1724290358528896/t0(0) 
o101->lustrefs8-MDT-mdc-956fb73c3800@172.18.185.1@o2ib:12/10 lens 
480/568 e 4 to 1 dl 1652735748 ref 2 fl Rpc:X/0/ rc 0/-1
May 16 22:15:48 m8011 kernel: Lustre: 
101710:0:(client.c:2146:ptlrpc_expire_one_request()) Skipped 6 previous similar 
messages
May 16 23:00:15 m8011 kernel: Lustre: 
4784:0:(client.c:2146:ptlrpc_expire_one_request()) @@@ Request sent has timed out 
for slow reply: [sent 1652738408/real 1652738408]  req@94ea07314380 
x1724290358763776/t0(0) o400->MGC172.18.185.1@o2ib@172.18.185.1@o2ib:26/25 lens 
224/224 e 0 to 1 dl 1652738415 ref 1 fl Rpc:XN/0/ rc 0/-1
May 16 23:00:15 m8011 kernel: LustreError: 166-1: MGC172.18.185.1@o2ib: 
Connection to MGS (at 172.18.185.1@o2ib) was lost; in progress operations using 
this service will fail
May 16 23:00:15 m8011 kernel: Lustre: Evicted from MGS (at 
MGC172.18.185.1@o2ib_0) after server handle changed from 0xdb7c7c778c8908d6 to 
0xdb7c7cbad3be9e79
May 16 23:00:15 m8011 kernel: Lustre: MGC172.18.185.1@o2ib: Connection restored 
to MGC172.18.185.1@o2ib_0 (at 172.18.185.1@o2ib)
May 16 23:01:49 m8011 kernel: LustreError: 167-0: 
lustrefs8-MDT-mdc-956fb73c3800: This client was evicted by 
lustrefs8-MDT; in progress operations using this service will fail.
May 16 23:01:49 m8011 kernel: LustreError: 
101719:0:(vvp_io.c:1562:vvp_io_init()) lustrefs8: refresh file layout 
[0x28107:0x9b08:0x0] error -108.
May 16 23:01:49 m8011 kernel: LustreError: 
101719:0:(vvp_io.c:1562:vvp_io_init()) Skipped 3 previous similar messages
May 16 23:01:49 m8011 kernel: Lustre: lustrefs8-MDT-mdc-956fb73c3800: 
Connection restored to 172.18.185.1@o2ib (at 172.18.185.1@o2ib)



MDS server logs:
May 16 22:15:40 c8mds1 kernel: LustreError: 
10686:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired 
after 99s: evicting client at 172.18.181.11@o2ib  ns: 
mdt-lustrefs8-MDT_UUID lock: 97b3730d98c0/0xdb7c7cbad3be1c7b lrc: 3/0,0 
mode: PW/PW res: [0x29119:0x327f:0x0].0x0 bits 0x40/0x0 rrc: 201 type: IBT 
flags: 0x6020040020 nid: 172.18.181.11@o2ib remote: 0xe62e31610edfb808 
expref: 90 pid: 10707 timeout: 8482830 lvb_type: 0
May 16 22:15:40 c8mds1 kernel: LustreError: 
10712:0:(ldlm_lockd.c:1351:ldlm_handle_enqueue0()) ### lock on destroyed export 
9769eaf46c00 ns: mdt-lustrefs8-MDT_UUID lock: 
97d828635e80/0xdb7c7cbad3be1c90 lrc: 3/0,0 mode: PW/PW res: 
[0x29119:0x327f:0x0].0x0 bits 0x40/0x0 rrc: 199 type: IBT flags: 
0x5020040020 nid: 172.18.181.11@o2ib remote: 0xe62e31610edfb80f expref: 77 
pid: 10712 timeout: 0 lvb_type: 0
May 16 22:15:40 c8mds1 kernel: LustreError: 
10712:0:(ldlm_lockd.c:1351:ldlm_handle_enqueue0()) Skipped 27 previous similar 
messages
May 16 22:17:22 c8mds1 kernel: LNet: Service thread pid 10783 was inactive for 
200.73s. The thread might be hung, or it might only be slow and will resume 
later. Dumping the stack trace for debugging purposes:
May 16 22:17:22 c8mds1 kernel: LNet: Skipped 3 previous similar messages
May 16 22:17:22 c8mds1 kernel: Pid: 10783, comm: mdt01_040 
3.10.0-1160.2.1.el7_lustre.x86_64 #1 SMP Wed Dec 9 20:53:35 UTC 2020
May 16 22:17:22 c8mds1 kernel: Call Trace:
May 16 22:17:22 c8mds1 kernel: [] 
ldlm_completion_ast+0x430/0x860 [ptlrpc]
May 16 22:17:22 c8mds1 kernel: [] 
ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc]
May 16 22:17:22 c8mds1 kernel: [] 
mdt_object_local_lock+0x50b/0xb20 [mdt]
May 16 22:17:22 c8mds1 kernel: [] 
mdt_object_lock_internal+0x70/0x360 [mdt]
May 16 22:17:22 c8mds1 kernel: [] mdt_object_lock+0x20/0x30 
[mdt]
May 16 22:17:22 c8mds1 kernel: [] mdt_brw_enqueue+0x44b/0x760 
[mdt]
May 16 22:17:22 c8mds1 kernel: [] mdt_intent_brw+0x1f/0x30 
[mdt]
May 16 22:17:22 c8mds1 kernel: [] 
mdt_intent_policy+0x435/0xd80 [mdt]
May 16 22:17:22 c8mds1 kernel: [] 
ldlm_lock_enqueue+0x376/0x9b0 [ptlrpc]
May 16 22:17:22 c8mds1 kernel: [] 
ldlm_handle_enqueue0+0xa86/0x1620 [ptlrpc]
May 16 22:17:22 c8mds1 kernel: [] tgt_enqueue+0x62/0x210 
[ptlrpc]
May 16 22:17:22 c8mds1 kernel: [] 
tgt_request_handle+0xada/0x1570 [ptlrpc]
May 16 

Re: [lustre-discuss] (no subject)

2022-01-12 Thread Peter Jones via lustre-discuss
Hi Koos

Thanks for bringing this up. This was just due to the person who usually copies 
the files across being on vacation at the time of the release. It’s there now.

Peter

From: lustre-discuss  on behalf of 
"Meijering, Koos via lustre-discuss" 
Reply-To: "Meijering, Koos" 
Date: Wednesday, January 12, 2022 at 1:38 AM
To: lustre-discuss 
Subject: [lustre-discuss] (no subject)

Hello,

On December 17 2021 Lustre 2.12.8 is released, in the past there was also a 
Mellanox Infiniband build, is there a plan to make a build for 2.12.8 for 
infiniband, or do we need to build our own build servers?

Bets regards
Koos Meijering

HPC/Certificaten/Lokaal Overleg
Rijksuniversiteit Groningen

h.meijer...@rug.nl<mailto:h.meijer...@rug.nl>
https://www.rug.nl/cit

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] (no subject)

2018-05-21 Thread Yugendra Guvvala
Hi Cory,

Thank you, I see for the link that  Interoperability Support is only 2.9.x
so I understand that we cant have client modules at version 2.4.x. also I
see supported kernel versions.

Thanks,
Yugi






On Mon, May 21, 2018 at 12:40 PM, Cory Spitz <spitz...@cray.com> wrote:

> Hello, Yugi.
>
>
>
> Please note that 2.10.3 is the latest LTS release and 2.10.4 is about to
> be released very soon.
>
> RHEL/CentOS 6.9 clients and RHEL/CentOS 7.4 servers are supported in
> 2.10.3.  See: http://wiki.lustre.org/Lustre_2.10.3_Changelog.  2.10.4
> will add support for RHEL/CentOS 7.5 servers.
>
>
>
> -Cory
>
>
>
> --
>
>
>
>
>
> *From: *lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on
> behalf of Yugendra Guvvala <yguvv...@cambridgecomputer.com>
> *Date: *Monday, May 21, 2018 at 11:12 AM
> *To: *"lustre-discuss@lists.lustre.org" <lustre-discuss@lists.lustre.org>
> *Subject: *[lustre-discuss] (no subject)
>
>
>
> Hi,
>
>
>
> We currently run Luster 2.4
>
>
>
> We are looking to upgrade to Lustre Version 2.10.2, Do we have to upgrade
> Lustre Client modules or is it Backward compatible?
>
>
>
> What are the compatible kernel versions for 2.10.2 on both server and
> client modules.
>
>
>
> We want to keep clients OS level at CENTOS 6.8 and upgrade Storage servers
> to CentOS 7.4. We are trying to phase this upgrade process to Servers first
> and then Clients and with all softwares, Drivers compatibility we are
> trying to find best upgrade path any suggestions are welcome.
>
>
>
> We could only find OS version and not kernel versions on support matrix
> here:
>
>
>
> https://wiki.hpdd.intel.com/display/PUB/Lustre+Support+Matrix
>
>
>
> Thanks,
>
> Yugi
>
>
>
>
>



-- 
Thanks,

*Yugendra Guvvala | HPC Technologist ** |** Cambridge Computer ** |** "Artists
in Data Storage" *
Direct: 781-250-3273  | Cell: 806-773-4464  | yguvv...@cambridgecomputer.com
  | www.cambridgecomputer.com
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] (no subject)

2018-05-21 Thread Cory Spitz
Hello, Yugi.

Please note that 2.10.3 is the latest LTS release and 2.10.4 is about to be 
released very soon.
RHEL/CentOS 6.9 clients and RHEL/CentOS 7.4 servers are supported in 2.10.3.  
See: http://wiki.lustre.org/Lustre_2.10.3_Changelog.  2.10.4 will add support 
for RHEL/CentOS 7.5 servers.

-Cory

--


From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on behalf of 
Yugendra Guvvala <yguvv...@cambridgecomputer.com>
Date: Monday, May 21, 2018 at 11:12 AM
To: "lustre-discuss@lists.lustre.org" <lustre-discuss@lists.lustre.org>
Subject: [lustre-discuss] (no subject)

Hi,

We currently run Luster 2.4

We are looking to upgrade to Lustre Version 2.10.2, Do we have to upgrade 
Lustre Client modules or is it Backward compatible?

What are the compatible kernel versions for 2.10.2 on both server and client 
modules.

We want to keep clients OS level at CENTOS 6.8 and upgrade Storage servers to 
CentOS 7.4. We are trying to phase this upgrade process to Servers first and 
then Clients and with all softwares, Drivers compatibility we are trying to 
find best upgrade path any suggestions are welcome.

We could only find OS version and not kernel versions on support matrix here:

https://wiki.hpdd.intel.com/display/PUB/Lustre+Support+Matrix

Thanks,
Yugi


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] (no subject)

2018-05-21 Thread Yugendra Guvvala
Hi,

We currently run Luster 2.4

We are looking to upgrade to Lustre Version 2.10.2, Do we have to upgrade
Lustre Client modules or is it Backward compatible?

What are the compatible kernel versions for 2.10.2 on both server and
client modules.

We want to keep clients OS level at CENTOS 6.8 and upgrade Storage servers
to CentOS 7.4. We are trying to phase this upgrade process to Servers first
and then Clients and with all softwares, Drivers compatibility we are
trying to find best upgrade path any suggestions are welcome.

We could only find OS version and not kernel versions on support matrix
here:

https://wiki.hpdd.intel.com/display/PUB/Lustre+Support+Matrix

Thanks,
Yugi
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[Lustre-discuss] (no subject)

2012-02-23 Thread zhengfeng
Dear all:

When we testing lustre on OST, get the following error:

Feb  6 23:25:09 localhost kernel: LustreError: 
28483:0:(osc_request.c:716:osc_announce_cached()) dirty 33673216  dirty_max 
33554432

I was wondering that, is it the dirty pages in memory are too much to induce 
this problem, pdflush thread had not written back dirty pages on time? 

Could you give me some advice about this issue?  Many Thanks.


Best Regards 


feng___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] (no subject)

2010-03-23 Thread tim . lund
test ignore

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] (no subject)

2010-03-23 Thread tim . lund
test ignore

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] (no subject)

2009-11-24 Thread Antonio Concas
You can not directly use  netapp appliance,
first of all you should enable FC license on netapp
than you need a Fiber channel switch like cisco mds o brocade,
than you have to connect cisco o brocade to netapp FC ports;
and last but not list you need computer with fiber channel connection,
directly connected to FC switch, that mount netapp LUN, and so you can
implement LUSTRE using a netapp like one o more ost

cheers,
Antonio

rishi pathak wrote:
 If your netapp appliance has feasibility of exporting volumes as iscsi 
 then you have a chance

 On Sun, Nov 22, 2009 at 12:23 PM, muhammed navas 
 navasonl...@gmail.com mailto:navasonl...@gmail.com wrote:

 My company having multple cluters running on nfs. we would like to
 test lustre  file system in our cluster. we are using Netapp  for
 storage. i went through lot of lustre docs , most of them are
 talking about local HD storage(OST). may i know how can i
 implement lustre using Netapp as storage(OST)?




 Regards,
 Muhammed navas
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 mailto:Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss




 -- 
 Regards--
 Rishi Pathak
 National PARAM Supercomputing Facility
 Center for Development of Advanced Computing(C-DAC)
 Pune University Campus,Ganesh Khind Road
 Pune-Maharastra
 

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
   

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] (no subject)

2009-11-23 Thread James Beal

I don't think that would be particularly sane to use iscsi imho.

Ontap 8.0 does have have clustered mode which allows a number of  
systems to have a single name space and load balance between a number  
of servers.


From the Announcement they sent round in September.

Data ONTAP 8.0 is the first release of the Data ONTAP 8 release  
family. It is a single codebase, with two separately orderable products:

Data ONTAP 8.0 7-Mode: The next release of Data ONTAP 7G (after 7.3.x)
Data ONTAP 8.0 Cluster-Mode: The next release of Data ONTAP GX (after  
10.0.x)
The RCx classification indicates that NetApp has completed its  
internal testing of the release. RCs are provided primarily to enable  
customers who want to start early on exploring the release for either  
new features or bug fixes, or who want to start testing the release  
before deploying it in critical production environments. NetApp may  
provide multiple RCs as is necessary to address specific issues found  
before the release becomes a General Availability (GA) release. NetApp  
Global Services (NGS) provides support for RCs.




On 23 Nov 2009, at 13:27, rishi pathak wrote:

If your netapp appliance has feasibility of exporting volumes as  
iscsi then you have a chance


On Sun, Nov 22, 2009 at 12:23 PM, muhammed navas navasonl...@gmail.com 
 wrote:
My company having multple cluters running on nfs. we would like to  
test lustre  file system in our cluster. we are using Netapp  for  
storage. i went through lot of lustre docs , most of them are  
talking about local HD storage(OST). may i know how can i implement  
lustre using Netapp as storage(OST)?






--
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] (no subject)

2009-11-21 Thread muhammed navas
My company having multple cluters running on nfs. we would like to test
lustre  file system in our cluster. we are using Netapp  for storage. i went
through lot of lustre docs , most of them are talking about local HD
storage(OST). may i know how can i implement lustre using Netapp as
storage(OST)?




Regards,
Muhammed navas
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] (no subject)

2009-11-21 Thread Andreas Dilger
On 2009-11-21, at 23:53, muhammed navas wrote:
 My company having multple cluters running on nfs. we would like to  
 test lustre  file system in our cluster. we are using Netapp  for  
 storage. i went through lot of lustre docs , most of them are  
 talking about local HD storage(OST). may i know how can i implement  
 lustre using Netapp as storage(OST)?


This isn't really possible.  Netapp servers are exporting NFS protocol  
already, and that isn't what Lustre can use.

The benefit of Lustre is that you can instead use much less expensive  
commodity servers to provide the storage, and Lustre will export it as  
a single filesystem to clients.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] (no subject)

2009-05-11 Thread Hayes, Robert N
While performing a single copy, single client write/read test using dd, we are 
finding that our Nehalem clients running
2.6.18-92.1.10.el5-lustre-1.6.5.1
write about half the speed of our Nehalem clients running
2.6.18-53.1.13.el5_lustre.1.6.4.3 to three different lustre file systems.
This is true even though the slower clients have the same processors and more 
RAM, 18GB for the slow writers and 12GB for the fast writers. Both systems use 
OFED 1.3.1. All benchmarks we use perform better on the slow-write clients and 
read speed from LFS is comparable across all clients.
Max_rpcs_in_flight and max_pages_per_rpc are default on both systems.
They are on the same IB network, with the same QDR cards and IB connectivity 
has been verified with the IB utilities. They are almost identical in bandwidth 
and latency.

We're also using the same modprobe.conf and openibd.conf files on both systems.
We're using 34GB file size on the 12GB and 18GB RAM systems, 137GB file on the 
96GB RAM system. So it's not a matter of caching in RAM.

Are there known issues with our 2.6.18-92.1.10.el5-lustre-1.6.5.1 combination?

This is not a problem with the lustre file system as we get the same type of 
results no matter which of our three lustre systems the test is being written 
to.

Here are the summaries from several runs of ost-survey on our new Lustre 
system. Please comment on the worst/best deltas of the read and write 
operations.
Number of Active OST devices : 96
Worst Read38.167753   38.932928   39.006537   39.782153   38.717915
Best Read 61.704534   61.832461   63.284999   65.000491   61.836016
Read Average: 51.433847   51.281630   51.297278   51.582327   51.318410
Worst Write   34.311237   49.009757   55.272744   51.532331   51.816523
Best Write   94.001170   96.033483   93.401792   93.081544   91.030717
Write Average:74.248683   71.831019   75.179863   74.723100   74.930529

/bob


Bob Hayes
System Administrator
SSG-DRD-DP
Office:  253-371-3040
Cell: 253-441-5482
e-mail: robert.n.ha...@intel.commailto:robert.n.ha...@intel.com

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] (no subject)

2009-05-11 Thread Andreas Dilger
On May 11, 2009  13:35 -0700, Hayes, Robert N wrote:
 While performing a single copy, single client write/read test using dd,
 we find that our Nehalem clients running 2.6.18-92.1.10.el5-lustre-1.6.5.1
 write about half the speed of our Nehalem clients running
 2.6.18-53.1.13.el5_lustre.1.6.4.3 to three different lustre file systems.

 This is true even though the slower clients have the same processors and
 more RAM, 18GB for the slow writers and 12GB for the fast writers. Both
 systems use OFED 1.3.1. All benchmarks we use perform better on the
 slow-write clients and read speed from LFS is comparable across all
 clients.

Have you tried booting the slower-with-more-RAM clients using mem=12G
to see if the performance gets worse with more RAM?  There is a known
performance bottleneck with the client-side cache in 1.6 clients, and
you may be triggering this...

If you have the luxury to do so, testing a 1.8.0 client's IO performance
against the same filesystems would also determine if the client-side
cache performance fixes therein will already solve your problems.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] (no subject)

2009-05-11 Thread David Dillow
On Mon, 2009-05-11 at 13:35 -0700, Hayes, Robert N wrote:
 While performing a single copy, single client write/read test using
 dd, we are finding that our Nehalem clients running 
 
 2.6.18-92.1.10.el5-lustre-1.6.5.1 
 
 write about half the speed of our Nehalem clients running 
 
 2.6.18-53.1.13.el5_lustre.1.6.4.3 to three different lustre file
 systems.

We've seen a fairly substantial block-level device throughput regression
going from -53 to -92 without involving Lustre, but I've not yet had
time to run down the changes to see what could be causing it.

-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] (no subject)

2009-05-11 Thread Hayes, Robert N
Dave
Does the substantial block-level device throughput regression exist in 
2.6.18-128?

/bob

-Original Message-
From: David Dillow [mailto:dillo...@ornl.gov] 
Sent: Monday, May 11, 2009 2:20 PM
To: Hayes, Robert N
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [Lustre-discuss] (no subject)

On Mon, 2009-05-11 at 13:35 -0700, Hayes, Robert N wrote:
 While performing a single copy, single client write/read test using
 dd, we are finding that our Nehalem clients running 
 
 2.6.18-92.1.10.el5-lustre-1.6.5.1 
 
 write about half the speed of our Nehalem clients running 
 
 2.6.18-53.1.13.el5_lustre.1.6.4.3 to three different lustre file
 systems.

We've seen a fairly substantial block-level device throughput regression
going from -53 to -92 without involving Lustre, but I've not yet had
time to run down the changes to see what could be causing it.

-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] (no subject)

2009-05-11 Thread David Dillow
On Mon, 2009-05-11 at 15:44 -0700, Hayes, Robert N wrote:
 Dave
 Does the substantial block-level device throughput regression exist
 in 2.6.18-128?

I couldn't say, haven't tested it. Our environment is SRP over IB for
the storage, so that'll be my focus when I look at the changes between
the two kernels.
-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] (no subject)

2009-05-11 Thread Andreas Dilger
On May 11, 2009  15:44 -0700, Hayes, Robert N wrote:
 Does the substantial block-level device throughput regression exist
 in 2.6.18-128?

Note that block-level device is meaningless from the point of view
of Lustre clients.  If you changed the client software only then this
shouldn't be a factor.

 -Original Message-
 From: David Dillow [mailto:dillo...@ornl.gov] 
 Sent: Monday, May 11, 2009 2:20 PM
 To: Hayes, Robert N
 Cc: lustre-discuss@lists.lustre.org
 Subject: Re: [Lustre-discuss] (no subject)
 
 On Mon, 2009-05-11 at 13:35 -0700, Hayes, Robert N wrote:
  While performing a single copy, single client write/read test using
  dd, we are finding that our Nehalem clients running 
  
  2.6.18-92.1.10.el5-lustre-1.6.5.1 
  
  write about half the speed of our Nehalem clients running 
  
  2.6.18-53.1.13.el5_lustre.1.6.4.3 to three different lustre file
  systems.
 
 We've seen a fairly substantial block-level device throughput regression
 going from -53 to -92 without involving Lustre, but I've not yet had
 time to run down the changes to see what could be causing it.
 
 -- 
 Dave Dillow
 National Center for Computational Science
 Oak Ridge National Laboratory
 (865) 241-6602 office
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] (no subject)

2009-05-11 Thread Andreas Dilger
On May 11, 2009  14:38 -0700, Hayes, Robert N wrote:
 We will test the mem=12G suggestion. Before attempting the 1.8.0 client,
 can you confirm that a 1.8 client should work with a 1.6 server without
 causing any more complications?

Yes, the 1.8.x clients are interoperable with 1.6.x servers.  If you are
worried about testing this out during live system time then you can wait
for an outage window to test the 1.8 client in isolation.  There is
nothing to do on the server, and just RPM upgrade/downgrades on the client.

 -Original Message-
 From: andreas.dil...@sun.com [mailto:andreas.dil...@sun.com] On Behalf Of 
 Andreas Dilger
 Sent: Monday, May 11, 2009 1:54 PM
 To: Hayes, Robert N
 Cc: lustre-discuss@lists.lustre.org
 Subject: Re: [Lustre-discuss] (no subject)
 
 On May 11, 2009  13:35 -0700, Hayes, Robert N wrote:
  While performing a single copy, single client write/read test using dd,
  we find that our Nehalem clients running 2.6.18-92.1.10.el5-lustre-1.6.5.1
  write about half the speed of our Nehalem clients running
  2.6.18-53.1.13.el5_lustre.1.6.4.3 to three different lustre file systems.
 
  This is true even though the slower clients have the same processors and
  more RAM, 18GB for the slow writers and 12GB for the fast writers. Both
  systems use OFED 1.3.1. All benchmarks we use perform better on the
  slow-write clients and read speed from LFS is comparable across all
  clients.
 
 Have you tried booting the slower-with-more-RAM clients using mem=12G
 to see if the performance gets worse with more RAM?  There is a known
 performance bottleneck with the client-side cache in 1.6 clients, and
 you may be triggering this...
 
 If you have the luxury to do so, testing a 1.8.0 client's IO performance
 against the same filesystems would also determine if the client-side
 cache performance fixes therein will already solve your problems.
 
 Cheers, Andreas
 --
 Andreas Dilger
 Sr. Staff Engineer, Lustre Group
 Sun Microsystems of Canada, Inc.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] (no subject)

2009-05-11 Thread Charles Taylor

On May 11, 2009, at 8:07 PM, Andreas Dilger wrote:

 On May 11, 2009  14:38 -0700, Hayes, Robert N wrote:
 We will test the mem=12G suggestion. Before attempting the 1.8.0  
 client,
 can you confirm that a 1.8 client should work with a 1.6 server  
 without
 causing any more complications?

 Yes, the 1.8.x clients are interoperable with 1.6.x servers.  If you  
 are
 worried about testing this out during live system time then you can  
 wait
 for an outage window to test the 1.8 client in isolation.  There is
 nothing to do on the server, and just RPM upgrade/downgrades on the  
 client.

And it's a beautiful thing.  :)

Charlie Taylor
UF HPC Center

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] (no subject)

2009-04-02 Thread Roger Spellman
Is there a way to know how many clients are presently mounted?  If so,
is this value changed as clients unmount? 

 

I thought that the counter in /proc/fs/lustre/mds/num_refs was such a
counter.  But, I recently mounted two clients, and I did not see that
value change.

 

Thanks.

 

Roger Spellman

Staff Engineer

Terascala, Inc.

508-588-1501

www.terascala.com http://www.terascala.com/

 

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] (no subject)

2009-03-06 Thread Evan Felix
Sebastion, you are correct on the OST Memory, you probably wont need the
16GB, and if you can save money on using only 4GB, and buy bigger/more
drives, you will probably be better off.

Evan


On 3/5/09 6:29 PM, Sebastian Gutierrez
sebastian.gutier...@cs.stanford.edu wrote:

 Hello
 
 I am looking for advice or suggestions on specking the MDS and OSS for a
 new file system I am building.
 
 The IO patterns as have been explained to me are jobs that either create a
 big mpeg file or thousands of about 100kb frames for video.
 This could grow to hudreds of thousands of 100k files.
 
 I have specked
 2 MDS for a cluster
 8x300GB SAS 15k drives for the MDT in a raid 10(I used raid 10 to reduce
 the chance of the MDT becomming a bottleneck with many small files)
 2x300GB SAS 15k drives for the MGS in a raid 1
 1xHostpare
 32gb 800mhz ram
 3.0 or 2.5 dual quad core procs
 
 4 OSS servers
 16GB 667mhz ram
 2.5 ghz dual quad core procs
 6x1TB SATA 3 7.2k disks 5TB OST in a raid 5 (possibly running software raid)
 expandable to 2 5TB OSTs Per OSS
 1xHotspare
 
 Since this is the first Lustre file system I am building.  I noticed the
 sizing page:
 http://manual.lustre.org/manual/LustreManual16_HTML/SystemLimits.html#50548887
 _46859
 
 The document mentions only dual core procs in regards to the MDS.
 
 The document also mentions a much lower amount of system RAM required for
 the OST.
 
 Am I wasting resources here by over-specking these boxes or is it a good
 idea to over spec a bit?
 
 Am I on the right track based on the IO pattern as I understand it?
 
 Sebastian
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] (no subject)

2009-03-05 Thread Sebastian Gutierrez
Hello

I am looking for advice or suggestions on specking the MDS and OSS for a
new file system I am building.

The IO patterns as have been explained to me are jobs that either create a
big mpeg file or thousands of about 100kb frames for video.
This could grow to hudreds of thousands of 100k files.

I have specked
2 MDS for a cluster
8x300GB SAS 15k drives for the MDT in a raid 10(I used raid 10 to reduce
the chance of the MDT becomming a bottleneck with many small files)
2x300GB SAS 15k drives for the MGS in a raid 1
1xHostpare
32gb 800mhz ram
3.0 or 2.5 dual quad core procs

4 OSS servers
16GB 667mhz ram
2.5 ghz dual quad core procs
6x1TB SATA 3 7.2k disks 5TB OST in a raid 5 (possibly running software raid)
expandable to 2 5TB OSTs Per OSS
1xHotspare

Since this is the first Lustre file system I am building.  I noticed the
sizing page:
http://manual.lustre.org/manual/LustreManual16_HTML/SystemLimits.html#50548887_46859

The document mentions only dual core procs in regards to the MDS.

The document also mentions a much lower amount of system RAM required for
the OST.

Am I wasting resources here by over-specking these boxes or is it a good
idea to over spec a bit?

Am I on the right track based on the IO pattern as I understand it?

Sebastian

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] (no subject)

2009-02-10 Thread Brian Stone
subscribe
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] (unknown subject)

2008-10-13 Thread Deval kulshrestha

HI Kevin

Thanks for your reply. Now I can set it up.

Regards
Deval 

Message: 9
Date: Thu, 09 Oct 2008 05:46:27 -0600
From: Kevin Van Maren [EMAIL PROTECTED]
Subject: Re: [Lustre-discuss] Test setup configuration
To: Deval kulshrestha [EMAIL PROTECTED]
Cc: lustre-discuss@lists.lustre.org
Message-ID: [EMAIL PROTECTED]
Content-Type: text/plain; format=flowed; charset=ISO-8859-1

Deval kulshrestha wrote:

 Hi

  

 I am a new luster user, trying to evaluate luster with few 
 configuration. I am going through luster 1.6 Operation manual. But I 
 am not able to understand which package should be installed on MDS, 
 OSS , and client.

 Should I install all the packages on all three types of nodes?

  

 Please explain

  

 Best Regards

 Deval K


Lustre servers (MDS/OSS):
  kernel-lustre-smp // patched server kernel
  lustre-modules // Lustre kernel modules
  lustre // user space tools (server)
  lustre-ldiskfs // ldiskfs
  e2fsprogs // filesystem tools

You can install all those RPMs on the client as well, but it is not 
necessary.
Lustre clients (assuming you have the matching vendor kernel for the 
lustre-modules installed):
  lustre-client-modules // kernel modules for client
  lustre-client // user space (client)

Kevin



--

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


End of Lustre-discuss Digest, Vol 33, Issue 12
**
No virus found in this incoming message.
Checked by AVG - http://www.avg.com 
Version: 8.0.173 / Virus Database: 270.7.6/1715 - Release Date: 10/8/2008
7:19 PM

===
Privileged or confidential information may be contained
in this message. If you are not the addressee indicated
in this message (or responsible for delivery of the 
message to such person), please delete this message and
kindly notify the sender by an emailed reply. Opinions,
conclusions and other information in this message that
do not relate to the official business of Progression
and its associate entities shall be understood as neither
given nor endorsed by them.
  

---
Progression Infonet Private Limited, Gurgaon (Haryana), India
Authorised dealer of PostMaster, by QuantumLink Communications Pvt. Ltd
Get your free copy of PostMaster at http://www.postmaster.co.in/



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] (no subject)

2008-07-07 Thread Jan H. Julian
While running cat /proc/sys/lnet/peers across all of our client node  
a consistently see all nodes marking two NID's with state down and  
many NIDS with a min in the negative range.  Seem as though this  
could not be a good state, but we are running???
We're still in 1.4.11.  Does this indcate a problem???
...
mg30

  nid  refs state   max   rtr   mintx   min  
queue
  [EMAIL PROTECTED] 1up 8 8 8 8 2 0
  [EMAIL PROTECTED] 1  down 8 8 8 8 2 0
  [EMAIL PROTECTED] 1  down 8 8 8 8 2 0
  [EMAIL PROTECTED] 1up 8 8 8 8 7 0
  [EMAIL PROTECTED] 1up 8 8 8 8 2 0
  [EMAIL PROTECTED] 1up 8 8 8 8 2 0
  [EMAIL PROTECTED] 1up 8 8 8 8 2 0

mt023

  nid  refs state   max   rtr   mintx   min  
queue
  [EMAIL PROTECTED] 1up 8 8 8 8   -16 0
  [EMAIL PROTECTED] 1  down 8 8 8 8   -66 0
  [EMAIL PROTECTED] 1  down 8 8 8 8   -63 0
  [EMAIL PROTECTED] 1up 8 8 8 8 4 0
  [EMAIL PROTECTED] 1up 8 8 8 8-8 0
  [EMAIL PROTECTED] 1up 8 8 8 8-8 0
  [EMAIL PROTECTED] 1up 8 8 8 8-8 0
...

Jan H. Julian
System Administrator
Midnight Cluster Activity Lead
[EMAIL PROTECTED]
907-450-8641



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] (no subject)

2008-06-26 Thread Chad Kerner
We have been having some trouble with one of our filesystems.  We have 
fsck'd all of the disks.  When we try to start the mds for this fs, it 
never actually comes back online.  It is staying in an 'AT' status.  Was 
wondering if anyone had some idea where we should start looking.

Thanks,
Chad


55 AT mds proj proj_UUID 2
  56 UP lov lov_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 4
  57 UP osc OSC_linc-io17_ostp1_1_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  58 UP osc OSC_linc-io17_ostp2_1_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  59 UP osc OSC_linc-io17_ostp3_1_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  60 UP osc OSC_linc-io17_ostp4_1_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  61 UP osc OSC_linc-io17_ostp5_1_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  62 UP osc OSC_linc-io17_ostp6_1_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  63 UP osc OSC_linc-io17_ostp1_2_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  64 UP osc OSC_linc-io17_ostp2_2_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  65 UP osc OSC_linc-io17_ostp3_2_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  66 UP osc OSC_linc-io17_ostp4_2_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  67 UP osc OSC_linc-io17_ostp5_2_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  68 UP osc OSC_linc-io17_ostp6_2_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  69 UP osc OSC_linc-io17_ostp1_3_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  70 UP osc OSC_linc-io17_ostp2_3_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  71 UP osc OSC_linc-io17_ostp3_3_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  72 UP osc OSC_linc-io17_ostp4_3_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  73 UP osc OSC_linc-io17_ostp5_3_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  74 UP osc OSC_linc-io17_ostp6_3_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  75 UP osc OSC_linc-io17_ostp1_4_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  76 UP osc OSC_linc-io17_ostp2_4_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  77 UP osc OSC_linc-io17_ostp3_4_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  78 UP osc OSC_linc-io17_ostp4_4_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  79 UP osc OSC_linc-io17_ostp5_4_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  80 UP osc OSC_linc-io17_ostp6_4_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  81 UP osc OSC_linc-io17_ostp1_5_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  82 UP osc OSC_linc-io17_ostp2_5_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  83 UP osc OSC_linc-io17_ostp3_5_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  84 UP osc OSC_linc-io17_ostp4_5_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  85 UP osc OSC_linc-io17_ostp5_5_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  86 UP osc OSC_linc-io17_ostp6_5_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  87 UP osc OSC_linc-io17_ostp1_6_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  88 UP osc OSC_linc-io17_ostp2_6_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  89 UP osc OSC_linc-io17_ostp3_6_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  90 UP osc OSC_linc-io17_ostp4_6_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  91 UP osc OSC_linc-io17_ostp5_6_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  92 UP osc OSC_linc-io17_ostp6_6_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  93 UP osc OSC_linc-io17_ostp1_7_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  94 UP osc OSC_linc-io17_ostp2_7_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  95 UP osc OSC_linc-io17_ostp3_7_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  96 UP osc OSC_linc-io17_ostp4_7_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  97 UP osc OSC_linc-io17_ostp5_7_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5
  98 UP osc OSC_linc-io17_ostp6_7_proj 
319d77f2-9570-4198-93dd-f24ca0cfe4b5 5




Chad
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] (no subject)

2008-04-08 Thread Chad Kerner
I am generating the MDT database for one of our lustre filesystems.  I 
am getting the following messages.  Should I be concerned, or are these 
going to correct themselves once the lfsck is done?

  warning MDS inode 
warning MDS inode git-update-ref 
(inum 14777090): DB_KEYEXIST: Key/data pair already exists hard link?
warning MDS inode git-upload-archive (inum 14777090): DB_KEYEXIST: 
Key/data pair already exists hard link?
warning MDS inode git-verify-pack (inum 14777090): DB_KEYEXIST: Key/data 
pair already exists hard link?
warning MDS inode git-verify-tag (inum 14777090): DB_KEYEXIST: Key/data 
pair already exists hard link?
warning MDS inode git-write-tree (inum 14777090): DB_KEYEXIST: Key/data 
pair already exists hard link?
warning MDS inode warning MDS inode git-pack-refs (inum 14777090): 
DB_KEYEXIST: Key/data pair already exists hard link?
warning MDS inode git-http-fetch (inum 14777090): DB_KEYEXIST: Key/data 
pair already exists hard link?
warning MDS inode idl (inum 14336878): DB_KEYEXIST: Key/data pair 
already exists hard link?
warning MDS inode idl_assistant (inum 14336878): DB_KEYEXIST: Key/data 
pair already exists hard link?

Thanks,
Chad
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss