[lustre-discuss] (no subject)
Hi all, We had a problem with one of our MDS (ldiskfs) on Lustre 2.12.6, which we think is a bug - but haven't been able to identify it. Can anyone shed any light? We unmounted and remounted the mdt at around 23:00. Client logs: May 16 22:15:41 m8011 kernel: LustreError: 11-0: lustrefs8-MDT-mdc-956fb73c3800: operation ldlm_enqueue to node 172.18.185.1@o2ib failed: rc = -107 May 16 22:15:41 m8011 kernel: Lustre: lustrefs8-MDT-mdc-956fb73c3800: Connection to lustrefs8-MDT (at 172.18.185.1@o2ib) was lost; in progress operations using this service will wait for recovery to complete May 16 22:15:41 m8011 kernel: LustreError: Skipped 5 previous similar messages May 16 22:15:48 m8011 kernel: Lustre: 101710:0:(client.c:2146:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1652735641/real 1652735641] req@949d8cb1de80 x1724290358528896/t0(0) o101->lustrefs8-MDT-mdc-956fb73c3800@172.18.185.1@o2ib:12/10 lens 480/568 e 4 to 1 dl 1652735748 ref 2 fl Rpc:X/0/ rc 0/-1 May 16 22:15:48 m8011 kernel: Lustre: 101710:0:(client.c:2146:ptlrpc_expire_one_request()) Skipped 6 previous similar messages May 16 23:00:15 m8011 kernel: Lustre: 4784:0:(client.c:2146:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1652738408/real 1652738408] req@94ea07314380 x1724290358763776/t0(0) o400->MGC172.18.185.1@o2ib@172.18.185.1@o2ib:26/25 lens 224/224 e 0 to 1 dl 1652738415 ref 1 fl Rpc:XN/0/ rc 0/-1 May 16 23:00:15 m8011 kernel: LustreError: 166-1: MGC172.18.185.1@o2ib: Connection to MGS (at 172.18.185.1@o2ib) was lost; in progress operations using this service will fail May 16 23:00:15 m8011 kernel: Lustre: Evicted from MGS (at MGC172.18.185.1@o2ib_0) after server handle changed from 0xdb7c7c778c8908d6 to 0xdb7c7cbad3be9e79 May 16 23:00:15 m8011 kernel: Lustre: MGC172.18.185.1@o2ib: Connection restored to MGC172.18.185.1@o2ib_0 (at 172.18.185.1@o2ib) May 16 23:01:49 m8011 kernel: LustreError: 167-0: lustrefs8-MDT-mdc-956fb73c3800: This client was evicted by lustrefs8-MDT; in progress operations using this service will fail. May 16 23:01:49 m8011 kernel: LustreError: 101719:0:(vvp_io.c:1562:vvp_io_init()) lustrefs8: refresh file layout [0x28107:0x9b08:0x0] error -108. May 16 23:01:49 m8011 kernel: LustreError: 101719:0:(vvp_io.c:1562:vvp_io_init()) Skipped 3 previous similar messages May 16 23:01:49 m8011 kernel: Lustre: lustrefs8-MDT-mdc-956fb73c3800: Connection restored to 172.18.185.1@o2ib (at 172.18.185.1@o2ib) MDS server logs: May 16 22:15:40 c8mds1 kernel: LustreError: 10686:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 99s: evicting client at 172.18.181.11@o2ib ns: mdt-lustrefs8-MDT_UUID lock: 97b3730d98c0/0xdb7c7cbad3be1c7b lrc: 3/0,0 mode: PW/PW res: [0x29119:0x327f:0x0].0x0 bits 0x40/0x0 rrc: 201 type: IBT flags: 0x6020040020 nid: 172.18.181.11@o2ib remote: 0xe62e31610edfb808 expref: 90 pid: 10707 timeout: 8482830 lvb_type: 0 May 16 22:15:40 c8mds1 kernel: LustreError: 10712:0:(ldlm_lockd.c:1351:ldlm_handle_enqueue0()) ### lock on destroyed export 9769eaf46c00 ns: mdt-lustrefs8-MDT_UUID lock: 97d828635e80/0xdb7c7cbad3be1c90 lrc: 3/0,0 mode: PW/PW res: [0x29119:0x327f:0x0].0x0 bits 0x40/0x0 rrc: 199 type: IBT flags: 0x5020040020 nid: 172.18.181.11@o2ib remote: 0xe62e31610edfb80f expref: 77 pid: 10712 timeout: 0 lvb_type: 0 May 16 22:15:40 c8mds1 kernel: LustreError: 10712:0:(ldlm_lockd.c:1351:ldlm_handle_enqueue0()) Skipped 27 previous similar messages May 16 22:17:22 c8mds1 kernel: LNet: Service thread pid 10783 was inactive for 200.73s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: May 16 22:17:22 c8mds1 kernel: LNet: Skipped 3 previous similar messages May 16 22:17:22 c8mds1 kernel: Pid: 10783, comm: mdt01_040 3.10.0-1160.2.1.el7_lustre.x86_64 #1 SMP Wed Dec 9 20:53:35 UTC 2020 May 16 22:17:22 c8mds1 kernel: Call Trace: May 16 22:17:22 c8mds1 kernel: [] ldlm_completion_ast+0x430/0x860 [ptlrpc] May 16 22:17:22 c8mds1 kernel: [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] May 16 22:17:22 c8mds1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] May 16 22:17:22 c8mds1 kernel: [] mdt_object_lock_internal+0x70/0x360 [mdt] May 16 22:17:22 c8mds1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] May 16 22:17:22 c8mds1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] May 16 22:17:22 c8mds1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] May 16 22:17:22 c8mds1 kernel: [] mdt_intent_policy+0x435/0xd80 [mdt] May 16 22:17:22 c8mds1 kernel: [] ldlm_lock_enqueue+0x376/0x9b0 [ptlrpc] May 16 22:17:22 c8mds1 kernel: [] ldlm_handle_enqueue0+0xa86/0x1620 [ptlrpc] May 16 22:17:22 c8mds1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] May 16 22:17:22 c8mds1 kernel: [] tgt_request_handle+0xada/0x1570 [ptlrpc] May 16
Re: [lustre-discuss] (no subject)
Hi Koos Thanks for bringing this up. This was just due to the person who usually copies the files across being on vacation at the time of the release. It’s there now. Peter From: lustre-discuss on behalf of "Meijering, Koos via lustre-discuss" Reply-To: "Meijering, Koos" Date: Wednesday, January 12, 2022 at 1:38 AM To: lustre-discuss Subject: [lustre-discuss] (no subject) Hello, On December 17 2021 Lustre 2.12.8 is released, in the past there was also a Mellanox Infiniband build, is there a plan to make a build for 2.12.8 for infiniband, or do we need to build our own build servers? Bets regards Koos Meijering HPC/Certificaten/Lokaal Overleg Rijksuniversiteit Groningen h.meijer...@rug.nl<mailto:h.meijer...@rug.nl> https://www.rug.nl/cit ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] (no subject)
Hi Cory, Thank you, I see for the link that Interoperability Support is only 2.9.x so I understand that we cant have client modules at version 2.4.x. also I see supported kernel versions. Thanks, Yugi On Mon, May 21, 2018 at 12:40 PM, Cory Spitz <spitz...@cray.com> wrote: > Hello, Yugi. > > > > Please note that 2.10.3 is the latest LTS release and 2.10.4 is about to > be released very soon. > > RHEL/CentOS 6.9 clients and RHEL/CentOS 7.4 servers are supported in > 2.10.3. See: http://wiki.lustre.org/Lustre_2.10.3_Changelog. 2.10.4 > will add support for RHEL/CentOS 7.5 servers. > > > > -Cory > > > > -- > > > > > > *From: *lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on > behalf of Yugendra Guvvala <yguvv...@cambridgecomputer.com> > *Date: *Monday, May 21, 2018 at 11:12 AM > *To: *"lustre-discuss@lists.lustre.org" <lustre-discuss@lists.lustre.org> > *Subject: *[lustre-discuss] (no subject) > > > > Hi, > > > > We currently run Luster 2.4 > > > > We are looking to upgrade to Lustre Version 2.10.2, Do we have to upgrade > Lustre Client modules or is it Backward compatible? > > > > What are the compatible kernel versions for 2.10.2 on both server and > client modules. > > > > We want to keep clients OS level at CENTOS 6.8 and upgrade Storage servers > to CentOS 7.4. We are trying to phase this upgrade process to Servers first > and then Clients and with all softwares, Drivers compatibility we are > trying to find best upgrade path any suggestions are welcome. > > > > We could only find OS version and not kernel versions on support matrix > here: > > > > https://wiki.hpdd.intel.com/display/PUB/Lustre+Support+Matrix > > > > Thanks, > > Yugi > > > > > -- Thanks, *Yugendra Guvvala | HPC Technologist ** |** Cambridge Computer ** |** "Artists in Data Storage" * Direct: 781-250-3273 | Cell: 806-773-4464 | yguvv...@cambridgecomputer.com | www.cambridgecomputer.com ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] (no subject)
Hello, Yugi. Please note that 2.10.3 is the latest LTS release and 2.10.4 is about to be released very soon. RHEL/CentOS 6.9 clients and RHEL/CentOS 7.4 servers are supported in 2.10.3. See: http://wiki.lustre.org/Lustre_2.10.3_Changelog. 2.10.4 will add support for RHEL/CentOS 7.5 servers. -Cory -- From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on behalf of Yugendra Guvvala <yguvv...@cambridgecomputer.com> Date: Monday, May 21, 2018 at 11:12 AM To: "lustre-discuss@lists.lustre.org" <lustre-discuss@lists.lustre.org> Subject: [lustre-discuss] (no subject) Hi, We currently run Luster 2.4 We are looking to upgrade to Lustre Version 2.10.2, Do we have to upgrade Lustre Client modules or is it Backward compatible? What are the compatible kernel versions for 2.10.2 on both server and client modules. We want to keep clients OS level at CENTOS 6.8 and upgrade Storage servers to CentOS 7.4. We are trying to phase this upgrade process to Servers first and then Clients and with all softwares, Drivers compatibility we are trying to find best upgrade path any suggestions are welcome. We could only find OS version and not kernel versions on support matrix here: https://wiki.hpdd.intel.com/display/PUB/Lustre+Support+Matrix Thanks, Yugi ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] (no subject)
Hi, We currently run Luster 2.4 We are looking to upgrade to Lustre Version 2.10.2, Do we have to upgrade Lustre Client modules or is it Backward compatible? What are the compatible kernel versions for 2.10.2 on both server and client modules. We want to keep clients OS level at CENTOS 6.8 and upgrade Storage servers to CentOS 7.4. We are trying to phase this upgrade process to Servers first and then Clients and with all softwares, Drivers compatibility we are trying to find best upgrade path any suggestions are welcome. We could only find OS version and not kernel versions on support matrix here: https://wiki.hpdd.intel.com/display/PUB/Lustre+Support+Matrix Thanks, Yugi ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[Lustre-discuss] (no subject)
Dear all: When we testing lustre on OST, get the following error: Feb 6 23:25:09 localhost kernel: LustreError: 28483:0:(osc_request.c:716:osc_announce_cached()) dirty 33673216 dirty_max 33554432 I was wondering that, is it the dirty pages in memory are too much to induce this problem, pdflush thread had not written back dirty pages on time? Could you give me some advice about this issue? Many Thanks. Best Regards feng___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] (no subject)
test ignore ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] (no subject)
test ignore ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] (no subject)
You can not directly use netapp appliance, first of all you should enable FC license on netapp than you need a Fiber channel switch like cisco mds o brocade, than you have to connect cisco o brocade to netapp FC ports; and last but not list you need computer with fiber channel connection, directly connected to FC switch, that mount netapp LUN, and so you can implement LUSTRE using a netapp like one o more ost cheers, Antonio rishi pathak wrote: If your netapp appliance has feasibility of exporting volumes as iscsi then you have a chance On Sun, Nov 22, 2009 at 12:23 PM, muhammed navas navasonl...@gmail.com mailto:navasonl...@gmail.com wrote: My company having multple cluters running on nfs. we would like to test lustre file system in our cluster. we are using Netapp for storage. i went through lot of lustre docs , most of them are talking about local HD storage(OST). may i know how can i implement lustre using Netapp as storage(OST)? Regards, Muhammed navas ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org mailto:Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -- Regards-- Rishi Pathak National PARAM Supercomputing Facility Center for Development of Advanced Computing(C-DAC) Pune University Campus,Ganesh Khind Road Pune-Maharastra ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] (no subject)
I don't think that would be particularly sane to use iscsi imho. Ontap 8.0 does have have clustered mode which allows a number of systems to have a single name space and load balance between a number of servers. From the Announcement they sent round in September. Data ONTAP 8.0 is the first release of the Data ONTAP 8 release family. It is a single codebase, with two separately orderable products: Data ONTAP 8.0 7-Mode: The next release of Data ONTAP 7G (after 7.3.x) Data ONTAP 8.0 Cluster-Mode: The next release of Data ONTAP GX (after 10.0.x) The RCx classification indicates that NetApp has completed its internal testing of the release. RCs are provided primarily to enable customers who want to start early on exploring the release for either new features or bug fixes, or who want to start testing the release before deploying it in critical production environments. NetApp may provide multiple RCs as is necessary to address specific issues found before the release becomes a General Availability (GA) release. NetApp Global Services (NGS) provides support for RCs. On 23 Nov 2009, at 13:27, rishi pathak wrote: If your netapp appliance has feasibility of exporting volumes as iscsi then you have a chance On Sun, Nov 22, 2009 at 12:23 PM, muhammed navas navasonl...@gmail.com wrote: My company having multple cluters running on nfs. we would like to test lustre file system in our cluster. we are using Netapp for storage. i went through lot of lustre docs , most of them are talking about local HD storage(OST). may i know how can i implement lustre using Netapp as storage(OST)? -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] (no subject)
My company having multple cluters running on nfs. we would like to test lustre file system in our cluster. we are using Netapp for storage. i went through lot of lustre docs , most of them are talking about local HD storage(OST). may i know how can i implement lustre using Netapp as storage(OST)? Regards, Muhammed navas ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] (no subject)
On 2009-11-21, at 23:53, muhammed navas wrote: My company having multple cluters running on nfs. we would like to test lustre file system in our cluster. we are using Netapp for storage. i went through lot of lustre docs , most of them are talking about local HD storage(OST). may i know how can i implement lustre using Netapp as storage(OST)? This isn't really possible. Netapp servers are exporting NFS protocol already, and that isn't what Lustre can use. The benefit of Lustre is that you can instead use much less expensive commodity servers to provide the storage, and Lustre will export it as a single filesystem to clients. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] (no subject)
While performing a single copy, single client write/read test using dd, we are finding that our Nehalem clients running 2.6.18-92.1.10.el5-lustre-1.6.5.1 write about half the speed of our Nehalem clients running 2.6.18-53.1.13.el5_lustre.1.6.4.3 to three different lustre file systems. This is true even though the slower clients have the same processors and more RAM, 18GB for the slow writers and 12GB for the fast writers. Both systems use OFED 1.3.1. All benchmarks we use perform better on the slow-write clients and read speed from LFS is comparable across all clients. Max_rpcs_in_flight and max_pages_per_rpc are default on both systems. They are on the same IB network, with the same QDR cards and IB connectivity has been verified with the IB utilities. They are almost identical in bandwidth and latency. We're also using the same modprobe.conf and openibd.conf files on both systems. We're using 34GB file size on the 12GB and 18GB RAM systems, 137GB file on the 96GB RAM system. So it's not a matter of caching in RAM. Are there known issues with our 2.6.18-92.1.10.el5-lustre-1.6.5.1 combination? This is not a problem with the lustre file system as we get the same type of results no matter which of our three lustre systems the test is being written to. Here are the summaries from several runs of ost-survey on our new Lustre system. Please comment on the worst/best deltas of the read and write operations. Number of Active OST devices : 96 Worst Read38.167753 38.932928 39.006537 39.782153 38.717915 Best Read 61.704534 61.832461 63.284999 65.000491 61.836016 Read Average: 51.433847 51.281630 51.297278 51.582327 51.318410 Worst Write 34.311237 49.009757 55.272744 51.532331 51.816523 Best Write 94.001170 96.033483 93.401792 93.081544 91.030717 Write Average:74.248683 71.831019 75.179863 74.723100 74.930529 /bob Bob Hayes System Administrator SSG-DRD-DP Office: 253-371-3040 Cell: 253-441-5482 e-mail: robert.n.ha...@intel.commailto:robert.n.ha...@intel.com ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] (no subject)
On May 11, 2009 13:35 -0700, Hayes, Robert N wrote: While performing a single copy, single client write/read test using dd, we find that our Nehalem clients running 2.6.18-92.1.10.el5-lustre-1.6.5.1 write about half the speed of our Nehalem clients running 2.6.18-53.1.13.el5_lustre.1.6.4.3 to three different lustre file systems. This is true even though the slower clients have the same processors and more RAM, 18GB for the slow writers and 12GB for the fast writers. Both systems use OFED 1.3.1. All benchmarks we use perform better on the slow-write clients and read speed from LFS is comparable across all clients. Have you tried booting the slower-with-more-RAM clients using mem=12G to see if the performance gets worse with more RAM? There is a known performance bottleneck with the client-side cache in 1.6 clients, and you may be triggering this... If you have the luxury to do so, testing a 1.8.0 client's IO performance against the same filesystems would also determine if the client-side cache performance fixes therein will already solve your problems. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] (no subject)
On Mon, 2009-05-11 at 13:35 -0700, Hayes, Robert N wrote: While performing a single copy, single client write/read test using dd, we are finding that our Nehalem clients running 2.6.18-92.1.10.el5-lustre-1.6.5.1 write about half the speed of our Nehalem clients running 2.6.18-53.1.13.el5_lustre.1.6.4.3 to three different lustre file systems. We've seen a fairly substantial block-level device throughput regression going from -53 to -92 without involving Lustre, but I've not yet had time to run down the changes to see what could be causing it. -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] (no subject)
Dave Does the substantial block-level device throughput regression exist in 2.6.18-128? /bob -Original Message- From: David Dillow [mailto:dillo...@ornl.gov] Sent: Monday, May 11, 2009 2:20 PM To: Hayes, Robert N Cc: lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] (no subject) On Mon, 2009-05-11 at 13:35 -0700, Hayes, Robert N wrote: While performing a single copy, single client write/read test using dd, we are finding that our Nehalem clients running 2.6.18-92.1.10.el5-lustre-1.6.5.1 write about half the speed of our Nehalem clients running 2.6.18-53.1.13.el5_lustre.1.6.4.3 to three different lustre file systems. We've seen a fairly substantial block-level device throughput regression going from -53 to -92 without involving Lustre, but I've not yet had time to run down the changes to see what could be causing it. -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] (no subject)
On Mon, 2009-05-11 at 15:44 -0700, Hayes, Robert N wrote: Dave Does the substantial block-level device throughput regression exist in 2.6.18-128? I couldn't say, haven't tested it. Our environment is SRP over IB for the storage, so that'll be my focus when I look at the changes between the two kernels. -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] (no subject)
On May 11, 2009 15:44 -0700, Hayes, Robert N wrote: Does the substantial block-level device throughput regression exist in 2.6.18-128? Note that block-level device is meaningless from the point of view of Lustre clients. If you changed the client software only then this shouldn't be a factor. -Original Message- From: David Dillow [mailto:dillo...@ornl.gov] Sent: Monday, May 11, 2009 2:20 PM To: Hayes, Robert N Cc: lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] (no subject) On Mon, 2009-05-11 at 13:35 -0700, Hayes, Robert N wrote: While performing a single copy, single client write/read test using dd, we are finding that our Nehalem clients running 2.6.18-92.1.10.el5-lustre-1.6.5.1 write about half the speed of our Nehalem clients running 2.6.18-53.1.13.el5_lustre.1.6.4.3 to three different lustre file systems. We've seen a fairly substantial block-level device throughput regression going from -53 to -92 without involving Lustre, but I've not yet had time to run down the changes to see what could be causing it. -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] (no subject)
On May 11, 2009 14:38 -0700, Hayes, Robert N wrote: We will test the mem=12G suggestion. Before attempting the 1.8.0 client, can you confirm that a 1.8 client should work with a 1.6 server without causing any more complications? Yes, the 1.8.x clients are interoperable with 1.6.x servers. If you are worried about testing this out during live system time then you can wait for an outage window to test the 1.8 client in isolation. There is nothing to do on the server, and just RPM upgrade/downgrades on the client. -Original Message- From: andreas.dil...@sun.com [mailto:andreas.dil...@sun.com] On Behalf Of Andreas Dilger Sent: Monday, May 11, 2009 1:54 PM To: Hayes, Robert N Cc: lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] (no subject) On May 11, 2009 13:35 -0700, Hayes, Robert N wrote: While performing a single copy, single client write/read test using dd, we find that our Nehalem clients running 2.6.18-92.1.10.el5-lustre-1.6.5.1 write about half the speed of our Nehalem clients running 2.6.18-53.1.13.el5_lustre.1.6.4.3 to three different lustre file systems. This is true even though the slower clients have the same processors and more RAM, 18GB for the slow writers and 12GB for the fast writers. Both systems use OFED 1.3.1. All benchmarks we use perform better on the slow-write clients and read speed from LFS is comparable across all clients. Have you tried booting the slower-with-more-RAM clients using mem=12G to see if the performance gets worse with more RAM? There is a known performance bottleneck with the client-side cache in 1.6 clients, and you may be triggering this... If you have the luxury to do so, testing a 1.8.0 client's IO performance against the same filesystems would also determine if the client-side cache performance fixes therein will already solve your problems. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] (no subject)
On May 11, 2009, at 8:07 PM, Andreas Dilger wrote: On May 11, 2009 14:38 -0700, Hayes, Robert N wrote: We will test the mem=12G suggestion. Before attempting the 1.8.0 client, can you confirm that a 1.8 client should work with a 1.6 server without causing any more complications? Yes, the 1.8.x clients are interoperable with 1.6.x servers. If you are worried about testing this out during live system time then you can wait for an outage window to test the 1.8 client in isolation. There is nothing to do on the server, and just RPM upgrade/downgrades on the client. And it's a beautiful thing. :) Charlie Taylor UF HPC Center ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] (no subject)
Is there a way to know how many clients are presently mounted? If so, is this value changed as clients unmount? I thought that the counter in /proc/fs/lustre/mds/num_refs was such a counter. But, I recently mounted two clients, and I did not see that value change. Thanks. Roger Spellman Staff Engineer Terascala, Inc. 508-588-1501 www.terascala.com http://www.terascala.com/ ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] (no subject)
Sebastion, you are correct on the OST Memory, you probably wont need the 16GB, and if you can save money on using only 4GB, and buy bigger/more drives, you will probably be better off. Evan On 3/5/09 6:29 PM, Sebastian Gutierrez sebastian.gutier...@cs.stanford.edu wrote: Hello I am looking for advice or suggestions on specking the MDS and OSS for a new file system I am building. The IO patterns as have been explained to me are jobs that either create a big mpeg file or thousands of about 100kb frames for video. This could grow to hudreds of thousands of 100k files. I have specked 2 MDS for a cluster 8x300GB SAS 15k drives for the MDT in a raid 10(I used raid 10 to reduce the chance of the MDT becomming a bottleneck with many small files) 2x300GB SAS 15k drives for the MGS in a raid 1 1xHostpare 32gb 800mhz ram 3.0 or 2.5 dual quad core procs 4 OSS servers 16GB 667mhz ram 2.5 ghz dual quad core procs 6x1TB SATA 3 7.2k disks 5TB OST in a raid 5 (possibly running software raid) expandable to 2 5TB OSTs Per OSS 1xHotspare Since this is the first Lustre file system I am building. I noticed the sizing page: http://manual.lustre.org/manual/LustreManual16_HTML/SystemLimits.html#50548887 _46859 The document mentions only dual core procs in regards to the MDS. The document also mentions a much lower amount of system RAM required for the OST. Am I wasting resources here by over-specking these boxes or is it a good idea to over spec a bit? Am I on the right track based on the IO pattern as I understand it? Sebastian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] (no subject)
Hello I am looking for advice or suggestions on specking the MDS and OSS for a new file system I am building. The IO patterns as have been explained to me are jobs that either create a big mpeg file or thousands of about 100kb frames for video. This could grow to hudreds of thousands of 100k files. I have specked 2 MDS for a cluster 8x300GB SAS 15k drives for the MDT in a raid 10(I used raid 10 to reduce the chance of the MDT becomming a bottleneck with many small files) 2x300GB SAS 15k drives for the MGS in a raid 1 1xHostpare 32gb 800mhz ram 3.0 or 2.5 dual quad core procs 4 OSS servers 16GB 667mhz ram 2.5 ghz dual quad core procs 6x1TB SATA 3 7.2k disks 5TB OST in a raid 5 (possibly running software raid) expandable to 2 5TB OSTs Per OSS 1xHotspare Since this is the first Lustre file system I am building. I noticed the sizing page: http://manual.lustre.org/manual/LustreManual16_HTML/SystemLimits.html#50548887_46859 The document mentions only dual core procs in regards to the MDS. The document also mentions a much lower amount of system RAM required for the OST. Am I wasting resources here by over-specking these boxes or is it a good idea to over spec a bit? Am I on the right track based on the IO pattern as I understand it? Sebastian ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] (no subject)
subscribe ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] (unknown subject)
HI Kevin Thanks for your reply. Now I can set it up. Regards Deval Message: 9 Date: Thu, 09 Oct 2008 05:46:27 -0600 From: Kevin Van Maren [EMAIL PROTECTED] Subject: Re: [Lustre-discuss] Test setup configuration To: Deval kulshrestha [EMAIL PROTECTED] Cc: lustre-discuss@lists.lustre.org Message-ID: [EMAIL PROTECTED] Content-Type: text/plain; format=flowed; charset=ISO-8859-1 Deval kulshrestha wrote: Hi I am a new luster user, trying to evaluate luster with few configuration. I am going through luster 1.6 Operation manual. But I am not able to understand which package should be installed on MDS, OSS , and client. Should I install all the packages on all three types of nodes? Please explain Best Regards Deval K Lustre servers (MDS/OSS): kernel-lustre-smp // patched server kernel lustre-modules // Lustre kernel modules lustre // user space tools (server) lustre-ldiskfs // ldiskfs e2fsprogs // filesystem tools You can install all those RPMs on the client as well, but it is not necessary. Lustre clients (assuming you have the matching vendor kernel for the lustre-modules installed): lustre-client-modules // kernel modules for client lustre-client // user space (client) Kevin -- ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss End of Lustre-discuss Digest, Vol 33, Issue 12 ** No virus found in this incoming message. Checked by AVG - http://www.avg.com Version: 8.0.173 / Virus Database: 270.7.6/1715 - Release Date: 10/8/2008 7:19 PM === Privileged or confidential information may be contained in this message. If you are not the addressee indicated in this message (or responsible for delivery of the message to such person), please delete this message and kindly notify the sender by an emailed reply. Opinions, conclusions and other information in this message that do not relate to the official business of Progression and its associate entities shall be understood as neither given nor endorsed by them. --- Progression Infonet Private Limited, Gurgaon (Haryana), India Authorised dealer of PostMaster, by QuantumLink Communications Pvt. Ltd Get your free copy of PostMaster at http://www.postmaster.co.in/ ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] (no subject)
While running cat /proc/sys/lnet/peers across all of our client node a consistently see all nodes marking two NID's with state down and many NIDS with a min in the negative range. Seem as though this could not be a good state, but we are running??? We're still in 1.4.11. Does this indcate a problem??? ... mg30 nid refs state max rtr mintx min queue [EMAIL PROTECTED] 1up 8 8 8 8 2 0 [EMAIL PROTECTED] 1 down 8 8 8 8 2 0 [EMAIL PROTECTED] 1 down 8 8 8 8 2 0 [EMAIL PROTECTED] 1up 8 8 8 8 7 0 [EMAIL PROTECTED] 1up 8 8 8 8 2 0 [EMAIL PROTECTED] 1up 8 8 8 8 2 0 [EMAIL PROTECTED] 1up 8 8 8 8 2 0 mt023 nid refs state max rtr mintx min queue [EMAIL PROTECTED] 1up 8 8 8 8 -16 0 [EMAIL PROTECTED] 1 down 8 8 8 8 -66 0 [EMAIL PROTECTED] 1 down 8 8 8 8 -63 0 [EMAIL PROTECTED] 1up 8 8 8 8 4 0 [EMAIL PROTECTED] 1up 8 8 8 8-8 0 [EMAIL PROTECTED] 1up 8 8 8 8-8 0 [EMAIL PROTECTED] 1up 8 8 8 8-8 0 ... Jan H. Julian System Administrator Midnight Cluster Activity Lead [EMAIL PROTECTED] 907-450-8641 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] (no subject)
We have been having some trouble with one of our filesystems. We have fsck'd all of the disks. When we try to start the mds for this fs, it never actually comes back online. It is staying in an 'AT' status. Was wondering if anyone had some idea where we should start looking. Thanks, Chad 55 AT mds proj proj_UUID 2 56 UP lov lov_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 4 57 UP osc OSC_linc-io17_ostp1_1_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 58 UP osc OSC_linc-io17_ostp2_1_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 59 UP osc OSC_linc-io17_ostp3_1_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 60 UP osc OSC_linc-io17_ostp4_1_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 61 UP osc OSC_linc-io17_ostp5_1_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 62 UP osc OSC_linc-io17_ostp6_1_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 63 UP osc OSC_linc-io17_ostp1_2_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 64 UP osc OSC_linc-io17_ostp2_2_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 65 UP osc OSC_linc-io17_ostp3_2_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 66 UP osc OSC_linc-io17_ostp4_2_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 67 UP osc OSC_linc-io17_ostp5_2_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 68 UP osc OSC_linc-io17_ostp6_2_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 69 UP osc OSC_linc-io17_ostp1_3_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 70 UP osc OSC_linc-io17_ostp2_3_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 71 UP osc OSC_linc-io17_ostp3_3_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 72 UP osc OSC_linc-io17_ostp4_3_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 73 UP osc OSC_linc-io17_ostp5_3_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 74 UP osc OSC_linc-io17_ostp6_3_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 75 UP osc OSC_linc-io17_ostp1_4_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 76 UP osc OSC_linc-io17_ostp2_4_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 77 UP osc OSC_linc-io17_ostp3_4_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 78 UP osc OSC_linc-io17_ostp4_4_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 79 UP osc OSC_linc-io17_ostp5_4_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 80 UP osc OSC_linc-io17_ostp6_4_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 81 UP osc OSC_linc-io17_ostp1_5_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 82 UP osc OSC_linc-io17_ostp2_5_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 83 UP osc OSC_linc-io17_ostp3_5_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 84 UP osc OSC_linc-io17_ostp4_5_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 85 UP osc OSC_linc-io17_ostp5_5_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 86 UP osc OSC_linc-io17_ostp6_5_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 87 UP osc OSC_linc-io17_ostp1_6_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 88 UP osc OSC_linc-io17_ostp2_6_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 89 UP osc OSC_linc-io17_ostp3_6_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 90 UP osc OSC_linc-io17_ostp4_6_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 91 UP osc OSC_linc-io17_ostp5_6_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 92 UP osc OSC_linc-io17_ostp6_6_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 93 UP osc OSC_linc-io17_ostp1_7_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 94 UP osc OSC_linc-io17_ostp2_7_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 95 UP osc OSC_linc-io17_ostp3_7_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 96 UP osc OSC_linc-io17_ostp4_7_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 97 UP osc OSC_linc-io17_ostp5_7_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 98 UP osc OSC_linc-io17_ostp6_7_proj 319d77f2-9570-4198-93dd-f24ca0cfe4b5 5 Chad ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] (no subject)
I am generating the MDT database for one of our lustre filesystems. I am getting the following messages. Should I be concerned, or are these going to correct themselves once the lfsck is done? warning MDS inode warning MDS inode git-update-ref (inum 14777090): DB_KEYEXIST: Key/data pair already exists hard link? warning MDS inode git-upload-archive (inum 14777090): DB_KEYEXIST: Key/data pair already exists hard link? warning MDS inode git-verify-pack (inum 14777090): DB_KEYEXIST: Key/data pair already exists hard link? warning MDS inode git-verify-tag (inum 14777090): DB_KEYEXIST: Key/data pair already exists hard link? warning MDS inode git-write-tree (inum 14777090): DB_KEYEXIST: Key/data pair already exists hard link? warning MDS inode warning MDS inode git-pack-refs (inum 14777090): DB_KEYEXIST: Key/data pair already exists hard link? warning MDS inode git-http-fetch (inum 14777090): DB_KEYEXIST: Key/data pair already exists hard link? warning MDS inode idl (inum 14336878): DB_KEYEXIST: Key/data pair already exists hard link? warning MDS inode idl_assistant (inum 14336878): DB_KEYEXIST: Key/data pair already exists hard link? Thanks, Chad ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss