Re: [lustre-discuss] Tuning for metadata performance

Michael Di Domenico Mon, 11 Jan 2021 07:06:15 -0800

perhaps i missed it somewhere, but in order to do a fair comparison
can you detail the hardware/software behind the nfs server?


On Fri, Jan 8, 2021 at 1:35 PM Vicker, Darby J. (JSC-EG111)[Jacobs
Technology, Inc.] <darby.vicke...@nasa.gov> wrote:
>
> Perhaps a better question to ask (although very closely related) would be how 
> can we improve the MD tests in the io500 benchmark?
>
>
>
> In the info below this is the info on these file systems:
>
>
>
> nobackup – a lustre FS on the hardware we've been discussing with a ZFS MDT, 
> nominally running on mds0
>
> ephemeral – a lustre FS on the hardware we've been discussing with an ldiskfs 
> MDT, nominally running on mds1
>
> scratch – a standard NFS mount
>
> local – a local SSD
>
>
>
> A little more background on the motivation here.  We have some fairly large 
> software development projects in the lab.  One of the largest active projects 
> has a git repo with about 500,000 files totaling 5 GB in size. A clone of 
> these repo takes 550 seconds on lustre and about 150 seconds on NFS.  A 
> status takes 15 seconds on lustre and 3 seconds on NFS.  Not surprisingly, 
> the timings are greatly reduced on a local SSD.  See the attached plot in 
> git_timings.pdf for details.  The slowness on lustre is largely (completely?) 
> driven by the MD performance.  Obviously, we work with the repo on a local 
> file system when possible to avoid the performance hit.  But one of the 
> workflows involves Monte Carlo analysis against this repo, varying dozens of 
> parameters, running 1000's of cases and analyzing the results.  This produces 
> a lot of data and necessitates the shared FS for both running the Monte Carlo 
> cases and simply storing the amounts of data these runs produce.
>
>
>
> There are several other scenarios in which we are working with smaller, but 
> still sizeable, data sets (git repos and other forms) on the lustre file 
> system and the MD sluggishness is noticeable and annoying.  So we would like 
> to try and improve MD performance.
>
>
>
> To further characterize and compare the IO performance on these file systems, 
> I've run the io500 benchmarks.  The attached plots show the results.  This is 
> a completely "out of the box" run on a single node.  That is, I'm just 
> running "./io500.sh config-minimal.ini".  (I've run the 10-node results too 
> (or tried to) for more direct comparison to the results on io500.org but 
> that's a slightly different objective.)  I figure the single node run is 
> analogous to a person working with a git repo scenario.  This is on a 10 
> gigabit ethernet client.  Details attached but the MD results are fairly 
> consistent with the above git timings – lustre is about 3x to 10x slower than 
> NFS.  I'd be curious to get some feedback on these MD performance numbers.  
> Do they seem low compared to other LFS's out there?  As I mentioned in the 
> original post in this thread, our numbers are quite low when compared to even 
> the lowest numbers on the current io500 list.
>
>
>
> How is MD performance expected to increase with increasing numbers of 
> clients?  I know bandwidth increases as you grab more OST' but would MD 
> performance be expected to increase at all?  We are not using DoM or DNE.
>
>
>
> Also as mentioned before, we will upgrade lustre soon.  I'd like to stick 
> with the 2.12 LTS stream.  But would the upcoming 2.14 have any potential MD 
> performance advantages?
>
>
>
>
>
> From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on behalf of 
> "Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.]" 
> <darby.vicke...@nasa.gov>
> Date: Wednesday, January 6, 2021 at 9:29 AM
> To: Andreas Dilger <adil...@whamcloud.com>
> Cc: "lustre-discuss@lists.lustre.org" <lustre-discuss@lists.lustre.org>
> Subject: Re: [lustre-discuss] [EXTERNAL] Re: Tuning for metadata performance
>
>
>
> My apologies – I posted some bad info.  While we started out with the HDD's 
> in the MDS, pretty early on we switched to SSD's.  So that's not the source 
> of our MD slowness.  Can you do NVMe in an external JBOD?
>
>
>
> From: Andreas Dilger <adil...@whamcloud.com>
> Date: Tuesday, January 5, 2021 at 11:51 AM
> To: "Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.]" 
> <darby.vicke...@nasa.gov>
> Cc: "lustre-discuss@lists.lustre.org" <lustre-discuss@lists.lustre.org>
> Subject: [EXTERNAL] Re: [lustre-discuss] Tuning for metadata performance
>
>
>
> Probably the best single thing you could do for metadata performance
>
> would be to switch to SSD, or better NVMe, storage.  ZFS is very sync
>
> and IOPS hungry, so using HDDs is killer for ZFS metadata performance.
>
>
>
> If you want to minimize the downtime, you could incrementally replace the
>
> HDDs in the zpool with larger SSD devices and resilver between each
>
> one.  I recall LLNL doing this in the first months of their first ZFS-based
>
> Lustre filesystem for this reason.
>
>
>
> Going to NVMe-based devices is even better for IOPS/bandwidth, but
>
> can't be done completely live.  You could potentially use repeated zfs
>
> send/recv to get an almost uptodate copy on a new MDS, then take a small
>
> outage to do the final resync. However, I've also seen reports that send/recv 
> is painfully slow with HDD MDTs so you should probably test that before 
> committing to a solution.
>
>
>
> Cheers, Andreas
>
>
>
>
> On Jan 5, 2021, at 08:47, Vicker, Darby J. (JSC-EG111)[Jacobs Technology, 
> Inc.] <darby.vicke...@nasa.gov> wrote:
>
> Hello,
>
>
>
> I'm looking for some advice on tuning our existing lustre file system to 
> achieve better metadata performance.  This file system is getting fairly old 
> – its been in production for almost 4 years now.  The hardware and our 
> existing tuning efforts can be found here.
>
>
>
> http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2017-April/014390.html
>
>
>
> The hardware is the same but we have upgraded the software stack a few times 
> – now on CentOS 7.6, ZFS 0.7.9 and lustre 2.10.8.  We do plan to upgrade to 
> the latest CentOS 7.x and either lustre 2.12 or 2.13 soon.  The MDS hardware 
> isn't well-described in that thread so here are more details:
>
>
>
> Chassis: Supermicro 2U Twin Server
>
> Processor: 4 x QuadCore Xeon Processor E52637 v2 3.50GHz (2 sockets/8 cores 
> per node)
>
> Memory: 16 x 16GB PC314900 1866MHz DDR3 ECC Registered DIMM (128GB per node)
>
>
>
> External JBOD:
>
> Chassis: 24x HotSwap 2.5" SAS  12Gb/s SAS Dual Expander
>
> Drives: 12 x 600GB SAS 3.0 12.0Gb/s 15000RPM  2.5"  Seagate Enterprise 
> Performance 15K HDD (512n)
>
> Controller Card: LSI SAS 9300-8e SAS 12Gb/s PCIe 3.0 8-Port Host Bus Adapter
>
>
>
> The above hardware and tuning served us well for a long time but the lab has 
> grown, both in number of lustre clients (now up to ~200 ethernet clients and 
> ~500 IB clients) and the number of users in the lab.  With the extra users 
> have come different types of workloads.  Peviously, the file system was most 
> used for workloads with a fairly small number of large files.  We now see 
> workloads that include 100's of concurrent processes all doing mixed small 
> and large file IO on a lot of files (e.g. each process clones a repo, 
> compiles a code and runs a serial sim that writes a lot of data).
>
>
>
> I recently ran the io500 tests and our LFS stats for MDEasy and MDHard are 
> pretty bad, even when compared to the lowest MD stats on the current io500 
> list.  Our standard NFS server handily beats our LFS wrt MD performance.  So 
> I'm hopeful that we can squeeze more MD performance out of our LFS.  
> Obviously, software tuning on the existing hardware would be preferred but we 
> are open to hardware additions/upgrades if that would help (e.g. adding more 
> MDS's).  There are a lot of tuning options in both ZFS and lustre so I'm 
> hoping someone can point me in the right direction.  Are DNE and/or DoM 
> expected to help?  I attended the SC20 Lustre BoF and it sounds like 2.13 has 
> some metadata performance improvements, so just an upgrade might help.  We 
> have dual MDS's now but for HA, not performance.  I'd hate to lose the HA 
> aspect as we utilize it for failover quite a bit (maintenance, etc.) but it 
> would probably be worth it if MD performance was significantly improved.  If 
> I understand correctly, there is some overhead with DNE and performance 
> suffers with just two MDS's with a benefit with 4 or more MDS's, correct?  So 
> that wouldn't be a good option for us unless we add MDS's?  Would an upgrade 
> to SSD or NVMe in our MDTs help?
>
>
>
> I would greatly appreciate thoughts on the best path forward for making 
> improvements.
>
>
>
> Thanks,
>
> Darby
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Tuning for metadata performance

Reply via email to