perhaps i missed it somewhere, but in order to do a fair comparison can you detail the hardware/software behind the nfs server?
On Fri, Jan 8, 2021 at 1:35 PM Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.] <darby.vicke...@nasa.gov> wrote: > > Perhaps a better question to ask (although very closely related) would be how > can we improve the MD tests in the io500 benchmark? > > > > In the info below this is the info on these file systems: > > > > nobackup – a lustre FS on the hardware we've been discussing with a ZFS MDT, > nominally running on mds0 > > ephemeral – a lustre FS on the hardware we've been discussing with an ldiskfs > MDT, nominally running on mds1 > > scratch – a standard NFS mount > > local – a local SSD > > > > A little more background on the motivation here. We have some fairly large > software development projects in the lab. One of the largest active projects > has a git repo with about 500,000 files totaling 5 GB in size. A clone of > these repo takes 550 seconds on lustre and about 150 seconds on NFS. A > status takes 15 seconds on lustre and 3 seconds on NFS. Not surprisingly, > the timings are greatly reduced on a local SSD. See the attached plot in > git_timings.pdf for details. The slowness on lustre is largely (completely?) > driven by the MD performance. Obviously, we work with the repo on a local > file system when possible to avoid the performance hit. But one of the > workflows involves Monte Carlo analysis against this repo, varying dozens of > parameters, running 1000's of cases and analyzing the results. This produces > a lot of data and necessitates the shared FS for both running the Monte Carlo > cases and simply storing the amounts of data these runs produce. > > > > There are several other scenarios in which we are working with smaller, but > still sizeable, data sets (git repos and other forms) on the lustre file > system and the MD sluggishness is noticeable and annoying. So we would like > to try and improve MD performance. > > > > To further characterize and compare the IO performance on these file systems, > I've run the io500 benchmarks. The attached plots show the results. This is > a completely "out of the box" run on a single node. That is, I'm just > running "./io500.sh config-minimal.ini". (I've run the 10-node results too > (or tried to) for more direct comparison to the results on io500.org but > that's a slightly different objective.) I figure the single node run is > analogous to a person working with a git repo scenario. This is on a 10 > gigabit ethernet client. Details attached but the MD results are fairly > consistent with the above git timings – lustre is about 3x to 10x slower than > NFS. I'd be curious to get some feedback on these MD performance numbers. > Do they seem low compared to other LFS's out there? As I mentioned in the > original post in this thread, our numbers are quite low when compared to even > the lowest numbers on the current io500 list. > > > > How is MD performance expected to increase with increasing numbers of > clients? I know bandwidth increases as you grab more OST' but would MD > performance be expected to increase at all? We are not using DoM or DNE. > > > > Also as mentioned before, we will upgrade lustre soon. I'd like to stick > with the 2.12 LTS stream. But would the upcoming 2.14 have any potential MD > performance advantages? > > > > > > From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on behalf of > "Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.]" > <darby.vicke...@nasa.gov> > Date: Wednesday, January 6, 2021 at 9:29 AM > To: Andreas Dilger <adil...@whamcloud.com> > Cc: "lustre-discuss@lists.lustre.org" <lustre-discuss@lists.lustre.org> > Subject: Re: [lustre-discuss] [EXTERNAL] Re: Tuning for metadata performance > > > > My apologies – I posted some bad info. While we started out with the HDD's > in the MDS, pretty early on we switched to SSD's. So that's not the source > of our MD slowness. Can you do NVMe in an external JBOD? > > > > From: Andreas Dilger <adil...@whamcloud.com> > Date: Tuesday, January 5, 2021 at 11:51 AM > To: "Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.]" > <darby.vicke...@nasa.gov> > Cc: "lustre-discuss@lists.lustre.org" <lustre-discuss@lists.lustre.org> > Subject: [EXTERNAL] Re: [lustre-discuss] Tuning for metadata performance > > > > Probably the best single thing you could do for metadata performance > > would be to switch to SSD, or better NVMe, storage. ZFS is very sync > > and IOPS hungry, so using HDDs is killer for ZFS metadata performance. > > > > If you want to minimize the downtime, you could incrementally replace the > > HDDs in the zpool with larger SSD devices and resilver between each > > one. I recall LLNL doing this in the first months of their first ZFS-based > > Lustre filesystem for this reason. > > > > Going to NVMe-based devices is even better for IOPS/bandwidth, but > > can't be done completely live. You could potentially use repeated zfs > > send/recv to get an almost uptodate copy on a new MDS, then take a small > > outage to do the final resync. However, I've also seen reports that send/recv > is painfully slow with HDD MDTs so you should probably test that before > committing to a solution. > > > > Cheers, Andreas > > > > > On Jan 5, 2021, at 08:47, Vicker, Darby J. (JSC-EG111)[Jacobs Technology, > Inc.] <darby.vicke...@nasa.gov> wrote: > > Hello, > > > > I'm looking for some advice on tuning our existing lustre file system to > achieve better metadata performance. This file system is getting fairly old > – its been in production for almost 4 years now. The hardware and our > existing tuning efforts can be found here. > > > > http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2017-April/014390.html > > > > The hardware is the same but we have upgraded the software stack a few times > – now on CentOS 7.6, ZFS 0.7.9 and lustre 2.10.8. We do plan to upgrade to > the latest CentOS 7.x and either lustre 2.12 or 2.13 soon. The MDS hardware > isn't well-described in that thread so here are more details: > > > > Chassis: Supermicro 2U Twin Server > > Processor: 4 x QuadCore Xeon Processor E52637 v2 3.50GHz (2 sockets/8 cores > per node) > > Memory: 16 x 16GB PC314900 1866MHz DDR3 ECC Registered DIMM (128GB per node) > > > > External JBOD: > > Chassis: 24x HotSwap 2.5" SAS 12Gb/s SAS Dual Expander > > Drives: 12 x 600GB SAS 3.0 12.0Gb/s 15000RPM 2.5" Seagate Enterprise > Performance 15K HDD (512n) > > Controller Card: LSI SAS 9300-8e SAS 12Gb/s PCIe 3.0 8-Port Host Bus Adapter > > > > The above hardware and tuning served us well for a long time but the lab has > grown, both in number of lustre clients (now up to ~200 ethernet clients and > ~500 IB clients) and the number of users in the lab. With the extra users > have come different types of workloads. Peviously, the file system was most > used for workloads with a fairly small number of large files. We now see > workloads that include 100's of concurrent processes all doing mixed small > and large file IO on a lot of files (e.g. each process clones a repo, > compiles a code and runs a serial sim that writes a lot of data). > > > > I recently ran the io500 tests and our LFS stats for MDEasy and MDHard are > pretty bad, even when compared to the lowest MD stats on the current io500 > list. Our standard NFS server handily beats our LFS wrt MD performance. So > I'm hopeful that we can squeeze more MD performance out of our LFS. > Obviously, software tuning on the existing hardware would be preferred but we > are open to hardware additions/upgrades if that would help (e.g. adding more > MDS's). There are a lot of tuning options in both ZFS and lustre so I'm > hoping someone can point me in the right direction. Are DNE and/or DoM > expected to help? I attended the SC20 Lustre BoF and it sounds like 2.13 has > some metadata performance improvements, so just an upgrade might help. We > have dual MDS's now but for HA, not performance. I'd hate to lose the HA > aspect as we utilize it for failover quite a bit (maintenance, etc.) but it > would probably be worth it if MD performance was significantly improved. If > I understand correctly, there is some overhead with DNE and performance > suffers with just two MDS's with a benefit with 4 or more MDS's, correct? So > that wouldn't be a good option for us unless we add MDS's? Would an upgrade > to SSD or NVMe in our MDTs help? > > > > I would greatly appreciate thoughts on the best path forward for making > improvements. > > > > Thanks, > > Darby > > _______________________________________________ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > > _______________________________________________ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org _______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org