Re: [lustre-discuss] bad performance with Lustre/ZFS on NVMe SSD
Ricardo, It can be helpful to see output of commands on zfs pool host when you read files through lustre client; and directly through zfs: # zpool iostat -lq -y zpool_name 1 # zpool iostat -w -y zpool_name 5 # zpool iostat -r -y zpool_name 5 -q queue statistics -l Latency statistics -r Request size histogram: -w (undocumented) latency statistics I did see different behavior of zfs reads on zfs pool for the same dd/fio command reading file from lustre mount on different host; and directly from zfs on OSS. I created separate zfs dataset with similar zfs settings on lustre zpool. lustre IO seen on zfs pool with 128KB requests while dd/fio on zfs has 1MB requests. dd/fio command used 1MB IO. zptevlfs6 sync_readsync_writeasync_readasync_write scrub req_size indaggindaggindaggindaggindagg -- - - - - - - - - - - 512 0 0 0 0 0 0 0 0 0 0 1K 0 0 0 0 0 0 0 0 0 0 2K 0 0 0 0 0 0 0 0 0 0 4K 0 0 0 0 0 0 0 0 0 0 8K 0 0 0 0 0 0 0 0 0 0 16K 0 0 0 0 0 0 0 0 0 0 32K 0 0 0 0 0 0 0 0 0 0 64K 0 0 0 0 0 0 0 0 0 0 128K0 0 0 0 2.00K 0 0 0 0 0 < 256K0 0 0 0 0 0 0 0 0 0 512K0 0 0 0 0 0 0 0 0 0 1M 0 0 0 0125 0 0 0 0 0< 2M 0 0 0 0 0 0 0 0 0 0 4M 0 0 0 0 0 0 0 0 0 0 8M 0 0 0 0 0 0 0 0 0 0 16M 0 0 0 0 0 0 0 0 0 0 ^C Alex. On 4/9/18, 6:15 PM, "lustre-discuss on behalf of Dilger, Andreas"wrote: On Apr 6, 2018, at 23:04, Riccardo Veraldi wrote: > > So I'm struggling since months with these low performances on Lsutre/ZFS. > > Looking for hints. > > 3 OSSes, RHEL 74 Lustre 2.10.3 and zfs 0.7.6 > > each OSS has one OST raidz > > pool: drpffb-ost01 > state: ONLINE > scan: none requested > trim: completed on Fri Apr 6 21:53:04 2018 (after 0h3m) > config: > > NAME STATE READ WRITE CKSUM > drpffb-ost01 ONLINE 0 0 0 > raidz1-0ONLINE 0 0 0 > nvme0n1 ONLINE 0 0 0 > nvme1n1 ONLINE 0 0 0 > nvme2n1 ONLINE 0 0 0 > nvme3n1 ONLINE 0 0 0 > nvme4n1 ONLINE 0 0 0 > nvme5n1 ONLINE 0 0 0 > > while the raidz without Lustre perform well at 6GB/s (1GB/s per disk), > with Lustre on top of it performances are really poor. > most of all they are not stable at all and go up and down between > 1.5GB/s and 6GB/s. I Tested with obfilter-survey > LNET is ok and working at 6GB/s (using infiniband FDR) > > What could be the cause of OST performance going up and down like a > roller coaster ? Riccardo, to take a step back for a minute, have you tested all of the devices individually, and also concurrently with some low-level tool like sgpdd or vdbench? After that is known to be working, have you tested with obdfilter-survey locally on the OSS, then remotely on the client(s) so that we can isolate where the bottleneck is being hit. Cheers, Andreas > for reference here are few considerations: > > filesystem parameters: > > zfs set mountpoint=none drpffb-ost01 > zfs set sync=disabled drpffb-ost01 > zfs set atime=off drpffb-ost01 > zfs set redundant_metadata=most drpffb-ost01 > zfs set xattr=sa drpffb-ost01 > zfs set recordsize=1M drpffb-ost01 > > NVMe SSD are 4KB/sector > > ashift=12 > > > ZFS module parameters > > options zfs zfs_prefetch_disable=1 > options zfs zfs_txg_history=120 > options zfs metaslab_debug_unload=1 > # > options zfs zfs_vdev_scheduler=deadline > options zfs
Re: [lustre-discuss] LNET Multi-rail
Thanks for the info. A few observations I found so far: - I think LU-10297 has solved my stability issues. - lustre.conf does work with comma separation of interfaces. I.e. o2ib(ib0,ib1). However, peers need to be configured with ldev.conf or lnetctl. - Defining peering ('lnetctl peer add' and ARP settings) on the client only, seems to make multi-rail work both ways. I'm a bit puzzled by the last observation. I expected that both ends needed to define peers? The client NID does not show as multi-rail (lnetctl peer show) on the server. Cheers, Hans Henrik On 14-03-2018 03:00, Riccardo Veraldi wrote: it works for me but you have to set up correctly lnet.conf either manually or using lnetctl to add peers. Then you export your configuration in lnet.conf and it will be loaded at reboot. I had to add my peers manually, I think peer auto discovery is not yet operational on 2.10.3. I suppose you are not using anymore lustre.conf to configure interfaces (ib,tcp) and that you are using the new Lustre DLC style: http://wiki.lustre.org/Dynamic_LNET_Configuration Also I do not know if you did this yet but you should configure ARP settings and also rt_tables for your ib interfaces if you use multi-rail. Here is an example. I had to do that to have things working properly: https://wiki.hpdd.intel.com/display/LNet/MR+Cluster+Setup You may also want to check that your IB interfaces (if you have a dual port infiniband like I have) can really double the performance when you enable both of them. The infiniband PCIe card bandwidth has to be capable of feeding enough traffic to both dual ports or it will just be useful as a fail over device, without improving the speed as you may want to. In my configuration fail over is working. If I disconnect one port, the other will still work. Of course if you disconnect it when traffic is going through you may have a problem with that stream of data. But new traffic will be handled correctly. I do not know if there is a way to avoid this, I am just talking about my experience and as I said I Am more interested in performance than fail over. Riccardo On 3/13/18 8:05 AM, Hans Henrik Happe wrote: Hi, I'm testing LNET multi-rail with 2.10.3 and I ran into some questions that I couldn't find in the documentation or elsewhere. As I understand the design document "Dynamic peer discovery" will make it possible to discover multi-rail peer without adding them manually? Is that functionality in 2.10.3? Will failover work without doing anything special? I've tested with two IB ports and unplugging resulted in no I/O from client and replugging didn't resolve it. How do I make and active/passive setup? One example I would really like to see in the documentation, is the obvious o2ib-tcp combination, where tcp is used if o2ib is down and fails back if it comes op again. Anyone using MR in production? Done at bit of testing with dual ib on both server and client and had a few crashes. Cheers, Hans Henrik ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] building lustre 2.11.50 on CentOS 7.4
problem solved: another git pull today, followed by autogen.sh and configure has made the error go away. I assume it was LU-10752 which was fixed by a patch by James Simons (commit 6189ae07c5161d14c9e9f863a400045f923f2301) that was landed on the hpdd git 16 hours ago. Martin On 04/09/2018 04:55 PM, Martin Hecht wrote: > Hi, > > I'm trying to build lustre 2.11 from source, with ldiskfs on CentOS 7.4. > > patching the kernel for ldiskfs worked fine, I have installed and booted > the patched kernel as well as the devel-rpm, but when I run `make rpms` > it exits with the following errors: > > Processing files: lustre-2.11.50-1.el7.centos.x86_64 > error: File not found: > /tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/init.d/lsvcgss > error: File not found: > /tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/sysconfig/lsvcgss > error: File not found: > /tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/request-key.d/lgssc.conf > > > RPM build errors: > File not found: > /tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/init.d/lsvcgss > File not found: > /tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/sysconfig/lsvcgss > File not found: > /tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/request-key.d/lgssc.conf > make: *** [rpms] Error 1 > > just `make` works fine, so the problem is something with packaging the > rpms. Any hints? > > kind regards, > Martin > > > > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org -- Dr. Martin Hecht High Performance Computing Center Stuttgart (HLRS) Office 0.051, HPCN Production, IT-Security University of Stuttgart Nobelstraße 19, 70569 Stuttgart, Germany Tel: +49(0)711/685-65799 Fax: -55799 Mail: he...@hlrs.de Web: http://www.hlrs.de/people/hecht/ PGP Key available at: https://www.hlrs.de/fileadmin/user_upload/Martin_Hecht.pgp PGP Key Fingerprint: 41BB 33E9 7170 3864 D5B3 44AD 5490 010B 96C2 6E4A smime.p7s Description: S/MIME Cryptographic Signature ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org