Re: [lustre-discuss] bad performance with Lustre/ZFS on NVMe SSD

2018-04-10 Thread Alexander I Kulyavtsev
Ricardo,
It can be helpful to see output of commands on zfs pool host when you read 
files through lustre client; and directly through zfs:

# zpool iostat -lq -y zpool_name 1
# zpool iostat -w -y zpool_name 5
# zpool iostat -r -y zpool_name 5

-q queue statistics
-l Latency statistics

-r Request size histogram:
-w (undocumented) latency statistics

I did see different behavior of zfs reads on zfs pool for the same dd/fio 
command reading file from lustre mount on different host; and directly from zfs 
on OSS. I created separate zfs dataset with similar zfs settings on lustre 
zpool.
lustre IO seen on zfs pool with 128KB requests while dd/fio on zfs has 1MB 
requests. dd/fio command used 1MB IO.

zptevlfs6 sync_readsync_writeasync_readasync_write  scrub   
req_size  indaggindaggindaggindaggindagg
--  -  -  -  -  -  -  -  -  -  -
512 0  0  0  0  0  0  0  0  0  0
1K  0  0  0  0  0  0  0  0  0  0
2K  0  0  0  0  0  0  0  0  0  0
4K  0  0  0  0  0  0  0  0  0  0
8K  0  0  0  0  0  0  0  0  0  0
16K 0  0  0  0  0  0  0  0  0  0
32K 0  0  0  0  0  0  0  0  0  0
64K 0  0  0  0  0  0  0  0  0  0
128K0  0  0  0  2.00K  0  0  0  0  
0 <
256K0  0  0  0  0  0  0  0  0  0
512K0  0  0  0  0  0  0  0  0  0
1M  0  0  0  0125  0  0  0  0  
0<
2M  0  0  0  0  0  0  0  0  0  0
4M  0  0  0  0  0  0  0  0  0  0
8M  0  0  0  0  0  0  0  0  0  0
16M 0  0  0  0  0  0  0  0  0  0

^C

Alex.


On 4/9/18, 6:15 PM, "lustre-discuss on behalf of Dilger, Andreas" 
 
wrote:

On Apr 6, 2018, at 23:04, Riccardo Veraldi  
wrote:
> 
> So I'm struggling since months with these low performances on Lsutre/ZFS.
> 
> Looking for hints.
> 
> 3 OSSes, RHEL 74  Lustre 2.10.3 and zfs 0.7.6
> 
> each OSS has one  OST raidz
> 
>   pool: drpffb-ost01
>  state: ONLINE
>   scan: none requested
>   trim: completed on Fri Apr  6 21:53:04 2018 (after 0h3m)
> config:
> 
> NAME  STATE READ WRITE CKSUM
> drpffb-ost01  ONLINE   0 0 0
>   raidz1-0ONLINE   0 0 0
> nvme0n1   ONLINE   0 0 0
> nvme1n1   ONLINE   0 0 0
> nvme2n1   ONLINE   0 0 0
> nvme3n1   ONLINE   0 0 0
> nvme4n1   ONLINE   0 0 0
> nvme5n1   ONLINE   0 0 0
> 
> while the raidz without Lustre perform well at 6GB/s (1GB/s per disk),
> with Lustre on top of it performances are really poor.
> most of all they are not stable at all and go up and down between
> 1.5GB/s and 6GB/s. I Tested with obfilter-survey
> LNET is ok and working at 6GB/s (using infiniband FDR)
> 
> What could be the cause of OST performance going up and down like a
> roller coaster ?

Riccardo,
to take a step back for a minute, have you tested all of the devices
individually, and also concurrently with some low-level tool like
sgpdd or vdbench?  After that is known to be working, have you tested
with obdfilter-survey locally on the OSS, then remotely on the client(s)
so that we can isolate where the bottleneck is being hit.

Cheers, Andreas


> for reference here are few considerations:
> 
> filesystem parameters:
> 
> zfs set mountpoint=none drpffb-ost01
> zfs set sync=disabled drpffb-ost01
> zfs set atime=off drpffb-ost01
> zfs set redundant_metadata=most drpffb-ost01
> zfs set xattr=sa drpffb-ost01
> zfs set recordsize=1M drpffb-ost01
> 
> NVMe SSD are  4KB/sector
> 
> ashift=12
> 
> 
> ZFS module parameters
> 
> options zfs zfs_prefetch_disable=1
> options zfs zfs_txg_history=120
> options zfs metaslab_debug_unload=1
> #
> options zfs zfs_vdev_scheduler=deadline
> options zfs 

Re: [lustre-discuss] LNET Multi-rail

2018-04-10 Thread Hans Henrik Happe

Thanks for the info. A few observations I found so far:

- I think LU-10297 has solved my stability issues.
- lustre.conf does work with comma separation of interfaces. I.e. 
o2ib(ib0,ib1). However, peers need to be configured with ldev.conf or 
lnetctl.
- Defining peering ('lnetctl peer add' and ARP settings) on the client 
only, seems to make  multi-rail work both ways.


I'm a bit puzzled by the last observation. I expected that both ends 
needed to define peers? The client NID does not show as multi-rail 
(lnetctl peer show) on the server.


Cheers,
Hans Henrik

On 14-03-2018 03:00, Riccardo Veraldi wrote:

it works for me but you have to set up correctly lnet.conf either
manually or using  lnetctl to add peers. Then you export your
configuration in lnet.conf
and it will be loaded at reboot. I had to add my peers manually, I think
peer auto discovery is not yet operational on 2.10.3.
I suppose you are not using anymore lustre.conf to configure interfaces
(ib,tcp) and that you are using the new Lustre DLC style:

http://wiki.lustre.org/Dynamic_LNET_Configuration

Also I do not know if you did this yet but you should configure ARP
settings and also rt_tables for your ib interfaces if you use multi-rail.
Here is an example. I had to do that to have things working properly:

https://wiki.hpdd.intel.com/display/LNet/MR+Cluster+Setup

You may also want to check that your IB interfaces (if you have a dual
port infiniband like I have) can really double the performance when you
enable both of them.
The infiniband PCIe card bandwidth has to be capable of feeding enough
traffic to both dual ports or it will just be useful as a fail over device,
without improving the speed as you may want to.

In my configuration fail over is working. If I disconnect one port, the
other will still work. Of course if you disconnect it when traffic is
going through
you may have a problem with that stream of data. But new traffic will be
handled correctly. I do not know if there is a way to avoid this, I am
just talking about my experience and as I said I Am more interested in
performance than fail over.


Riccardo


On 3/13/18 8:05 AM, Hans Henrik Happe wrote:

Hi,

I'm testing LNET multi-rail with 2.10.3 and I ran into some questions
that I couldn't find in the documentation or elsewhere.

As I understand the design document "Dynamic peer discovery" will make
it possible to discover multi-rail peer without adding them manually?
Is that functionality in 2.10.3?

Will failover work without doing anything special? I've tested with
two IB ports and unplugging resulted in no I/O from client and
replugging didn't resolve it.

How do I make and active/passive setup? One example I would really
like to see in the documentation, is the obvious o2ib-tcp combination,
where tcp is used if o2ib is down and fails back if it comes op again.

Anyone using MR in production? Done at bit of testing with dual ib on
both server and client and had a few crashes.

Cheers,
Hans Henrik
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org





___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] building lustre 2.11.50 on CentOS 7.4

2018-04-10 Thread Martin Hecht

problem solved:

another git pull today, followed by autogen.sh and configure has made
the error go away.

I assume it was LU-10752 which was fixed by a patch by James Simons
(commit 6189ae07c5161d14c9e9f863a400045f923f2301) that was landed on the
hpdd git 16 hours ago.

Martin

On 04/09/2018 04:55 PM, Martin Hecht wrote:
> Hi,
>
> I'm trying to build lustre 2.11 from source, with ldiskfs on CentOS 7.4.
>
> patching the kernel for ldiskfs worked fine, I have installed and booted
> the patched kernel as well as the devel-rpm,  but when I run `make rpms`
> it exits with the following errors:
>
> Processing files: lustre-2.11.50-1.el7.centos.x86_64
> error: File not found:
> /tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/init.d/lsvcgss
> error: File not found:
> /tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/sysconfig/lsvcgss
> error: File not found:
> /tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/request-key.d/lgssc.conf
>
>
> RPM build errors:
>     File not found:
> /tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/init.d/lsvcgss
>     File not found:
> /tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/sysconfig/lsvcgss
>     File not found:
> /tmp/rpmbuild-lustre-build-KfxFi3eG/BUILDROOT/lustre-2.11.50-1.x86_64/etc/request-key.d/lgssc.conf
> make: *** [rpms] Error 1
>
> just `make` works fine, so the problem is something with packaging the
> rpms. Any hints?
>
> kind regards,
> Martin
>
>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


-- 
Dr. Martin Hecht
High Performance Computing Center Stuttgart (HLRS)
Office 0.051, HPCN Production, IT-Security
University of Stuttgart
Nobelstraße 19, 70569 Stuttgart, Germany
Tel: +49(0)711/685-65799  Fax: -55799
Mail: he...@hlrs.de
Web: http://www.hlrs.de/people/hecht/
PGP Key available at: https://www.hlrs.de/fileadmin/user_upload/Martin_Hecht.pgp
PGP Key Fingerprint: 41BB 33E9 7170 3864 D5B3 44AD 5490 010B 96C2 6E4A



smime.p7s
Description: S/MIME Cryptographic Signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org