I found out that ko2iblnd is not getting settings from
/etc/modprobe/ko2iblnd.conf
alias ko2iblnd-opa ko2iblnd
options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024
concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048
fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4

install ko2iblnd /usr/sbin/ko2iblnd-probe

but if I modify ko2iblnd.conf like this, then settings are loaded:

options ko2iblnd peer_credits=128 peer_credits_hiw=64 credits=1024
concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048
fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4

install ko2iblnd /usr/sbin/ko2iblnd-probe

Lnet tests show better behaviour but still I Would expect more than this.
Is it possible to tune parameters in /etc/modprobe/ko2iblnd.conf so that
Mellanox ConnectX-3 will work more efficiently ?

[LNet Rates of servers]
[R] Avg: 2286     RPC/s Min: 0        RPC/s Max: 4572     RPC/s
[W] Avg: 3322     RPC/s Min: 0        RPC/s Max: 6643     RPC/s
[LNet Bandwidth of servers]
[R] Avg: 625.23   MiB/s Min: 0.00     MiB/s Max: 1250.46  MiB/s
[W] Avg: 1035.85  MiB/s Min: 0.00     MiB/s Max: 2071.69  MiB/s
[LNet Rates of servers]
[R] Avg: 2286     RPC/s Min: 1        RPC/s Max: 4571     RPC/s
[W] Avg: 3321     RPC/s Min: 1        RPC/s Max: 6641     RPC/s
[LNet Bandwidth of servers]
[R] Avg: 625.55   MiB/s Min: 0.00     MiB/s Max: 1251.11  MiB/s
[W] Avg: 1035.05  MiB/s Min: 0.00     MiB/s Max: 2070.11  MiB/s
[LNet Rates of servers]
[R] Avg: 2291     RPC/s Min: 0        RPC/s Max: 4581     RPC/s
[W] Avg: 3329     RPC/s Min: 0        RPC/s Max: 6657     RPC/s
[LNet Bandwidth of servers]
[R] Avg: 626.55   MiB/s Min: 0.00     MiB/s Max: 1253.11  MiB/s
[W] Avg: 1038.05  MiB/s Min: 0.00     MiB/s Max: 2076.11  MiB/s
session is ended
./lnet_test.sh: line 17: 23394 Terminated              lst stat servers




On 8/19/17 4:20 AM, Arman Khalatyan wrote:
> just minor comment,
> you should push up performance of your nodes,they are not running in
> the max cpu frequencies.Al tests might be inconsistent. in order to
> get most of ib run following:
> tuned-adm profile latency-performance
> for more options use:
> tuned-adm list
>
> It will be interesting to see the difference.
>
> Am 19.08.2017 3:57 vorm. schrieb "Riccardo Veraldi"
> <riccardo.vera...@cnaf.infn.it <mailto:riccardo.vera...@cnaf.infn.it>>:
>
>     Hello Keith and Dennis, these are the test I ran.
>
>       * obdfilter-survey, shows that I Can saturate disk performance,
>         the NVMe/ZFS backend is performing very well and it is faster
>         then my Infiniband network
>
>     *pool          alloc   free   read  write   read  write**
>     **------------  -----  -----  -----  -----  -----  -----**
>     **drpffb-ost01  3.31T  3.19T      3  35.7K  16.0K  7.03G**
>     **  raidz1      3.31T  3.19T      3  35.7K  16.0K  7.03G**
>     **    nvme0n1       -      -      1  5.95K  7.99K  1.17G**
>     **    nvme1n1       -      -      0  6.01K      0  1.18G**
>     **    nvme2n1       -      -      0  5.93K      0  1.17G**
>     **    nvme3n1       -      -      0  5.88K      0  1.16G**
>     **    nvme4n1       -      -      1  5.95K  7.99K  1.17G**
>     **    nvme5n1       -      -      0  5.96K      0  1.17G**
>     **------------  -----  -----  -----  -----  -----  -----*
>
>     this are the tests results
>
>     Fri Aug 18 16:54:48 PDT 2017 Obdfilter-survey for case=disk from
>     drp-tst-ffb01
>     ost  1 sz 10485760K rsz 1024K obj    1 thr    1
>     write*7633.08   *          SHORT rewrite 7558.78             SHORT
>     read 3205.24 [3213.70, 3226.78]
>     ost  1 sz 10485760K rsz 1024K obj    1 thr    2
>     write*7996.89 *            SHORT rewrite 7903.42             SHORT
>     read 5264.70             SHORT
>     ost  1 sz 10485760K rsz 1024K obj    2 thr    2 write
>     *7718.94*             SHORT rewrite 7977.84             SHORT read
>     5802.17             SHORT
>
>       * Lnet self test, and here I see the problems. For reference
>         172.21.52.[83,84] are the two OSSes 172.21.52.86 is the
>         reader/writer. Here is the script that I ran
>
>     #!/bin/bash
>     export LST_SESSION=$$
>     lst new_session read_write
>     lst add_group servers 172.21.52.[83,84]@o2ib5
>     lst add_group readers 172.21.52.86@o2ib5
>     lst add_group writers 172.21.52.86@o2ib5
>     lst add_batch bulk_rw
>     lst add_test --batch bulk_rw --from readers --to servers \
>     brw read check=simple size=1M
>     lst add_test --batch bulk_rw --from writers --to servers \
>     brw write check=full size=1M
>     # start running
>     lst run bulk_rw
>     # display server stats for 30 seconds
>     lst stat servers & sleep 30; kill $!
>     # tear down
>     lst end_session
>
>
>     here the results
>
>     SESSION: read_write FEATURES: 1 TIMEOUT: 300 FORCE: No
>     172.21.52.[83,84]@o2ib5 are added to session
>     172.21.52.86@o2ib5 are added to session
>     172.21.52.86@o2ib5 are added to session
>     Test was added successfully
>     Test was added successfully
>     bulk_rw is running now
>     [LNet Rates of servers]
>     [R] Avg: 1751     RPC/s Min: 0        RPC/s Max: 3502     RPC/s
>     [W] Avg: 2525     RPC/s Min: 0        RPC/s Max: 5050     RPC/s
>     [LNet Bandwidth of servers]
>     [R] Avg: 488.79   MiB/s Min: 0.00     MiB/s Max: 977.59   MiB/s
>     [W] Avg: 773.99   MiB/s Min: 0.00     MiB/s Max: 1547.99  MiB/s
>     [LNet Rates of servers]
>     [R] Avg: 1718     RPC/s Min: 0        RPC/s Max: 3435     RPC/s
>     [W] Avg: 2479     RPC/s Min: 0        RPC/s Max: 4958     RPC/s
>     [LNet Bandwidth of servers]
>     [R] Avg: 478.19   MiB/s Min: 0.00     MiB/s Max: 956.39   MiB/s
>     [W] Avg: 761.74   MiB/s Min: 0.00     MiB/s Max: 1523.47  MiB/s
>     [LNet Rates of servers]
>     [R] Avg: 1734     RPC/s Min: 0        RPC/s Max: 3467     RPC/s
>     [W] Avg: 2506     RPC/s Min: 0        RPC/s Max: 5012     RPC/s
>     [LNet Bandwidth of servers]
>     [R] Avg: 480.79   MiB/s Min: 0.00     MiB/s Max: 961.58   MiB/s
>     [W] Avg: 772.49   MiB/s Min: 0.00     MiB/s Max: 1544.98  MiB/s
>     [LNet Rates of servers]
>     [R] Avg: 1722     RPC/s Min: 0        RPC/s Max: 3444     RPC/s
>     [W] Avg: 2486     RPC/s Min: 0        RPC/s Max: 4972     RPC/s
>     [LNet Bandwidth of servers]
>     [R] Avg: 479.09   MiB/s Min: 0.00     MiB/s Max: 958.18   MiB/s
>     [W] Avg: 764.19   MiB/s Min: 0.00     MiB/s Max: 1528.38  MiB/s
>     [LNet Rates of servers]
>     [R] Avg: 1741     RPC/s Min: 0        RPC/s Max: 3482     RPC/s
>     [W] Avg: 2513     RPC/s Min: 0        RPC/s Max: 5025     RPC/s
>     [LNet Bandwidth of servers]
>     [R] Avg: 484.59   MiB/s Min: 0.00     MiB/s Max: 969.19   MiB/s
>     [W] Avg: 771.94   MiB/s Min: 0.00     MiB/s Max: 1543.87  MiB/s
>     session is ended
>     ./lnet_test.sh: line 17:  4940 Terminated              lst stat
>     servers
>
>     so looks like Lnet is really under performing  going at least half
>     and less than InfiniBand capabilities.
>     How can I find out what is causing this ?
>
>     running perf tools tests with infiniband tools I have good results:
>
>
>     ************************************
>     * Waiting for client to connect... *
>     ************************************
>
>     
> ---------------------------------------------------------------------------------------
>                         Send BW Test
>      Dual-port       : OFF        Device         : mlx4_0
>      Number of qps   : 1        Transport type : IB
>      Connection type : RC        Using SRQ      : OFF
>      RX depth        : 512
>      CQ Moderation   : 100
>      Mtu             : 2048[B]
>      Link type       : IB
>      Max inline data : 0[B]
>      rdma_cm QPs     : OFF
>      Data ex. method : Ethernet
>     
> ---------------------------------------------------------------------------------------
>      local address: LID 0x07 QPN 0x020f PSN 0xacc37a
>      remote address: LID 0x0a QPN 0x020f PSN 0x91a069
>     
> ---------------------------------------------------------------------------------------
>      #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]  
>     MsgRate[Mpps]
>     Conflicting CPU frequency values detected: 1249.234000 !=
>     1326.000000. CPU Frequency is not max.
>      2          1000             0.00               11.99            
>     6.285330
>     Conflicting CPU frequency values detected: 1314.910000 !=
>     1395.460000. CPU Frequency is not max.
>      4          1000             0.00               28.26            
>     7.409324
>     Conflicting CPU frequency values detected: 1314.910000 !=
>     1460.207000. CPU Frequency is not max.
>      8          1000             0.00               54.47            
>     7.139164
>     Conflicting CPU frequency values detected: 1314.910000 !=
>     1244.320000. CPU Frequency is not max.
>      16         1000             0.00               113.13           
>     7.413889
>     Conflicting CPU frequency values detected: 1314.910000 !=
>     1460.207000. CPU Frequency is not max.
>      32         1000             0.00               226.07           
>     7.407811
>     Conflicting CPU frequency values detected: 1469.703000 !=
>     1301.031000. CPU Frequency is not max.
>      64         1000             0.00               452.12           
>     7.407465
>     Conflicting CPU frequency values detected: 1469.703000 !=
>     1301.031000. CPU Frequency is not max.
>      128        1000             0.00               845.45           
>     6.925918
>     Conflicting CPU frequency values detected: 1469.703000 !=
>     1362.257000. CPU Frequency is not max.
>      256        1000             0.00               1746.93          
>     7.155406
>     Conflicting CPU frequency values detected: 1469.703000 !=
>     1362.257000. CPU Frequency is not max.
>      512        1000             0.00               2766.93          
>     5.666682
>     Conflicting CPU frequency values detected: 1296.714000 !=
>     1204.675000. CPU Frequency is not max.
>      1024       1000             0.00               3516.26          
>     3.600646
>     Conflicting CPU frequency values detected: 1296.714000 !=
>     1325.535000. CPU Frequency is not max.
>      2048       1000             0.00               3630.93          
>     1.859035
>     Conflicting CPU frequency values detected: 1296.714000 !=
>     1331.312000. CPU Frequency is not max.
>      4096       1000             0.00               3702.39          
>     0.947813
>     Conflicting CPU frequency values detected: 1296.714000 !=
>     1200.027000. CPU Frequency is not max.
>      8192       1000             0.00               3724.82          
>     0.476777
>     Conflicting CPU frequency values detected: 1384.902000 !=
>     1314.113000. CPU Frequency is not max.
>      16384      1000             0.00               3731.21          
>     0.238798
>     Conflicting CPU frequency values detected: 1578.078000 !=
>     1200.027000. CPU Frequency is not max.
>      32768      1000             0.00               3735.32          
>     0.119530
>     Conflicting CPU frequency values detected: 1578.078000 !=
>     1200.027000. CPU Frequency is not max.
>      65536      1000             0.00               3736.98          
>     0.059792
>     Conflicting CPU frequency values detected: 1578.078000 !=
>     1200.027000. CPU Frequency is not max.
>      131072     1000             0.00               3737.80          
>     0.029902
>     Conflicting CPU frequency values detected: 1578.078000 !=
>     1200.027000. CPU Frequency is not max.
>      262144     1000             0.00               3738.43          
>     0.014954
>     Conflicting CPU frequency values detected: 1570.507000 !=
>     1200.027000. CPU Frequency is not max.
>      524288     1000             0.00               3738.50          
>     0.007477
>     Conflicting CPU frequency values detected: 1457.019000 !=
>     1236.152000. CPU Frequency is not max.
>      1048576    1000             0.00               3738.65          
>     0.003739
>     Conflicting CPU frequency values detected: 1411.597000 !=
>     1234.957000. CPU Frequency is not max.
>      2097152    1000             0.00               3738.65          
>     0.001869
>     Conflicting CPU frequency values detected: 1369.828000 !=
>     1516.851000. CPU Frequency is not max.
>      4194304    1000             0.00               3738.80          
>     0.000935
>     Conflicting CPU frequency values detected: 1564.664000 !=
>     1247.574000. CPU Frequency is not max.
>      8388608    1000             0.00               3738.76          
>     0.000467
>     
> ---------------------------------------------------------------------------------------
>
>     RDMA modules are loaded
>
>     rpcrdma                90366  0
>     rdma_ucm               26837  0
>     ib_uverbs              51854  2 ib_ucm,rdma_ucm
>     rdma_cm                53755  5
>     rpcrdma,ko2iblnd,ib_iser,rdma_ucm,ib_isert
>     ib_cm                  47149  5 rdma_cm,ib_srp,ib_ucm,ib_srpt,ib_ipoib
>     iw_cm                  46022  1 rdma_cm
>     ib_core               210381  15
>     
> rdma_cm,ib_cm,iw_cm,rpcrdma,ko2iblnd,mlx4_ib,ib_srp,ib_ucm,ib_iser,ib_srpt,ib_umad,ib_uverbs,rdma_ucm,ib_ipoib,ib_isert
>     sunrpc                334343  17
>     nfs,nfsd,rpcsec_gss_krb5,auth_rpcgss,lockd,nfsv4,rpcrdma,nfs_acl
>
>     I do not know where to look to have Lnet performing faster. I am
>     running my ib0 interface in connected mode with 65520 MTU size.
>
>     Any hint will be much appreciated
>
>     thank you
>
>     Rick
>
>
>
>
>     On 8/18/17 9:05 AM, Mannthey, Keith wrote:
>>     I would suggest you a few other tests to help isolate where the issue 
>> might be.  
>>
>>     1. What is the single thread "DD" write speed?
>>      
>>     2. Lnet_selfttest:  Please see " Chapter 28. Testing Lustre Network 
>> Performance (LNet Self-Test)" in the Lustre manual if this is a new test for 
>> you. 
>>     This will help show how much Lnet bandwith you have from your single 
>> client.  There are tunable in the lnet later that can affect things.  Which 
>> QRD HCA are you using?
>>
>>     3. OBDFilter_survey :  Please see " 29.3. Testing OST Performance 
>> (obdfilter-survey)" in the Lustre manual.  This test will help demonstrate 
>> what the backed NVMe/ZFS setup can do at the OBD layer in Lustre.  
>>
>>     Thanks,
>>      Keith 
>>     -----Original Message-----
>>     From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org
>>     <mailto:lustre-discuss-boun...@lists.lustre.org>] On Behalf Of Riccardo 
>> Veraldi
>>     Sent: Thursday, August 17, 2017 10:48 PM
>>     To: Dennis Nelson <dnel...@ddn.com> <mailto:dnel...@ddn.com>; 
>> lustre-discuss@lists.lustre.org
>>     <mailto:lustre-discuss@lists.lustre.org>
>>     Subject: Re: [lustre-discuss] Lustre poor performance
>>
>>     this is my lustre.conf
>>
>>     [drp-tst-ffb01:~]$ cat /etc/modprobe.d/lustre.conf options lnet 
>> networks=o2ib5(ib0),tcp5(enp1s0f0)
>>
>>     data transfer is over infiniband
>>
>>     ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 65520
>>             inet 172.21.52.83  netmask 255.255.252.0  broadcast 172.21.55.255
>>
>>
>>     On 8/17/17 10:45 PM, Riccardo Veraldi wrote:
>>>     On 8/17/17 9:22 PM, Dennis Nelson wrote:
>>>>     It appears that you are running iozone on a single client?  What kind 
>>>> of network is tcp5?  Have you looked at the network to make sure it is not 
>>>> the bottleneck?
>>>>
>>>     yes the data transfer is on ib0 interface and I did a memory to memory 
>>>     test through InfiniBand QDR  resulting in 3.7GB/sec.
>>>     tcp is used to connect to the MDS. It is tcp5 to differentiate it from 
>>>     my other many Lustre clusters. I could have called it tcp but it does 
>>>     not make any difference performance wise.
>>>     I ran the test from one single node yes, I ran the same test also 
>>>     locally on a zpool identical to the one on the Lustre OSS.
>>>      Ihave 4 identical servers each of them with the aame nvme disks:
>>>
>>>     server1: OSS - OST1 Lustre/ZFS  raidz1
>>>
>>>     server2: OSS - OST2 Lustre/ZFS  raidz1
>>>
>>>     server3: local ZFS raidz1
>>>
>>>     server4: Lustre client
>>>
>>>
>>>
>>>     _______________________________________________
>>>     lustre-discuss mailing list
>>>     lustre-discuss@lists.lustre.org
>>>     <mailto:lustre-discuss@lists.lustre.org>
>>>     http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>     <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
>>     _______________________________________________
>>     lustre-discuss mailing list
>>     lustre-discuss@lists.lustre.org
>>     <mailto:lustre-discuss@lists.lustre.org>
>>     http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>     <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
>>
>     _______________________________________________ lustre-discuss
>     mailing list lustre-discuss@lists.lustre.org
>     <mailto:lustre-discuss@lists.lustre.org>
>     http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>     <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org> 
>
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to