I ran again my Lnet self test and  this time adding --concurrency=16  I
can use all of the IB bandwith (3.5GB/sec).

the only thing I do not understand is why ko2iblnd.conf is not loaded
properly and I had to remove the alias in the config file to allow
the proper peer_credit settings to be loaded.

thanks to everyone for helping

Riccardo

On 8/19/17 8:54 AM, Riccardo Veraldi wrote:
>
> I found out that ko2iblnd is not getting settings from
> /etc/modprobe/ko2iblnd.conf
> alias ko2iblnd-opa ko2iblnd
> options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024
> concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048
> fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4
>
> install ko2iblnd /usr/sbin/ko2iblnd-probe
>
> but if I modify ko2iblnd.conf like this, then settings are loaded:
>
> options ko2iblnd peer_credits=128 peer_credits_hiw=64 credits=1024
> concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048
> fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4
>
> install ko2iblnd /usr/sbin/ko2iblnd-probe
>
> Lnet tests show better behaviour but still I Would expect more than this.
> Is it possible to tune parameters in /etc/modprobe/ko2iblnd.conf so
> that Mellanox ConnectX-3 will work more efficiently ?
>
> [LNet Rates of servers]
> [R] Avg: 2286     RPC/s Min: 0        RPC/s Max: 4572     RPC/s
> [W] Avg: 3322     RPC/s Min: 0        RPC/s Max: 6643     RPC/s
> [LNet Bandwidth of servers]
> [R] Avg: 625.23   MiB/s Min: 0.00     MiB/s Max: 1250.46  MiB/s
> [W] Avg: 1035.85  MiB/s Min: 0.00     MiB/s Max: 2071.69  MiB/s
> [LNet Rates of servers]
> [R] Avg: 2286     RPC/s Min: 1        RPC/s Max: 4571     RPC/s
> [W] Avg: 3321     RPC/s Min: 1        RPC/s Max: 6641     RPC/s
> [LNet Bandwidth of servers]
> [R] Avg: 625.55   MiB/s Min: 0.00     MiB/s Max: 1251.11  MiB/s
> [W] Avg: 1035.05  MiB/s Min: 0.00     MiB/s Max: 2070.11  MiB/s
> [LNet Rates of servers]
> [R] Avg: 2291     RPC/s Min: 0        RPC/s Max: 4581     RPC/s
> [W] Avg: 3329     RPC/s Min: 0        RPC/s Max: 6657     RPC/s
> [LNet Bandwidth of servers]
> [R] Avg: 626.55   MiB/s Min: 0.00     MiB/s Max: 1253.11  MiB/s
> [W] Avg: 1038.05  MiB/s Min: 0.00     MiB/s Max: 2076.11  MiB/s
> session is ended
> ./lnet_test.sh: line 17: 23394 Terminated              lst stat servers
>
>
>
>
> On 8/19/17 4:20 AM, Arman Khalatyan wrote:
>> just minor comment,
>> you should push up performance of your nodes,they are not running in
>> the max cpu frequencies.Al tests might be inconsistent. in order to
>> get most of ib run following:
>> tuned-adm profile latency-performance
>> for more options use:
>> tuned-adm list
>>
>> It will be interesting to see the difference.
>>
>> Am 19.08.2017 3:57 vorm. schrieb "Riccardo Veraldi"
>> <riccardo.vera...@cnaf.infn.it <mailto:riccardo.vera...@cnaf.infn.it>>:
>>
>>     Hello Keith and Dennis, these are the test I ran.
>>
>>       * obdfilter-survey, shows that I Can saturate disk performance,
>>         the NVMe/ZFS backend is performing very well and it is faster
>>         then my Infiniband network
>>
>>     *pool          alloc   free   read  write   read  write**
>>     **------------  -----  -----  -----  -----  -----  -----**
>>     **drpffb-ost01  3.31T  3.19T      3  35.7K  16.0K  7.03G**
>>     **  raidz1      3.31T  3.19T      3  35.7K  16.0K  7.03G**
>>     **    nvme0n1       -      -      1  5.95K  7.99K  1.17G**
>>     **    nvme1n1       -      -      0  6.01K      0  1.18G**
>>     **    nvme2n1       -      -      0  5.93K      0  1.17G**
>>     **    nvme3n1       -      -      0  5.88K      0  1.16G**
>>     **    nvme4n1       -      -      1  5.95K  7.99K  1.17G**
>>     **    nvme5n1       -      -      0  5.96K      0  1.17G**
>>     **------------  -----  -----  -----  -----  -----  -----*
>>
>>     this are the tests results
>>
>>     Fri Aug 18 16:54:48 PDT 2017 Obdfilter-survey for case=disk from
>>     drp-tst-ffb01
>>     ost  1 sz 10485760K rsz 1024K obj    1 thr    1
>>     write*7633.08   *          SHORT rewrite 7558.78            
>>     SHORT read 3205.24 [3213.70, 3226.78]
>>     ost  1 sz 10485760K rsz 1024K obj    1 thr    2
>>     write*7996.89 *            SHORT rewrite 7903.42            
>>     SHORT read 5264.70             SHORT
>>     ost  1 sz 10485760K rsz 1024K obj    2 thr    2 write
>>     *7718.94*             SHORT rewrite 7977.84             SHORT
>>     read 5802.17             SHORT
>>
>>       * Lnet self test, and here I see the problems. For reference
>>         172.21.52.[83,84] are the two OSSes 172.21.52.86 is the
>>         reader/writer. Here is the script that I ran
>>
>>     #!/bin/bash
>>     export LST_SESSION=$$
>>     lst new_session read_write
>>     lst add_group servers 172.21.52.[83,84]@o2ib5
>>     lst add_group readers 172.21.52.86@o2ib5
>>     lst add_group writers 172.21.52.86@o2ib5
>>     lst add_batch bulk_rw
>>     lst add_test --batch bulk_rw --from readers --to servers \
>>     brw read check=simple size=1M
>>     lst add_test --batch bulk_rw --from writers --to servers \
>>     brw write check=full size=1M
>>     # start running
>>     lst run bulk_rw
>>     # display server stats for 30 seconds
>>     lst stat servers & sleep 30; kill $!
>>     # tear down
>>     lst end_session
>>
>>
>>     here the results
>>
>>     SESSION: read_write FEATURES: 1 TIMEOUT: 300 FORCE: No
>>     172.21.52.[83,84]@o2ib5 are added to session
>>     172.21.52.86@o2ib5 are added to session
>>     172.21.52.86@o2ib5 are added to session
>>     Test was added successfully
>>     Test was added successfully
>>     bulk_rw is running now
>>     [LNet Rates of servers]
>>     [R] Avg: 1751     RPC/s Min: 0        RPC/s Max: 3502     RPC/s
>>     [W] Avg: 2525     RPC/s Min: 0        RPC/s Max: 5050     RPC/s
>>     [LNet Bandwidth of servers]
>>     [R] Avg: 488.79   MiB/s Min: 0.00     MiB/s Max: 977.59   MiB/s
>>     [W] Avg: 773.99   MiB/s Min: 0.00     MiB/s Max: 1547.99  MiB/s
>>     [LNet Rates of servers]
>>     [R] Avg: 1718     RPC/s Min: 0        RPC/s Max: 3435     RPC/s
>>     [W] Avg: 2479     RPC/s Min: 0        RPC/s Max: 4958     RPC/s
>>     [LNet Bandwidth of servers]
>>     [R] Avg: 478.19   MiB/s Min: 0.00     MiB/s Max: 956.39   MiB/s
>>     [W] Avg: 761.74   MiB/s Min: 0.00     MiB/s Max: 1523.47  MiB/s
>>     [LNet Rates of servers]
>>     [R] Avg: 1734     RPC/s Min: 0        RPC/s Max: 3467     RPC/s
>>     [W] Avg: 2506     RPC/s Min: 0        RPC/s Max: 5012     RPC/s
>>     [LNet Bandwidth of servers]
>>     [R] Avg: 480.79   MiB/s Min: 0.00     MiB/s Max: 961.58   MiB/s
>>     [W] Avg: 772.49   MiB/s Min: 0.00     MiB/s Max: 1544.98  MiB/s
>>     [LNet Rates of servers]
>>     [R] Avg: 1722     RPC/s Min: 0        RPC/s Max: 3444     RPC/s
>>     [W] Avg: 2486     RPC/s Min: 0        RPC/s Max: 4972     RPC/s
>>     [LNet Bandwidth of servers]
>>     [R] Avg: 479.09   MiB/s Min: 0.00     MiB/s Max: 958.18   MiB/s
>>     [W] Avg: 764.19   MiB/s Min: 0.00     MiB/s Max: 1528.38  MiB/s
>>     [LNet Rates of servers]
>>     [R] Avg: 1741     RPC/s Min: 0        RPC/s Max: 3482     RPC/s
>>     [W] Avg: 2513     RPC/s Min: 0        RPC/s Max: 5025     RPC/s
>>     [LNet Bandwidth of servers]
>>     [R] Avg: 484.59   MiB/s Min: 0.00     MiB/s Max: 969.19   MiB/s
>>     [W] Avg: 771.94   MiB/s Min: 0.00     MiB/s Max: 1543.87  MiB/s
>>     session is ended
>>     ./lnet_test.sh: line 17:  4940 Terminated              lst stat
>>     servers
>>
>>     so looks like Lnet is really under performing  going at least
>>     half and less than InfiniBand capabilities.
>>     How can I find out what is causing this ?
>>
>>     running perf tools tests with infiniband tools I have good results:
>>
>>
>>     ************************************
>>     * Waiting for client to connect... *
>>     ************************************
>>
>>     
>> ---------------------------------------------------------------------------------------
>>                         Send BW Test
>>      Dual-port       : OFF        Device         : mlx4_0
>>      Number of qps   : 1        Transport type : IB
>>      Connection type : RC        Using SRQ      : OFF
>>      RX depth        : 512
>>      CQ Moderation   : 100
>>      Mtu             : 2048[B]
>>      Link type       : IB
>>      Max inline data : 0[B]
>>      rdma_cm QPs     : OFF
>>      Data ex. method : Ethernet
>>     
>> ---------------------------------------------------------------------------------------
>>      local address: LID 0x07 QPN 0x020f PSN 0xacc37a
>>      remote address: LID 0x0a QPN 0x020f PSN 0x91a069
>>     
>> ---------------------------------------------------------------------------------------
>>      #bytes     #iterations    BW peak[MB/sec]    BW
>>     average[MB/sec]   MsgRate[Mpps]
>>     Conflicting CPU frequency values detected: 1249.234000 !=
>>     1326.000000. CPU Frequency is not max.
>>      2          1000             0.00               11.99            
>>     6.285330
>>     Conflicting CPU frequency values detected: 1314.910000 !=
>>     1395.460000. CPU Frequency is not max.
>>      4          1000             0.00               28.26            
>>     7.409324
>>     Conflicting CPU frequency values detected: 1314.910000 !=
>>     1460.207000. CPU Frequency is not max.
>>      8          1000             0.00               54.47            
>>     7.139164
>>     Conflicting CPU frequency values detected: 1314.910000 !=
>>     1244.320000. CPU Frequency is not max.
>>      16         1000             0.00               113.13           
>>     7.413889
>>     Conflicting CPU frequency values detected: 1314.910000 !=
>>     1460.207000. CPU Frequency is not max.
>>      32         1000             0.00               226.07           
>>     7.407811
>>     Conflicting CPU frequency values detected: 1469.703000 !=
>>     1301.031000. CPU Frequency is not max.
>>      64         1000             0.00               452.12           
>>     7.407465
>>     Conflicting CPU frequency values detected: 1469.703000 !=
>>     1301.031000. CPU Frequency is not max.
>>      128        1000             0.00               845.45           
>>     6.925918
>>     Conflicting CPU frequency values detected: 1469.703000 !=
>>     1362.257000. CPU Frequency is not max.
>>      256        1000             0.00               1746.93          
>>     7.155406
>>     Conflicting CPU frequency values detected: 1469.703000 !=
>>     1362.257000. CPU Frequency is not max.
>>      512        1000             0.00               2766.93          
>>     5.666682
>>     Conflicting CPU frequency values detected: 1296.714000 !=
>>     1204.675000. CPU Frequency is not max.
>>      1024       1000             0.00               3516.26          
>>     3.600646
>>     Conflicting CPU frequency values detected: 1296.714000 !=
>>     1325.535000. CPU Frequency is not max.
>>      2048       1000             0.00               3630.93          
>>     1.859035
>>     Conflicting CPU frequency values detected: 1296.714000 !=
>>     1331.312000. CPU Frequency is not max.
>>      4096       1000             0.00               3702.39          
>>     0.947813
>>     Conflicting CPU frequency values detected: 1296.714000 !=
>>     1200.027000. CPU Frequency is not max.
>>      8192       1000             0.00               3724.82          
>>     0.476777
>>     Conflicting CPU frequency values detected: 1384.902000 !=
>>     1314.113000. CPU Frequency is not max.
>>      16384      1000             0.00               3731.21          
>>     0.238798
>>     Conflicting CPU frequency values detected: 1578.078000 !=
>>     1200.027000. CPU Frequency is not max.
>>      32768      1000             0.00               3735.32          
>>     0.119530
>>     Conflicting CPU frequency values detected: 1578.078000 !=
>>     1200.027000. CPU Frequency is not max.
>>      65536      1000             0.00               3736.98          
>>     0.059792
>>     Conflicting CPU frequency values detected: 1578.078000 !=
>>     1200.027000. CPU Frequency is not max.
>>      131072     1000             0.00               3737.80          
>>     0.029902
>>     Conflicting CPU frequency values detected: 1578.078000 !=
>>     1200.027000. CPU Frequency is not max.
>>      262144     1000             0.00               3738.43          
>>     0.014954
>>     Conflicting CPU frequency values detected: 1570.507000 !=
>>     1200.027000. CPU Frequency is not max.
>>      524288     1000             0.00               3738.50          
>>     0.007477
>>     Conflicting CPU frequency values detected: 1457.019000 !=
>>     1236.152000. CPU Frequency is not max.
>>      1048576    1000             0.00               3738.65          
>>     0.003739
>>     Conflicting CPU frequency values detected: 1411.597000 !=
>>     1234.957000. CPU Frequency is not max.
>>      2097152    1000             0.00               3738.65          
>>     0.001869
>>     Conflicting CPU frequency values detected: 1369.828000 !=
>>     1516.851000. CPU Frequency is not max.
>>      4194304    1000             0.00               3738.80          
>>     0.000935
>>     Conflicting CPU frequency values detected: 1564.664000 !=
>>     1247.574000. CPU Frequency is not max.
>>      8388608    1000             0.00               3738.76          
>>     0.000467
>>     
>> ---------------------------------------------------------------------------------------
>>
>>     RDMA modules are loaded
>>
>>     rpcrdma                90366  0
>>     rdma_ucm               26837  0
>>     ib_uverbs              51854  2 ib_ucm,rdma_ucm
>>     rdma_cm                53755  5
>>     rpcrdma,ko2iblnd,ib_iser,rdma_ucm,ib_isert
>>     ib_cm                  47149  5
>>     rdma_cm,ib_srp,ib_ucm,ib_srpt,ib_ipoib
>>     iw_cm                  46022  1 rdma_cm
>>     ib_core               210381  15
>>     
>> rdma_cm,ib_cm,iw_cm,rpcrdma,ko2iblnd,mlx4_ib,ib_srp,ib_ucm,ib_iser,ib_srpt,ib_umad,ib_uverbs,rdma_ucm,ib_ipoib,ib_isert
>>     sunrpc                334343  17
>>     nfs,nfsd,rpcsec_gss_krb5,auth_rpcgss,lockd,nfsv4,rpcrdma,nfs_acl
>>
>>     I do not know where to look to have Lnet performing faster. I am
>>     running my ib0 interface in connected mode with 65520 MTU size.
>>
>>     Any hint will be much appreciated
>>
>>     thank you
>>
>>     Rick
>>
>>
>>
>>
>>     On 8/18/17 9:05 AM, Mannthey, Keith wrote:
>>>     I would suggest you a few other tests to help isolate where the issue 
>>> might be.  
>>>
>>>     1. What is the single thread "DD" write speed?
>>>      
>>>     2. Lnet_selfttest:  Please see " Chapter 28. Testing Lustre Network 
>>> Performance (LNet Self-Test)" in the Lustre manual if this is a new test 
>>> for you. 
>>>     This will help show how much Lnet bandwith you have from your single 
>>> client.  There are tunable in the lnet later that can affect things.  Which 
>>> QRD HCA are you using?
>>>
>>>     3. OBDFilter_survey :  Please see " 29.3. Testing OST Performance 
>>> (obdfilter-survey)" in the Lustre manual.  This test will help demonstrate 
>>> what the backed NVMe/ZFS setup can do at the OBD layer in Lustre.  
>>>
>>>     Thanks,
>>>      Keith 
>>>     -----Original Message-----
>>>     From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org
>>>     <mailto:lustre-discuss-boun...@lists.lustre.org>] On Behalf Of Riccardo 
>>> Veraldi
>>>     Sent: Thursday, August 17, 2017 10:48 PM
>>>     To: Dennis Nelson <dnel...@ddn.com> <mailto:dnel...@ddn.com>; 
>>> lustre-discuss@lists.lustre.org
>>>     <mailto:lustre-discuss@lists.lustre.org>
>>>     Subject: Re: [lustre-discuss] Lustre poor performance
>>>
>>>     this is my lustre.conf
>>>
>>>     [drp-tst-ffb01:~]$ cat /etc/modprobe.d/lustre.conf options lnet 
>>> networks=o2ib5(ib0),tcp5(enp1s0f0)
>>>
>>>     data transfer is over infiniband
>>>
>>>     ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 65520
>>>             inet 172.21.52.83  netmask 255.255.252.0  broadcast 
>>> 172.21.55.255
>>>
>>>
>>>     On 8/17/17 10:45 PM, Riccardo Veraldi wrote:
>>>>     On 8/17/17 9:22 PM, Dennis Nelson wrote:
>>>>>     It appears that you are running iozone on a single client?  What kind 
>>>>> of network is tcp5?  Have you looked at the network to make sure it is 
>>>>> not the bottleneck?
>>>>>
>>>>     yes the data transfer is on ib0 interface and I did a memory to memory 
>>>>     test through InfiniBand QDR  resulting in 3.7GB/sec.
>>>>     tcp is used to connect to the MDS. It is tcp5 to differentiate it from 
>>>>     my other many Lustre clusters. I could have called it tcp but it does 
>>>>     not make any difference performance wise.
>>>>     I ran the test from one single node yes, I ran the same test also 
>>>>     locally on a zpool identical to the one on the Lustre OSS.
>>>>      Ihave 4 identical servers each of them with the aame nvme disks:
>>>>
>>>>     server1: OSS - OST1 Lustre/ZFS  raidz1
>>>>
>>>>     server2: OSS - OST2 Lustre/ZFS  raidz1
>>>>
>>>>     server3: local ZFS raidz1
>>>>
>>>>     server4: Lustre client
>>>>
>>>>
>>>>
>>>>     _______________________________________________
>>>>     lustre-discuss mailing list
>>>>     lustre-discuss@lists.lustre.org
>>>>     <mailto:lustre-discuss@lists.lustre.org>
>>>>     http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>>     <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
>>>     _______________________________________________
>>>     lustre-discuss mailing list
>>>     lustre-discuss@lists.lustre.org
>>>     <mailto:lustre-discuss@lists.lustre.org>
>>>     http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>     <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
>>>
>>     _______________________________________________ lustre-discuss
>>     mailing list lustre-discuss@lists.lustre.org
>>     <mailto:lustre-discuss@lists.lustre.org>
>>     http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>     <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org> 
>>
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to