I ran again my Lnet self test and this time adding --concurrency=16 I can use all of the IB bandwith (3.5GB/sec).
the only thing I do not understand is why ko2iblnd.conf is not loaded properly and I had to remove the alias in the config file to allow the proper peer_credit settings to be loaded. thanks to everyone for helping Riccardo On 8/19/17 8:54 AM, Riccardo Veraldi wrote: > > I found out that ko2iblnd is not getting settings from > /etc/modprobe/ko2iblnd.conf > alias ko2iblnd-opa ko2iblnd > options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 > concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 > fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4 > > install ko2iblnd /usr/sbin/ko2iblnd-probe > > but if I modify ko2iblnd.conf like this, then settings are loaded: > > options ko2iblnd peer_credits=128 peer_credits_hiw=64 credits=1024 > concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 > fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4 > > install ko2iblnd /usr/sbin/ko2iblnd-probe > > Lnet tests show better behaviour but still I Would expect more than this. > Is it possible to tune parameters in /etc/modprobe/ko2iblnd.conf so > that Mellanox ConnectX-3 will work more efficiently ? > > [LNet Rates of servers] > [R] Avg: 2286 RPC/s Min: 0 RPC/s Max: 4572 RPC/s > [W] Avg: 3322 RPC/s Min: 0 RPC/s Max: 6643 RPC/s > [LNet Bandwidth of servers] > [R] Avg: 625.23 MiB/s Min: 0.00 MiB/s Max: 1250.46 MiB/s > [W] Avg: 1035.85 MiB/s Min: 0.00 MiB/s Max: 2071.69 MiB/s > [LNet Rates of servers] > [R] Avg: 2286 RPC/s Min: 1 RPC/s Max: 4571 RPC/s > [W] Avg: 3321 RPC/s Min: 1 RPC/s Max: 6641 RPC/s > [LNet Bandwidth of servers] > [R] Avg: 625.55 MiB/s Min: 0.00 MiB/s Max: 1251.11 MiB/s > [W] Avg: 1035.05 MiB/s Min: 0.00 MiB/s Max: 2070.11 MiB/s > [LNet Rates of servers] > [R] Avg: 2291 RPC/s Min: 0 RPC/s Max: 4581 RPC/s > [W] Avg: 3329 RPC/s Min: 0 RPC/s Max: 6657 RPC/s > [LNet Bandwidth of servers] > [R] Avg: 626.55 MiB/s Min: 0.00 MiB/s Max: 1253.11 MiB/s > [W] Avg: 1038.05 MiB/s Min: 0.00 MiB/s Max: 2076.11 MiB/s > session is ended > ./lnet_test.sh: line 17: 23394 Terminated lst stat servers > > > > > On 8/19/17 4:20 AM, Arman Khalatyan wrote: >> just minor comment, >> you should push up performance of your nodes,they are not running in >> the max cpu frequencies.Al tests might be inconsistent. in order to >> get most of ib run following: >> tuned-adm profile latency-performance >> for more options use: >> tuned-adm list >> >> It will be interesting to see the difference. >> >> Am 19.08.2017 3:57 vorm. schrieb "Riccardo Veraldi" >> <riccardo.vera...@cnaf.infn.it <mailto:riccardo.vera...@cnaf.infn.it>>: >> >> Hello Keith and Dennis, these are the test I ran. >> >> * obdfilter-survey, shows that I Can saturate disk performance, >> the NVMe/ZFS backend is performing very well and it is faster >> then my Infiniband network >> >> *pool alloc free read write read write** >> **------------ ----- ----- ----- ----- ----- -----** >> **drpffb-ost01 3.31T 3.19T 3 35.7K 16.0K 7.03G** >> ** raidz1 3.31T 3.19T 3 35.7K 16.0K 7.03G** >> ** nvme0n1 - - 1 5.95K 7.99K 1.17G** >> ** nvme1n1 - - 0 6.01K 0 1.18G** >> ** nvme2n1 - - 0 5.93K 0 1.17G** >> ** nvme3n1 - - 0 5.88K 0 1.16G** >> ** nvme4n1 - - 1 5.95K 7.99K 1.17G** >> ** nvme5n1 - - 0 5.96K 0 1.17G** >> **------------ ----- ----- ----- ----- ----- -----* >> >> this are the tests results >> >> Fri Aug 18 16:54:48 PDT 2017 Obdfilter-survey for case=disk from >> drp-tst-ffb01 >> ost 1 sz 10485760K rsz 1024K obj 1 thr 1 >> write*7633.08 * SHORT rewrite 7558.78 >> SHORT read 3205.24 [3213.70, 3226.78] >> ost 1 sz 10485760K rsz 1024K obj 1 thr 2 >> write*7996.89 * SHORT rewrite 7903.42 >> SHORT read 5264.70 SHORT >> ost 1 sz 10485760K rsz 1024K obj 2 thr 2 write >> *7718.94* SHORT rewrite 7977.84 SHORT >> read 5802.17 SHORT >> >> * Lnet self test, and here I see the problems. For reference >> 172.21.52.[83,84] are the two OSSes 172.21.52.86 is the >> reader/writer. Here is the script that I ran >> >> #!/bin/bash >> export LST_SESSION=$$ >> lst new_session read_write >> lst add_group servers 172.21.52.[83,84]@o2ib5 >> lst add_group readers 172.21.52.86@o2ib5 >> lst add_group writers 172.21.52.86@o2ib5 >> lst add_batch bulk_rw >> lst add_test --batch bulk_rw --from readers --to servers \ >> brw read check=simple size=1M >> lst add_test --batch bulk_rw --from writers --to servers \ >> brw write check=full size=1M >> # start running >> lst run bulk_rw >> # display server stats for 30 seconds >> lst stat servers & sleep 30; kill $! >> # tear down >> lst end_session >> >> >> here the results >> >> SESSION: read_write FEATURES: 1 TIMEOUT: 300 FORCE: No >> 172.21.52.[83,84]@o2ib5 are added to session >> 172.21.52.86@o2ib5 are added to session >> 172.21.52.86@o2ib5 are added to session >> Test was added successfully >> Test was added successfully >> bulk_rw is running now >> [LNet Rates of servers] >> [R] Avg: 1751 RPC/s Min: 0 RPC/s Max: 3502 RPC/s >> [W] Avg: 2525 RPC/s Min: 0 RPC/s Max: 5050 RPC/s >> [LNet Bandwidth of servers] >> [R] Avg: 488.79 MiB/s Min: 0.00 MiB/s Max: 977.59 MiB/s >> [W] Avg: 773.99 MiB/s Min: 0.00 MiB/s Max: 1547.99 MiB/s >> [LNet Rates of servers] >> [R] Avg: 1718 RPC/s Min: 0 RPC/s Max: 3435 RPC/s >> [W] Avg: 2479 RPC/s Min: 0 RPC/s Max: 4958 RPC/s >> [LNet Bandwidth of servers] >> [R] Avg: 478.19 MiB/s Min: 0.00 MiB/s Max: 956.39 MiB/s >> [W] Avg: 761.74 MiB/s Min: 0.00 MiB/s Max: 1523.47 MiB/s >> [LNet Rates of servers] >> [R] Avg: 1734 RPC/s Min: 0 RPC/s Max: 3467 RPC/s >> [W] Avg: 2506 RPC/s Min: 0 RPC/s Max: 5012 RPC/s >> [LNet Bandwidth of servers] >> [R] Avg: 480.79 MiB/s Min: 0.00 MiB/s Max: 961.58 MiB/s >> [W] Avg: 772.49 MiB/s Min: 0.00 MiB/s Max: 1544.98 MiB/s >> [LNet Rates of servers] >> [R] Avg: 1722 RPC/s Min: 0 RPC/s Max: 3444 RPC/s >> [W] Avg: 2486 RPC/s Min: 0 RPC/s Max: 4972 RPC/s >> [LNet Bandwidth of servers] >> [R] Avg: 479.09 MiB/s Min: 0.00 MiB/s Max: 958.18 MiB/s >> [W] Avg: 764.19 MiB/s Min: 0.00 MiB/s Max: 1528.38 MiB/s >> [LNet Rates of servers] >> [R] Avg: 1741 RPC/s Min: 0 RPC/s Max: 3482 RPC/s >> [W] Avg: 2513 RPC/s Min: 0 RPC/s Max: 5025 RPC/s >> [LNet Bandwidth of servers] >> [R] Avg: 484.59 MiB/s Min: 0.00 MiB/s Max: 969.19 MiB/s >> [W] Avg: 771.94 MiB/s Min: 0.00 MiB/s Max: 1543.87 MiB/s >> session is ended >> ./lnet_test.sh: line 17: 4940 Terminated lst stat >> servers >> >> so looks like Lnet is really under performing going at least >> half and less than InfiniBand capabilities. >> How can I find out what is causing this ? >> >> running perf tools tests with infiniband tools I have good results: >> >> >> ************************************ >> * Waiting for client to connect... * >> ************************************ >> >> >> --------------------------------------------------------------------------------------- >> Send BW Test >> Dual-port : OFF Device : mlx4_0 >> Number of qps : 1 Transport type : IB >> Connection type : RC Using SRQ : OFF >> RX depth : 512 >> CQ Moderation : 100 >> Mtu : 2048[B] >> Link type : IB >> Max inline data : 0[B] >> rdma_cm QPs : OFF >> Data ex. method : Ethernet >> >> --------------------------------------------------------------------------------------- >> local address: LID 0x07 QPN 0x020f PSN 0xacc37a >> remote address: LID 0x0a QPN 0x020f PSN 0x91a069 >> >> --------------------------------------------------------------------------------------- >> #bytes #iterations BW peak[MB/sec] BW >> average[MB/sec] MsgRate[Mpps] >> Conflicting CPU frequency values detected: 1249.234000 != >> 1326.000000. CPU Frequency is not max. >> 2 1000 0.00 11.99 >> 6.285330 >> Conflicting CPU frequency values detected: 1314.910000 != >> 1395.460000. CPU Frequency is not max. >> 4 1000 0.00 28.26 >> 7.409324 >> Conflicting CPU frequency values detected: 1314.910000 != >> 1460.207000. CPU Frequency is not max. >> 8 1000 0.00 54.47 >> 7.139164 >> Conflicting CPU frequency values detected: 1314.910000 != >> 1244.320000. CPU Frequency is not max. >> 16 1000 0.00 113.13 >> 7.413889 >> Conflicting CPU frequency values detected: 1314.910000 != >> 1460.207000. CPU Frequency is not max. >> 32 1000 0.00 226.07 >> 7.407811 >> Conflicting CPU frequency values detected: 1469.703000 != >> 1301.031000. CPU Frequency is not max. >> 64 1000 0.00 452.12 >> 7.407465 >> Conflicting CPU frequency values detected: 1469.703000 != >> 1301.031000. CPU Frequency is not max. >> 128 1000 0.00 845.45 >> 6.925918 >> Conflicting CPU frequency values detected: 1469.703000 != >> 1362.257000. CPU Frequency is not max. >> 256 1000 0.00 1746.93 >> 7.155406 >> Conflicting CPU frequency values detected: 1469.703000 != >> 1362.257000. CPU Frequency is not max. >> 512 1000 0.00 2766.93 >> 5.666682 >> Conflicting CPU frequency values detected: 1296.714000 != >> 1204.675000. CPU Frequency is not max. >> 1024 1000 0.00 3516.26 >> 3.600646 >> Conflicting CPU frequency values detected: 1296.714000 != >> 1325.535000. CPU Frequency is not max. >> 2048 1000 0.00 3630.93 >> 1.859035 >> Conflicting CPU frequency values detected: 1296.714000 != >> 1331.312000. CPU Frequency is not max. >> 4096 1000 0.00 3702.39 >> 0.947813 >> Conflicting CPU frequency values detected: 1296.714000 != >> 1200.027000. CPU Frequency is not max. >> 8192 1000 0.00 3724.82 >> 0.476777 >> Conflicting CPU frequency values detected: 1384.902000 != >> 1314.113000. CPU Frequency is not max. >> 16384 1000 0.00 3731.21 >> 0.238798 >> Conflicting CPU frequency values detected: 1578.078000 != >> 1200.027000. CPU Frequency is not max. >> 32768 1000 0.00 3735.32 >> 0.119530 >> Conflicting CPU frequency values detected: 1578.078000 != >> 1200.027000. CPU Frequency is not max. >> 65536 1000 0.00 3736.98 >> 0.059792 >> Conflicting CPU frequency values detected: 1578.078000 != >> 1200.027000. CPU Frequency is not max. >> 131072 1000 0.00 3737.80 >> 0.029902 >> Conflicting CPU frequency values detected: 1578.078000 != >> 1200.027000. CPU Frequency is not max. >> 262144 1000 0.00 3738.43 >> 0.014954 >> Conflicting CPU frequency values detected: 1570.507000 != >> 1200.027000. CPU Frequency is not max. >> 524288 1000 0.00 3738.50 >> 0.007477 >> Conflicting CPU frequency values detected: 1457.019000 != >> 1236.152000. CPU Frequency is not max. >> 1048576 1000 0.00 3738.65 >> 0.003739 >> Conflicting CPU frequency values detected: 1411.597000 != >> 1234.957000. CPU Frequency is not max. >> 2097152 1000 0.00 3738.65 >> 0.001869 >> Conflicting CPU frequency values detected: 1369.828000 != >> 1516.851000. CPU Frequency is not max. >> 4194304 1000 0.00 3738.80 >> 0.000935 >> Conflicting CPU frequency values detected: 1564.664000 != >> 1247.574000. CPU Frequency is not max. >> 8388608 1000 0.00 3738.76 >> 0.000467 >> >> --------------------------------------------------------------------------------------- >> >> RDMA modules are loaded >> >> rpcrdma 90366 0 >> rdma_ucm 26837 0 >> ib_uverbs 51854 2 ib_ucm,rdma_ucm >> rdma_cm 53755 5 >> rpcrdma,ko2iblnd,ib_iser,rdma_ucm,ib_isert >> ib_cm 47149 5 >> rdma_cm,ib_srp,ib_ucm,ib_srpt,ib_ipoib >> iw_cm 46022 1 rdma_cm >> ib_core 210381 15 >> >> rdma_cm,ib_cm,iw_cm,rpcrdma,ko2iblnd,mlx4_ib,ib_srp,ib_ucm,ib_iser,ib_srpt,ib_umad,ib_uverbs,rdma_ucm,ib_ipoib,ib_isert >> sunrpc 334343 17 >> nfs,nfsd,rpcsec_gss_krb5,auth_rpcgss,lockd,nfsv4,rpcrdma,nfs_acl >> >> I do not know where to look to have Lnet performing faster. I am >> running my ib0 interface in connected mode with 65520 MTU size. >> >> Any hint will be much appreciated >> >> thank you >> >> Rick >> >> >> >> >> On 8/18/17 9:05 AM, Mannthey, Keith wrote: >>> I would suggest you a few other tests to help isolate where the issue >>> might be. >>> >>> 1. What is the single thread "DD" write speed? >>> >>> 2. Lnet_selfttest: Please see " Chapter 28. Testing Lustre Network >>> Performance (LNet Self-Test)" in the Lustre manual if this is a new test >>> for you. >>> This will help show how much Lnet bandwith you have from your single >>> client. There are tunable in the lnet later that can affect things. Which >>> QRD HCA are you using? >>> >>> 3. OBDFilter_survey : Please see " 29.3. Testing OST Performance >>> (obdfilter-survey)" in the Lustre manual. This test will help demonstrate >>> what the backed NVMe/ZFS setup can do at the OBD layer in Lustre. >>> >>> Thanks, >>> Keith >>> -----Original Message----- >>> From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org >>> <mailto:lustre-discuss-boun...@lists.lustre.org>] On Behalf Of Riccardo >>> Veraldi >>> Sent: Thursday, August 17, 2017 10:48 PM >>> To: Dennis Nelson <dnel...@ddn.com> <mailto:dnel...@ddn.com>; >>> lustre-discuss@lists.lustre.org >>> <mailto:lustre-discuss@lists.lustre.org> >>> Subject: Re: [lustre-discuss] Lustre poor performance >>> >>> this is my lustre.conf >>> >>> [drp-tst-ffb01:~]$ cat /etc/modprobe.d/lustre.conf options lnet >>> networks=o2ib5(ib0),tcp5(enp1s0f0) >>> >>> data transfer is over infiniband >>> >>> ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 65520 >>> inet 172.21.52.83 netmask 255.255.252.0 broadcast >>> 172.21.55.255 >>> >>> >>> On 8/17/17 10:45 PM, Riccardo Veraldi wrote: >>>> On 8/17/17 9:22 PM, Dennis Nelson wrote: >>>>> It appears that you are running iozone on a single client? What kind >>>>> of network is tcp5? Have you looked at the network to make sure it is >>>>> not the bottleneck? >>>>> >>>> yes the data transfer is on ib0 interface and I did a memory to memory >>>> test through InfiniBand QDR resulting in 3.7GB/sec. >>>> tcp is used to connect to the MDS. It is tcp5 to differentiate it from >>>> my other many Lustre clusters. I could have called it tcp but it does >>>> not make any difference performance wise. >>>> I ran the test from one single node yes, I ran the same test also >>>> locally on a zpool identical to the one on the Lustre OSS. >>>> Ihave 4 identical servers each of them with the aame nvme disks: >>>> >>>> server1: OSS - OST1 Lustre/ZFS raidz1 >>>> >>>> server2: OSS - OST2 Lustre/ZFS raidz1 >>>> >>>> server3: local ZFS raidz1 >>>> >>>> server4: Lustre client >>>> >>>> >>>> >>>> _______________________________________________ >>>> lustre-discuss mailing list >>>> lustre-discuss@lists.lustre.org >>>> <mailto:lustre-discuss@lists.lustre.org> >>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >>>> <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org> >>> _______________________________________________ >>> lustre-discuss mailing list >>> lustre-discuss@lists.lustre.org >>> <mailto:lustre-discuss@lists.lustre.org> >>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >>> <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org> >>> >> _______________________________________________ lustre-discuss >> mailing list lustre-discuss@lists.lustre.org >> <mailto:lustre-discuss@lists.lustre.org> >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >> <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org> >>
_______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org