I found out that ko2iblnd is not getting settings from /etc/modprobe/ko2iblnd.conf alias ko2iblnd-opa ko2iblnd options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4
install ko2iblnd /usr/sbin/ko2iblnd-probe but if I modify ko2iblnd.conf like this, then settings are loaded: options ko2iblnd peer_credits=128 peer_credits_hiw=64 credits=1024 concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4 install ko2iblnd /usr/sbin/ko2iblnd-probe Lnet tests show better behaviour but still I Would expect more than this. Is it possible to tune parameters in /etc/modprobe/ko2iblnd.conf so that Mellanox ConnectX-3 will work more efficiently ? [LNet Rates of servers] [R] Avg: 2286 RPC/s Min: 0 RPC/s Max: 4572 RPC/s [W] Avg: 3322 RPC/s Min: 0 RPC/s Max: 6643 RPC/s [LNet Bandwidth of servers] [R] Avg: 625.23 MiB/s Min: 0.00 MiB/s Max: 1250.46 MiB/s [W] Avg: 1035.85 MiB/s Min: 0.00 MiB/s Max: 2071.69 MiB/s [LNet Rates of servers] [R] Avg: 2286 RPC/s Min: 1 RPC/s Max: 4571 RPC/s [W] Avg: 3321 RPC/s Min: 1 RPC/s Max: 6641 RPC/s [LNet Bandwidth of servers] [R] Avg: 625.55 MiB/s Min: 0.00 MiB/s Max: 1251.11 MiB/s [W] Avg: 1035.05 MiB/s Min: 0.00 MiB/s Max: 2070.11 MiB/s [LNet Rates of servers] [R] Avg: 2291 RPC/s Min: 0 RPC/s Max: 4581 RPC/s [W] Avg: 3329 RPC/s Min: 0 RPC/s Max: 6657 RPC/s [LNet Bandwidth of servers] [R] Avg: 626.55 MiB/s Min: 0.00 MiB/s Max: 1253.11 MiB/s [W] Avg: 1038.05 MiB/s Min: 0.00 MiB/s Max: 2076.11 MiB/s session is ended ./lnet_test.sh: line 17: 23394 Terminated lst stat servers On 8/19/17 4:20 AM, Arman Khalatyan wrote: > just minor comment, > you should push up performance of your nodes,they are not running in > the max cpu frequencies.Al tests might be inconsistent. in order to > get most of ib run following: > tuned-adm profile latency-performance > for more options use: > tuned-adm list > > It will be interesting to see the difference. > > Am 19.08.2017 3:57 vorm. schrieb "Riccardo Veraldi" > <riccardo.vera...@cnaf.infn.it <mailto:riccardo.vera...@cnaf.infn.it>>: > > Hello Keith and Dennis, these are the test I ran. > > * obdfilter-survey, shows that I Can saturate disk performance, > the NVMe/ZFS backend is performing very well and it is faster > then my Infiniband network > > *pool alloc free read write read write** > **------------ ----- ----- ----- ----- ----- -----** > **drpffb-ost01 3.31T 3.19T 3 35.7K 16.0K 7.03G** > ** raidz1 3.31T 3.19T 3 35.7K 16.0K 7.03G** > ** nvme0n1 - - 1 5.95K 7.99K 1.17G** > ** nvme1n1 - - 0 6.01K 0 1.18G** > ** nvme2n1 - - 0 5.93K 0 1.17G** > ** nvme3n1 - - 0 5.88K 0 1.16G** > ** nvme4n1 - - 1 5.95K 7.99K 1.17G** > ** nvme5n1 - - 0 5.96K 0 1.17G** > **------------ ----- ----- ----- ----- ----- -----* > > this are the tests results > > Fri Aug 18 16:54:48 PDT 2017 Obdfilter-survey for case=disk from > drp-tst-ffb01 > ost 1 sz 10485760K rsz 1024K obj 1 thr 1 > write*7633.08 * SHORT rewrite 7558.78 SHORT > read 3205.24 [3213.70, 3226.78] > ost 1 sz 10485760K rsz 1024K obj 1 thr 2 > write*7996.89 * SHORT rewrite 7903.42 SHORT > read 5264.70 SHORT > ost 1 sz 10485760K rsz 1024K obj 2 thr 2 write > *7718.94* SHORT rewrite 7977.84 SHORT read > 5802.17 SHORT > > * Lnet self test, and here I see the problems. For reference > 172.21.52.[83,84] are the two OSSes 172.21.52.86 is the > reader/writer. Here is the script that I ran > > #!/bin/bash > export LST_SESSION=$$ > lst new_session read_write > lst add_group servers 172.21.52.[83,84]@o2ib5 > lst add_group readers 172.21.52.86@o2ib5 > lst add_group writers 172.21.52.86@o2ib5 > lst add_batch bulk_rw > lst add_test --batch bulk_rw --from readers --to servers \ > brw read check=simple size=1M > lst add_test --batch bulk_rw --from writers --to servers \ > brw write check=full size=1M > # start running > lst run bulk_rw > # display server stats for 30 seconds > lst stat servers & sleep 30; kill $! > # tear down > lst end_session > > > here the results > > SESSION: read_write FEATURES: 1 TIMEOUT: 300 FORCE: No > 172.21.52.[83,84]@o2ib5 are added to session > 172.21.52.86@o2ib5 are added to session > 172.21.52.86@o2ib5 are added to session > Test was added successfully > Test was added successfully > bulk_rw is running now > [LNet Rates of servers] > [R] Avg: 1751 RPC/s Min: 0 RPC/s Max: 3502 RPC/s > [W] Avg: 2525 RPC/s Min: 0 RPC/s Max: 5050 RPC/s > [LNet Bandwidth of servers] > [R] Avg: 488.79 MiB/s Min: 0.00 MiB/s Max: 977.59 MiB/s > [W] Avg: 773.99 MiB/s Min: 0.00 MiB/s Max: 1547.99 MiB/s > [LNet Rates of servers] > [R] Avg: 1718 RPC/s Min: 0 RPC/s Max: 3435 RPC/s > [W] Avg: 2479 RPC/s Min: 0 RPC/s Max: 4958 RPC/s > [LNet Bandwidth of servers] > [R] Avg: 478.19 MiB/s Min: 0.00 MiB/s Max: 956.39 MiB/s > [W] Avg: 761.74 MiB/s Min: 0.00 MiB/s Max: 1523.47 MiB/s > [LNet Rates of servers] > [R] Avg: 1734 RPC/s Min: 0 RPC/s Max: 3467 RPC/s > [W] Avg: 2506 RPC/s Min: 0 RPC/s Max: 5012 RPC/s > [LNet Bandwidth of servers] > [R] Avg: 480.79 MiB/s Min: 0.00 MiB/s Max: 961.58 MiB/s > [W] Avg: 772.49 MiB/s Min: 0.00 MiB/s Max: 1544.98 MiB/s > [LNet Rates of servers] > [R] Avg: 1722 RPC/s Min: 0 RPC/s Max: 3444 RPC/s > [W] Avg: 2486 RPC/s Min: 0 RPC/s Max: 4972 RPC/s > [LNet Bandwidth of servers] > [R] Avg: 479.09 MiB/s Min: 0.00 MiB/s Max: 958.18 MiB/s > [W] Avg: 764.19 MiB/s Min: 0.00 MiB/s Max: 1528.38 MiB/s > [LNet Rates of servers] > [R] Avg: 1741 RPC/s Min: 0 RPC/s Max: 3482 RPC/s > [W] Avg: 2513 RPC/s Min: 0 RPC/s Max: 5025 RPC/s > [LNet Bandwidth of servers] > [R] Avg: 484.59 MiB/s Min: 0.00 MiB/s Max: 969.19 MiB/s > [W] Avg: 771.94 MiB/s Min: 0.00 MiB/s Max: 1543.87 MiB/s > session is ended > ./lnet_test.sh: line 17: 4940 Terminated lst stat > servers > > so looks like Lnet is really under performing going at least half > and less than InfiniBand capabilities. > How can I find out what is causing this ? > > running perf tools tests with infiniband tools I have good results: > > > ************************************ > * Waiting for client to connect... * > ************************************ > > > --------------------------------------------------------------------------------------- > Send BW Test > Dual-port : OFF Device : mlx4_0 > Number of qps : 1 Transport type : IB > Connection type : RC Using SRQ : OFF > RX depth : 512 > CQ Moderation : 100 > Mtu : 2048[B] > Link type : IB > Max inline data : 0[B] > rdma_cm QPs : OFF > Data ex. method : Ethernet > > --------------------------------------------------------------------------------------- > local address: LID 0x07 QPN 0x020f PSN 0xacc37a > remote address: LID 0x0a QPN 0x020f PSN 0x91a069 > > --------------------------------------------------------------------------------------- > #bytes #iterations BW peak[MB/sec] BW average[MB/sec] > MsgRate[Mpps] > Conflicting CPU frequency values detected: 1249.234000 != > 1326.000000. CPU Frequency is not max. > 2 1000 0.00 11.99 > 6.285330 > Conflicting CPU frequency values detected: 1314.910000 != > 1395.460000. CPU Frequency is not max. > 4 1000 0.00 28.26 > 7.409324 > Conflicting CPU frequency values detected: 1314.910000 != > 1460.207000. CPU Frequency is not max. > 8 1000 0.00 54.47 > 7.139164 > Conflicting CPU frequency values detected: 1314.910000 != > 1244.320000. CPU Frequency is not max. > 16 1000 0.00 113.13 > 7.413889 > Conflicting CPU frequency values detected: 1314.910000 != > 1460.207000. CPU Frequency is not max. > 32 1000 0.00 226.07 > 7.407811 > Conflicting CPU frequency values detected: 1469.703000 != > 1301.031000. CPU Frequency is not max. > 64 1000 0.00 452.12 > 7.407465 > Conflicting CPU frequency values detected: 1469.703000 != > 1301.031000. CPU Frequency is not max. > 128 1000 0.00 845.45 > 6.925918 > Conflicting CPU frequency values detected: 1469.703000 != > 1362.257000. CPU Frequency is not max. > 256 1000 0.00 1746.93 > 7.155406 > Conflicting CPU frequency values detected: 1469.703000 != > 1362.257000. CPU Frequency is not max. > 512 1000 0.00 2766.93 > 5.666682 > Conflicting CPU frequency values detected: 1296.714000 != > 1204.675000. CPU Frequency is not max. > 1024 1000 0.00 3516.26 > 3.600646 > Conflicting CPU frequency values detected: 1296.714000 != > 1325.535000. CPU Frequency is not max. > 2048 1000 0.00 3630.93 > 1.859035 > Conflicting CPU frequency values detected: 1296.714000 != > 1331.312000. CPU Frequency is not max. > 4096 1000 0.00 3702.39 > 0.947813 > Conflicting CPU frequency values detected: 1296.714000 != > 1200.027000. CPU Frequency is not max. > 8192 1000 0.00 3724.82 > 0.476777 > Conflicting CPU frequency values detected: 1384.902000 != > 1314.113000. CPU Frequency is not max. > 16384 1000 0.00 3731.21 > 0.238798 > Conflicting CPU frequency values detected: 1578.078000 != > 1200.027000. CPU Frequency is not max. > 32768 1000 0.00 3735.32 > 0.119530 > Conflicting CPU frequency values detected: 1578.078000 != > 1200.027000. CPU Frequency is not max. > 65536 1000 0.00 3736.98 > 0.059792 > Conflicting CPU frequency values detected: 1578.078000 != > 1200.027000. CPU Frequency is not max. > 131072 1000 0.00 3737.80 > 0.029902 > Conflicting CPU frequency values detected: 1578.078000 != > 1200.027000. CPU Frequency is not max. > 262144 1000 0.00 3738.43 > 0.014954 > Conflicting CPU frequency values detected: 1570.507000 != > 1200.027000. CPU Frequency is not max. > 524288 1000 0.00 3738.50 > 0.007477 > Conflicting CPU frequency values detected: 1457.019000 != > 1236.152000. CPU Frequency is not max. > 1048576 1000 0.00 3738.65 > 0.003739 > Conflicting CPU frequency values detected: 1411.597000 != > 1234.957000. CPU Frequency is not max. > 2097152 1000 0.00 3738.65 > 0.001869 > Conflicting CPU frequency values detected: 1369.828000 != > 1516.851000. CPU Frequency is not max. > 4194304 1000 0.00 3738.80 > 0.000935 > Conflicting CPU frequency values detected: 1564.664000 != > 1247.574000. CPU Frequency is not max. > 8388608 1000 0.00 3738.76 > 0.000467 > > --------------------------------------------------------------------------------------- > > RDMA modules are loaded > > rpcrdma 90366 0 > rdma_ucm 26837 0 > ib_uverbs 51854 2 ib_ucm,rdma_ucm > rdma_cm 53755 5 > rpcrdma,ko2iblnd,ib_iser,rdma_ucm,ib_isert > ib_cm 47149 5 rdma_cm,ib_srp,ib_ucm,ib_srpt,ib_ipoib > iw_cm 46022 1 rdma_cm > ib_core 210381 15 > > rdma_cm,ib_cm,iw_cm,rpcrdma,ko2iblnd,mlx4_ib,ib_srp,ib_ucm,ib_iser,ib_srpt,ib_umad,ib_uverbs,rdma_ucm,ib_ipoib,ib_isert > sunrpc 334343 17 > nfs,nfsd,rpcsec_gss_krb5,auth_rpcgss,lockd,nfsv4,rpcrdma,nfs_acl > > I do not know where to look to have Lnet performing faster. I am > running my ib0 interface in connected mode with 65520 MTU size. > > Any hint will be much appreciated > > thank you > > Rick > > > > > On 8/18/17 9:05 AM, Mannthey, Keith wrote: >> I would suggest you a few other tests to help isolate where the issue >> might be. >> >> 1. What is the single thread "DD" write speed? >> >> 2. Lnet_selfttest: Please see " Chapter 28. Testing Lustre Network >> Performance (LNet Self-Test)" in the Lustre manual if this is a new test for >> you. >> This will help show how much Lnet bandwith you have from your single >> client. There are tunable in the lnet later that can affect things. Which >> QRD HCA are you using? >> >> 3. OBDFilter_survey : Please see " 29.3. Testing OST Performance >> (obdfilter-survey)" in the Lustre manual. This test will help demonstrate >> what the backed NVMe/ZFS setup can do at the OBD layer in Lustre. >> >> Thanks, >> Keith >> -----Original Message----- >> From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org >> <mailto:lustre-discuss-boun...@lists.lustre.org>] On Behalf Of Riccardo >> Veraldi >> Sent: Thursday, August 17, 2017 10:48 PM >> To: Dennis Nelson <dnel...@ddn.com> <mailto:dnel...@ddn.com>; >> lustre-discuss@lists.lustre.org >> <mailto:lustre-discuss@lists.lustre.org> >> Subject: Re: [lustre-discuss] Lustre poor performance >> >> this is my lustre.conf >> >> [drp-tst-ffb01:~]$ cat /etc/modprobe.d/lustre.conf options lnet >> networks=o2ib5(ib0),tcp5(enp1s0f0) >> >> data transfer is over infiniband >> >> ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 65520 >> inet 172.21.52.83 netmask 255.255.252.0 broadcast 172.21.55.255 >> >> >> On 8/17/17 10:45 PM, Riccardo Veraldi wrote: >>> On 8/17/17 9:22 PM, Dennis Nelson wrote: >>>> It appears that you are running iozone on a single client? What kind >>>> of network is tcp5? Have you looked at the network to make sure it is not >>>> the bottleneck? >>>> >>> yes the data transfer is on ib0 interface and I did a memory to memory >>> test through InfiniBand QDR resulting in 3.7GB/sec. >>> tcp is used to connect to the MDS. It is tcp5 to differentiate it from >>> my other many Lustre clusters. I could have called it tcp but it does >>> not make any difference performance wise. >>> I ran the test from one single node yes, I ran the same test also >>> locally on a zpool identical to the one on the Lustre OSS. >>> Ihave 4 identical servers each of them with the aame nvme disks: >>> >>> server1: OSS - OST1 Lustre/ZFS raidz1 >>> >>> server2: OSS - OST2 Lustre/ZFS raidz1 >>> >>> server3: local ZFS raidz1 >>> >>> server4: Lustre client >>> >>> >>> >>> _______________________________________________ >>> lustre-discuss mailing list >>> lustre-discuss@lists.lustre.org >>> <mailto:lustre-discuss@lists.lustre.org> >>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >>> <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org> >> _______________________________________________ >> lustre-discuss mailing list >> lustre-discuss@lists.lustre.org >> <mailto:lustre-discuss@lists.lustre.org> >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >> <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org> >> > _______________________________________________ lustre-discuss > mailing list lustre-discuss@lists.lustre.org > <mailto:lustre-discuss@lists.lustre.org> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org> >
_______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org