On 8/22/17 9:22 AM, Mannthey, Keith wrote: > > You may want to file a jira ticket if ko2iblnd-opa setting were being > automatically used on your Mellanox setup. That is not expected. > yes they are automatically used on my Mellanox and the script ko2iblnd-probe seems like not working properly. > > > > On another note: As you note you NVMe backend is much faster than QRD > link speed. You may want to look at using the new Multi-rall lnet > feature to boost network bandwidth. You can add a 2^nd QRD HCA/Port > and get more Lnet bandwith from your OSS server. It is a new feature > that is a bit of work to use but if you are chasing bandwith it might > be worth the effort. > I have a dual infiniband card so I was thinking to bond them to have more bandwidth. Is this that you mean when you are talking about the Muti-rail feature boost ?
thanks Rick > > > Thanks, > > Keith > > > > *From:*lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] > *On Behalf Of *Chris Horn > *Sent:* Monday, August 21, 2017 12:40 PM > *To:* Riccardo Veraldi <riccardo.vera...@cnaf.infn.it>; Arman > Khalatyan <arm2...@gmail.com> > *Cc:* lustre-discuss@lists.lustre.org > *Subject:* Re: [lustre-discuss] Lustre poor performance > > > > The ko2iblnd-opa settings are tuned specifically for Intel OmniPath. > Take a look at the /usr/sbin/ko2iblnd-probe script to see how OPA > hardware is detected and the “ko2iblnd-opa” settings get used. > > > > Chris Horn > > > > *From: *lustre-discuss <lustre-discuss-boun...@lists.lustre.org > <mailto:lustre-discuss-boun...@lists.lustre.org>> on behalf of > Riccardo Veraldi <riccardo.vera...@cnaf.infn.it > <mailto:riccardo.vera...@cnaf.infn.it>> > *Date: *Saturday, August 19, 2017 at 5:00 PM > *To: *Arman Khalatyan <arm2...@gmail.com <mailto:arm2...@gmail.com>> > *Cc: *"lustre-discuss@lists.lustre.org > <mailto:lustre-discuss@lists.lustre.org>" > <lustre-discuss@lists.lustre.org <mailto:lustre-discuss@lists.lustre.org>> > *Subject: *Re: [lustre-discuss] Lustre poor performance > > > > I ran again my Lnet self test and this time adding --concurrency=16 > I can use all of the IB bandwith (3.5GB/sec). > > the only thing I do not understand is why ko2iblnd.conf is not loaded > properly and I had to remove the alias in the config file to allow > the proper peer_credit settings to be loaded. > > thanks to everyone for helping > > Riccardo > > On 8/19/17 8:54 AM, Riccardo Veraldi wrote: > > > I found out that ko2iblnd is not getting settings from > /etc/modprobe/ko2iblnd.conf > alias ko2iblnd-opa ko2iblnd > options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 > credits=1024 concurrent_sends=256 ntx=2048 map_on_demand=32 > fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4 > > install ko2iblnd /usr/sbin/ko2iblnd-probe > > but if I modify ko2iblnd.conf like this, then settings are loaded: > > options ko2iblnd peer_credits=128 peer_credits_hiw=64 credits=1024 > concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 > fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4 > > install ko2iblnd /usr/sbin/ko2iblnd-probe > > Lnet tests show better behaviour but still I Would expect more > than this. > Is it possible to tune parameters in /etc/modprobe/ko2iblnd.conf > so that Mellanox ConnectX-3 will work more efficiently ? > > [LNet Rates of servers] > [R] Avg: 2286 RPC/s Min: 0 RPC/s Max: 4572 RPC/s > [W] Avg: 3322 RPC/s Min: 0 RPC/s Max: 6643 RPC/s > [LNet Bandwidth of servers] > [R] Avg: 625.23 MiB/s Min: 0.00 MiB/s Max: 1250.46 MiB/s > [W] Avg: 1035.85 MiB/s Min: 0.00 MiB/s Max: 2071.69 MiB/s > [LNet Rates of servers] > [R] Avg: 2286 RPC/s Min: 1 RPC/s Max: 4571 RPC/s > [W] Avg: 3321 RPC/s Min: 1 RPC/s Max: 6641 RPC/s > [LNet Bandwidth of servers] > [R] Avg: 625.55 MiB/s Min: 0.00 MiB/s Max: 1251.11 MiB/s > [W] Avg: 1035.05 MiB/s Min: 0.00 MiB/s Max: 2070.11 MiB/s > [LNet Rates of servers] > [R] Avg: 2291 RPC/s Min: 0 RPC/s Max: 4581 RPC/s > [W] Avg: 3329 RPC/s Min: 0 RPC/s Max: 6657 RPC/s > [LNet Bandwidth of servers] > [R] Avg: 626.55 MiB/s Min: 0.00 MiB/s Max: 1253.11 MiB/s > [W] Avg: 1038.05 MiB/s Min: 0.00 MiB/s Max: 2076.11 MiB/s > session is ended > ./lnet_test.sh: line 17: 23394 Terminated lst stat > servers > > > > > On 8/19/17 4:20 AM, Arman Khalatyan wrote: > > just minor comment, > > you should push up performance of your nodes,they are not > running in the max cpu frequencies.Al tests might be > inconsistent. in order to get most of ib run following: > > tuned-adm profile latency-performance > > for more options use: > > tuned-adm list > > > > It will be interesting to see the difference. > > > > Am 19.08.2017 3:57 vorm. schrieb "Riccardo Veraldi" > <riccardo.vera...@cnaf.infn.it > <mailto:riccardo.vera...@cnaf.infn.it>>: > > Hello Keith and Dennis, these are the test I ran. > > * obdfilter-survey, shows that I Can saturate disk > performance, the NVMe/ZFS backend is performing very > well and it is faster then my Infiniband network > > *pool alloc free read write read write** > ------------ ----- ----- ----- ----- ----- ----- > drpffb-ost01 3.31T 3.19T 3 35.7K 16.0K 7.03G > raidz1 3.31T 3.19T 3 35.7K 16.0K 7.03G > nvme0n1 - - 1 5.95K 7.99K 1.17G > nvme1n1 - - 0 6.01K 0 1.18G > nvme2n1 - - 0 5.93K 0 1.17G > nvme3n1 - - 0 5.88K 0 1.16G > nvme4n1 - - 1 5.95K 7.99K 1.17G > nvme5n1 - - 0 5.96K 0 1.17G > ------------ ----- ----- ----- ----- ----- -----* > > this are the tests results > > Fri Aug 18 16:54:48 PDT 2017 Obdfilter-survey for > case=disk from drp-tst-ffb01 > ost 1 sz 10485760K rsz 1024K obj 1 thr 1 > write*7633.08 * SHORT rewrite > 7558.78 SHORT read 3205.24 [3213.70, 3226.78] > ost 1 sz 10485760K rsz 1024K obj 1 thr 2 > write*7996.89 * SHORT rewrite > 7903.42 SHORT read 5264.70 SHORT > ost 1 sz 10485760K rsz 1024K obj 2 thr 2 write > *7718.94* SHORT rewrite 7977.84 > SHORT read 5802.17 SHORT > > * Lnet self test, and here I see the problems. For > reference 172.21.52.[83,84] are the two OSSes > 172.21.52.86 is the reader/writer. Here is the script > that I ran > > #!/bin/bash > export LST_SESSION=$$ > lst new_session read_write > lst add_group servers 172.21.52.[83,84]@o2ib5 > lst add_group readers 172.21.52.86@o2ib5 > <mailto:172.21.52.86@o2ib5> > lst add_group writers 172.21.52.86@o2ib5 > <mailto:172.21.52.86@o2ib5> > lst add_batch bulk_rw > lst add_test --batch bulk_rw --from readers --to servers \ > brw read check=simple size=1M > lst add_test --batch bulk_rw --from writers --to servers \ > brw write check=full size=1M > # start running > lst run bulk_rw > # display server stats for 30 seconds > lst stat servers & sleep 30; kill $! > # tear down > lst end_session > > > > here the results > > SESSION: read_write FEATURES: 1 TIMEOUT: 300 FORCE: No > 172.21.52.[83,84]@o2ib5 are added to session > 172.21.52.86@o2ib5 <mailto:172.21.52.86@o2ib5> are added > to session > 172.21.52.86@o2ib5 <mailto:172.21.52.86@o2ib5> are added > to session > Test was added successfully > Test was added successfully > bulk_rw is running now > [LNet Rates of servers] > [R] Avg: 1751 RPC/s Min: 0 RPC/s Max: 3502 > RPC/s > [W] Avg: 2525 RPC/s Min: 0 RPC/s Max: 5050 > RPC/s > [LNet Bandwidth of servers] > [R] Avg: 488.79 MiB/s Min: 0.00 MiB/s Max: 977.59 > MiB/s > [W] Avg: 773.99 MiB/s Min: 0.00 MiB/s Max: 1547.99 > MiB/s > [LNet Rates of servers] > [R] Avg: 1718 RPC/s Min: 0 RPC/s Max: 3435 > RPC/s > [W] Avg: 2479 RPC/s Min: 0 RPC/s Max: 4958 > RPC/s > [LNet Bandwidth of servers] > [R] Avg: 478.19 MiB/s Min: 0.00 MiB/s Max: 956.39 > MiB/s > [W] Avg: 761.74 MiB/s Min: 0.00 MiB/s Max: 1523.47 > MiB/s > [LNet Rates of servers] > [R] Avg: 1734 RPC/s Min: 0 RPC/s Max: 3467 > RPC/s > [W] Avg: 2506 RPC/s Min: 0 RPC/s Max: 5012 > RPC/s > [LNet Bandwidth of servers] > [R] Avg: 480.79 MiB/s Min: 0.00 MiB/s Max: 961.58 > MiB/s > [W] Avg: 772.49 MiB/s Min: 0.00 MiB/s Max: 1544.98 > MiB/s > [LNet Rates of servers] > [R] Avg: 1722 RPC/s Min: 0 RPC/s Max: 3444 > RPC/s > [W] Avg: 2486 RPC/s Min: 0 RPC/s Max: 4972 > RPC/s > [LNet Bandwidth of servers] > [R] Avg: 479.09 MiB/s Min: 0.00 MiB/s Max: 958.18 > MiB/s > [W] Avg: 764.19 MiB/s Min: 0.00 MiB/s Max: 1528.38 > MiB/s > [LNet Rates of servers] > [R] Avg: 1741 RPC/s Min: 0 RPC/s Max: 3482 > RPC/s > [W] Avg: 2513 RPC/s Min: 0 RPC/s Max: 5025 > RPC/s > [LNet Bandwidth of servers] > [R] Avg: 484.59 MiB/s Min: 0.00 MiB/s Max: 969.19 > MiB/s > [W] Avg: 771.94 MiB/s Min: 0.00 MiB/s Max: 1543.87 > MiB/s > session is ended > ./lnet_test.sh: line 17: 4940 Terminated lst > stat servers > > so looks like Lnet is really under performing going at > least half and less than InfiniBand capabilities. > How can I find out what is causing this ? > > running perf tools tests with infiniband tools I have good > results: > > > > ************************************ > * Waiting for client to connect... * > ************************************ > > > --------------------------------------------------------------------------------------- > Send BW Test > Dual-port : OFF Device : mlx4_0 > Number of qps : 1 Transport type : IB > Connection type : RC Using SRQ : OFF > RX depth : 512 > CQ Moderation : 100 > Mtu : 2048[B] > Link type : IB > Max inline data : 0[B] > rdma_cm QPs : OFF > Data ex. method : Ethernet > > --------------------------------------------------------------------------------------- > local address: LID 0x07 QPN 0x020f PSN 0xacc37a > remote address: LID 0x0a QPN 0x020f PSN 0x91a069 > > --------------------------------------------------------------------------------------- > #bytes #iterations BW peak[MB/sec] BW > average[MB/sec] MsgRate[Mpps] > Conflicting CPU frequency values detected: 1249.234000 != > 1326.000000. CPU Frequency is not max. > 2 1000 0.00 11.99 > 6.285330 > Conflicting CPU frequency values detected: 1314.910000 != > 1395.460000. CPU Frequency is not max. > 4 1000 0.00 28.26 > 7.409324 > Conflicting CPU frequency values detected: 1314.910000 != > 1460.207000. CPU Frequency is not max. > 8 1000 0.00 54.47 > 7.139164 > Conflicting CPU frequency values detected: 1314.910000 != > 1244.320000. CPU Frequency is not max. > 16 1000 0.00 113.13 > 7.413889 > Conflicting CPU frequency values detected: 1314.910000 != > 1460.207000. CPU Frequency is not max. > 32 1000 0.00 226.07 > 7.407811 > Conflicting CPU frequency values detected: 1469.703000 != > 1301.031000. CPU Frequency is not max. > 64 1000 0.00 452.12 > 7.407465 > Conflicting CPU frequency values detected: 1469.703000 != > 1301.031000. CPU Frequency is not max. > 128 1000 0.00 845.45 > 6.925918 > Conflicting CPU frequency values detected: 1469.703000 != > 1362.257000. CPU Frequency is not max. > 256 1000 0.00 1746.93 > 7.155406 > Conflicting CPU frequency values detected: 1469.703000 != > 1362.257000. CPU Frequency is not max. > 512 1000 0.00 2766.93 > 5.666682 > Conflicting CPU frequency values detected: 1296.714000 != > 1204.675000. CPU Frequency is not max. > 1024 1000 0.00 3516.26 > 3.600646 > Conflicting CPU frequency values detected: 1296.714000 != > 1325.535000. CPU Frequency is not max. > 2048 1000 0.00 3630.93 > 1.859035 > Conflicting CPU frequency values detected: 1296.714000 != > 1331.312000. CPU Frequency is not max. > 4096 1000 0.00 3702.39 > 0.947813 > Conflicting CPU frequency values detected: 1296.714000 != > 1200.027000. CPU Frequency is not max. > 8192 1000 0.00 3724.82 > 0.476777 > Conflicting CPU frequency values detected: 1384.902000 != > 1314.113000. CPU Frequency is not max. > 16384 1000 0.00 3731.21 > 0.238798 > Conflicting CPU frequency values detected: 1578.078000 != > 1200.027000. CPU Frequency is not max. > 32768 1000 0.00 3735.32 > 0.119530 > Conflicting CPU frequency values detected: 1578.078000 != > 1200.027000. CPU Frequency is not max. > 65536 1000 0.00 3736.98 > 0.059792 > Conflicting CPU frequency values detected: 1578.078000 != > 1200.027000. CPU Frequency is not max. > 131072 1000 0.00 3737.80 > 0.029902 > Conflicting CPU frequency values detected: 1578.078000 != > 1200.027000. CPU Frequency is not max. > 262144 1000 0.00 3738.43 > 0.014954 > Conflicting CPU frequency values detected: 1570.507000 != > 1200.027000. CPU Frequency is not max. > 524288 1000 0.00 3738.50 > 0.007477 > Conflicting CPU frequency values detected: 1457.019000 != > 1236.152000. CPU Frequency is not max. > 1048576 1000 0.00 3738.65 > 0.003739 > Conflicting CPU frequency values detected: 1411.597000 != > 1234.957000. CPU Frequency is not max. > 2097152 1000 0.00 3738.65 > 0.001869 > Conflicting CPU frequency values detected: 1369.828000 != > 1516.851000. CPU Frequency is not max. > 4194304 1000 0.00 3738.80 > 0.000935 > Conflicting CPU frequency values detected: 1564.664000 != > 1247.574000. CPU Frequency is not max. > 8388608 1000 0.00 3738.76 > 0.000467 > > --------------------------------------------------------------------------------------- > > RDMA modules are loaded > > rpcrdma 90366 0 > rdma_ucm 26837 0 > ib_uverbs 51854 2 ib_ucm,rdma_ucm > rdma_cm 53755 5 > rpcrdma,ko2iblnd,ib_iser,rdma_ucm,ib_isert > ib_cm 47149 5 > rdma_cm,ib_srp,ib_ucm,ib_srpt,ib_ipoib > iw_cm 46022 1 rdma_cm > ib_core 210381 15 > > rdma_cm,ib_cm,iw_cm,rpcrdma,ko2iblnd,mlx4_ib,ib_srp,ib_ucm,ib_iser,ib_srpt,ib_umad,ib_uverbs,rdma_ucm,ib_ipoib,ib_isert > sunrpc 334343 17 > nfs,nfsd,rpcsec_gss_krb5,auth_rpcgss,lockd,nfsv4,rpcrdma,nfs_acl > > I do not know where to look to have Lnet performing > faster. I am running my ib0 interface in connected mode > with 65520 MTU size. > > Any hint will be much appreciated > > thank you > > Rick > > > > > > > > On 8/18/17 9:05 AM, Mannthey, Keith wrote: > > I would suggest you a few other tests to help isolate where > the issue might be. > > > > 1. What is the single thread "DD" write speed? > > > > 2. Lnet_selfttest: Please see " Chapter 28. Testing Lustre > Network Performance (LNet Self-Test)" in the Lustre manual if this is a new > test for you. > > This will help show how much Lnet bandwith you have from your > single client. There are tunable in the lnet later that can affect things. > Which QRD HCA are you using? > > > > 3. OBDFilter_survey : Please see " 29.3. Testing OST > Performance (obdfilter-survey)" in the Lustre manual. This test will help > demonstrate what the backed NVMe/ZFS setup can do at the OBD layer in Lustre. > > > > > Thanks, > > Keith > > -----Original Message----- > > From: lustre-discuss > [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Riccardo Veraldi > > Sent: Thursday, August 17, 2017 10:48 PM > > To: Dennis Nelson <dnel...@ddn.com> <mailto:dnel...@ddn.com>; > lustre-discuss@lists.lustre.org > <mailto:lustre-discuss@lists.lustre.org> > > Subject: Re: [lustre-discuss] Lustre poor performance > > > > this is my lustre.conf > > > > [drp-tst-ffb01:~]$ cat /etc/modprobe.d/lustre.conf options > lnet networks=o2ib5(ib0),tcp5(enp1s0f0) > > > > data transfer is over infiniband > > > > ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 65520 > > inet 172.21.52.83 netmask 255.255.252.0 broadcast > 172.21.55.255 > > > > > > On 8/17/17 10:45 PM, Riccardo Veraldi wrote: > > On 8/17/17 9:22 PM, Dennis Nelson wrote: > > It appears that you are running iozone on a single > client? What kind of network is tcp5? Have you looked at the network to > make sure it is not the bottleneck? > > > > yes the data transfer is on ib0 interface and I did a > memory to memory > > test through InfiniBand QDR resulting in 3.7GB/sec. > > tcp is used to connect to the MDS. It is tcp5 to > differentiate it from > > my other many Lustre clusters. I could have called it tcp > but it does > > not make any difference performance wise. > > I ran the test from one single node yes, I ran the same > test also > > locally on a zpool identical to the one on the Lustre OSS. > > Ihave 4 identical servers each of them with the aame > nvme disks: > > > > server1: OSS - OST1 Lustre/ZFS raidz1 > > > > server2: OSS - OST2 Lustre/ZFS raidz1 > > > > server3: local ZFS raidz1 > > > > server4: Lustre client > > > > > > > > _______________________________________________ > > lustre-discuss mailing list > > lustre-discuss@lists.lustre.org > <mailto:lustre-discuss@lists.lustre.org> > > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > > _______________________________________________ > > lustre-discuss mailing list > > lustre-discuss@lists.lustre.org > <mailto:lustre-discuss@lists.lustre.org> > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > > > > _______________________________________________ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > <mailto:lustre-discuss@lists.lustre.org> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > >
_______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org