Re: [lustre-discuss] [External] Re: Very bad lnet ethernet read performance

2019-08-16 Thread Louis Bailleul
Thanks for the pointers.

Flow control has limited impact at this point (no change under lnet_selftest 
and ~10% drop when disabled under iperf).
All machines have tcp_sack enabled.
Checksum don't seems to make a difference either.
Bumping up the max_rpc_in_flights didn't improve much but seems to have made 
the write speed more consistent.
read_ahead had no effect on read performance.

At this point I am struggling to understand what has actual effects on reads.
iperf between clients and OSS gives a combined bandwidth that reach ~90% of 
link capacity (43.7GB/s), but lnet_selftest max out at ~14GB/s so about 28%.

Any clues on what lnet tunables / settings could have any impacts here  ?

Best regards,
Louis

On 13/08/2019 12:53, Raj wrote:
Louis,
I would also try:
- turning on selective ack (net.ipv4.tcp_sack=1) on all nodes. This helps 
although there is a CVE out there for older kernels.
- turning off checksum osc.ostid*.checksums. This can be turned off per OST/FS 
on clients.
- Increasing max_pages_per_rpc to 16M. Although this may not help with your 
reads.
- Increasing max_rpcs_in_flight and max_dirty_mb be  2 x max_rpcs_in_flight
- Increasing llite.ostid*.max_read_ahead_mb to up to 1024 on clients. Again 
this can be set per OST/FS.

_Raj

On Mon, Aug 12, 2019 at 12:12 PM Shawn Hall 
mailto:shawn.h...@nag.com>> wrote:
Do you have Ethernet flow control configured on all ports (especially the 
uplink ports)?  We’ve found that flow control is critical when there are 
mismatched uplink/client port speeds.

Shawn

From: lustre-discuss 
mailto:lustre-discuss-boun...@lists.lustre.org>>
 On Behalf Of Louis Bailleul
Sent: Monday, August 12, 2019 1:08 PM
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] Very bad lnet ethernet read performance

Hi all,

I am trying to understand what I am doing wrong here.
I have a Lustre 2.12.1 system backed by NVME drives under zfs for which 
obdfilter-survey gives descent values
ost  2 sz 536870912K rsz 1024K obj2 thr  256 write 15267.49 [6580.36, 
8664.20] rewrite 15225.24 [6559.05, 8900.54] read 19739.86 [9062.25, 10429.04]
But my actual Lustre performances are pretty poor in comparison (can't top 
8GB/s write and 13.5GB/s read)
So I started to question my lnet tuning but playing with peer_credits and 
max_rpc_per_pages didn't help.

My test setup consist of 133x10G Ethernet clients (uplinks between end devices 
and OSS are 2x100G for every 20 nodes).
The single OSS is fitted with a bonding of 2x100G Ethernet.

I have tried to understand the problem using lnet_selftest but I'll need some 
help/doco as this doesn't make sense to me.

Testing a single 10G client
[LNet Rates of lfrom]
[R] Avg: 2231 RPC/s Min: 2231 RPC/s Max: 2231 RPC/s
[W] Avg: 1156 RPC/s Min: 1156 RPC/s Max: 1156 RPC/s
[LNet Bandwidth of lfrom]
[R] Avg: 1075.16  MiB/s Min: 1075.16  MiB/s Max: 1075.16  MiB/s
[W] Avg: 0.18 MiB/s Min: 0.18 MiB/s Max: 0.18 MiB/s
[LNet Rates of lto]
[R] Avg: 1179 RPC/s Min: 1179 RPC/s Max: 1179 RPC/s
[W] Avg: 2254 RPC/s Min: 2254 RPC/s Max: 2254 RPC/s
[LNet Bandwidth of lto]
[R] Avg: 0.19 MiB/s Min: 0.19 MiB/s Max: 0.19 MiB/s
[W] Avg: 1075.17  MiB/s Min: 1075.17  MiB/s Max: 1075.17  MiB/s
With 10x10G clients :
[LNet Rates of lfrom]
[R] Avg: 1416 RPC/s Min: 1102 RPC/s Max: 1642 RPC/s
[W] Avg: 708  RPC/s Min: 551  RPC/s Max: 821  RPC/s
[LNet Bandwidth of lfrom]
[R] Avg: 708.20   MiB/s Min: 550.77   MiB/s Max: 820.96   MiB/s
[W] Avg: 0.11 MiB/s Min: 0.08 MiB/s Max: 0.13 MiB/s
[LNet Rates of lto]
[R] Avg: 7084 RPC/s Min: 7084 RPC/s Max: 7084 RPC/s
[W] Avg: 14165RPC/s Min: 14165RPC/s Max: 14165RPC/s
[LNet Bandwidth of lto]
[R] Avg: 1.08 MiB/s Min: 1.08 MiB/s Max: 1.08 MiB/s
[W] Avg: 7081.86  MiB/s Min: 7081.86  MiB/s Max: 7081.86  MiB/s

With all 133x10G clients:
[LNet Rates of lfrom]
[R] Avg: 510  RPC/s Min: 98   RPC/s Max: 23457RPC/s
[W] Avg: 510  RPC/s Min: 49   RPC/s Max: 45863RPC/s
[LNet Bandwidth of lfrom]
[R] Avg: 169.87   MiB/s Min: 48.77MiB/s Max: 341.26   MiB/s
[W] Avg: 169.86   MiB/s Min: 0.01 MiB/s Max: 22757.92 MiB/s
[LNet Rates of lto]
[R] Avg: 23458RPC/s Min: 23458RPC/s Max: 23458RPC/s
[W] Avg: 45876RPC/s Min: 45876RPC/s Max: 45876RPC/s
[LNet Bandwidth of lto]
[R] Avg: 341.12   MiB/s Min: 341.12   MiB/s Max: 341.12   MiB/s
[W] Avg: 22758.42 MiB/s Min: 22758.42 MiB/s Max: 22758.42 MiB/s

So if I add clients the aggregate write bandwidth somewhat stacks, but the read 
bandwidth decrease ???
When throwing all the nodes at the system, I am pretty happy with the ~22GB/s 
on write pretty as this is in the 90% of the 2x100G, but the 341MB/s read 
sounds very weird considering that this is a third of the performance of a 
single client.

This are my ksocklnd tuning :
# for i in /sys/module/ksocklnd/parameters/*; do echo "$i 

[lustre-discuss] How to get CPU and Network usage for a particular OST

2019-08-16 Thread Masudul Hasan Masud Bhuiyan
Hi,
I am not sure if its possible but I need to know the CPU, memory usages and
network metrics like packet drop, rtt etc for a particular OST. How can get
these information in real time?

Regards.
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org