Re: [lustre-discuss] Very bad lnet ethernet read performance
Louis, I would also try: - turning on selective ack (net.ipv4.tcp_sack=1) on all nodes. This helps although there is a CVE out there for older kernels. - turning off checksum osc.ostid*.checksums. This can be turned off per OST/FS on clients. - Increasing max_pages_per_rpc to 16M. Although this may not help with your reads. - Increasing max_rpcs_in_flight and max_dirty_mb be 2 x max_rpcs_in_flight - Increasing llite.ostid*.max_read_ahead_mb to up to 1024 on clients. Again this can be set per OST/FS. _Raj On Mon, Aug 12, 2019 at 12:12 PM Shawn Hall wrote: > Do you have Ethernet flow control configured on all ports (especially the > uplink ports)? We’ve found that flow control is critical when there are > mismatched uplink/client port speeds. > > > > Shawn > > > > *From:* lustre-discuss *On > Behalf Of *Louis Bailleul > *Sent:* Monday, August 12, 2019 1:08 PM > *To:* lustre-discuss@lists.lustre.org > *Subject:* [lustre-discuss] Very bad lnet ethernet read performance > > > > Hi all, > > I am trying to understand what I am doing wrong here. > I have a Lustre 2.12.1 system backed by NVME drives under zfs for which > obdfilter-survey gives descent values > > ost 2 sz 536870912K rsz 1024K obj2 thr 256 write 15267.49 [6580.36, > 8664.20] rewrite 15225.24 [6559.05, 8900.54] read 19739.86 [9062.25, > 10429.04] > > But my actual Lustre performances are pretty poor in comparison (can't top > 8GB/s write and 13.5GB/s read) > So I started to question my lnet tuning but playing with peer_credits and > max_rpc_per_pages didn't help. > > My test setup consist of 133x10G Ethernet clients (uplinks between end > devices and OSS are 2x100G for every 20 nodes). > The single OSS is fitted with a bonding of 2x100G Ethernet. > > I have tried to understand the problem using lnet_selftest but I'll need > some help/doco as this doesn't make sense to me. > > Testing a single 10G client > > [LNet Rates of lfrom] > [R] Avg: 2231 RPC/s Min: 2231 RPC/s Max: 2231 RPC/s > [W] Avg: 1156 RPC/s Min: 1156 RPC/s Max: 1156 RPC/s > [LNet Bandwidth of lfrom] > [R] Avg: 1075.16 MiB/s Min: 1075.16 MiB/s Max: 1075.16 MiB/s > [W] Avg: 0.18 MiB/s Min: 0.18 MiB/s Max: 0.18 MiB/s > [LNet Rates of lto] > [R] Avg: 1179 RPC/s Min: 1179 RPC/s Max: 1179 RPC/s > [W] Avg: 2254 RPC/s Min: 2254 RPC/s Max: 2254 RPC/s > [LNet Bandwidth of lto] > [R] Avg: 0.19 MiB/s Min: 0.19 MiB/s Max: 0.19 MiB/s > [W] Avg: 1075.17 MiB/s Min: 1075.17 MiB/s Max: 1075.17 MiB/s > > With 10x10G clients : > > [LNet Rates of lfrom] > [R] Avg: 1416 RPC/s Min: 1102 RPC/s Max: 1642 RPC/s > [W] Avg: 708 RPC/s Min: 551 RPC/s Max: 821 RPC/s > [LNet Bandwidth of lfrom] > [R] Avg: 708.20 MiB/s Min: 550.77 MiB/s Max: 820.96 MiB/s > [W] Avg: 0.11 MiB/s Min: 0.08 MiB/s Max: 0.13 MiB/s > [LNet Rates of lto] > [R] Avg: 7084 RPC/s Min: 7084 RPC/s Max: 7084 RPC/s > [W] Avg: 14165RPC/s Min: 14165RPC/s Max: 14165RPC/s > [LNet Bandwidth of lto] > [R] Avg: 1.08 MiB/s Min: 1.08 MiB/s Max: 1.08 MiB/s > [W] Avg: 7081.86 MiB/s Min: 7081.86 MiB/s Max: 7081.86 MiB/s > > > With all 133x10G clients: > > [LNet Rates of lfrom] > [R] Avg: 510 RPC/s Min: 98 RPC/s Max: 23457RPC/s > [W] Avg: 510 RPC/s Min: 49 RPC/s Max: 45863RPC/s > [LNet Bandwidth of lfrom] > [R] Avg: 169.87 MiB/s Min: 48.77MiB/s Max: 341.26 MiB/s > [W] Avg: 169.86 MiB/s Min: 0.01 MiB/s Max: 22757.92 MiB/s > [LNet Rates of lto] > [R] Avg: 23458RPC/s Min: 23458RPC/s Max: 23458RPC/s > [W] Avg: 45876RPC/s Min: 45876RPC/s Max: 45876RPC/s > [LNet Bandwidth of lto] > [R] Avg: 341.12 MiB/s Min: 341.12 MiB/s Max: 341.12 MiB/s > [W] Avg: 22758.42 MiB/s Min: 22758.42 MiB/s Max: 22758.42 MiB/s > > > So if I add clients the aggregate write bandwidth somewhat stacks, but the > read bandwidth decrease ??? > When throwing all the nodes at the system, I am pretty happy with the > ~22GB/s on write pretty as this is in the 90% of the 2x100G, but the > 341MB/s read sounds very weird considering that this is a third of the > performance of a single client. > > This are my ksocklnd tuning : > > # for i in /sys/module/ksocklnd/parameters/*; do echo "$i : $(cat $i)"; > done > /sys/module/ksocklnd/parameters/credits : 1024 > /sys/module/ksocklnd/parameters/eager_ack : 0 > /sys/module/ksocklnd/parameters/enable_csum : 0 > /sys/module/ksocklnd/parameters/enable_irq_affinity : 0 > /sys/module/ksocklnd/parameters/inject_csum_error : 0 > /sys/module/ksocklnd/parameters/keepalive : 30 > /sys/module/ksocklnd/parameters/keepalive_count : 5 > /sys/module/ksocklnd/parameters/keepalive_idle : 30 > /sys/module/ksocklnd/parameters/keepalive_intvl : 5 > /sys/module/ksocklnd/parameters/max_reconnectms : 6 > /sys/module/ksocklnd/parameters/min_bulk : 1024 > /sys/module/ksocklnd/parameters/min_reconnectms : 1000 >
Re: [lustre-discuss] Very bad lnet ethernet read performance
Do you have Ethernet flow control configured on all ports (especially the uplink ports)? We’ve found that flow control is critical when there are mismatched uplink/client port speeds. Shawn From: lustre-discuss On Behalf Of Louis Bailleul Sent: Monday, August 12, 2019 1:08 PM To: lustre-discuss@lists.lustre.org Subject: [lustre-discuss] Very bad lnet ethernet read performance Hi all, I am trying to understand what I am doing wrong here. I have a Lustre 2.12.1 system backed by NVME drives under zfs for which obdfilter-survey gives descent values ost 2 sz 536870912K rsz 1024K obj2 thr 256 write 15267.49 [6580.36, 8664.20] rewrite 15225.24 [6559.05, 8900.54] read 19739.86 [9062.25, 10429.04] But my actual Lustre performances are pretty poor in comparison (can't top 8GB/s write and 13.5GB/s read) So I started to question my lnet tuning but playing with peer_credits and max_rpc_per_pages didn't help. My test setup consist of 133x10G Ethernet clients (uplinks between end devices and OSS are 2x100G for every 20 nodes). The single OSS is fitted with a bonding of 2x100G Ethernet. I have tried to understand the problem using lnet_selftest but I'll need some help/doco as this doesn't make sense to me. Testing a single 10G client [LNet Rates of lfrom] [R] Avg: 2231 RPC/s Min: 2231 RPC/s Max: 2231 RPC/s [W] Avg: 1156 RPC/s Min: 1156 RPC/s Max: 1156 RPC/s [LNet Bandwidth of lfrom] [R] Avg: 1075.16 MiB/s Min: 1075.16 MiB/s Max: 1075.16 MiB/s [W] Avg: 0.18 MiB/s Min: 0.18 MiB/s Max: 0.18 MiB/s [LNet Rates of lto] [R] Avg: 1179 RPC/s Min: 1179 RPC/s Max: 1179 RPC/s [W] Avg: 2254 RPC/s Min: 2254 RPC/s Max: 2254 RPC/s [LNet Bandwidth of lto] [R] Avg: 0.19 MiB/s Min: 0.19 MiB/s Max: 0.19 MiB/s [W] Avg: 1075.17 MiB/s Min: 1075.17 MiB/s Max: 1075.17 MiB/s With 10x10G clients : [LNet Rates of lfrom] [R] Avg: 1416 RPC/s Min: 1102 RPC/s Max: 1642 RPC/s [W] Avg: 708 RPC/s Min: 551 RPC/s Max: 821 RPC/s [LNet Bandwidth of lfrom] [R] Avg: 708.20 MiB/s Min: 550.77 MiB/s Max: 820.96 MiB/s [W] Avg: 0.11 MiB/s Min: 0.08 MiB/s Max: 0.13 MiB/s [LNet Rates of lto] [R] Avg: 7084 RPC/s Min: 7084 RPC/s Max: 7084 RPC/s [W] Avg: 14165RPC/s Min: 14165RPC/s Max: 14165RPC/s [LNet Bandwidth of lto] [R] Avg: 1.08 MiB/s Min: 1.08 MiB/s Max: 1.08 MiB/s [W] Avg: 7081.86 MiB/s Min: 7081.86 MiB/s Max: 7081.86 MiB/s With all 133x10G clients: [LNet Rates of lfrom] [R] Avg: 510 RPC/s Min: 98 RPC/s Max: 23457RPC/s [W] Avg: 510 RPC/s Min: 49 RPC/s Max: 45863RPC/s [LNet Bandwidth of lfrom] [R] Avg: 169.87 MiB/s Min: 48.77MiB/s Max: 341.26 MiB/s [W] Avg: 169.86 MiB/s Min: 0.01 MiB/s Max: 22757.92 MiB/s [LNet Rates of lto] [R] Avg: 23458RPC/s Min: 23458RPC/s Max: 23458RPC/s [W] Avg: 45876RPC/s Min: 45876RPC/s Max: 45876RPC/s [LNet Bandwidth of lto] [R] Avg: 341.12 MiB/s Min: 341.12 MiB/s Max: 341.12 MiB/s [W] Avg: 22758.42 MiB/s Min: 22758.42 MiB/s Max: 22758.42 MiB/s So if I add clients the aggregate write bandwidth somewhat stacks, but the read bandwidth decrease ??? When throwing all the nodes at the system, I am pretty happy with the ~22GB/s on write pretty as this is in the 90% of the 2x100G, but the 341MB/s read sounds very weird considering that this is a third of the performance of a single client. This are my ksocklnd tuning : # for i in /sys/module/ksocklnd/parameters/*; do echo "$i : $(cat $i)"; done /sys/module/ksocklnd/parameters/credits : 1024 /sys/module/ksocklnd/parameters/eager_ack : 0 /sys/module/ksocklnd/parameters/enable_csum : 0 /sys/module/ksocklnd/parameters/enable_irq_affinity : 0 /sys/module/ksocklnd/parameters/inject_csum_error : 0 /sys/module/ksocklnd/parameters/keepalive : 30 /sys/module/ksocklnd/parameters/keepalive_count : 5 /sys/module/ksocklnd/parameters/keepalive_idle : 30 /sys/module/ksocklnd/parameters/keepalive_intvl : 5 /sys/module/ksocklnd/parameters/max_reconnectms : 6 /sys/module/ksocklnd/parameters/min_bulk : 1024 /sys/module/ksocklnd/parameters/min_reconnectms : 1000 /sys/module/ksocklnd/parameters/nagle : 0 /sys/module/ksocklnd/parameters/nconnds : 4 /sys/module/ksocklnd/parameters/nconnds_max : 64 /sys/module/ksocklnd/parameters/nonblk_zcack : 1 /sys/module/ksocklnd/parameters/nscheds : 12 /sys/module/ksocklnd/parameters/peer_buffer_credits : 0 /sys/module/ksocklnd/parameters/peer_credits : 128 /sys/module/ksocklnd/parameters/peer_timeout : 180 /sys/module/ksocklnd/parameters/round_robin : 1 /sys/module/ksocklnd/parameters/rx_buffer_size : 0 /sys/module/ksocklnd/parameters/sock_timeout : 50 /sys/module/ksocklnd/parameters/tx_buffer_size : 0 /sys/module/ksocklnd/parameters/typed_conns : 1 /sys/module/ksocklnd/parameters/zc_min_payload : 16384 /sys/module/ksocklnd/parameters/zc_recv : 0 /sys/module/ksocklnd/parameters/zc_recv_min_nfrags : 16 Best regards,