[lustre-discuss] FC and sector size over 1024kb

2019-08-13 Thread Рачко Антон Сергеевич
Hi. I install Lustre on Centos 7.6 server + HP MSA 2040 SAN connected via FC. 
When I try mount Lustre volume with sector size over 1024 kb, an error occurs 
(invalid path, etc.).

Two question:

1)It is possible fix it?

2)It's safety for data if I remount from sector_size=1024kb to 
sector_size=16384kb?

[root@ast1 ~]# uname -a
Linux ast1 3.10.0-957.10.1.el7_lustre.x86_64 #1 SMP Sun May 26 21:48:35 UTC 
2019 x86_64 x86_64 x86_64 GNU/Linux

[root@ast1 ~]# multipath -ll
mpathe (3600c0ff0001e5f28329a0b5d0100) dm-4 HP  ,MSA 2040 SAN
size=22T features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| `- 4:0:0:3 sdd 8:48  active ready running
`-+- policy='service-time 0' prio=10 status=enabled
  `- 5:0:0:3 sdi 8:128 active ready running
mpathd (3600c0ff0001e5f62439a0b5d0100) dm-3 HP  ,MSA 2040 SAN
size=22T features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| `- 5:0:0:4 sdj 8:144 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
  `- 4:0:0:4 sde 8:64  active ready running
mpathc (3600c0ff0001e5f62c6bbe35c0100) dm-0 HP  ,MSA 2040 SAN
size=3.3T features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| `- 5:0:0:2 sdh 8:112 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
  `- 4:0:0:2 sdc 8:32  active ready running
mpathb (3600c0ff0001e5f28b8bbe35c0100) dm-1 HP  ,MSA 2040 SAN
size=6.5T features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| `- 4:0:0:1 sdb 8:16  active ready running
`-+- policy='service-time 0' prio=10 status=enabled
  `- 5:0:0:1 sdg 8:96  active ready running
mpatha (3600c0ff0001e5f28e7b2e25c0100) dm-2 HP  ,MSA 2040 SAN
size=558G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| `- 4:0:0:0 sda 8:0   active ready running
`-+- policy='service-time 0' prio=10 status=enabled
  `- 5:0:0:0 sdf 8:80  active ready running


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Very bad lnet ethernet read performance

2019-08-13 Thread Raj
Louis,
I would also try:
- turning on selective ack (net.ipv4.tcp_sack=1) on all nodes. This helps
although there is a CVE out there for older kernels.
- turning off checksum osc.ostid*.checksums. This can be turned off per
OST/FS on clients.
- Increasing max_pages_per_rpc to 16M. Although this may not help with your
reads.
- Increasing max_rpcs_in_flight and max_dirty_mb be  2 x max_rpcs_in_flight
- Increasing llite.ostid*.max_read_ahead_mb to up to 1024 on clients. Again
this can be set per OST/FS.

_Raj

On Mon, Aug 12, 2019 at 12:12 PM Shawn Hall  wrote:

> Do you have Ethernet flow control configured on all ports (especially the
> uplink ports)?  We’ve found that flow control is critical when there are
> mismatched uplink/client port speeds.
>
>
>
> Shawn
>
>
>
> *From:* lustre-discuss  *On
> Behalf Of *Louis Bailleul
> *Sent:* Monday, August 12, 2019 1:08 PM
> *To:* lustre-discuss@lists.lustre.org
> *Subject:* [lustre-discuss] Very bad lnet ethernet read performance
>
>
>
> Hi all,
>
> I am trying to understand what I am doing wrong here.
> I have a Lustre 2.12.1 system backed by NVME drives under zfs for which
> obdfilter-survey gives descent values
>
> ost  2 sz 536870912K rsz 1024K obj2 thr  256 write 15267.49 [6580.36,
> 8664.20] rewrite 15225.24 [6559.05, 8900.54] read 19739.86 [9062.25,
> 10429.04]
>
> But my actual Lustre performances are pretty poor in comparison (can't top
> 8GB/s write and 13.5GB/s read)
> So I started to question my lnet tuning but playing with peer_credits and
> max_rpc_per_pages didn't help.
>
> My test setup consist of 133x10G Ethernet clients (uplinks between end
> devices and OSS are 2x100G for every 20 nodes).
> The single OSS is fitted with a bonding of 2x100G Ethernet.
>
> I have tried to understand the problem using lnet_selftest but I'll need
> some help/doco as this doesn't make sense to me.
>
> Testing a single 10G client
>
> [LNet Rates of lfrom]
> [R] Avg: 2231 RPC/s Min: 2231 RPC/s Max: 2231 RPC/s
> [W] Avg: 1156 RPC/s Min: 1156 RPC/s Max: 1156 RPC/s
> [LNet Bandwidth of lfrom]
> [R] Avg: 1075.16  MiB/s Min: 1075.16  MiB/s Max: 1075.16  MiB/s
> [W] Avg: 0.18 MiB/s Min: 0.18 MiB/s Max: 0.18 MiB/s
> [LNet Rates of lto]
> [R] Avg: 1179 RPC/s Min: 1179 RPC/s Max: 1179 RPC/s
> [W] Avg: 2254 RPC/s Min: 2254 RPC/s Max: 2254 RPC/s
> [LNet Bandwidth of lto]
> [R] Avg: 0.19 MiB/s Min: 0.19 MiB/s Max: 0.19 MiB/s
> [W] Avg: 1075.17  MiB/s Min: 1075.17  MiB/s Max: 1075.17  MiB/s
>
> With 10x10G clients :
>
> [LNet Rates of lfrom]
> [R] Avg: 1416 RPC/s Min: 1102 RPC/s Max: 1642 RPC/s
> [W] Avg: 708  RPC/s Min: 551  RPC/s Max: 821  RPC/s
> [LNet Bandwidth of lfrom]
> [R] Avg: 708.20   MiB/s Min: 550.77   MiB/s Max: 820.96   MiB/s
> [W] Avg: 0.11 MiB/s Min: 0.08 MiB/s Max: 0.13 MiB/s
> [LNet Rates of lto]
> [R] Avg: 7084 RPC/s Min: 7084 RPC/s Max: 7084 RPC/s
> [W] Avg: 14165RPC/s Min: 14165RPC/s Max: 14165RPC/s
> [LNet Bandwidth of lto]
> [R] Avg: 1.08 MiB/s Min: 1.08 MiB/s Max: 1.08 MiB/s
> [W] Avg: 7081.86  MiB/s Min: 7081.86  MiB/s Max: 7081.86  MiB/s
>
>
> With all 133x10G clients:
>
> [LNet Rates of lfrom]
> [R] Avg: 510  RPC/s Min: 98   RPC/s Max: 23457RPC/s
> [W] Avg: 510  RPC/s Min: 49   RPC/s Max: 45863RPC/s
> [LNet Bandwidth of lfrom]
> [R] Avg: 169.87   MiB/s Min: 48.77MiB/s Max: 341.26   MiB/s
> [W] Avg: 169.86   MiB/s Min: 0.01 MiB/s Max: 22757.92 MiB/s
> [LNet Rates of lto]
> [R] Avg: 23458RPC/s Min: 23458RPC/s Max: 23458RPC/s
> [W] Avg: 45876RPC/s Min: 45876RPC/s Max: 45876RPC/s
> [LNet Bandwidth of lto]
> [R] Avg: 341.12   MiB/s Min: 341.12   MiB/s Max: 341.12   MiB/s
> [W] Avg: 22758.42 MiB/s Min: 22758.42 MiB/s Max: 22758.42 MiB/s
>
>
> So if I add clients the aggregate write bandwidth somewhat stacks, but the
> read bandwidth decrease ???
> When throwing all the nodes at the system, I am pretty happy with the
> ~22GB/s on write pretty as this is in the 90% of the 2x100G, but the
> 341MB/s read sounds very weird considering that this is a third of the
> performance of a single client.
>
> This are my ksocklnd tuning :
>
> # for i in /sys/module/ksocklnd/parameters/*; do echo "$i : $(cat $i)";
> done
> /sys/module/ksocklnd/parameters/credits : 1024
> /sys/module/ksocklnd/parameters/eager_ack : 0
> /sys/module/ksocklnd/parameters/enable_csum : 0
> /sys/module/ksocklnd/parameters/enable_irq_affinity : 0
> /sys/module/ksocklnd/parameters/inject_csum_error : 0
> /sys/module/ksocklnd/parameters/keepalive : 30
> /sys/module/ksocklnd/parameters/keepalive_count : 5
> /sys/module/ksocklnd/parameters/keepalive_idle : 30
> /sys/module/ksocklnd/parameters/keepalive_intvl : 5
> /sys/module/ksocklnd/parameters/max_reconnectms : 6
> /sys/module/ksocklnd/parameters/min_bulk : 1024
> /sys/module/ksocklnd/parameters/min_reconnectms : 1000
> /sys/module/ksocklnd/parameter