[lustre-discuss] max_pages_per_rpc=4096 fails on the client nodes

2019-08-14 Thread Pinkesh Valdria
I want to enable large RPC size.   I followed the steps as per the Lustre 
manual section: 33.9.2 Usage (http://doc.lustre.org/lustre_manual.xhtml),  but 
I get the below we error when I try to update the client.    

 

Updated the OSS server: 

[root@lustre-oss-server-nic0-1 test]# lctl set_param 
obdfilter.lfsbv-*.brw_size=16

obdfilter.lfsbv-OST.brw_size=16

obdfilter.lfsbv-OST0001.brw_size=16

obdfilter.lfsbv-OST0002.brw_size=16

obdfilter.lfsbv-OST0003.brw_size=16

obdfilter.lfsbv-OST0004.brw_size=16

obdfilter.lfsbv-OST0005.brw_size=16

obdfilter.lfsbv-OST0006.brw_size=16

obdfilter.lfsbv-OST0007.brw_size=16

obdfilter.lfsbv-OST0008.brw_size=16

obdfilter.lfsbv-OST0009.brw_size=16

[root@lustre-oss-server-nic0-1 test]#

 

Add the above change permanently using MGS node: 

[root@lustre-mds-server-nic0-1 ~]# lctl set_param -P 
obdfilter.lfsbv-*.brw_size=16

[root@lustre-mds-server-nic0-1 ~]#

 

 

Client side update – failed 

[root@lustre-client-1 ~]# lctl set_param osc.lfsbv-OST*.max_pages_per_rpc=4096

error: set_param: setting 
/proc/fs/lustre/osc/lfsbv-OST-osc-8e66b4b08000/max_pages_per_rpc=4096: 
Numerical result out of range

error: set_param: setting 
/proc/fs/lustre/osc/lfsbv-OST0001-osc-8e66b4b08000/max_pages_per_rpc=4096: 
Numerical result out of range

error: set_param: setting 
/proc/fs/lustre/osc/lfsbv-OST0002-osc-8e66b4b08000/max_pages_per_rpc=4096: 
Numerical result out of range

error: set_param: setting 
/proc/fs/lustre/osc/lfsbv-OST0003-osc-8e66b4b08000/max_pages_per_rpc=4096: 
Numerical result out of range

…..

…..

 

 

33.9.2. Usage

In order to enable a larger RPC size, brw_size must be changed to an IO size 
value up to 16MB. To temporarily change brw_size, the following command should 
be run on the OSS:

oss# lctl set_param obdfilter.fsname-OST*.brw_size=16

To persistently change brw_size, the following command should be run:

oss# lctl set_param -P obdfilter.fsname-OST*.brw_size=16

When a client connects to an OST target, it will fetch brw_size from the target 
and pick the maximum value of brw_size and its local setting for 
max_pages_per_rpc as the actual RPC size. Therefore, the max_pages_per_rpc on 
the client side would have to be set to 16M, or 4096 if the PAGESIZE is 4KB, to 
enable a 16MB RPC. To temporarily make the change, the following command should 
be run on the client to setmax_pages_per_rpc:

client$ lctl set_param osc.fsname-OST*.max_pages_per_rpc=16M

To persistently make this change, the following command should be run:

client$ lctl set_param -P obdfilter.fsname-OST*.osc.max_pages_per_rpc=16M

Caution

The brw_size of an OST can be changed on the fly. However, clients have to be 
remounted to renegotiate the new maximum RPC size.

 

 

 

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] max_pages_per_rpc=4096 fails on the client nodes

2019-08-14 Thread Pinkesh Valdria
For others, incase they face this issue.  


Solution: I had to unmount and remount for the command to work. 

 

From: Pinkesh Valdria 
Date: Wednesday, August 14, 2019 at 9:25 AM
To: "lustre-discuss@lists.lustre.org" 
Subject: max_pages_per_rpc=4096 fails on the client nodes

 

I want to enable large RPC size.   I followed the steps as per the Lustre 
manual section: 33.9.2 Usage (http://doc.lustre.org/lustre_manual.xhtml),  but 
I get the below we error when I try to update the client.

 

Updated the OSS server: 

[root@lustre-oss-server-nic0-1 test]# lctl set_param 
obdfilter.lfsbv-*.brw_size=16

obdfilter.lfsbv-OST.brw_size=16

obdfilter.lfsbv-OST0001.brw_size=16

obdfilter.lfsbv-OST0002.brw_size=16

obdfilter.lfsbv-OST0003.brw_size=16

obdfilter.lfsbv-OST0004.brw_size=16

obdfilter.lfsbv-OST0005.brw_size=16

obdfilter.lfsbv-OST0006.brw_size=16

obdfilter.lfsbv-OST0007.brw_size=16

obdfilter.lfsbv-OST0008.brw_size=16

obdfilter.lfsbv-OST0009.brw_size=16

[root@lustre-oss-server-nic0-1 test]#

 

Add the above change permanently using MGS node: 

[root@lustre-mds-server-nic0-1 ~]# lctl set_param -P 
obdfilter.lfsbv-*.brw_size=16

[root@lustre-mds-server-nic0-1 ~]#

 

 

Client side update – failed 

[root@lustre-client-1 ~]# lctl set_param osc.lfsbv-OST*.max_pages_per_rpc=4096

error: set_param: setting 
/proc/fs/lustre/osc/lfsbv-OST-osc-8e66b4b08000/max_pages_per_rpc=4096: 
Numerical result out of range

error: set_param: setting 
/proc/fs/lustre/osc/lfsbv-OST0001-osc-8e66b4b08000/max_pages_per_rpc=4096: 
Numerical result out of range

error: set_param: setting 
/proc/fs/lustre/osc/lfsbv-OST0002-osc-8e66b4b08000/max_pages_per_rpc=4096: 
Numerical result out of range

error: set_param: setting 
/proc/fs/lustre/osc/lfsbv-OST0003-osc-8e66b4b08000/max_pages_per_rpc=4096: 
Numerical result out of range

…..

…..

 

 

33.9.2. Usage

In order to enable a larger RPC size, brw_size must be changed to an IO size 
value up to 16MB. To temporarily change brw_size, the following command should 
be run on the OSS:

oss# lctl set_param obdfilter.fsname-OST*.brw_size=16

To persistently change brw_size, the following command should be run:

oss# lctl set_param -P obdfilter.fsname-OST*.brw_size=16

When a client connects to an OST target, it will fetch brw_size from the target 
and pick the maximum value of brw_size and its local setting for 
max_pages_per_rpc as the actual RPC size. Therefore, the max_pages_per_rpc on 
the client side would have to be set to 16M, or 4096 if the PAGESIZE is 4KB, to 
enable a 16MB RPC. To temporarily make the change, the following command should 
be run on the client to setmax_pages_per_rpc:

client$ lctl set_param osc.fsname-OST*.max_pages_per_rpc=16M

To persistently make this change, the following command should be run:

client$ lctl set_param -P obdfilter.fsname-OST*.osc.max_pages_per_rpc=16M

Caution

The brw_size of an OST can be changed on the fly. However, clients have to be 
remounted to renegotiate the new maximum RPC size.

 

 

 

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Frequent, silent OSS hangs on multi-homed system

2019-08-14 Thread Kirk, Benjamin (JSC-EG311)
Hi, I'm love some ideas to debug what has become a frequent annoyance for us.  
At the high level, we're observing fairly frequent OSS hangs, with absolutely 
no console or logging activity.  Our BMC watchdogs then reboot the OSS and ~6 
minutes later everything is back in line.  This has been an infrequent 
occurance on this system for a couple years, but has become much more frequent 
in recent months.

I'd love any suggestions for either lustre/lnet or overall kernel tricks to up 
the logging level if possible to see if we can get some more useful output. 
Right now we're blind.

More details below, and also what I'd characterize as uninformed speculation:

-) overall system is (2x)MDS, (12x)OSS, (2x) Monitoring nodes of identical 
servers, network cards, etc... 

-) only difference is JBOD types, the OSS'es are connected to Supermicro 90-bay 
SC946ED-R2KJBOD. All other server hardware is identical. 

-) only the OSSes hang in this manner. I'm looking back, some seem more prone 
than others, but it's not obviously only a few.

-) CentOS 7.6, lustre 2.10.8, ZFS 0.7.9

-) 2 active file systems, one is pure ZFS and the other ZFS/OSS with ldiskfs mdt

-) Mellanox ConnectX3 FDR IB & 40GbE

-) LSI 9300-8e HBA

-) Lustre servers are triple-homed, they live on (2x) IB and (1x) 40GbE networks

-) previously when we first moved to 2.10 we were bit hard and frequently by 
LU-10163 (which may or may not be relevant)

-) The hangs don't correlate to any discrete event best I can tell.  
Importantly, we get no LBUGs or anything, which is different than the previous 
signature.

-) We have definitely stepped up the traffic on the ethernet network this year. 
 Whereas the primary I/O was previously just on the two IB networks, we are now 
taxing the ethernet as well with some regularity.

Any thoughts are most welcome, and thanks!

-Ben




___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org