Re: [lustre-discuss] Degraded read performance with Large Bulk IO (16MB RPC)

Moreno Diego (ID SIS) Fri, 13 Dec 2019 08:36:13 -0800

From what I can see I think you just ran the wrong command (lctl list_param -R 
* ) or it doesn’t work as you expected on 2.12.3.


But llite params are sure there on a *mounted* Lustre client.

This will give you the parameters you’re looking for and need to modify to 
have, likely, better read performance:

lctl list_param -R llite | grep max_read_ahead


From: Pinkesh Valdria <pinkesh.vald...@oracle.com>
Date: Friday, 13 December 2019 at 17:33
To: "Moreno Diego (ID SIS)" <diego.mor...@id.ethz.ch>, 
"lustre-discuss@lists.lustre.org" <lustre-discuss@lists.lustre.org>
Subject: Re: [lustre-discuss] Degraded read performance with Large Bulk IO 
(16MB RPC)

This is how I installed lustre clients (only showing packages installed steps).


cat > /etc/yum.repos.d/lustre.repo << EOF
[hpddLustreserver]
name=CentOS- - Lustre
baseurl=https://downloads.whamcloud.com/public/lustre/latest-release/el7/server/
gpgcheck=0

[e2fsprogs]
name=CentOS- - Ldiskfs
baseurl=https://downloads.whamcloud.com/public/e2fsprogs/latest/el7/
gpgcheck=0

[hpddLustreclient]
name=CentOS- - Lustre
baseurl=https://downloads.whamcloud.com/public/lustre/latest-release/el7/client/
gpgcheck=0
EOF

yum  install  lustre-client  -y

reboot



From: "Moreno Diego (ID SIS)" <diego.mor...@id.ethz.ch>
Date: Friday, December 13, 2019 at 2:55 AM
To: Pinkesh Valdria <pinkesh.vald...@oracle.com>, 
"lustre-discuss@lists.lustre.org" <lustre-discuss@lists.lustre.org>
Subject: Re: [lustre-discuss] Degraded read performance with Large Bulk IO 
(16MB RPC)

From what I can see they exist on my 2.12.3 client node:

[root@rufus4 ~]# lctl list_param -R llite | grep max_read_ahead
llite.reprofs-ffff9f7c3b4a8800.max_read_ahead_mb
llite.reprofs-ffff9f7c3b4a8800.max_read_ahead_per_file_mb
llite.reprofs-ffff9f7c3b4a8800.max_read_ahead_whole_mb

Regards,

Diego


From: Pinkesh Valdria <pinkesh.vald...@oracle.com>
Date: Wednesday, 11 December 2019 at 17:46
To: "Moreno Diego (ID SIS)" <diego.mor...@id.ethz.ch>, 
"lustre-discuss@lists.lustre.org" <lustre-discuss@lists.lustre.org>
Subject: Re: [lustre-discuss] Degraded read performance with Large Bulk IO 
(16MB RPC)

I was not able to find those parameters on my client nodes,  OSS or MGS nodes.  
 Here is how I was extracting all parameters .

mkdir -p lctl_list_param_R/
cd lctl_list_param_R/
lctl list_param -R *  > lctl_list_param_R

[opc@lustre-client-1 lctl_list_param_R]$ less lctl_list_param_R  | grep ahead
llite.lfsbv-ffff98231c3bc000.statahead_agl
llite.lfsbv-ffff98231c3bc000.statahead_max
llite.lfsbv-ffff98231c3bc000.statahead_running_max
llite.lfsnvme-ffff98232c30e000.statahead_agl
llite.lfsnvme-ffff98232c30e000.statahead_max
llite.lfsnvme-ffff98232c30e000.statahead_running_max
[opc@lustre-client-1 lctl_list_param_R]$

I also tried these commands:

Not working:
On client nodes
lctl get_param llite.lfsbv-*.max_read_ahead_mb
error: get_param: param_path 'llite/lfsbv-*/max_read_ahead_mb': No such file or 
directory
[opc@lustre-client-1 lctl_list_param_R]$

Works
On client nodes
lctl get_param llite.*.statahead_agl
llite.lfsbv-ffff98231c3bc000.statahead_agl=1
llite.lfsnvme-ffff98232c30e000.statahead_agl=1
[opc@lustre-client-1 lctl_list_param_R]$



From: "Moreno Diego (ID SIS)" <diego.mor...@id.ethz.ch>
Date: Tuesday, December 10, 2019 at 2:06 AM
To: Pinkesh Valdria <pinkesh.vald...@oracle.com>, 
"lustre-discuss@lists.lustre.org" <lustre-discuss@lists.lustre.org>
Subject: Re: [lustre-discuss] Degraded read performance with Large Bulk IO 
(16MB RPC)

With that kind of degradation performance on read I would immediately think on 
llite’s max_read_ahead parameters on the client. Specifically these 2:

max_read_ahead_mb: total amount of MB allocated for read ahead, usually quite 
low for bandwidth benchmarking purposes and when there’re several files per 
client
max_read_ahead_per_file_mb: the default is quite low for 16MB RPCs (only a few 
RPCs per file)

You probably need to check the effect increasing both of them.

Regards,

Diego


From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on behalf of 
Pinkesh Valdria <pinkesh.vald...@oracle.com>
Date: Tuesday, 10 December 2019 at 09:40
To: "lustre-discuss@lists.lustre.org" <lustre-discuss@lists.lustre.org>
Subject: [lustre-discuss] Degraded read performance with Large Bulk IO (16MB 
RPC)

I was expecting better or same read performance with Large Bulk IO (16MB RPC),  
but I see degradation in performance.   Do I need to tune any other parameter 
to benefit from Large Bulk IO?   Appreciate if I can get any pointers to 
troubleshoot further.

Throughput before

-          Read:  2563 MB/s

-          Write:  2585 MB/s

Throughput after

-          Read:  1527 MB/s. (down by ~1025)

-          Write:  2859 MB/s


Changes I did are:
On oss

-          lctl set_param obdfilter.lfsbv-*.brw_size=16

On clients

-          unmounted and remounted

-          lctl set_param osc.lfsbv-OST*.max_pages_per_rpc=4096  (got 
auto-updated after re-mount)

-          lctl set_param osc.*.max_rpcs_in_flight=64   (Had to manually 
increase this to 64,  since after re-mount, it was auto-set to 8,  but 
read/write performance was poor)

-          lctl set_param osc.*.max_dirty_mb=2040. (setting the value to 2048 
was failing with : Numerical result out of range error.   Previously it was set 
to 2000 when I got good performance.


My other settings:

-          lnetctl net add --net tcp1 --if $interface  –peer-timeout 180 
–peer-credits 128 –credits 1024

-          echo "options ksocklnd nscheds=10 sock_timeout=100 credits=2560 
peer_credits=63 enable_irq_affinity=0"  >  /etc/modprobe.d/ksocklnd.conf

-          lfs setstripe -c 1 -S 1M /mnt/mdt_bv/test1

_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Degraded read performance with Large Bulk IO (16MB RPC)

Reply via email to