Re: [lustre-discuss] Degraded read performance with Large Bulk IO (16MB RPC)

2020-01-22 Thread Pinkesh Valdria
To close the loop on this topic.   

 

The below parameters were not set by default and hence they were not showing up 
in lctl list_param commands.  I have to set them first.   

lctl set_param llite.*.max_read_ahead_mb=256

lctl set_param llite.*.max_read_ahead_per_file_mb=256

 

 

Thanks to the Lustre Community for their help to tune Lustre,  I was able to 
tune Lustre on Oracle Cloud Infrastructure to get good performance on Bare 
metal nodes with 2x25Gbps network.   We have open sourced the deployment of 
Lustre on Oracle Cloud as well as all the performance tuning done at the 
Infrastructure level as well as Lustre FS level for everyone to benefit from 
it.  

 

https://github.com/oracle-quickstart/oci-lustre

Terraform files are in :  
https://github.com/oracle-quickstart/oci-lustre/tree/master/terraform

Tuning scripts are in this folder:  
https://github.com/oracle-quickstart/oci-lustre/tree/master/scripts

 

 

As next step -  I plan to test deployment of Lustre on 100 Gbps RoCEv2 RDMA 
network (Mellanox CX5).  

 

 

Thanks, 

Pinkesh Valdria 

Oracle Cloud – Principal Solutions Architect 

https://blogs.oracle.com/cloud-infrastructure/lustre-file-system-performance-on-oracle-cloud-infrastructure

https://blogs.oracle.com/author/pinkesh-valdria

 

 

From: lustre-discuss  on behalf of 
Pinkesh Valdria 
Date: Friday, December 13, 2019 at 11:14 AM
To: "Moreno Diego (ID SIS)" , 
"lustre-discuss@lists.lustre.org" 
Subject: Re: [lustre-discuss] Degraded read performance with Large Bulk IO 
(16MB RPC)

 

I ran the latest command you provided and it does not show the parameter, like 
you see.I can do screenshare. 

 

 

[opc@lustre-client-1 ~]$ df -h

Filesystem  Size  Used Avail Use% Mounted on

/dev/sda339G  2.5G   36G   7% /

devtmpfs158G 0  158G   0% /dev

tmpfs   158G 0  158G   0% /dev/shm

tmpfs   158G   17M  158G   1% /run

tmpfs   158G 0  158G   0% /sys/fs/cgroup

/dev/sda1   512M   12M  501M   3% /boot/efi

10.0.3.6@tcp1:/lfsbv 50T   89M   48T   1% /mnt/mdt_bv

10.0.3.6@tcp1:/lfsnvme  185T  8.7M  176T   1% /mnt/mdt_nvme

tmpfs32G 0   32G   0% /run/user/1000

 

 

[opc@lustre-client-1 ~]$ lctl list_param -R llite | grep max_read_ahead

[opc@lustre-client-1 ~]$

 

So I ran this: 

 

[opc@lustre-client-1 ~]$ lctl list_param -R llite  >  llite_parameters.txt

 

There are other parameters under llite.   I attached the complete list. 

 

 

From: "Moreno Diego (ID SIS)" 
Date: Friday, December 13, 2019 at 8:36 AM
To: Pinkesh Valdria , 
"lustre-discuss@lists.lustre.org" 
Subject: Re: [lustre-discuss] Degraded read performance with Large Bulk IO 
(16MB RPC)

 

>From what I can see I think you just ran the wrong command (lctl list_param -R 
>* ) or it doesn’t work as you expected on 2.12.3.

 

But llite params are sure there on a *mounted* Lustre client. 

 

This will give you the parameters you’re looking for and need to modify to 
have, likely, better read performance:

 

lctl list_param -R llite | grep max_read_ahead

 

 

From: Pinkesh Valdria 
Date: Friday, 13 December 2019 at 17:33
To: "Moreno Diego (ID SIS)" , 
"lustre-discuss@lists.lustre.org" 
Subject: Re: [lustre-discuss] Degraded read performance with Large Bulk IO 
(16MB RPC)

 

This is how I installed lustre clients (only showing packages installed steps). 

 

 

cat > /etc/yum.repos.d/lustre.repo << EOF

[hpddLustreserver]

name=CentOS- - Lustre

baseurl=https://downloads.whamcloud.com/public/lustre/latest-release/el7/server/

gpgcheck=0

 

[e2fsprogs]

name=CentOS- - Ldiskfs

baseurl=https://downloads.whamcloud.com/public/e2fsprogs/latest/el7/

gpgcheck=0

 

[hpddLustreclient]

name=CentOS- - Lustre

baseurl=https://downloads.whamcloud.com/public/lustre/latest-release/el7/client/

gpgcheck=0

EOF

 

yum  install  lustre-client  -y

 

reboot

 

 

 

From: "Moreno Diego (ID SIS)" 
Date: Friday, December 13, 2019 at 2:55 AM
To: Pinkesh Valdria , 
"lustre-discuss@lists.lustre.org" 
Subject: Re: [lustre-discuss] Degraded read performance with Large Bulk IO 
(16MB RPC)

 

>From what I can see they exist on my 2.12.3 client node:

 

[root@rufus4 ~]# lctl list_param -R llite | grep max_read_ahead

llite.reprofs-9f7c3b4a8800.max_read_ahead_mb

llite.reprofs-9f7c3b4a8800.max_read_ahead_per_file_mb

llite.reprofs-9f7c3b4a8800.max_read_ahead_whole_mb

 

Regards,

 

Diego

 

 

From: Pinkesh Valdria 
Date: Wednesday, 11 December 2019 at 17:46
To: "Moreno Diego (ID SIS)" , 
"lustre-discuss@lists.lustre.org" 
Subject: Re: [lustre-discuss] Degraded read performance with Large Bulk IO 
(16MB RPC)

 

I was not able to find those parameters on my client nodes,  OSS or MGS nodes.  
 Here is how I was extracting all parameters .  


Re: [lustre-discuss] Degraded read performance with Large Bulk IO (16MB RPC)

2019-12-13 Thread Pinkesh Valdria
I ran the latest command you provided and it does not show the parameter, like 
you see.    I can do screenshare. 

 

 

[opc@lustre-client-1 ~]$ df -h

Filesystem  Size  Used Avail Use% Mounted on

/dev/sda3    39G  2.5G   36G   7% /

devtmpfs    158G 0  158G   0% /dev

tmpfs   158G 0  158G   0% /dev/shm

tmpfs   158G   17M  158G   1% /run

tmpfs   158G 0  158G   0% /sys/fs/cgroup

/dev/sda1   512M   12M  501M   3% /boot/efi

10.0.3.6@tcp1:/lfsbv 50T   89M   48T   1% /mnt/mdt_bv

10.0.3.6@tcp1:/lfsnvme  185T  8.7M  176T   1% /mnt/mdt_nvme

tmpfs    32G 0   32G   0% /run/user/1000

 

 

[opc@lustre-client-1 ~]$ lctl list_param -R llite | grep max_read_ahead

[opc@lustre-client-1 ~]$

 

So I ran this: 

 

[opc@lustre-client-1 ~]$ lctl list_param -R llite  >  llite_parameters.txt

 

There are other parameters under llite.   I attached the complete list. 

 

 

From: "Moreno Diego (ID SIS)" 
Date: Friday, December 13, 2019 at 8:36 AM
To: Pinkesh Valdria , 
"lustre-discuss@lists.lustre.org" 
Subject: Re: [lustre-discuss] Degraded read performance with Large Bulk IO 
(16MB RPC)

 

>From what I can see I think you just ran the wrong command (lctl list_param -R 
>* ) or it doesn’t work as you expected on 2.12.3.

 

But llite params are sure there on a *mounted* Lustre client. 

 

This will give you the parameters you’re looking for and need to modify to 
have, likely, better read performance:

 

lctl list_param -R llite | grep max_read_ahead

 

 

From: Pinkesh Valdria 
Date: Friday, 13 December 2019 at 17:33
To: "Moreno Diego (ID SIS)" , 
"lustre-discuss@lists.lustre.org" 
Subject: Re: [lustre-discuss] Degraded read performance with Large Bulk IO 
(16MB RPC)

 

This is how I installed lustre clients (only showing packages installed steps). 

 

 

cat > /etc/yum.repos.d/lustre.repo << EOF

[hpddLustreserver]

name=CentOS- - Lustre

baseurl=https://downloads.whamcloud.com/public/lustre/latest-release/el7/server/

gpgcheck=0

 

[e2fsprogs]

name=CentOS- - Ldiskfs

baseurl=https://downloads.whamcloud.com/public/e2fsprogs/latest/el7/

gpgcheck=0

 

[hpddLustreclient]

name=CentOS- - Lustre

baseurl=https://downloads.whamcloud.com/public/lustre/latest-release/el7/client/

gpgcheck=0

EOF

 

yum  install  lustre-client  -y

 

reboot

 

 

 

From: "Moreno Diego (ID SIS)" 
Date: Friday, December 13, 2019 at 2:55 AM
To: Pinkesh Valdria , 
"lustre-discuss@lists.lustre.org" 
Subject: Re: [lustre-discuss] Degraded read performance with Large Bulk IO 
(16MB RPC)

 

>From what I can see they exist on my 2.12.3 client node:

 

[root@rufus4 ~]# lctl list_param -R llite | grep max_read_ahead

llite.reprofs-9f7c3b4a8800.max_read_ahead_mb

llite.reprofs-9f7c3b4a8800.max_read_ahead_per_file_mb

llite.reprofs-9f7c3b4a8800.max_read_ahead_whole_mb

 

Regards,

 

Diego

 

 

From: Pinkesh Valdria 
Date: Wednesday, 11 December 2019 at 17:46
To: "Moreno Diego (ID SIS)" , 
"lustre-discuss@lists.lustre.org" 
Subject: Re: [lustre-discuss] Degraded read performance with Large Bulk IO 
(16MB RPC)

 

I was not able to find those parameters on my client nodes,  OSS or MGS nodes.  
 Here is how I was extracting all parameters .  

 

mkdir -p lctl_list_param_R/

cd lctl_list_param_R/

lctl list_param -R *  > lctl_list_param_R

 

[opc@lustre-client-1 lctl_list_param_R]$ less lctl_list_param_R  | grep ahead

llite.lfsbv-98231c3bc000.statahead_agl

llite.lfsbv-98231c3bc000.statahead_max

llite.lfsbv-98231c3bc000.statahead_running_max

llite.lfsnvme-98232c30e000.statahead_agl

llite.lfsnvme-98232c30e000.statahead_max

llite.lfsnvme-98232c30e000.statahead_running_max

[opc@lustre-client-1 lctl_list_param_R]$

 

I also tried these commands:  

 

Not working: 

On client nodes

lctl get_param llite.lfsbv-*.max_read_ahead_mb

error: get_param: param_path 'llite/lfsbv-*/max_read_ahead_mb': No such file or 
directory

[opc@lustre-client-1 lctl_list_param_R]$

 

Works 

On client nodes

lctl get_param llite.*.statahead_agl

llite.lfsbv-98231c3bc000.statahead_agl=1

llite.lfsnvme-98232c30e000.statahead_agl=1

[opc@lustre-client-1 lctl_list_param_R]$

 

 

 

From: "Moreno Diego (ID SIS)" 
Date: Tuesday, December 10, 2019 at 2:06 AM
To: Pinkesh Valdria , 
"lustre-discuss@lists.lustre.org" 
Subject: Re: [lustre-discuss] Degraded read performance with Large Bulk IO 
(16MB RPC)

 

With that kind of degradation performance on read I would immediately think on 
llite’s max_read_ahead parameters on the client. Specifically these 2:

 

max_read_ahead_mb: total amount of MB allocated for read ahead, usually quite 
low for bandwidth benchmarking purposes and when there’re several files per 
client

max_read_ahead_per_file_mb: the defa

Re: [lustre-discuss] Degraded read performance with Large Bulk IO (16MB RPC)

2019-12-13 Thread Moreno Diego (ID SIS)
From what I can see I think you just ran the wrong command (lctl list_param -R 
* ) or it doesn’t work as you expected on 2.12.3.

But llite params are sure there on a *mounted* Lustre client.

This will give you the parameters you’re looking for and need to modify to 
have, likely, better read performance:

lctl list_param -R llite | grep max_read_ahead


From: Pinkesh Valdria 
Date: Friday, 13 December 2019 at 17:33
To: "Moreno Diego (ID SIS)" , 
"lustre-discuss@lists.lustre.org" 
Subject: Re: [lustre-discuss] Degraded read performance with Large Bulk IO 
(16MB RPC)

This is how I installed lustre clients (only showing packages installed steps).


cat > /etc/yum.repos.d/lustre.repo << EOF
[hpddLustreserver]
name=CentOS- - Lustre
baseurl=https://downloads.whamcloud.com/public/lustre/latest-release/el7/server/
gpgcheck=0

[e2fsprogs]
name=CentOS- - Ldiskfs
baseurl=https://downloads.whamcloud.com/public/e2fsprogs/latest/el7/
gpgcheck=0

[hpddLustreclient]
name=CentOS- - Lustre
baseurl=https://downloads.whamcloud.com/public/lustre/latest-release/el7/client/
gpgcheck=0
EOF

yum  install  lustre-client  -y

reboot



From: "Moreno Diego (ID SIS)" 
Date: Friday, December 13, 2019 at 2:55 AM
To: Pinkesh Valdria , 
"lustre-discuss@lists.lustre.org" 
Subject: Re: [lustre-discuss] Degraded read performance with Large Bulk IO 
(16MB RPC)

From what I can see they exist on my 2.12.3 client node:

[root@rufus4 ~]# lctl list_param -R llite | grep max_read_ahead
llite.reprofs-9f7c3b4a8800.max_read_ahead_mb
llite.reprofs-9f7c3b4a8800.max_read_ahead_per_file_mb
llite.reprofs-9f7c3b4a8800.max_read_ahead_whole_mb

Regards,

Diego


From: Pinkesh Valdria 
Date: Wednesday, 11 December 2019 at 17:46
To: "Moreno Diego (ID SIS)" , 
"lustre-discuss@lists.lustre.org" 
Subject: Re: [lustre-discuss] Degraded read performance with Large Bulk IO 
(16MB RPC)

I was not able to find those parameters on my client nodes,  OSS or MGS nodes.  
 Here is how I was extracting all parameters .

mkdir -p lctl_list_param_R/
cd lctl_list_param_R/
lctl list_param -R *  > lctl_list_param_R

[opc@lustre-client-1 lctl_list_param_R]$ less lctl_list_param_R  | grep ahead
llite.lfsbv-98231c3bc000.statahead_agl
llite.lfsbv-98231c3bc000.statahead_max
llite.lfsbv-98231c3bc000.statahead_running_max
llite.lfsnvme-98232c30e000.statahead_agl
llite.lfsnvme-98232c30e000.statahead_max
llite.lfsnvme-98232c30e000.statahead_running_max
[opc@lustre-client-1 lctl_list_param_R]$

I also tried these commands:

Not working:
On client nodes
lctl get_param llite.lfsbv-*.max_read_ahead_mb
error: get_param: param_path 'llite/lfsbv-*/max_read_ahead_mb': No such file or 
directory
[opc@lustre-client-1 lctl_list_param_R]$

Works
On client nodes
lctl get_param llite.*.statahead_agl
llite.lfsbv-98231c3bc000.statahead_agl=1
llite.lfsnvme-98232c30e000.statahead_agl=1
[opc@lustre-client-1 lctl_list_param_R]$



From: "Moreno Diego (ID SIS)" 
Date: Tuesday, December 10, 2019 at 2:06 AM
To: Pinkesh Valdria , 
"lustre-discuss@lists.lustre.org" 
Subject: Re: [lustre-discuss] Degraded read performance with Large Bulk IO 
(16MB RPC)

With that kind of degradation performance on read I would immediately think on 
llite’s max_read_ahead parameters on the client. Specifically these 2:

max_read_ahead_mb: total amount of MB allocated for read ahead, usually quite 
low for bandwidth benchmarking purposes and when there’re several files per 
client
max_read_ahead_per_file_mb: the default is quite low for 16MB RPCs (only a few 
RPCs per file)

You probably need to check the effect increasing both of them.

Regards,

Diego


From: lustre-discuss  on behalf of 
Pinkesh Valdria 
Date: Tuesday, 10 December 2019 at 09:40
To: "lustre-discuss@lists.lustre.org" 
Subject: [lustre-discuss] Degraded read performance with Large Bulk IO (16MB 
RPC)

I was expecting better or same read performance with Large Bulk IO (16MB RPC),  
but I see degradation in performance.   Do I need to tune any other parameter 
to benefit from Large Bulk IO?   Appreciate if I can get any pointers to 
troubleshoot further.

Throughput before

-  Read:  2563 MB/s

-  Write:  2585 MB/s

Throughput after

-  Read:  1527 MB/s. (down by ~1025)

-  Write:  2859 MB/s


Changes I did are:
On oss

-  lctl set_param obdfilter.lfsbv-*.brw_size=16

On clients

-  unmounted and remounted

-  lctl set_param osc.lfsbv-OST*.max_pages_per_rpc=4096  (got 
auto-updated after re-mount)

-  lctl set_param osc.*.max_rpcs_in_flight=64   (Had to manually 
increase this to 64,  since after re-mount, it was auto-set to 8,  but 
read/write performance was poor)

-  lctl set_param osc.*.max_dirty_mb=2040. (setting the value to 2048 
was failing with : Numerical result out of range error.   Previously it was set 
to 2000 when I got good perfo

Re: [lustre-discuss] Degraded read performance with Large Bulk IO (16MB RPC)

2019-12-13 Thread Pinkesh Valdria
This is how I installed lustre clients (only showing packages installed steps). 

 

 

cat > /etc/yum.repos.d/lustre.repo << EOF

[hpddLustreserver]

name=CentOS- - Lustre

baseurl=https://downloads.whamcloud.com/public/lustre/latest-release/el7/server/

gpgcheck=0

 

[e2fsprogs]

name=CentOS- - Ldiskfs

baseurl=https://downloads.whamcloud.com/public/e2fsprogs/latest/el7/

gpgcheck=0

 

[hpddLustreclient]

name=CentOS- - Lustre

baseurl=https://downloads.whamcloud.com/public/lustre/latest-release/el7/client/

gpgcheck=0

EOF

 

yum  install  lustre-client  -y

 

reboot

 

 

 

From: "Moreno Diego (ID SIS)" 
Date: Friday, December 13, 2019 at 2:55 AM
To: Pinkesh Valdria , 
"lustre-discuss@lists.lustre.org" 
Subject: Re: [lustre-discuss] Degraded read performance with Large Bulk IO 
(16MB RPC)

 

>From what I can see they exist on my 2.12.3 client node:

 

[root@rufus4 ~]# lctl list_param -R llite | grep max_read_ahead

llite.reprofs-9f7c3b4a8800.max_read_ahead_mb

llite.reprofs-9f7c3b4a8800.max_read_ahead_per_file_mb

llite.reprofs-9f7c3b4a8800.max_read_ahead_whole_mb

 

Regards,

 

Diego

 

 

From: Pinkesh Valdria 
Date: Wednesday, 11 December 2019 at 17:46
To: "Moreno Diego (ID SIS)" , 
"lustre-discuss@lists.lustre.org" 
Subject: Re: [lustre-discuss] Degraded read performance with Large Bulk IO 
(16MB RPC)

 

I was not able to find those parameters on my client nodes,  OSS or MGS nodes.  
 Here is how I was extracting all parameters .  

 

mkdir -p lctl_list_param_R/

cd lctl_list_param_R/

lctl list_param -R *  > lctl_list_param_R

 

[opc@lustre-client-1 lctl_list_param_R]$ less lctl_list_param_R  | grep ahead

llite.lfsbv-98231c3bc000.statahead_agl

llite.lfsbv-98231c3bc000.statahead_max

llite.lfsbv-98231c3bc000.statahead_running_max

llite.lfsnvme-98232c30e000.statahead_agl

llite.lfsnvme-98232c30e000.statahead_max

llite.lfsnvme-98232c30e000.statahead_running_max

[opc@lustre-client-1 lctl_list_param_R]$

 

I also tried these commands:  

 

Not working: 

On client nodes

lctl get_param llite.lfsbv-*.max_read_ahead_mb

error: get_param: param_path 'llite/lfsbv-*/max_read_ahead_mb': No such file or 
directory

[opc@lustre-client-1 lctl_list_param_R]$

 

Works 

On client nodes

lctl get_param llite.*.statahead_agl

llite.lfsbv-98231c3bc000.statahead_agl=1

llite.lfsnvme-98232c30e000.statahead_agl=1

[opc@lustre-client-1 lctl_list_param_R]$

 

 

 

From: "Moreno Diego (ID SIS)" 
Date: Tuesday, December 10, 2019 at 2:06 AM
To: Pinkesh Valdria , 
"lustre-discuss@lists.lustre.org" 
Subject: Re: [lustre-discuss] Degraded read performance with Large Bulk IO 
(16MB RPC)

 

With that kind of degradation performance on read I would immediately think on 
llite’s max_read_ahead parameters on the client. Specifically these 2:

 

max_read_ahead_mb: total amount of MB allocated for read ahead, usually quite 
low for bandwidth benchmarking purposes and when there’re several files per 
client

max_read_ahead_per_file_mb: the default is quite low for 16MB RPCs (only a few 
RPCs per file)

 

You probably need to check the effect increasing both of them.

 

Regards,

 

Diego

 

 

From: lustre-discuss  on behalf of 
Pinkesh Valdria 
Date: Tuesday, 10 December 2019 at 09:40
To: "lustre-discuss@lists.lustre.org" 
Subject: [lustre-discuss] Degraded read performance with Large Bulk IO (16MB 
RPC)

 

I was expecting better or same read performance with Large Bulk IO (16MB RPC),  
but I see degradation in performance.   Do I need to tune any other parameter 
to benefit from Large Bulk IO?   Appreciate if I can get any pointers to 
troubleshoot further. 

 

Throughput before 

-  Read:  2563 MB/s

-  Write:  2585 MB/s

 

Throughput after

-  Read:  1527 MB/s. (down by ~1025)

-  Write:  2859 MB/s

 

 

Changes I did are: 

On oss

-  lctl set_param obdfilter.lfsbv-*.brw_size=16

 

On clients 

-  unmounted and remounted

-  lctl set_param osc.lfsbv-OST*.max_pages_per_rpc=4096  (got 
auto-updated after re-mount)

-  lctl set_param osc.*.max_rpcs_in_flight=64   (Had to manually 
increase this to 64,  since after re-mount, it was auto-set to 8,  but 
read/write performance was poor)

-  lctl set_param osc.*.max_dirty_mb=2040. (setting the value to 2048 
was failing with : Numerical result out of range error.   Previously it was set 
to 2000 when I got good performance. 

 

 

My other settings: 

-  lnetctl net add --net tcp1 --if $interface  –peer-timeout 180 
–peer-credits 128 –credits 1024

-  echo "options ksocklnd nscheds=10 sock_timeout=100 credits=2560 
peer_credits=63 enable_irq_affinity=0"  >  /etc/modprobe.d/ksocklnd.conf

-  lfs setstripe -c 1 -S 1M /mnt/mdt_bv/test1

 

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Degraded read performance with Large Bulk IO (16MB RPC)

2019-12-11 Thread Pinkesh Valdria
I was not able to find those parameters on my client nodes,  OSS or MGS nodes.  
 Here is how I was extracting all parameters .  

 

mkdir -p lctl_list_param_R/

cd lctl_list_param_R/

lctl list_param -R *  > lctl_list_param_R

 

[opc@lustre-client-1 lctl_list_param_R]$ less lctl_list_param_R  | grep ahead

llite.lfsbv-98231c3bc000.statahead_agl

llite.lfsbv-98231c3bc000.statahead_max

llite.lfsbv-98231c3bc000.statahead_running_max

llite.lfsnvme-98232c30e000.statahead_agl

llite.lfsnvme-98232c30e000.statahead_max

llite.lfsnvme-98232c30e000.statahead_running_max

[opc@lustre-client-1 lctl_list_param_R]$

 

I also tried these commands:  

 

Not working: 

On client nodes

lctl get_param llite.lfsbv-*.max_read_ahead_mb

error: get_param: param_path 'llite/lfsbv-*/max_read_ahead_mb': No such file or 
directory

[opc@lustre-client-1 lctl_list_param_R]$

 

Works 

On client nodes

lctl get_param llite.*.statahead_agl

llite.lfsbv-98231c3bc000.statahead_agl=1

llite.lfsnvme-98232c30e000.statahead_agl=1

[opc@lustre-client-1 lctl_list_param_R]$

 

 

 

From: "Moreno Diego (ID SIS)" 
Date: Tuesday, December 10, 2019 at 2:06 AM
To: Pinkesh Valdria , 
"lustre-discuss@lists.lustre.org" 
Subject: Re: [lustre-discuss] Degraded read performance with Large Bulk IO 
(16MB RPC)

 

With that kind of degradation performance on read I would immediately think on 
llite’s max_read_ahead parameters on the client. Specifically these 2:

 

max_read_ahead_mb: total amount of MB allocated for read ahead, usually quite 
low for bandwidth benchmarking purposes and when there’re several files per 
client

max_read_ahead_per_file_mb: the default is quite low for 16MB RPCs (only a few 
RPCs per file)

 

You probably need to check the effect increasing both of them.

 

Regards,

 

Diego

 

 

From: lustre-discuss  on behalf of 
Pinkesh Valdria 
Date: Tuesday, 10 December 2019 at 09:40
To: "lustre-discuss@lists.lustre.org" 
Subject: [lustre-discuss] Degraded read performance with Large Bulk IO (16MB 
RPC)

 

I was expecting better or same read performance with Large Bulk IO (16MB RPC),  
but I see degradation in performance.   Do I need to tune any other parameter 
to benefit from Large Bulk IO?   Appreciate if I can get any pointers to 
troubleshoot further. 

 

Throughput before 

-  Read:  2563 MB/s

-  Write:  2585 MB/s

 

Throughput after

-  Read:  1527 MB/s. (down by ~1025)

-  Write:  2859 MB/s

 

 

Changes I did are: 

On oss

-  lctl set_param obdfilter.lfsbv-*.brw_size=16

 

On clients 

-  unmounted and remounted

-  lctl set_param osc.lfsbv-OST*.max_pages_per_rpc=4096  (got 
auto-updated after re-mount)

-  lctl set_param osc.*.max_rpcs_in_flight=64   (Had to manually 
increase this to 64,  since after re-mount, it was auto-set to 8,  but 
read/write performance was poor)

-  lctl set_param osc.*.max_dirty_mb=2040. (setting the value to 2048 
was failing with : Numerical result out of range error.   Previously it was set 
to 2000 when I got good performance. 

 

 

My other settings: 

-  lnetctl net add --net tcp1 --if $interface  –peer-timeout 180 
–peer-credits 128 –credits 1024

-  echo "options ksocklnd nscheds=10 sock_timeout=100 credits=2560 
peer_credits=63 enable_irq_affinity=0"  >  /etc/modprobe.d/ksocklnd.conf

-  lfs setstripe -c 1 -S 1M /mnt/mdt_bv/test1

 

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Degraded read performance with Large Bulk IO (16MB RPC)

2019-12-10 Thread Moreno Diego (ID SIS)
With that kind of degradation performance on read I would immediately think on 
llite’s max_read_ahead parameters on the client. Specifically these 2:

max_read_ahead_mb: total amount of MB allocated for read ahead, usually quite 
low for bandwidth benchmarking purposes and when there’re several files per 
client
max_read_ahead_per_file_mb: the default is quite low for 16MB RPCs (only a few 
RPCs per file)

You probably need to check the effect increasing both of them.

Regards,

Diego


From: lustre-discuss  on behalf of 
Pinkesh Valdria 
Date: Tuesday, 10 December 2019 at 09:40
To: "lustre-discuss@lists.lustre.org" 
Subject: [lustre-discuss] Degraded read performance with Large Bulk IO (16MB 
RPC)

I was expecting better or same read performance with Large Bulk IO (16MB RPC),  
but I see degradation in performance.   Do I need to tune any other parameter 
to benefit from Large Bulk IO?   Appreciate if I can get any pointers to 
troubleshoot further.

Throughput before

-  Read:  2563 MB/s

-  Write:  2585 MB/s

Throughput after

-  Read:  1527 MB/s. (down by ~1025)

-  Write:  2859 MB/s


Changes I did are:
On oss

-  lctl set_param obdfilter.lfsbv-*.brw_size=16

On clients

-  unmounted and remounted

-  lctl set_param osc.lfsbv-OST*.max_pages_per_rpc=4096  (got 
auto-updated after re-mount)

-  lctl set_param osc.*.max_rpcs_in_flight=64   (Had to manually 
increase this to 64,  since after re-mount, it was auto-set to 8,  but 
read/write performance was poor)

-  lctl set_param osc.*.max_dirty_mb=2040. (setting the value to 2048 
was failing with : Numerical result out of range error.   Previously it was set 
to 2000 when I got good performance.


My other settings:

-  lnetctl net add --net tcp1 --if $interface  –peer-timeout 180 
–peer-credits 128 –credits 1024

-  echo "options ksocklnd nscheds=10 sock_timeout=100 credits=2560 
peer_credits=63 enable_irq_affinity=0"  >  /etc/modprobe.d/ksocklnd.conf

-  lfs setstripe -c 1 -S 1M /mnt/mdt_bv/test1

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Degraded read performance with Large Bulk IO (16MB RPC)

2019-12-10 Thread Pinkesh Valdria
I was expecting better or same read performance with Large Bulk IO (16MB RPC),  
but I see degradation in performance.   Do I need to tune any other parameter 
to benefit from Large Bulk IO?   Appreciate if I can get any pointers to 
troubleshoot further. 

 

Throughput before 
Read:  2563 MB/s
Write:  2585 MB/s
 

Throughput after
Read:  1527 MB/s. (down by ~1025)
Write:  2859 MB/s
 

 

Changes I did are: 

On oss
lctl set_param obdfilter.lfsbv-*.brw_size=16
 

On clients 
unmounted and remounted
lctl set_param osc.lfsbv-OST*.max_pages_per_rpc=4096  (got auto-updated after 
re-mount)
lctl set_param osc.*.max_rpcs_in_flight=64   (Had to manually increase this to 
64,  since after re-mount, it was auto-set to 8,  but read/write performance 
was poor)
lctl set_param osc.*.max_dirty_mb=2040. (setting the value to 2048 was failing 
with : Numerical result out of range error.   Previously it was set to 2000 
when I got good performance. 
 

 

My other settings: 
lnetctl net add --net tcp1 --if $interface  –peer-timeout 180 –peer-credits 128 
–credits 1024
echo "options ksocklnd nscheds=10 sock_timeout=100 credits=2560 peer_credits=63 
enable_irq_affinity=0"  >  /etc/modprobe.d/ksocklnd.conf
lfs setstripe -c 1 -S 1M /mnt/mdt_bv/test1
 

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org