Re: [lustre-discuss] Question about max service threads

Andreas Dilger via lustre-discuss Wed, 22 Sep 2021 14:43:39 -0700

Both methods should produce equivalent numbers.  On my 2.14.0 system:

# ps auxww | grep mdt0
root      594183  0.0  0.0      0     0 ?        I    Sep09   0:41 [mdt00_000]
root      594184  0.0  0.0      0     0 ?        I    Sep09   0:34 [mdt00_001]
root      594185  0.0  0.0      0     0 ?        I    Sep09   0:37 [mdt00_002]
root      594288  0.0  0.0      0     0 ?        I    Sep09   0:36 [mdt00_003]
root      594664  0.0  0.0      0     0 ?        I    Sep09   0:25 [mdt00_004]
root      594665  0.0  0.0      0     0 ?        I    Sep09   0:32 [mdt00_005]
root      594667  0.0  0.0      0     0 ?        I    Sep09   0:41 [mdt00_006]
root      594668  0.0  0.0      0     0 ?        I    Sep09   0:30 [mdt00_007]
root      594670  0.0  0.0      0     0 ?        I    Sep09   0:30 [mdt00_008]
root      594673  0.0  0.0      0     0 ?        I    Sep09   0:37 [mdt00_009]
root      594680  0.0  0.0      0     0 ?        I    Sep09   0:40 [mdt00_010]
# lctl get_param mds.MDS.mdt.threads_started
mds.MDS.mdt.threads_started=11


This is the service thread for most MDT RPC requests.  Definitely increasing 
the number of "mdt" threads will improve metadata performance, as long as the 
underlying storage has IOPS for it.  Having too many threads would use more 
memory, and in the rare case of an HDD-based MDT this might cause excessive 
seeking (that is obviously not a problem for SSD/NVMe MDTs).

# ps auxww | grep mdt_rdpg
root      594186  0.0  0.0      0     0 ?        I    Sep09   0:02 
[mdt_rdpg00_000]
root      594187  0.0  0.0      0     0 ?        I    Sep09   0:02 
[mdt_rdpg00_001]
root      594663  0.0  0.0      0     0 ?        I    Sep09   0:03 
[mdt_rdpg00_002]
# lctl get_param mds.MDS.mdt_readpage.threads_started
mds.MDS.mdt_readpage.threads_started=3

This service is for bulk readdir RPCs.

Cheers, Andreas

On Sep 22, 2021, at 15:16, Houkun Zhu 
<diskun....@gmail.com<mailto:diskun....@gmail.com>> wrote:

I’m running lustre 2.12.7. The workload I was running was generated by fio, 
i.e., 6 process send I/O requests to the server. As I could see the non-trivial 
performance difference when I increase the parameter mds.MDS.mdt.threads_max, I 
assume it played a role in the performance.

Thanks to the tip from Patrick, I just executed command ps axu|grep mdt and got 
the following result,

root     17654  0.0  0.0      0     0 ?        S    Aug07   0:13 [mdt00_006]
root     18672  0.0  0.0      0     0 ?        S    Aug07   0:21 [mdt00_007]
root     18902  0.0  0.0      0     0 ?        S    Aug07   0:11 [mdt00_008]
root     23778  0.0  0.0      0     0 ?        S    Sep21   0:46 [mdt01_003]
root     24032  0.0  0.0      0     0 ?        S    Sep21   0:14 
[mdt_rdpg01_002]
root     25292  0.0  0.0      0     0 ?        S    Sep21   0:34 [mdt01_004]
root     25293  0.0  0.0      0     0 ?        S    Sep21   0:36 [mdt01_005]
root     25294  0.0  0.0      0     0 ?        S    Sep21   0:35 [mdt01_006]
root     25295  0.0  0.0      0     0 ?        S    Sep21   0:37 [mdt01_007]
root     25296  0.0  0.0      0     0 ?        S    Sep21   0:36 [mdt01_008]
root     25297  0.0  0.0      0     0 ?        S    Sep21   0:11 
[mdt_rdpg01_003]
root     25298  0.0  0.0      0     0 ?        S    Sep21   0:38 [mdt01_009]
root     25299  0.0  0.0      0     0 ?        S    Sep21   0:38 [mdt01_010]
root     25301  0.0  0.0      0     0 ?        S    Sep21   0:11 
[mdt_rdpg01_004]
root     25302  0.0  0.0      0     0 ?        S    Sep21   0:37 [mdt01_011]
root     25303  0.0  0.0      0     0 ?        S    Sep21   0:35 [mdt01_012]
root     25304  0.0  0.0      0     0 ?        S    Sep21   0:11 
[mdt_rdpg01_005]
root     25370  0.0  0.0      0     0 ?        S    Sep21   0:11 
[mdt_rdpg01_006]
root     25375  0.0  0.0      0     0 ?        S    Sep21   0:09 
[mdt_rdpg01_007]
root     29073  0.0  0.0      0     0 ?        S    Aug26   0:00 
[mdt_rdpg00_003]

Could I verify my assumption by counting the number of process  mdt\d\d_\d*?

Best regards,
Houkun

On 22. Sep 2021, at 21:21, Andreas Dilger 
<adil...@whamcloud.com<mailto:adil...@whamcloud.com>> wrote:

What version of Lustre are you running?  I tested with 2.14.0 and observed that 
*.*.threads_started increased and (eventually) decreased as the service threads 
were being used.  Note that the "*.*.threads_max" parameter is the *maximum* 
number of threads for a particular service (e.g. ost.OSS.ost_io.* is for bulk 
read/write IO operations, while ost.OSS.ost.* is for most other OST 
operations).  New threads are only started if the number of incoming requests 
in the queue exceeds the number of currently running threads, so if the 
requests are processed quickly and/or there are not enough clients generating 
RPCs, then new threads will not be started beyond the number to manage the 
current workload.

For example, I had reduced ost_io.threads_max=16 on my home filesystemyesterday 
to verify that the threads eventually stopped (that needed some ongoing IO 
workload until the higher-numbered threads processed a request and were the 
last thread running (see comment at ptlrpc_thread_should_stop() for details):

# lctl get_param ost.OSS.ost_io.threads*
ost.OSS.ost_io.threads_max=16
ost.OSS.ost_io.threads_min=3
ost.OSS.ost_io.threads_started=16

When I increased threads_max=32 and ran a parallel IO workload on a client it 
increased the threads_started, but wasn't able to generate enough RPCs in 
flight to hit the maximum number of threads:

# lctl get_param ost.OSS.ost_io.threads*
ost.OSS.ost_io.threads_max=32
ost.OSS.ost_io.threads_min=3
ost.OSS.ost_io.threads_started=26

On Sep 22, 2021, at 11:37, Houkun Zhu 
<diskun....@gmail.com<mailto:diskun....@gmail.com>> wrote:

Hi Andres,

Thanks a lot for your help. I actually record the parameter 
mds.MDS.mdt.threads_started. But its value never changes. However, I can 
observe the performance difference (i.e., throughput is increased tremendously) 
when I set a higher value of threads_max for mds.


Best regards,
Houkun

On 22. Sep 2021, at 07:21, Andreas Dilger 
<adil...@whamcloud.com<mailto:adil...@whamcloud.com>> wrote:

There is actually a parameter for this:

$ lctl get_param ost.OSS.*.thread*
ost.OSS.ost.threads_max=16
ost.OSS.ost.threads_min=3
ost.OSS.ost.threads_started=16
ost.OSS.ost_create.threads_max=10
ost.OSS.ost_create.threads_min=2
ost.OSS.ost_create.threads_started=3
ost.OSS.ost_io.threads_max=16
ost.OSS.ost_io.threads_min=3
ost.OSS.ost_io.threads_started=16
ost.OSS.ost_out.threads_max=10
ost.OSS.ost_out.threads_min=2
ost.OSS.ost_out.threads_started=2
ost.OSS.ost_seq.threads_max=10
ost.OSS.ost_seq.threads_min=2
ost.OSS.ost_seq.threads_started=2

$ lctl get_param mds.MDS.*.thread*
mds.MDS.mdt.threads_max=80
mds.MDS.mdt.threads_min=3
mds.MDS.mdt.threads_started=11
mds.MDS.mdt_fld.threads_max=256
mds.MDS.mdt_fld.threads_min=2
mds.MDS.mdt_fld.threads_started=3
mds.MDS.mdt_io.threads_max=80
mds.MDS.mdt_io.threads_min=3
mds.MDS.mdt_io.threads_started=4
mds.MDS.mdt_out.threads_max=80
mds.MDS.mdt_out.threads_min=2
mds.MDS.mdt_out.threads_started=2
mds.MDS.mdt_readpage.threads_max=56
mds.MDS.mdt_readpage.threads_min=2
mds.MDS.mdt_readpage.threads_started=3
mds.MDS.mdt_seqm.threads_max=256
mds.MDS.mdt_seqm.threads_min=2
mds.MDS.mdt_seqm.threads_started=2
mds.MDS.mdt_seqs.threads_max=256
mds.MDS.mdt_seqs.threads_min=2
mds.MDS.mdt_seqs.threads_started=2
mds.MDS.mdt_setattr.threads_max=56
mds.MDS.mdt_setattr.threads_min=2
mds.MDS.mdt_setattr.threads_started=2


On Sep 21, 2021, at 19:21, Patrick Farrell 
<pfarr...@ddn.com<mailto:pfarr...@ddn.com>> wrote:

“
Though I can wait for the number threads to automatically decrease, I didn’t 
find ways which can really indicate the current running threads. I’ve tried 
thread_started (e.g., lctl get_param mds.MDS.mdt.threads_,started). But this 
param doesn’t change. ”

I don’t think Lustre exposes a stat which gives *current* count of worker 
threads.  I’ve always used ps, grep, and wc -l to answer that question :)
From: lustre-discuss 
<lustre-discuss-boun...@lists.lustre.org<mailto:lustre-discuss-boun...@lists.lustre.org>>
 on behalf of Andreas Dilger via lustre-discuss 
<lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>>
Sent: Tuesday, September 21, 2021 8:03 PM
To: Houkun Zhu <diskun....@gmail.com<mailto:diskun....@gmail.com>>
Cc: lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org> 
<lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>>
Subject: Re: [lustre-discuss] Question about max service threads

Hello Houkun,
There was patch https://review.whamcloud.com/34400 "LU-947 ptlrpc: allow 
stopping threads above threads_max" landed for the 2.13 release. You could 
apply this patch to your 2.12 release, or test with 2.14.0. Note that this 
patch only lazily stops threads as they become idle, so there is no guarantee 
that they will all stop immediately when the parameter is changed. It may be 
some time and processed RPCs before the higher-numbered threads exit.

It might be possible to wake up all of the threads when the threads_max 
parameter is reduced, to have them check for this condition and exit. However, 
this is a very unlikely condition under normal usage.

I would recommend to test with increasing the thread count, rather than 
decreasing it...

Cheers, Andreas

On Sep 20, 2021, at 02:29, Houkun Zhu via lustre-discuss 
<lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>> wrote:


Hi guys,

I’m creating an automatic lustre performance tuning system. But I find it’s 
hard to tune parameter regarding  max service threads because it seems there is 
only guarantee of max threads when we increase the parameter. I’ve found a 
similar discussion from 2011, is there any updates?

Though I can wait for the number threads to automatically decrease, I didn’t 
find ways which can really indicate the current running threads. I’ve tried 
thread_started (e.g., lctl get_param mds.MDS.mdt.threads_,started). But this 
param doesn’t change.

Looking forward to your help! Thank you in advance!

Best regards,
Houkun

_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud









Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud









Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud

_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Question about max service threads

Reply via email to