Both methods should produce equivalent numbers. On my 2.14.0 system: # ps auxww | grep mdt0 root 594183 0.0 0.0 0 0 ? I Sep09 0:41 [mdt00_000] root 594184 0.0 0.0 0 0 ? I Sep09 0:34 [mdt00_001] root 594185 0.0 0.0 0 0 ? I Sep09 0:37 [mdt00_002] root 594288 0.0 0.0 0 0 ? I Sep09 0:36 [mdt00_003] root 594664 0.0 0.0 0 0 ? I Sep09 0:25 [mdt00_004] root 594665 0.0 0.0 0 0 ? I Sep09 0:32 [mdt00_005] root 594667 0.0 0.0 0 0 ? I Sep09 0:41 [mdt00_006] root 594668 0.0 0.0 0 0 ? I Sep09 0:30 [mdt00_007] root 594670 0.0 0.0 0 0 ? I Sep09 0:30 [mdt00_008] root 594673 0.0 0.0 0 0 ? I Sep09 0:37 [mdt00_009] root 594680 0.0 0.0 0 0 ? I Sep09 0:40 [mdt00_010] # lctl get_param mds.MDS.mdt.threads_started mds.MDS.mdt.threads_started=11
This is the service thread for most MDT RPC requests. Definitely increasing the number of "mdt" threads will improve metadata performance, as long as the underlying storage has IOPS for it. Having too many threads would use more memory, and in the rare case of an HDD-based MDT this might cause excessive seeking (that is obviously not a problem for SSD/NVMe MDTs). # ps auxww | grep mdt_rdpg root 594186 0.0 0.0 0 0 ? I Sep09 0:02 [mdt_rdpg00_000] root 594187 0.0 0.0 0 0 ? I Sep09 0:02 [mdt_rdpg00_001] root 594663 0.0 0.0 0 0 ? I Sep09 0:03 [mdt_rdpg00_002] # lctl get_param mds.MDS.mdt_readpage.threads_started mds.MDS.mdt_readpage.threads_started=3 This service is for bulk readdir RPCs. Cheers, Andreas On Sep 22, 2021, at 15:16, Houkun Zhu <diskun....@gmail.com<mailto:diskun....@gmail.com>> wrote: I’m running lustre 2.12.7. The workload I was running was generated by fio, i.e., 6 process send I/O requests to the server. As I could see the non-trivial performance difference when I increase the parameter mds.MDS.mdt.threads_max, I assume it played a role in the performance. Thanks to the tip from Patrick, I just executed command ps axu|grep mdt and got the following result, root 17654 0.0 0.0 0 0 ? S Aug07 0:13 [mdt00_006] root 18672 0.0 0.0 0 0 ? S Aug07 0:21 [mdt00_007] root 18902 0.0 0.0 0 0 ? S Aug07 0:11 [mdt00_008] root 23778 0.0 0.0 0 0 ? S Sep21 0:46 [mdt01_003] root 24032 0.0 0.0 0 0 ? S Sep21 0:14 [mdt_rdpg01_002] root 25292 0.0 0.0 0 0 ? S Sep21 0:34 [mdt01_004] root 25293 0.0 0.0 0 0 ? S Sep21 0:36 [mdt01_005] root 25294 0.0 0.0 0 0 ? S Sep21 0:35 [mdt01_006] root 25295 0.0 0.0 0 0 ? S Sep21 0:37 [mdt01_007] root 25296 0.0 0.0 0 0 ? S Sep21 0:36 [mdt01_008] root 25297 0.0 0.0 0 0 ? S Sep21 0:11 [mdt_rdpg01_003] root 25298 0.0 0.0 0 0 ? S Sep21 0:38 [mdt01_009] root 25299 0.0 0.0 0 0 ? S Sep21 0:38 [mdt01_010] root 25301 0.0 0.0 0 0 ? S Sep21 0:11 [mdt_rdpg01_004] root 25302 0.0 0.0 0 0 ? S Sep21 0:37 [mdt01_011] root 25303 0.0 0.0 0 0 ? S Sep21 0:35 [mdt01_012] root 25304 0.0 0.0 0 0 ? S Sep21 0:11 [mdt_rdpg01_005] root 25370 0.0 0.0 0 0 ? S Sep21 0:11 [mdt_rdpg01_006] root 25375 0.0 0.0 0 0 ? S Sep21 0:09 [mdt_rdpg01_007] root 29073 0.0 0.0 0 0 ? S Aug26 0:00 [mdt_rdpg00_003] Could I verify my assumption by counting the number of process mdt\d\d_\d*? Best regards, Houkun On 22. Sep 2021, at 21:21, Andreas Dilger <adil...@whamcloud.com<mailto:adil...@whamcloud.com>> wrote: What version of Lustre are you running? I tested with 2.14.0 and observed that *.*.threads_started increased and (eventually) decreased as the service threads were being used. Note that the "*.*.threads_max" parameter is the *maximum* number of threads for a particular service (e.g. ost.OSS.ost_io.* is for bulk read/write IO operations, while ost.OSS.ost.* is for most other OST operations). New threads are only started if the number of incoming requests in the queue exceeds the number of currently running threads, so if the requests are processed quickly and/or there are not enough clients generating RPCs, then new threads will not be started beyond the number to manage the current workload. For example, I had reduced ost_io.threads_max=16 on my home filesystemyesterday to verify that the threads eventually stopped (that needed some ongoing IO workload until the higher-numbered threads processed a request and were the last thread running (see comment at ptlrpc_thread_should_stop() for details): # lctl get_param ost.OSS.ost_io.threads* ost.OSS.ost_io.threads_max=16 ost.OSS.ost_io.threads_min=3 ost.OSS.ost_io.threads_started=16 When I increased threads_max=32 and ran a parallel IO workload on a client it increased the threads_started, but wasn't able to generate enough RPCs in flight to hit the maximum number of threads: # lctl get_param ost.OSS.ost_io.threads* ost.OSS.ost_io.threads_max=32 ost.OSS.ost_io.threads_min=3 ost.OSS.ost_io.threads_started=26 On Sep 22, 2021, at 11:37, Houkun Zhu <diskun....@gmail.com<mailto:diskun....@gmail.com>> wrote: Hi Andres, Thanks a lot for your help. I actually record the parameter mds.MDS.mdt.threads_started. But its value never changes. However, I can observe the performance difference (i.e., throughput is increased tremendously) when I set a higher value of threads_max for mds. Best regards, Houkun On 22. Sep 2021, at 07:21, Andreas Dilger <adil...@whamcloud.com<mailto:adil...@whamcloud.com>> wrote: There is actually a parameter for this: $ lctl get_param ost.OSS.*.thread* ost.OSS.ost.threads_max=16 ost.OSS.ost.threads_min=3 ost.OSS.ost.threads_started=16 ost.OSS.ost_create.threads_max=10 ost.OSS.ost_create.threads_min=2 ost.OSS.ost_create.threads_started=3 ost.OSS.ost_io.threads_max=16 ost.OSS.ost_io.threads_min=3 ost.OSS.ost_io.threads_started=16 ost.OSS.ost_out.threads_max=10 ost.OSS.ost_out.threads_min=2 ost.OSS.ost_out.threads_started=2 ost.OSS.ost_seq.threads_max=10 ost.OSS.ost_seq.threads_min=2 ost.OSS.ost_seq.threads_started=2 $ lctl get_param mds.MDS.*.thread* mds.MDS.mdt.threads_max=80 mds.MDS.mdt.threads_min=3 mds.MDS.mdt.threads_started=11 mds.MDS.mdt_fld.threads_max=256 mds.MDS.mdt_fld.threads_min=2 mds.MDS.mdt_fld.threads_started=3 mds.MDS.mdt_io.threads_max=80 mds.MDS.mdt_io.threads_min=3 mds.MDS.mdt_io.threads_started=4 mds.MDS.mdt_out.threads_max=80 mds.MDS.mdt_out.threads_min=2 mds.MDS.mdt_out.threads_started=2 mds.MDS.mdt_readpage.threads_max=56 mds.MDS.mdt_readpage.threads_min=2 mds.MDS.mdt_readpage.threads_started=3 mds.MDS.mdt_seqm.threads_max=256 mds.MDS.mdt_seqm.threads_min=2 mds.MDS.mdt_seqm.threads_started=2 mds.MDS.mdt_seqs.threads_max=256 mds.MDS.mdt_seqs.threads_min=2 mds.MDS.mdt_seqs.threads_started=2 mds.MDS.mdt_setattr.threads_max=56 mds.MDS.mdt_setattr.threads_min=2 mds.MDS.mdt_setattr.threads_started=2 On Sep 21, 2021, at 19:21, Patrick Farrell <pfarr...@ddn.com<mailto:pfarr...@ddn.com>> wrote: “ Though I can wait for the number threads to automatically decrease, I didn’t find ways which can really indicate the current running threads. I’ve tried thread_started (e.g., lctl get_param mds.MDS.mdt.threads_,started). But this param doesn’t change. ” I don’t think Lustre exposes a stat which gives *current* count of worker threads. I’ve always used ps, grep, and wc -l to answer that question :) From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org<mailto:lustre-discuss-boun...@lists.lustre.org>> on behalf of Andreas Dilger via lustre-discuss <lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>> Sent: Tuesday, September 21, 2021 8:03 PM To: Houkun Zhu <diskun....@gmail.com<mailto:diskun....@gmail.com>> Cc: lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org> <lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>> Subject: Re: [lustre-discuss] Question about max service threads Hello Houkun, There was patch https://review.whamcloud.com/34400 "LU-947 ptlrpc: allow stopping threads above threads_max" landed for the 2.13 release. You could apply this patch to your 2.12 release, or test with 2.14.0. Note that this patch only lazily stops threads as they become idle, so there is no guarantee that they will all stop immediately when the parameter is changed. It may be some time and processed RPCs before the higher-numbered threads exit. It might be possible to wake up all of the threads when the threads_max parameter is reduced, to have them check for this condition and exit. However, this is a very unlikely condition under normal usage. I would recommend to test with increasing the thread count, rather than decreasing it... Cheers, Andreas On Sep 20, 2021, at 02:29, Houkun Zhu via lustre-discuss <lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>> wrote: Hi guys, I’m creating an automatic lustre performance tuning system. But I find it’s hard to tune parameter regarding max service threads because it seems there is only guarantee of max threads when we increase the parameter. I’ve found a similar discussion from 2011, is there any updates? Though I can wait for the number threads to automatically decrease, I didn’t find ways which can really indicate the current running threads. I’ve tried thread_started (e.g., lctl get_param mds.MDS.mdt.threads_,started). But this param doesn’t change. Looking forward to your help! Thank you in advance! Best regards, Houkun _______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org Cheers, Andreas -- Andreas Dilger Lustre Principal Architect Whamcloud Cheers, Andreas -- Andreas Dilger Lustre Principal Architect Whamcloud Cheers, Andreas -- Andreas Dilger Lustre Principal Architect Whamcloud
_______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org