Hi,
Maybe, you can also try this :
https://github.com/quentinbouyer/topmdt
Le 28/05/2020 à 18:32, Chad DeWitt a écrit :
Hi Heath,
Hope you're doing well!
Your mileage may vary (and quite frankly, there may be better
approaches), but this is a quick and dirty set of steps to find which
client is issuing a large number of metadata operations.:
* Log into the affected MDS.
* Change into the exports directory.
cd /proc/fs/lustre/mdt//<Your affected MDT>//exports/
* OPTIONAL: Set all your stats to zero and clear out stale
clients. (If you don't want to do this step, you don't really
have to, but it does make it easier to see the stats if you
are starting with a clean slate. In fact, you may want to skip
this the first time through and just look for high numbers. If
a particular client is the source of the issue, the stats
should clearly be higher for that client when compared to the
others.)
echo "C" > clear
* Wait for a few seconds and dump the stats.
for client in $( ls -d */ ) ; do echo && echo && echo
${client} && cat ${client}/stats && echo ; done
You'll get a listing of stats for each mounted client like so:
open 278676 samples [reqs]
close 278629 samples [reqs]
mknod 2320 samples [reqs]
unlink 495 samples [reqs]
mkdir 575 samples [reqs]
rename 1534 samples [reqs]
getattr 277552 samples [reqs]
setattr 550 samples [reqs]
getxattr 2742 samples [reqs]
statfs 350058 samples [reqs]
samedir_rename 1534 samples [reqs]
(Don't worry if some of the clients give back what appears to be empty
stats. That just means they are mounted, but have not yet performed
any metadata operations.) From this data, you are looking for any
"high" samples. The client with the high samples is usually the
culprit. For the example client stats above, I would look to see what
process(es) on this client is listing, opening, and then closing files
in Lustre... The advantage with this method is you are seeing exactly
which metadata operations are occurring. (I know there are also
various utilities included with Lustre that may give this information
as well, but I just go to the source.)
Once you find the client, you can use various commands, such as mount
and lsof to get a better understanding of what may be hitting Lustre.
Some of the more common issues I've found that can cause a high MDS load:
* List a directory containing a large number of files. (Instead,
unalias ls or better yet, use lfs find.)
* Remove on many files.
* Open and close many files. (May be better to move the data over to
another file system, such as XFS, etc. We keep some of our deep
learning off Lustre, because of the sheer number of small files.)
Of course the actual mitigation of the load depends on what the user
is attempting to do...
I hope this helps...
Cheers,
Chad
------------------------------------------------------------
Chad DeWitt, CISSP
UNC Charlotte *| *ITS – University Research Computing
ccdew...@uncc.edu <mailto:ccdew...@uncc.edu> *| *www.uncc.edu
------------------------------------------------------------
If you are not the intended recipient of this transmission or a person
responsible for delivering it to the intended recipient, any
disclosure, copying, distribution, or other use of any of the
information in this transmission is strictly prohibited. If you have
received this transmission in error, please notify me immediately by
reply email or by telephone at 704-687-7802. Thank you.
On Thu, May 28, 2020 at 11:37 AM Peeples, Heath
<hea...@hpc.msstate.edu <mailto:hea...@hpc.msstate.edu>> wrote:
I have 2 MDSs and periodically on one of them (either at one time
or another) peak above 300, causing the file system to basically
stop. This lasts for a few minutes and then goes away. We can’t
identify any one user running jobs at the times we see this, so
it’s hard to pinpoint this on a user doing something to cause it.
Could anyone point me in the direction of how to begin debugging
this? Any help is greatly appreciated.
Heath
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
<mailto:lustre-discuss@lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org