Have you tried lfs check servers on the login node?

Sent from my iPhone

On Oct 11, 2021, at 2:58 AM, Sid Young via lustre-discuss 
<lustre-discuss@lists.lustre.org> wrote:



I'm having trouble diagnosing where the problem lies in  my Lustre 
installation, clients are 2.12.6 and I have a /home and /lustre filesystems 
using Lustre.

/home has 4 OSTs and /lustre is made up of 6 OSTs. lfs df shows all OSTs as 
ACTIVE.

The /lustre file system appears fine, I can ls into every directory.

When people log into the login node, it appears to lockup. I have shut down 
everything and remounted the OSTs and MDTs etc in order with no errors 
reporting but I'm getting the lockup issue soon after a few people log in.
The backend network is 100G Ethernet using ConnectX5 cards and the OS is Cento 
7.9, everything was installed as RPMs and updates are disabled in yum.conf

Two questions to start with:
Is there a command line tool to check each OST individually?
Apart from /var/log/messages, is there a lustre specific log I can monitor on 
the login node to see errors when I hit /home...



Sid Young
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to