Dear Lustre Community,

from time to time we have the situation, 
that on combinations of MDT+OST no new file can be created on Lustre.

We can observe that, if at least a MDS is showing up high load and suspicious 
errors in the kernel logs.

We would like to question if that is a known bug/error?

For the recently happened crash of a file server that also impacted two 
metadata server 
that then showed up the described problem, we can provide kernel and also 
Lustre logs for the MDS 
for debugging purposes.

In a more detail:

Shortly before the crash of that OSS, new file creations on its OSTs in 
combination with two MDS started failing...

During the repair of the OSS we set max_create_count=0 and return to 
max_create_count=20000 after successful recovery. 
No further errors are observed in either MDS or OSS logs, but in that case MDS1 
does not create any new files and 
MDS2 creates new files on few OSTs of that repaired OSS (but not all are 
failing as with MDS1).

Usually our workaround is to reactivate that particular OSTs on the MDS via 
`lctl set_param osp.lustrefs-${OST}-osc-*.active=(from 0 to 1)`.

Currently on all MDS the following is set:  

Checking max_create_count on all MDS:

* max_create_count=20000
* to all OSTs active=1

And still we got the following list of failing file creations on Lustre:

MDT0-OST[259-265]
MDT1-OST[260-262]

We are running Lustre 2.12.5.

Any help would be appreciated!


Best
Gabriele
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to