Please don't reply to lustre-devel. Instead, comment in Bugzilla by using the 
following link:
https://bugzilla.lustre.org/show_bug.cgi?id=11511



I don't have a crystal clear reproducer -- other than to say an application
(obviously) gets a bunch of flock locks and then dies. It seems the common
threads is that they are all flock'ing the same resource on the MDS - so
probably only one client gets a granted lock, the rest are waiting. Once the
application is dead, we come in with llrd to clean these nids up and do the
evictions. I am sure we are only going to see more of these. It should be quite
easy to write an MPI test app that would do a bunch of flock enqueues on a
single resource and then fall over dead (segfault, etc)

It does seem that we are killing a node with the lock held, which gets the
completion AST sent to the client (which seems silly, given that we _know_ one
of the clients is dead) and then when that AST timesout, we release that lock
and reprocess the queue of pending locks for that resource. 

I understand there isn't much we can do, given that llrd only gives us a single
nid at once. We *could* utlize the evict nid by list changes that are floating
around somewhere in Bugzilla and update llrd to use them. I do not know if there
is a limit to the number of nids we can write into this proc file -- but we
certainly need to know. This would give Lustre a single look at all the nids we
are trying to kill. If Lustre could then mark each as
"ADMIN_EVICTION_IN_PROGRESS" before it started cleaning up granted locks, etc
the various paths that would send RPCs to these clients could be prevented from
taking too much time.

Also -- it should be possible to look at the time spent waiting for the flock
locks and if it was > obd_timeout (from request sent to being actually granted),
dump the request as old. I believe this is similiar to the approach for bug 
11330.

_______________________________________________
Lustre-devel mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-devel

Reply via email to