Hi;

We had a couple of our Lustre clients get horked last night (one
o2ib, and one tcp), df on the clients reported 'Input/output error'
for the Lustre fs.

Here's what was in the syslog of our MDS/MGS regarding the two
nodes:

Feb 14 22:50:43 hpcmds kernel: LustreError: 
26165:0:(handler.c:1498:mds_handle()) operation 35 on unconnected MDS 
from [EMAIL PROTECTED]
Feb 14 22:50:43 hpcmds kernel: LustreError: 
26165:0:(ldlm_lib.c:1437:target_send_reply_msg()) @@@ processing error 
(-107)  [EMAIL PROTECTED] x7753405/t0 o35-><?>@<?>:-1 lens 296/0 ref 
0 fl Interpret:/0/0 rc -107/0
Feb 14 22:50:43 hpcmds kernel: LustreError: 
26165:0:(ldlm_lib.c:1437:target_send_reply_msg()) Skipped 9 previous 
similar messages
Feb 14 23:01:54 hpcmds kernel: LustreError: 
31054:0:(handler.c:1498:mds_handle()) operation 35 on unconnected MDS 
from [EMAIL PROTECTED]
Feb 14 23:01:54 hpcmds kernel: LustreError: 
31054:0:(ldlm_lib.c:1437:target_send_reply_msg()) @@@ processing error 
(-107)  [EMAIL PROTECTED] x73394313/t0 o35-><?>@<?>:-1 lens 296/0 ref 
0 fl Interpret:/0/0 rc -107/0

The corresponding syslog from one of the clients is appended below
(they were very similar).

Does anyone recognize this?  It says ENOTCONN, but there is no evidence
of anything being wrong with our ethernet or IB networks.  There aren't
any locking complaints.

We are running Lustre 1.6.3 (plus a couple of patches) with a combo
MGS/MDS.  Kernel is a fully patched 2.6.18-8.1.14el5 everywhere.

Thanks,
Craig


Feb 14 23:01:54 r5b-s14 kernel: LustreError: 11-0: an error occurred 
while communicating with [EMAIL PROTECTED] The mds_close operation 
failed with -107
Feb 14 23:01:54 r5b-s14 kernel: Lustre: 
ufhpc-MDT0000-mdc-ffff81022dfac400: Connection to service ufhpc-MDT0000 
via nid [EMAIL PROTECTED] was lost; in progress operations using this 
service will wait for recovery to complete.
Feb 14 23:01:54 r5b-s14 kernel: Lustre: Skipped 5 previous similar messages
Feb 14 23:01:54 r5b-s14 kernel: LustreError: 
17647:0:(file.c:97:ll_close_inode_openhandle()) inode 41118870 mdc close 
failed: rc = -4
Feb 14 23:01:54 r5b-s14 kernel: LustreError: 167-0: This client was 
evicted by ufhpc-MDT0000; in progress operations using this service will 
fail.
Feb 14 23:01:54 r5b-s14 kernel: LustreError: 
15785:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID 
[EMAIL PROTECTED] x73394323/t0 
o35->[EMAIL PROTECTED]@o2ib:12 lens 296/1736 ref 1 fl 
Rpc:/0/0 rc 0/0
Feb 14 23:01:54 r5b-s14 kernel: LustreError: 
15785:0:(file.c:97:ll_close_inode_openhandle()) inode 24445371 mdc close 
failed: rc = -108
Feb 14 23:01:54 r5b-s14 kernel: LustreError: 
17719:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID 
[EMAIL PROTECTED] x73394324/t0 
o35->[EMAIL PROTECTED]@o2ib:12 lens 296/1736 ref 1 fl 
Rpc:/0/0 rc 0/0
Feb 14 23:01:54 r5b-s14 kernel: LustreError: 
15785:0:(file.c:97:ll_close_inode_openhandle()) Skipped 12 previous 
similar messages
Feb 14 23:01:54 r5b-s14 kernel: LustreError: 
17640:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID 
[EMAIL PROTECTED] x73394331/t0 
o35->[EMAIL PROTECTED]@o2ib:12 lens 296/1736 ref 1 fl 
Rpc:/0/0 rc 0/0
Feb 14 23:01:54 r5b-s14 kernel: LustreError: 
17640:0:(client.c:519:ptlrpc_import_delay_req()) Skipped 6 previous 
similar messages
Feb 14 23:01:54 r5b-s14 kernel: LustreError: 
17640:0:(file.c:97:ll_close_inode_openhandle()) inode 41118867 mdc close 
failed: rc = -108
Feb 14 23:01:54 r5b-s14 kernel: LustreError: 
17640:0:(file.c:97:ll_close_inode_openhandle()) Skipped 3 previous 
similar messages
Feb 14 23:01:54 r5b-s14 kernel: LustreError: 
17254:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID 
[EMAIL PROTECTED] x73394332/t0 
o35->[EMAIL PROTECTED]@o2ib:12 lens 296/1736 ref 1 fl 
Rpc:/0/0 rc 0/0
Feb 14 23:01:54 r5b-s14 kernel: LustreError: 
17254:0:(file.c:97:ll_close_inode_openhandle()) inode 42611428 mdc close 
failed: rc = -108
Feb 14 23:01:54 r5b-s14 kernel: LustreError: 
17254:0:(file.c:97:ll_close_inode_openhandle()) Skipped 2 previous 
similar messages
Feb 14 23:01:54 r5b-s14 kernel: LustreError: 
17323:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID 
[EMAIL PROTECTED] x73394341/t0 
o35->[EMAIL PROTECTED]@o2ib:12 lens 296/1736 ref 1 fl 
Rpc:/0/0 rc 0/0
Feb 14 23:01:54 r5b-s14 kernel: LustreError: 
17323:0:(client.c:519:ptlrpc_import_delay_req()) Skipped 8 previous 
similar messages
Feb 14 23:01:54 r5b-s14 kernel: LustreError: 
17323:0:(file.c:97:ll_close_inode_openhandle()) inode 24249026 mdc close 
failed: rc = -108
Feb 14 23:01:54 r5b-s14 kernel: LustreError: 
17323:0:(file.c:97:ll_close_inode_openhandle()) Skipped 11 previous 
similar messages
Feb 14 23:02:35 r5b-s14 kernel: LustreError: 
5746:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID 
[EMAIL PROTECTED] x73394350/t0 
o35->[EMAIL PROTECTED]@o2ib:12 lens 296/1736 ref 1 fl 
Rpc:/0/0 rc 0/0
Feb 14 23:02:35 r5b-s14 kernel: LustreError: 
5746:0:(client.c:519:ptlrpc_import_delay_req()) Skipped 8 previous 
similar messages
Feb 14 23:02:35 r5b-s14 kernel: LustreError: 
5746:0:(file.c:97:ll_close_inode_openhandle()) inode 42195621 mdc close 
failed: rc = -108
Feb 14 23:02:35 r5b-s14 kernel: LustreError: 
5746:0:(file.c:97:ll_close_inode_openhandle()) Skipped 2 previous 
similar messages
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to