Hi; We had a couple of our Lustre clients get horked last night (one o2ib, and one tcp), df on the clients reported 'Input/output error' for the Lustre fs.
Here's what was in the syslog of our MDS/MGS regarding the two nodes: Feb 14 22:50:43 hpcmds kernel: LustreError: 26165:0:(handler.c:1498:mds_handle()) operation 35 on unconnected MDS from [EMAIL PROTECTED] Feb 14 22:50:43 hpcmds kernel: LustreError: 26165:0:(ldlm_lib.c:1437:target_send_reply_msg()) @@@ processing error (-107) [EMAIL PROTECTED] x7753405/t0 o35-><?>@<?>:-1 lens 296/0 ref 0 fl Interpret:/0/0 rc -107/0 Feb 14 22:50:43 hpcmds kernel: LustreError: 26165:0:(ldlm_lib.c:1437:target_send_reply_msg()) Skipped 9 previous similar messages Feb 14 23:01:54 hpcmds kernel: LustreError: 31054:0:(handler.c:1498:mds_handle()) operation 35 on unconnected MDS from [EMAIL PROTECTED] Feb 14 23:01:54 hpcmds kernel: LustreError: 31054:0:(ldlm_lib.c:1437:target_send_reply_msg()) @@@ processing error (-107) [EMAIL PROTECTED] x73394313/t0 o35-><?>@<?>:-1 lens 296/0 ref 0 fl Interpret:/0/0 rc -107/0 The corresponding syslog from one of the clients is appended below (they were very similar). Does anyone recognize this? It says ENOTCONN, but there is no evidence of anything being wrong with our ethernet or IB networks. There aren't any locking complaints. We are running Lustre 1.6.3 (plus a couple of patches) with a combo MGS/MDS. Kernel is a fully patched 2.6.18-8.1.14el5 everywhere. Thanks, Craig Feb 14 23:01:54 r5b-s14 kernel: LustreError: 11-0: an error occurred while communicating with [EMAIL PROTECTED] The mds_close operation failed with -107 Feb 14 23:01:54 r5b-s14 kernel: Lustre: ufhpc-MDT0000-mdc-ffff81022dfac400: Connection to service ufhpc-MDT0000 via nid [EMAIL PROTECTED] was lost; in progress operations using this service will wait for recovery to complete. Feb 14 23:01:54 r5b-s14 kernel: Lustre: Skipped 5 previous similar messages Feb 14 23:01:54 r5b-s14 kernel: LustreError: 17647:0:(file.c:97:ll_close_inode_openhandle()) inode 41118870 mdc close failed: rc = -4 Feb 14 23:01:54 r5b-s14 kernel: LustreError: 167-0: This client was evicted by ufhpc-MDT0000; in progress operations using this service will fail. Feb 14 23:01:54 r5b-s14 kernel: LustreError: 15785:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID [EMAIL PROTECTED] x73394323/t0 o35->[EMAIL PROTECTED]@o2ib:12 lens 296/1736 ref 1 fl Rpc:/0/0 rc 0/0 Feb 14 23:01:54 r5b-s14 kernel: LustreError: 15785:0:(file.c:97:ll_close_inode_openhandle()) inode 24445371 mdc close failed: rc = -108 Feb 14 23:01:54 r5b-s14 kernel: LustreError: 17719:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID [EMAIL PROTECTED] x73394324/t0 o35->[EMAIL PROTECTED]@o2ib:12 lens 296/1736 ref 1 fl Rpc:/0/0 rc 0/0 Feb 14 23:01:54 r5b-s14 kernel: LustreError: 15785:0:(file.c:97:ll_close_inode_openhandle()) Skipped 12 previous similar messages Feb 14 23:01:54 r5b-s14 kernel: LustreError: 17640:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID [EMAIL PROTECTED] x73394331/t0 o35->[EMAIL PROTECTED]@o2ib:12 lens 296/1736 ref 1 fl Rpc:/0/0 rc 0/0 Feb 14 23:01:54 r5b-s14 kernel: LustreError: 17640:0:(client.c:519:ptlrpc_import_delay_req()) Skipped 6 previous similar messages Feb 14 23:01:54 r5b-s14 kernel: LustreError: 17640:0:(file.c:97:ll_close_inode_openhandle()) inode 41118867 mdc close failed: rc = -108 Feb 14 23:01:54 r5b-s14 kernel: LustreError: 17640:0:(file.c:97:ll_close_inode_openhandle()) Skipped 3 previous similar messages Feb 14 23:01:54 r5b-s14 kernel: LustreError: 17254:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID [EMAIL PROTECTED] x73394332/t0 o35->[EMAIL PROTECTED]@o2ib:12 lens 296/1736 ref 1 fl Rpc:/0/0 rc 0/0 Feb 14 23:01:54 r5b-s14 kernel: LustreError: 17254:0:(file.c:97:ll_close_inode_openhandle()) inode 42611428 mdc close failed: rc = -108 Feb 14 23:01:54 r5b-s14 kernel: LustreError: 17254:0:(file.c:97:ll_close_inode_openhandle()) Skipped 2 previous similar messages Feb 14 23:01:54 r5b-s14 kernel: LustreError: 17323:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID [EMAIL PROTECTED] x73394341/t0 o35->[EMAIL PROTECTED]@o2ib:12 lens 296/1736 ref 1 fl Rpc:/0/0 rc 0/0 Feb 14 23:01:54 r5b-s14 kernel: LustreError: 17323:0:(client.c:519:ptlrpc_import_delay_req()) Skipped 8 previous similar messages Feb 14 23:01:54 r5b-s14 kernel: LustreError: 17323:0:(file.c:97:ll_close_inode_openhandle()) inode 24249026 mdc close failed: rc = -108 Feb 14 23:01:54 r5b-s14 kernel: LustreError: 17323:0:(file.c:97:ll_close_inode_openhandle()) Skipped 11 previous similar messages Feb 14 23:02:35 r5b-s14 kernel: LustreError: 5746:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID [EMAIL PROTECTED] x73394350/t0 o35->[EMAIL PROTECTED]@o2ib:12 lens 296/1736 ref 1 fl Rpc:/0/0 rc 0/0 Feb 14 23:02:35 r5b-s14 kernel: LustreError: 5746:0:(client.c:519:ptlrpc_import_delay_req()) Skipped 8 previous similar messages Feb 14 23:02:35 r5b-s14 kernel: LustreError: 5746:0:(file.c:97:ll_close_inode_openhandle()) inode 42195621 mdc close failed: rc = -108 Feb 14 23:02:35 r5b-s14 kernel: LustreError: 5746:0:(file.c:97:ll_close_inode_openhandle()) Skipped 2 previous similar messages _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss