Hi, I'm using a mixed environment of 1.8.0.1 MDS and 1.6.6 OSS's (had a problem with qlogic drivers and rolled back to 1.6.6). My MDS get unresponsive each day at 4-5 am local time, no kernel panic or error messages before. Some errors and an LBUG appear in the log after force booting the MDS and mounting the MDT and then the log is clear until next morning:
Jan 4 06:27:32 tech-mds kernel: LustreError: 6290:0: (ldlm_lib.c:884:target_handle_connect()) technion-MDT0000: denying connection for new client 192.114.101...@tcp (ab671897-b1e2-76d3-b661-7b87e82d23e7): 34 clients in recovery for 337s Jan 4 06:27:32 tech-mds kernel: LustreError: 6290:0: (ldlm_lib.c:1826:target_send_reply_msg()) @@@ processing error (-16) r...@ffff81006f99cc00 x1323646107950586/t0 o38-><?>@<?>:0/0 lens 368/264 e 0 to 0 dl 1262579352 ref 1 fl Interpret:/0/0 rc -16/0 Jan 4 06:27:41 tech-mds kernel: Lustre: 6280:0: (ldlm_lib.c:1718:target_queue_last_replay_reply()) technion-MDT0000: 33 recoverable clients remain Jan 4 06:27:57 tech-mds kernel: LustreError: 6284:0: (ldlm_lib.c:884:target_handle_connect()) technion-MDT0000: denying connection for new client 192.114.101...@tcp (ab671897-b1e2-76d3-b661-7b87e82d23e7): 33 clients in recovery for 312s Jan 4 06:27:57 tech-mds kernel: LustreError: 6284:0: (ldlm_lib.c:1826:target_send_reply_msg()) @@@ processing error (-16) r...@ffff81011c69d400 x1323646107950600/t0 o38-><?>@<?>:0/0 lens 368/264 e 0 to 0 dl 1262579377 ref 1 fl Interpret:/0/0 rc -16/0 Jan 4 06:28:22 tech-mds kernel: LustreError: 6302:0: (ldlm_lib.c:884:target_handle_connect()) technion-MDT0000: denying connection for new client 192.114.101...@tcp (ab671897-b1e2-76d3-b661-7b87e82d23e7): 33 clients in recovery for 287s Jan 4 06:28:22 tech-mds kernel: LustreError: 6302:0: (ldlm_lib.c:1826:target_send_reply_msg()) @@@ processing error (-16) r...@ffff81006fa4e000 x1323646107950612/t0 o38-><?>@<?>:0/0 lens 368/264 e 0 to 0 dl 1262579402 ref 1 fl Interpret:/0/0 rc -16/0 Jan 4 06:28:47 tech-mds kernel: LustreError: 6305:0: (ldlm_lib.c:884:target_handle_connect()) technion-MDT0000: denying connection for new client 192.114.101...@tcp (ab671897-b1e2-76d3-b661-7b87e82d23e7): 33 clients in recovery for 262s Jan 4 06:28:47 tech-mds kernel: LustreError: 6305:0: (ldlm_lib.c:1826:target_send_reply_msg()) @@@ processing error (-16) r...@ffff81011c69d800 x1323646107950624/t0 o38-><?>@<?>:0/0 lens 368/264 e 0 to 0 dl 1262579427 ref 1 fl Interpret:/0/0 rc -16/0 Jan 4 06:29:01 tech-mds ntpd[5999]: synchronized to 132.68.238.40, stratum 2 Jan 4 06:29:01 tech-mds ntpd[5999]: kernel time sync enabled 0001 Jan 4 06:29:12 tech-mds kernel: LustreError: 6278:0: (ldlm_lib.c:884:target_handle_connect()) technion-MDT0000: denying connection for new client 192.114.101...@tcp (ab671897-b1e2-76d3-b661-7b87e82d23e7): 33 clients in recovery for 237s Jan 4 06:29:12 tech-mds kernel: LustreError: 6278:0: (ldlm_lib.c:1826:target_send_reply_msg()) @@@ processing error (-16) r...@ffff81007053ac00 x1323646107950636/t0 o38-><?>@<?>:0/0 lens 368/264 e 0 to 0 dl 1262579452 ref 1 fl Interpret:/0/0 rc -16/0 Jan 4 06:29:37 tech-mds kernel: LustreError: 6293:0: (ldlm_lib.c:884:target_handle_connect()) technion-MDT0000: denying connection for new client 192.114.101...@tcp (ab671897-b1e2-76d3-b661-7b87e82d23e7): 33 clients in recovery for 212s Jan 4 06:29:37 tech-mds kernel: LustreError: 6293:0: (ldlm_lib.c:1826:target_send_reply_msg()) @@@ processing error (-16) r...@ffff81006f8a7000 x1323646107950648/t0 o38-><?>@<?>:0/0 lens 368/264 e 0 to 0 dl 1262579477 ref 1 fl Interpret:/0/0 rc -16/0 Jan 4 06:30:02 tech-mds kernel: LustreError: 6277:0: (ldlm_lib.c:884:target_handle_connect()) technion-MDT0000: denying connection for new client 192.114.101...@tcp (ab671897-b1e2-76d3-b661-7b87e82d23e7): 33 clients in recovery for 187s Jan 4 06:30:02 tech-mds kernel: LustreError: 6277:0: (ldlm_lib.c:1826:target_send_reply_msg()) @@@ processing error (-16) r...@ffff81010bb61000 x1323646107950660/t0 o38-><?>@<?>:0/0 lens 368/264 e 0 to 0 dl 1262579502 ref 1 fl Interpret:/0/0 rc -16/0 Jan 4 06:30:27 tech-mds kernel: LustreError: 6300:0: (ldlm_lib.c:884:target_handle_connect()) technion-MDT0000: denying connection for new client 192.114.101...@tcp (ab671897-b1e2-76d3-b661-7b87e82d23e7): 33 clients in recovery for 162s Jan 4 06:30:52 tech-mds kernel: LustreError: 6281:0: (ldlm_lib.c:1826:target_send_reply_msg()) @@@ processing error (-16) r...@ffff81006f8fd400 x1323646107950684/t0 o38-><?>@<?>:0/0 lens 368/264 e 0 to 0 dl 1262579552 ref 1 fl Interpret:/0/0 rc -16/0 Jan 4 06:30:52 tech-mds kernel: LustreError: 6281:0: (ldlm_lib.c:1826:target_send_reply_msg()) Skipped 1 previous similar message Jan 4 06:31:11 tech-mds kernel: Lustre: 6264:0: (ldlm_lib.c:538:target_handle_reconnect()) MGS: ca34b32b-6fd6- b367-9c76-870c8c944b50 reconnecting Jan 4 06:31:11 tech-mds kernel: Lustre: 6305:0: (ldlm_lib.c:1718:target_queue_last_replay_reply()) technion-MDT0000: 32 recoverable clients remain Jan 4 06:31:17 tech-mds kernel: LustreError: 6285:0: (ldlm_lib.c:884:target_handle_connect()) technion-MDT0000: denying connection for new client 192.114.101...@tcp (ab671897-b1e2-76d3-b661-7b87e82d23e7): 32 clients in recovery for 112s Jan 4 06:31:17 tech-mds kernel: LustreError: 6285:0: (ldlm_lib.c:884:target_handle_connect()) Skipped 1 previous similar message Jan 4 06:31:19 tech-mds kernel: Lustre: 6263:0: (ldlm_lib.c:538:target_handle_reconnect()) MGS: c26bf58b-6583-5577-e6b8- f2ff1d0e5df8 reconnecting Jan 4 06:31:19 tech-mds kernel: Lustre: 6299:0: (ldlm_lib.c:815:target_handle_connect()) technion-MDT0000: refuse reconnection from c6e1cf14-2820-92bb-4471-e48c5a5a0...@192.114.101.25@tcp to 0xffff81006fc1e000; still busy with 2 active RPCs Jan 4 06:31:19 tech-mds kernel: Lustre: 6288:0: (ldlm_lib.c:1718:target_queue_last_replay_reply()) technion-MDT0000: 31 recoverable clients remain Jan 4 06:31:32 tech-mds kernel: Lustre: 6302:0: (ldlm_lib.c:538:target_handle_reconnect()) technion-MDT0000: 5887f548-0db2-2b71-ff4c-0063614c0686 reconnecting Jan 4 06:31:32 tech-mds kernel: Lustre: 6302:0: (ldlm_lib.c:538:target_handle_reconnect()) Skipped 2 previous similar messages Jan 4 06:31:32 tech-mds kernel: LustreError: 6280:0: (service.c:612:ptlrpc_check_req()) @@@ DROPPING req from old connection 203 < 204 r...@ffff8100d3ecc450 x1323646281069438/t0 o101->5887f548-0db2-2b71- ff4c-0063614c0...@net_0x20000c0726514_uuid:0/0 lens 296/0 e 0 to 0 dl 1262579747 ref 1 fl Interpret:/0/0 rc 0/0 Jan 4 06:31:36 tech-mds kernel: Lustre: 6283:0: (ldlm_lib.c:538:target_handle_reconnect()) technion-MDT0000: c6e1cf14-2820-92bb-4471-e48c5a5a0cbf reconnecting Jan 4 06:31:36 tech-mds kernel: Lustre: 6283:0: (ldlm_lib.c:538:target_handle_reconnect()) Skipped 1 previous similar message Jan 4 06:31:39 tech-mds kernel: Lustre: 6283:0: (ldlm_lib.c:538:target_handle_reconnect()) technion-MDT0000: 410e0e8a-b08b- f77d-a88e-a216da983909 reconnecting Jan 4 06:31:39 tech-mds kernel: Lustre: 6283:0: (ldlm_lib.c:538:target_handle_reconnect()) Skipped 1 previous similar message Jan 4 06:31:47 tech-mds kernel: Lustre: 6302:0: (ldlm_lib.c:1718:target_queue_last_replay_reply()) technion-MDT0000: 30 recoverable clients remain Jan 4 06:31:48 tech-mds kernel: Lustre: 6292:0: (ldlm_lib.c:538:target_handle_reconnect()) technion-MDT0000: 8383ca9c- fdbf-1edf-06d9-0fb98f7e1472 reconnecting Jan 4 06:31:52 tech-mds kernel: Lustre: 6306:0: (ldlm_lib.c:815:target_handle_connect()) technion-MDT0000: refuse reconnection from ec65e3e4-19af-a532-f0c3-ae73899a2...@192.114.101.30@tcp to 0xffff81006fc88000; still busy with 2 active RPCs Jan 4 06:31:57 tech-mds kernel: Lustre: 6281:0: (ldlm_lib.c:538:target_handle_reconnect()) technion-MDT0000: 58b52546-23b2-4857-cd8c-c172d4f64069 reconnecting Jan 4 06:31:57 tech-mds kernel: Lustre: 6281:0: (ldlm_lib.c:538:target_handle_reconnect()) Skipped 4 previous similar messages Jan 4 06:31:57 tech-mds kernel: Lustre: 6291:0: (ldlm_lib.c:1718:target_queue_last_replay_reply()) technion-MDT0000: 28 recoverable clients remain Jan 4 06:31:57 tech-mds kernel: Lustre: 6291:0: (ldlm_lib.c:1718:target_queue_last_replay_reply()) Skipped 1 previous similar message Jan 4 06:32:07 tech-mds kernel: LustreError: 6305:0: (ldlm_lib.c:1826:target_send_reply_msg()) @@@ processing error (-16) r...@ffff810054eb3000 x1323646107950720/t0 o38-><?>@<?>:0/0 lens 368/264 e 0 to 0 dl 1262579627 ref 1 fl Interpret:/0/0 rc -16/0 Jan 4 06:32:07 tech-mds kernel: LustreError: 6305:0: (ldlm_lib.c:1826:target_send_reply_msg()) Skipped 4 previous similar messages Jan 4 06:32:13 tech-mds kernel: Lustre: 6304:0: (ldlm_lib.c:1718:target_queue_last_replay_reply()) technion-MDT0000: 27 recoverable clients remain Jan 4 06:32:15 tech-mds kernel: Lustre: 6301:0: (ldlm_lib.c:538:target_handle_reconnect()) technion-MDT0000: 9ba11766-2e56-35b9-957e-5d186169b9c8 reconnecting Jan 4 06:32:15 tech-mds kernel: Lustre: 6301:0: (ldlm_lib.c:538:target_handle_reconnect()) Skipped 1 previous similar message Jan 4 06:32:17 tech-mds kernel: LustreError: 6164:0: (socklnd.c:1639:ksocknal_destroy_conn()) Completing partial receive from 12345-192.114.101...@tcp, ip 192.114.101.24:1022, with error Jan 4 06:32:17 tech-mds kernel: LustreError: 6164:0: (events.c:229:request_in_callback()) event type 1, status -5, service mds Jan 4 06:32:17 tech-mds kernel: LustreError: 6289:0: (pack_generic.c:871:lustre_unpack_msg()) message length 0 too small for magic/version check Jan 4 06:32:17 tech-mds kernel: LustreError: 6289:0: (service.c:1102:ptlrpc_server_handle_req_in()) error unpacking request: ptl 12 from 12345-192.114.101...@tcp xid 1323646241338075 Jan 4 06:32:31 tech-mds kernel: Lustre: 6283:0: (ldlm_lib.c:1718:target_queue_last_replay_reply()) technion-MDT0000: 25 recoverable clients remain Jan 4 06:32:31 tech-mds kernel: Lustre: 6283:0: (ldlm_lib.c:1718:target_queue_last_replay_reply()) Skipped 1 previous similar message Jan 4 06:32:32 tech-mds kernel: LustreError: 6291:0: (ldlm_lib.c:884:target_handle_connect()) technion-MDT0000: denying connection for new client 192.114.101...@tcp (ab671897-b1e2-76d3-b661-7b87e82d23e7): 25 clients in recovery for 37s Jan 4 06:32:32 tech-mds kernel: LustreError: 6291:0: (ldlm_lib.c:884:target_handle_connect()) Skipped 2 previous similar messages Jan 4 06:32:55 tech-mds kernel: Lustre: 6295:0: (ldlm_lib.c:538:target_handle_reconnect()) technion-MDT0000: 1d2eebf8- db26-7093-3a42-f7f0ca8a6b1b reconnecting Jan 4 06:32:55 tech-mds kernel: Lustre: 6295:0: (ldlm_lib.c:538:target_handle_reconnect()) Skipped 5 previous similar messages Jan 4 06:33:02 tech-mds kernel: LustreError: 6164:0: (socklnd.c:1639:ksocknal_destroy_conn()) Completing partial receive from 12345-192.114.10...@tcp, ip 192.114.101.9:1023, with error Jan 4 06:33:02 tech-mds kernel: LustreError: 6164:0: (events.c:229:request_in_callback()) event type 1, status -5, service mds Jan 4 06:33:02 tech-mds kernel: LustreError: 6299:0: (pack_generic.c:871:lustre_unpack_msg()) message length 0 too small for magic/version check Jan 4 06:33:02 tech-mds kernel: LustreError: 6299:0: (service.c:1102:ptlrpc_server_handle_req_in()) error unpacking request: ptl 12 from 12345-192.114.10...@tcp xid 1323646400066660 Jan 4 06:33:08 tech-mds kernel: Lustre: MGS: haven't heard from client 7f39d026-7d8e-6127-a73a-e0e30f4a0cbf (at 192.114.101...@tcp) in 193 seconds. I think it's dead, and I am evicting it. Jan 4 06:33:09 tech-mds kernel: Lustre: 6298:0: (ldlm_lib.c:1718:target_queue_last_replay_reply()) technion-MDT0000: 21 recoverable clients remain Jan 4 06:33:09 tech-mds kernel: Lustre: 6298:0: (ldlm_lib.c:1718:target_queue_last_replay_reply()) Skipped 3 previous similar messages Jan 4 06:33:10 tech-mds kernel: Lustre: technion-MDT0000: recovery period over; 21 clients never reconnected after 375s (35 clients did) Jan 4 06:33:19 tech-mds kernel: LustreError: 6263:0: (mgs_handler.c:572:mgs_handle()) lustre_mgs: operation 400 on unconnected MGS Jan 4 06:33:20 tech-mds kernel: LustreError: 6263:0: (mgs_handler.c:572:mgs_handle()) lustre_mgs: operation 400 on unconnected MGS Jan 4 06:33:26 tech-mds kernel: Lustre: 6281:0: (ldlm_lib.c:815:target_handle_connect()) technion-MDT0000: refuse reconnection from 7535d83e-42c3-217f-e06c-f503c9eac...@192.114.101.4@tcp to 0xffff81006fc80000; still busy with 2 active RPCs Jan 4 06:33:26 tech-mds kernel: LustreError: 6264:0: (service.c:612:ptlrpc_check_req()) @@@ DROPPING req from old connection 298 < 299 r...@ffff810070eb7850 x1323646478059177/t0 o400->7eb753db-5828-0ada-8a05- fa96abac8...@net_0x20000c0726504_uuid:0/0 lens 192/0 e 0 to 0 dl 1262579612 ref 1 fl Interpret:H/0/0 rc 0/0 Jan 4 06:33:31 tech-mds kernel: LustreError: 6357:0: (class_hash.c:225:lustre_hash_findadd_unique_hnode()) ASSERTION(hlist_unhashed(hnode)) failed Jan 4 06:33:31 tech-mds kernel: LustreError: 6357:0: (class_hash.c:225:lustre_hash_findadd_unique_hnode()) LBUG Jan 4 06:33:31 tech-mds kernel: Lustre: 6357:0:(linux- debug.c:222:libcfs_debug_dumpstack()) showing stack for process 6357 Jan 4 06:33:31 tech-mds kernel: ll_mgs_02 R running task 0 6357 1 6340 (L-TLB) Jan 4 06:33:31 tech-mds kernel: ffff810110dfde50 ffffffff80063097 ffff810070f28000 0000000000000082 Jan 4 06:33:31 tech-mds kernel: 0000008100002000 ffff810070e325b0 ffff810070ebb148 0000000000000001 Jan 4 06:33:31 tech-mds kernel: ffff810070e325a8 0000000000000000 ffff810110dfde10 ffffffff8008882b Jan 4 06:33:31 tech-mds kernel: Call Trace: Jan 4 06:33:31 tech-mds kernel: [<ffffffff80063097>] thread_return+0x62/0xfe Jan 4 06:33:31 tech-mds kernel: [<ffffffff8008882b>] __wake_up_common+0x3e/0x68 Jan 4 06:33:31 tech-mds kernel: [<ffffffff886682e8>] :ptlrpc:ptlrpc_main+0x1218/0x13e0 Jan 4 06:33:31 tech-mds kernel: [<ffffffff8008a3f6>] default_wake_function+0x0/0xe Jan 4 06:33:31 tech-mds kernel: [<ffffffff800b491a>] audit_syscall_exit+0x31b/0x336 Jan 4 06:33:31 tech-mds kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Jan 4 06:33:31 tech-mds kernel: [<ffffffff886670d0>] :ptlrpc:ptlrpc_main+0x0/0x13e0 Jan 4 06:33:31 tech-mds kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 Jan 4 06:33:31 tech-mds kernel: Jan 4 06:33:31 tech-mds kernel: LustreError: dumping log to /tmp/lustre- log.1262579611.6357 Jan 4 06:34:35 tech-mds kernel: Lustre: 6264:0: (ldlm_lib.c:538:target_handle_reconnect()) MGS: 055e7f6a-94fb-97e0-2117- bc6afa3f8b10 reconnecting Jan 4 06:34:35 tech-mds kernel: Lustre: 6276:0: (ldlm_lib.c:1718:target_queue_last_replay_reply()) technion-MDT0000: 15 recoverable clients remain Jan 4 06:34:35 tech-mds kernel: Lustre: 6276:0: (ldlm_lib.c:1718:target_queue_last_replay_reply()) Skipped 5 previous similar messages Jan 4 06:34:35 tech-mds kernel: Lustre: 6264:0: (ldlm_lib.c:538:target_handle_reconnect()) Skipped 21 previous similar messages Jan 4 06:34:37 tech-mds kernel: LustreError: 6287:0: (ldlm_lib.c:1826:target_send_reply_msg()) @@@ processing error (-16) r...@ffff8100d3eb0800 x1323646107950792/t0 o38-><?>@<?>:0/0 lens 368/264 e 0 to 0 dl 1262579777 ref 1 fl Interpret:/0/0 rc -16/0 Jan 4 06:34:37 tech-mds kernel: LustreError: 6287:0: (ldlm_lib.c:1826:target_send_reply_msg()) Skipped 8 previous similar messages Jan 4 06:35:02 tech-mds kernel: LustreError: 6290:0: (ldlm_lib.c:884:target_handle_connect()) technion-MDT0000: denying connection for new client 192.114.101...@tcp (ab671897-b1e2-76d3-b661-7b87e82d23e7): 14 clients in recovery for 187s Jan 4 06:35:02 tech-mds kernel: LustreError: 6290:0: (ldlm_lib.c:884:target_handle_connect()) Skipped 5 previous similar messages Jan 4 06:35:21 tech-mds kernel: LustreError: 6263:0: (service.c:612:ptlrpc_check_req()) @@@ DROPPING req from old connection 296 < 297 r...@ffff810070e3f850 x1323645340887548/t0 o400->8c347320-a2f7- aa5a-14a1-35d466efd...@net_0x20000c0726522_uuid:0/0 lens 192/0 e 0 to 0 dl 1262579727 ref 1 fl Interpret:H/0/0 rc 0/0 Jan 4 06:36:51 tech-mds kernel: Lustre: 0:0:(watchdog.c:153:lcw_cb()) Watchdog triggered for pid 6357: it was inactive for 200.00s Jan 4 06:36:51 tech-mds kernel: Lustre: 0:0:(linux- debug.c:222:libcfs_debug_dumpstack()) showing stack for process 6357 Jan 4 06:36:51 tech-mds kernel: ll_mgs_02 D ffff81000237e980 0 6357 1 6340 (L-TLB) Jan 4 06:36:51 tech-mds kernel: ffff810110dfd9d0 0000000000000046 0000000000000000 0000000000000000 Jan 4 06:36:51 tech-mds kernel: ffff810110dfd990 0000000000000009 ffff81011e687820 ffff8101023ca080 Jan 4 06:36:51 tech-mds kernel: 00000057eeacb9c0 000000000000167b ffff81011e687a08 00000001000000e1 Jan 4 06:36:51 tech-mds kernel: Call Trace: Jan 4 06:36:51 tech-mds kernel: [<ffffffff8008a3f6>] default_wake_function+0x0/0xe Jan 4 06:36:51 tech-mds kernel: [<ffffffff884fab26>] :libcfs:lbug_with_loc+0xc6/0xd0 Jan 4 06:36:51 tech-mds kernel: [<ffffffff88502c70>] :libcfs:tracefile_init+0x0/0x110 Jan 4 06:36:51 tech-mds kernel: [<ffffffff88597702>] :obdclass:lustre_hash_findadd_unique_hnode+0x1a2/0x380 Jan 4 06:36:51 tech-mds kernel: [<ffffffff8859897e>] :obdclass:lustre_hash_add_unique+0x7e/0x230 Jan 4 06:36:51 tech-mds kernel: [<ffffffff8862941f>] :ptlrpc:target_handle_connect+0x250f/0x2880 Jan 4 06:36:51 tech-mds kernel: [<ffffffff8865e900>] :ptlrpc:lustre_msg_set_conn_cnt+0xc0/0x120 Jan 4 06:36:51 tech-mds kernel: [<ffffffff88653d78>] :ptlrpc:ptlrpc_send_reply+0x5c8/0x5e0 Jan 4 06:36:51 tech-mds kernel: [<ffffffff888c1cce>] :mgs:mgs_handle+0x4ee/0x1540 Jan 4 06:36:51 tech-mds kernel: [<ffffffff88664db3>] :ptlrpc:ptlrpc_server_handle_request+0xa93/0x1160 Jan 4 06:36:51 tech-mds kernel: [<ffffffff80063097>] thread_return+0x62/0xfe Jan 4 06:36:51 tech-mds kernel: [<ffffffff8008882b>] __wake_up_common+0x3e/0x68 Jan 4 06:36:51 tech-mds kernel: [<ffffffff886682e8>] :ptlrpc:ptlrpc_main+0x1218/0x13e0 Jan 4 06:36:51 tech-mds kernel: [<ffffffff8008a3f6>] default_wake_function+0x0/0xe Jan 4 06:36:51 tech-mds kernel: [<ffffffff800b491a>] audit_syscall_exit+0x31b/0x336 Jan 4 06:36:51 tech-mds kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Jan 4 06:36:51 tech-mds kernel: [<ffffffff886670d0>] :ptlrpc:ptlrpc_main+0x0/0x13e0 Jan 4 06:36:51 tech-mds kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 Jan 4 06:36:51 tech-mds kernel: Jan 4 06:36:51 tech-mds kernel: LustreError: dumping log to /tmp/lustre- log.1262579811.6357 Jan 4 06:37:01 tech-mds kernel: Lustre: 6306:0: (ldlm_lib.c:538:target_handle_reconnect()) technion-MDT0000: 5887f548-0db2-2b71-ff4c-0063614c0686 reconnecting Jan 4 06:37:01 tech-mds kernel: Lustre: 6306:0: (ldlm_lib.c:538:target_handle_reconnect()) Skipped 6 previous similar messages Jan 4 06:37:02 tech-mds kernel: Lustre: 6304:0: (ldlm_lib.c:1718:target_queue_last_replay_reply()) technion-MDT0000: 10 recoverable clients remain Jan 4 06:37:02 tech-mds kernel: Lustre: 6304:0: (ldlm_lib.c:1718:target_queue_last_replay_reply()) Skipped 4 previous similar messages Jan 4 06:38:10 tech-mds kernel: Lustre: technion-MDT0000: recovery period over; 10 clients never reconnected after 675s (35 clients did) Jan 4 06:38:10 tech-mds kernel: LustreError: 6275:0: (handler.c:1554:mds_handle()) operation 101 on unconnected MDS from 12345-192.114.10...@tcp Jan 4 06:38:10 tech-mds kernel: LustreError: 6303:0: (handler.c:1554:mds_handle()) operation 101 on unconnected MDS from 12345-192.114.101...@tcp Jan 4 06:38:11 tech-mds kernel: LustreError: 6281:0: (handler.c:1554:mds_handle()) operation 101 on unconnected MDS from 12345-192.114.101...@tcp Jan 4 06:38:12 tech-mds kernel: LustreError: 6296:0: (handler.c:1554:mds_handle()) operation 101 on unconnected MDS from 12345-192.114.10...@tcp Jan 4 06:38:12 tech-mds kernel: LustreError: 6296:0: (handler.c:1554:mds_handle()) Skipped 1 previous similar message Jan 4 06:38:14 tech-mds kernel: Lustre: 6301:0: (quota_master.c:1680:mds_quota_recovery()) Only 13/10 OSTs are active, abort quota recovery Jan 4 06:38:14 tech-mds kernel: Lustre: technion-MDT0000: recovery complete: rc 0 Jan 4 06:38:14 tech-mds kernel: Lustre: technion-MDT0000: sending delayed replies to recovered clients Jan 4 06:38:14 tech-mds kernel: LustreError: 6276:0: (mds_open.c:664:reconstruct_open()) Re-opened file Jan 4 06:38:14 tech-mds kernel: LustreError: 6139:0: (handler.c:416:mds_destroy_export()) ASSERTION(list_empty(&exp- >exp_mds_data.med_open_head)) failed Jan 4 06:38:14 tech-mds kernel: LustreError: 6139:0: (handler.c:416:mds_destroy_export()) LBUG Jan 4 06:38:14 tech-mds kernel: Lustre: 6139:0:(linux- debug.c:222:libcfs_debug_dumpstack()) showing stack for process 6139 Jan 4 06:38:14 tech-mds kernel: obd_zombid R running task 0 6139 1 6156 6026 (L-TLB) Jan 4 06:38:14 tech-mds kernel: ffffffff88505ab5 ffffffff8895f8f8 ffff81006fc20000 ffff810071e29f00 Jan 4 06:38:14 tech-mds kernel: 00002b35469e6010 ffffffff8892604f ffff81011bc447a0 ffff81006fc20000 Jan 4 06:38:14 tech-mds kernel: ffff81011ab58078 ffff810071e29f00 00002b35469e6010 ffff81006fc20000 Jan 4 06:38:14 tech-mds kernel: Call Trace: Jan 4 06:38:14 tech-mds kernel: [<ffffffff8892604f>] :mds:mds_destroy_export+0x9f/0x120 Jan 4 06:38:14 tech-mds kernel: [<ffffffff8859d3bc>] :obdclass:class_export_destroy+0x20c/0x2c0 Jan 4 06:38:15 tech-mds kernel: [<ffffffff8859bac1>] :obdclass:obd_zombi_impexp_check+0x11/0xc0 Jan 4 06:38:15 tech-mds kernel: [<ffffffff8859d4f2>] :obdclass:obd_zombie_impexp_cull+0x82/0xa0 Jan 4 06:38:15 tech-mds kernel: [<ffffffff885a226c>] :obdclass:obd_zombie_impexp_thread+0x1ec/0x290 Jan 4 06:38:15 tech-mds kernel: [<ffffffff8008a3f6>] default_wake_function+0x0/0xe Jan 4 06:38:15 tech-mds kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Jan 4 06:38:15 tech-mds kernel: [<ffffffff885a2080>] :obdclass:obd_zombie_impexp_thread+0x0/0x290 Jan 4 06:38:15 tech-mds kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 Jan 4 06:38:16 tech-mds kernel: Jan 4 06:38:16 tech-mds kernel: LustreError: dumping log to /tmp/lustre- log.1262579894.6139 Jan 4 06:38:16 tech-mds kernel: LustreError: 6298:0: (handler.c:1554:mds_handle()) operation 101 on unconnected MDS from 12345-192.114.101...@tcp Jan 4 06:38:16 tech-mds kernel: LustreError: 6298:0: (handler.c:1554:mds_handle()) Skipped 4 previous similar messages Jan 4 06:38:16 tech-mds kernel: LustreError: 6288:0: (mds_open.c:664:reconstruct_open()) Re-opened file Jan 4 06:38:16 tech-mds kernel: Lustre: MDS technion-MDT0000: technion- OST0008_UUID now active, resetting orphans Jan 4 06:38:16 tech-mds kernel: Lustre: MDS technion-MDT0000: technion- OST000a_UUID now active, resetting orphans Jan 4 06:38:21 tech-mds kernel: Lustre: MDS technion-MDT0000: technion- OST0002_UUID now active, resetting orphans Jan 4 06:38:21 tech-mds kernel: Lustre: Skipped 5 previous similar messages Jan 4 06:38:26 tech-mds kernel: Lustre: MDS technion-MDT0000: technion- OST0001_UUID now active, resetting orphans Jan 4 06:38:31 tech-mds kernel: Lustre: MDS technion-MDT0000: technion- OST0000_UUID now active, resetting orphans Jan 4 06:38:41 tech-mds kernel: LustreError: 6392:0: (mds_open.c:1665:mds_close()) @@@ no handle for file close ino 18531070: cookie 0xdcb9c7fd999ea709 r...@ffff8100d3ed0000 x1323646224495072/t0 o35->5d1ee8c1- f826-9ab3-89bf-342c4f9e2...@net_0x20000c0726512_uuid:0/0 lens 408/976 e 0 to 0 dl 1262579964 ref 1 fl Interpret:/0/0 rc 0/0 Jan 4 06:38:41 tech-mds kernel: LustreError: 6398:0: (mds_open.c:1665:mds_close()) @@@ no handle for file close ino 18531068: cookie 0xdcb9c7fd999e9dfc r...@ffff8100dc7c8c00 x1323646224495073/t0 o35->5d1ee8c1- f826-9ab3-89bf-342c4f9e2...@net_0x20000c0726512_uuid:0/0 lens 408/976 e 0 to 0 dl 1262579927 ref 1 fl Interpret:/0/0 rc 0/0 Jan 4 06:38:41 tech-mds kernel: LustreError: 6415:0: (mds_open.c:1665:mds_close()) @@@ no handle for file close ino 18508458: cookie 0xdcb9c7fd9983617e r...@ffff8100d4bfb400 x1323646224495345/t0 o35->5d1ee8c1- f826-9ab3-89bf-342c4f9e2...@net_0x20000c0726512_uuid:0/0 lens 408/976 e 0 to 0 dl 1262579927 ref 1 fl Interpret:/0/0 rc 0/0 Jan 4 06:38:41 tech-mds kernel: LustreError: 6415:0: (mds_open.c:1665:mds_close()) Skipped 271 previous similar messages Jan 4 06:38:42 tech-mds kernel: LustreError: 6409:0: (mds_open.c:1665:mds_close()) @@@ no handle for file close ino 18498078: cookie 0xdcb9c7fd99273a35 r...@ffff810054d2e800 x1323646224496303/t0 o35->5d1ee8c1- f826-9ab3-89bf-342c4f9e2...@net_0x20000c0726512_uuid:0/0 lens 408/976 e 0 to 0 dl 1262579928 ref 1 fl Interpret:/0/0 rc 0/0 Jan 4 06:38:42 tech-mds kernel: LustreError: 6409:0: (mds_open.c:1665:mds_close()) Skipped 957 previous similar messages Jan 4 06:38:44 tech-mds kernel: LustreError: 6413:0: (mds_open.c:1665:mds_close()) @@@ no handle for file close ino 18464618: cookie 0xdcb9c7fd9893064a r...@ffff8100d39f3400 x1323646224498078/t0 o35->5d1ee8c1- f826-9ab3-89bf-342c4f9e2...@net_0x20000c0726512_uuid:0/0 lens 408/976 e 0 to 0 dl 1262579930 ref 1 fl Interpret:/0/0 rc 0/0 Jan 4 06:38:44 tech-mds kernel: LustreError: 6413:0: (mds_open.c:1665:mds_close()) Skipped 1774 previous similar messages Jan 4 06:38:48 tech-mds kernel: LustreError: 6423:0: (mds_open.c:1665:mds_close()) @@@ no handle for file close ino 18437710: cookie 0xdcb9c7fd9817e589 r...@ffff8100d45b5c00 x1323646224499484/t0 o35->5d1ee8c1- f826-9ab3-89bf-342c4f9e2...@net_0x20000c0726512_uuid:0/0 lens 408/976 e 0 to 0 dl 1262579934 ref 1 fl Interpret:/0/0 rc 0/0 Jan 4 06:38:48 tech-mds kernel: LustreError: 6423:0: (mds_open.c:1665:mds_close()) Skipped 1405 previous similar messages Jan 4 06:38:53 tech-mds kernel: LustreError: 6422:0: (ldlm_lib.c:1826:target_send_reply_msg()) @@@ processing error (-116) r...@ffff810054d38000 x1323646224500886/t0 o35->5d1ee8c1- f826-9ab3-89bf-342c4f9e2...@net_0x20000c0726512_uuid:0/0 lens 408/976 e 0 to 0 dl 1262579939 ref 1 fl Interpret:/0/0 rc -116/0 Jan 4 06:38:53 tech-mds kernel: LustreError: 6422:0: (ldlm_lib.c:1826:target_send_reply_msg()) Skipped 5838 previous similar messages Jan 4 06:38:56 tech-mds kernel: LustreError: 6420:0: (mds_open.c:1665:mds_close()) @@@ no handle for file close ino 13567564: cookie 0xde1fda06cd4d058c r...@ffff810055378800 x1323646224501408/t0 o35->5d1ee8c1- f826-9ab3-89bf-342c4f9e2...@net_0x20000c0726512_uuid:0/0 lens 408/976 e 0 to 0 dl 1262579942 ref 1 fl Interpret:/0/0 rc 0/0 Jan 4 06:38:56 tech-mds kernel: LustreError: 6420:0: (mds_open.c:1665:mds_close()) Skipped 1923 previous similar messages -- David Cohen _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss