Hi, My MDS crashed during MDT mount. The last_rcvd trick described in the knowledge base is not working -kernel still crashes after truncating last_rcvd to 8k. (I have used it successfully before).
Any ideas (other than upgrading from 1.6.4.3) on getting my MDT running again ? Thanks /Jakob [ 344.935438] BUG: scheduling while atomic: mount.lustre/0xffff8101/2024 [ 344.936754] [ 344.936755] Call Trace: [ 344.937738] [<ffffffff8025973a>] __sched_text_start+0x7a/0x769 [ 344.939092] ----------- [cut here ] --------- [please bite here ] --------- [ 344.940751] Kernel BUG at kernel/sched.c:1008 [ 344.941801] invalid opcode: 0000 [1] SMP [ 344.942784] CPU 0 [ 344.943308] Modules linked in: osc mds fsfilt_ldiskfs mgs mgc lustre lov lquota mdc ksocklnd ptlrpc obdclass lnet lvfs libcfs ldiskfs crc16 ipmi_devintf ipmi_si ipmi_msghandler bonding dm_snapshot dm_mirror dm_mod generic serio_raw piix ehci_hcd uhci_hcd ide_core [ 344.949927] Pid: 2024, comm: mount.lustre Not tainted 2.6.18.8-bnx2-1.6.7b-cciss-3.6.18-5-lustre-1.6.4.3 #2 [ 344.951972] RIP: 0010:[<ffffffff80274371>] [<ffffffff80274371>] resched_task+0x24/0x65 [ 344.953893] RSP: 0018:ffffffff804ccdc0 EFLAGS: 00010002 [ 344.955099] RAX: 0000000000000001 RBX: 000000504ff8c8da RCX: ffff810124422000 [ 344.956687] RDX: ffff81012bd3bbc0 RSI: ffff810001023bf8 RDI: ffff81012b06a180 [ 344.958253] RBP: ffffffff804ccdc0 R08: 000000000000000d R09: 000000000000007f [ 344.959865] R10: ffff81012baec420 R11: 0000000000000000 R12: ffff81012b8dd810 [ 344.961259] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8100010232a0 [ 344.962946] FS: 00002ac6e3d176d0(0000) GS:ffffffff8051a000(0000) knlGS:0000000000000000 [ 344.964530] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 344.965871] CR2: 00002b233f140160 CR3: 0000000124240000 CR4: 00000000000006e0 [ 344.967261] Process mount.lustre (pid: 2024, threadinfo ffff810124422000, task ffff81012b06a180) [ 344.968992] Stack: ffffffff804cce20 ffffffff8024232e 0000000000000000 0000000000000001 [ 344.970865] 0000000000000001 0000000000000002 0000000000000082 ffff81012b8dd810 [ 344.972743] 000000000000000e 0000000000000001 ffff810001024d04 0000000000000000 [ 344.974502] Call Trace: [ 344.975203] <IRQ> [<ffffffff8024232e>] try_to_wake_up+0x2e3/0x353 [ 344.976561] [<ffffffff8027fab7>] signal_wake_up+0x1e/0x2d [ 344.977835] [<ffffffff8027fdcc>] __group_send_sig_info+0x89/0x94 [ 344.979030] [<ffffffff802551cf>] group_send_sig_info+0x4e/0x75 [ 344.980414] [<ffffffff80280cf3>] send_group_sig_info+0x28/0x35 [ 344.981591] [<ffffffff8027a99d>] it_real_fn+0x23/0x4f [ 344.982775] [<ffffffff8027a97a>] it_real_fn+0x0/0x4f [ 344.983792] [<ffffffff80249dbb>] hrtimer_run_queues+0x107/0x16d [ 344.984974] [<ffffffff8027e434>] run_timer_softirq+0x21/0x1b0 [ 344.986369] [<ffffffff802101e5>] __do_softirq+0x5e/0xd6 [ 344.987602] [<ffffffff80305e65>] end_msi_irq_w_maskbit+0xf/0x1c [ 344.994691] [<ffffffff80257f58>] call_softirq+0x1c/0x28 [ 344.996209] [<ffffffff802610a6>] do_softirq+0x2c/0x7d [ 344.997383] [<ffffffff80261071>] do_IRQ+0x6a/0x73 [ 344.998472] [<ffffffff8025727d>] ret_from_intr+0x0/0xa [ 344.999537] <EOI> [<ffffffff8027918c>] vprintk+0x29e/0x2ea [ 345.000844] [<ffffffff80286a6c>] autoremove_wake_function+0x9/0x2e [ 345.002332] [<ffffffff80273dbf>] __wake_up_common+0x3e/0x68 [ 345.003612] [<ffffffff8025973a>] __sched_text_start+0x7a/0x769 [ 345.004946] [<ffffffff80279226>] printk+0x4e/0x56 [ 345.006061] [<ffffffff8025973a>] __sched_text_start+0x7a/0x769 [ 345.007403] [<ffffffff8027918c>] vprintk+0x29e/0x2ea [ 345.008607] [<ffffffff8028e1bc>] kallsyms_lookup+0xe7/0x1af [ 345.009948] [<ffffffff8025973a>] __sched_text_start+0x7a/0x769 [ 345.011277] [<ffffffff8025f832>] printk_address+0x9f/0xac [ 345.012519] [<ffffffff80279226>] printk+0x4e/0x56 [ 345.013507] [<ffffffff802f1216>] elv_insert+0xc9/0x192 [ 345.014549] [<ffffffff8025973a>] __sched_text_start+0x7a/0x769 [ 345.015890] [<ffffffff8025fa38>] show_trace+0x1f9/0x21f [ 345.016965] [<ffffffff802130a8>] sync_buffer+0x0/0x3f [ 345.018125] [<ffffffff8025fa70>] dump_stack+0x12/0x17 [ 345.019361] [<ffffffff8804a2bf>] :dm_mod:__map_bio+0x47/0x9b [ 345.020664] [<ffffffff8025973a>] __sched_text_start+0x7a/0x769 [ 345.021999] [<ffffffff8023ab95>] lock_timer_base+0x1b/0x3c [ 345.023258] [<ffffffff8022f226>] del_timer+0x4e/0x57 [ 345.024442] [<ffffffff802130a8>] sync_buffer+0x0/0x3f [ 345.025663] [<ffffffff8025a59a>] io_schedule+0x28/0x34 [ 345.026919] [<ffffffff802130e3>] sync_buffer+0x3b/0x3f [ 345.028132] [<ffffffff8025a8f5>] __wait_on_bit+0x40/0x6f [ 345.029198] [<ffffffff802130a8>] sync_buffer+0x0/0x3f [ 345.030400] [<ffffffff8025a990>] out_of_line_wait_on_bit+0x6c/0x78 [ 345.031660] [<ffffffff80286a91>] wake_bit_function+0x0/0x23 [ 345.032977] [<ffffffff80222c9f>] __bread+0x62/0x77 [ 345.034066] [<ffffffff880a1de2>] :ldiskfs:read_block_bitmap +0xa2/0xf0 [ 345.035359] [<ffffffff880a2695>] :ldiskfs:ldiskfs_free_blocks_sb +0x115/0x510 [ 345.036986] [<ffffffff880a2b21>] :ldiskfs:ldiskfs_free_blocks +0x91/0xe0 [ 345.038504] [<ffffffff880a7d1a>] :ldiskfs:ldiskfs_free_data +0x8a/0x110 [ 345.039828] [<ffffffff880a819c>] :ldiskfs:ldiskfs_truncate +0x20c/0x650 [ 345.041133] [<ffffffff802dbeab>] start_this_handle+0x355/0x405 [ 345.042556] [<ffffffff880a8bb4>] :ldiskfs:ldiskfs_delete_inode +0x84/0xf0 [ 345.044197] [<ffffffff880a8b30>] :ldiskfs:ldiskfs_delete_inode +0x0/0xf0 [ 345.045501] [<ffffffff8022c804>] generic_delete_inode+0x8e/0x10b [ 345.046728] [<ffffffff883ed891>] :mds:mds_obd_destroy+0xa11/0xad0 [ 345.048128] [<ffffffff8022a2d7>] mntput_no_expire+0x19/0x8b [ 345.049525] [<ffffffff8814961b>] :obdclass:llog_lvfs_close +0x6b/0x130 [ 345.051039] [<ffffffff8814a6c1>] :obdclass:llog_lvfs_destroy +0x841/0xa10 [ 345.052386] [<ffffffff88146a0f>] :obdclass:llog_cat_id2handle +0x4cf/0x5f0 [ 345.053994] [<ffffffff8021557d>] cache_grow+0x2ee/0x343 [ 345.055074] [<ffffffff881509c5>] :obdclass:cat_cancel_cb+0x405/0x630 [ 345.056634] [<ffffffff88146129>] :obdclass:llog_process+0xa09/0xe20 [ 345.058192] [<ffffffff8020c894>] dput+0x23/0x152 [ 345.059280] [<ffffffff881505c0>] :obdclass:cat_cancel_cb+0x0/0x630 [ 345.060717] [<ffffffff881503b3>] :obdclass:llog_obd_origin_setup +0x773/0x980 [ 345.062330] [<ffffffff8027486e>] find_busiest_group+0x20d/0x634 [ 345.063694] [<ffffffff8021819f>] vsnprintf+0x55e/0x5a3 [ 345.064967] [<ffffffff8815137d>] :obdclass:llog_setup+0x78d/0x860 [ 345.066364] [<ffffffff8842da94>] :osc:osc_llog_init+0x104/0x390 [ 345.067748] [<ffffffff880d7f48>] :libcfs:cfs_alloc+0x28/0x60 [ 345.069099] [<ffffffff8814e979>] :obdclass:obd_llog_init+0x179/0x210 [ 345.070579] [<ffffffff882b92ca>] :lov:lov_llog_init+0x2ca/0x400 [ 345.071958] [<ffffffff8814e979>] :obdclass:obd_llog_init+0x179/0x210 [ 345.073485] [<ffffffff8022a2d7>] mntput_no_expire+0x19/0x8b [ 345.074837] [<ffffffff883b31ad>] :mds:mds_llog_init+0x1ad/0x270 [ 345.076015] [<ffffffff8029abcb>] map_vm_area+0x229/0x2a8 [ 345.077175] [<ffffffff8814e979>] :obdclass:obd_llog_init+0x179/0x210 [ 345.078448] [<ffffffff8029af5b>] __vmalloc_area_node+0x12b/0x153 [ 345.079650] [<ffffffff8814edc5>] :obdclass:llog_cat_initialize +0x3b5/0x670 [ 345.081268] [<ffffffff882cdc61>] :lov:lov_get_info+0x9f1/0xaa0 [ 345.082616] [<ffffffff8025a990>] out_of_line_wait_on_bit+0x6c/0x78 [ 345.083841] [<ffffffff80286a91>] wake_bit_function+0x0/0x23 [ 345.085059] [<ffffffff883bc5ac>] :mds:mds_lov_update_desc +0xbcc/0xd30 [ 345.086619] [<ffffffff883c0e21>] :mds:mds_lov_connect+0x12c1/0x2020 [ 345.088059] [<ffffffff880d7f48>] :libcfs:cfs_alloc+0x28/0x60 [ 345.089271] [<ffffffff8815135e>] :obdclass:llog_setup+0x76e/0x860 [ 345.090497] [<ffffffff880d7f48>] :libcfs:cfs_alloc+0x28/0x60 [ 345.091872] [<ffffffff880f9db8>] :lvfs:upcall_cache_init+0x2f8/0x3a0 [ 345.093153] [<ffffffff883ce381>] :mds:mds_setup+0x10a1/0x1bd0 [ 345.094315] [<ffffffff8021557d>] cache_grow+0x2ee/0x343 [ 345.095371] [<ffffffff802562d7>] cache_alloc_refill+0xde/0x1da [ 345.096740] [<ffffffff880d7f48>] :libcfs:cfs_alloc+0x28/0x60 [ 345.098050] [<ffffffff8815a5cd>] :obdclass:class_new_export +0x52d/0x5b0 [ 345.099458] [<ffffffff8816fcdb>] :obdclass:class_setup+0x8bb/0xbe0 [ 345.100697] [<ffffffff8817236a>] :obdclass:class_process_config +0x14ca/0x19f0 [ 345.102340] [<ffffffff881756da>] :obdclass:class_config_llog_handler +0x153a/0x1990 [ 345.104079] [<ffffffff80224869>] do_filp_open+0x2d/0x3d [ 345.105317] [<ffffffff8814bcfc>] :obdclass:llog_lvfs_next_block +0x2ac/0x710 [ 345.106876] [<ffffffff88146129>] :obdclass:llog_process+0xa09/0xe20 [ 345.108321] [<ffffffff880d7f48>] :libcfs:cfs_alloc+0x28/0x60 [ 345.109465] [<ffffffff881741a0>] :obdclass:class_config_llog_handler +0x0/0x1990 [ 345.111169] [<ffffffff8817402f>] :obdclass:class_config_parse_llog +0x43f/0x5b0 [ 345.112828] [<ffffffff8020c8a5>] dput+0x34/0x152 [ 345.113868] [<ffffffff880f9052>] :lvfs:lustre_rename+0x482/0x530 [ 345.115157] [<ffffffff88143fea>] :obdclass:llog_close+0x1aa/0x230 [ 345.116668] [<ffffffff8836fe03>] :mgc:mgc_process_log+0x20f3/0x2640 [ 345.117916] [<ffffffff88370b90>] :mgc:mgc_blocking_ast+0x0/0x450 [ 345.119221] [<ffffffff881ddeb0>] :ptlrpc:ldlm_completion_ast +0x0/0x6a0 [ 345.120556] [<ffffffff8836d85c>] :mgc:config_log_find+0x19c/0x340 [ 345.121954] [<ffffffff88373fc2>] :mgc:mgc_process_config +0xe02/0x1280 [ 345.123472] [<ffffffff881795bc>] :obdclass:lustre_process_log +0xb2c/0xee0 [ 345.125033] [<ffffffff88179a40>] :obdclass:server_find_mount +0x80/0x190 [ 345.126421] [<ffffffff8817f7a6>] :obdclass:server_start_targets +0xb36/0x17e0 [ 345.127819] [<ffffffff8022d4ac>] __up_write+0x21/0x10d [ 345.128871] [<ffffffff88183c27>] :obdclass:server_fill_super +0x18c7/0x1ee0 [ 345.130308] [<ffffffff80208d6d>] __d_lookup+0xb0/0x100 [ 345.131812] [<ffffffff880d7f48>] :libcfs:cfs_alloc+0x28/0x60 [ 345.132994] [<ffffffff881778bf>] :obdclass:lustre_init_lsi +0x29f/0x660 [ 345.134301] [<ffffffff88184240>] :obdclass:lustre_fill_super +0x0/0x1ae0 [ 345.135680] [<ffffffff88185ba3>] :obdclass:lustre_fill_super +0x1963/0x1ae0 [ 345.137254] [<ffffffff802a95f5>] set_anon_super+0x3c/0xab [ 345.138372] [<ffffffff802a95b9>] set_anon_super+0x0/0xab [ 345.139609] [<ffffffff88184240>] :obdclass:lustre_fill_super +0x0/0x1ae0 [ 345.141115] [<ffffffff802a9805>] get_sb_nodev+0x4f/0x97 [ 345.142318] [<ffffffff802a910b>] vfs_kern_mount+0x93/0x11a [ 345.143573] [<ffffffff802a91d4>] do_kern_mount+0x36/0x4d [ 345.144754] [<ffffffff802b1982>] do_mount+0x68c/0x6ff [ 345.145930] [<ffffffff802088d3>] __handle_mm_fault+0x530/0x91a [ 345.147288] [<ffffffff80218776>] remove_vma+0x55/0x5c [ 345.148307] [<ffffffff8021f84a>] __up_read+0x13/0x8a [ 345.149455] [<ffffffff8020a6af>] do_page_fault+0x3d1/0x706 [ 345.150715] [<ffffffff8020c2e4>] do_path_lookup+0x268/0x28c [ 345.151992] [<ffffffff80297807>] zone_statistics+0x3e/0x6d [ 345.153145] [<ffffffff8020dcbc>] __alloc_pages+0x5c/0x29b [ 345.154399] [<ffffffff802472dd>] sys_mount+0x8a/0xd7 [ 345.155550] [<ffffffff80256d82>] system_call+0x7e/0x83 [ 345.156591] [ 345.156966] [ 345.156967] Code: 0f 0b 68 aa 6c 3f 80 c2 f0 03 8b 41 10 a8 08 75 2e f0 0f ba [ 345.159822] RIP [<ffffffff80274371>] resched_task+0x24/0x65 [ 345.161214] RSP <ffffffff804ccdc0> [ 345.161948] <0>Kernel panic - not syncing: Aiee, killing interrupt handler! [ 345.163565] _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
