On 2010-09-09, at 10:56, Nirmal Seenu wrote: > I just upgraded my lustre version from 1.8.1.1 to 1.8.4 and I can't reboot my > lustre clients cleanly anymore. I am using the latest RHEL kernel and > the openibd that comes part of that RHEL kernel + patchless lustre client > installed from the tar ball. > > The lustre client gets unmounted cleanly but the system deadlocks once the > openibd driver is removed. I had to modify the openibd stop script to > include "umount lustre" and "lustre_rmmod" as a work around.
If you put "_netdev" in the lustre mount options, the shutdown scripts should unmount it before trying to stop the networking. > The following is the error message that I get when I try to reboot the lustre > client: > > Scientific Linux SLF release 5.3 (Lederman) > Kernel 2.6.18-194.11.1.el5 on an x86_64 > > INIT:Shutting down smartd: [ OK ] > Stopping atd: [ OK ] > Shutting down process accounting: [ OK ] > Stopping xinetd: [ OK ] > Stopping autofs: Stopping automount: [ OK ] > [ OK ] > Stopping acpi daemon: [ OK ] > Shutting down ntpd: [ OK ] > Unmounting network block filesystems: LustreError: > 3697:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -108 from cancel > RPC: canceling anyway > LustreError: 3697:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) > ldlm_cli_cancel_list: -108 > Lustre: client ffff81020f145400 umount complete > [ OK ] > Unmounting NFS filesystems: [ OK ] > Stopping system message bus: [ OK ] > Stopping RPC idmapd: [ OK ] > Stopping NFS locking: [ OK ] > Stopping NFS statd: [ OK ] > Stopping portmap: [ OK ] > Stopping PC/SC smart card daemon (pcscd): [ OK ] > Shutting down kernel logger: [ OK ] > Shutting down system logger: [ OK ] > Unloading OpenIB kernel modules:NET: Unregistered protocol family 27 > > Failed to unload rdma_cm > > Failed to unload ib_cm > > Failed to unload iw_cm > LustreError: 131-3: Received notification of device removal > Please shutdown LNET to allow this to proceed > INFO: task rmmod:4151 blocked for more than 120 seconds. > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > rmmod D ffff810227061420 0 4151 3795 > (NOTLB) > ffff81021c8ddce8 0000000000000082 000000000000000f 0000000000000292 > 00000000000000ef 0000000000000001 ffff81020ecdd100 ffff8102271ef040 > 0000004a957c4bd9 000000000095dc57 ffff81020ecdd2e8 0000000480076646 > Call Trace: > [<ffffffff80063167>] wait_for_completion+0x79/0xa2 > [<ffffffff8008cfa1>] default_wake_function+0x0/0xe > [<ffffffff80063b05>] mutex_lock+0xd/0x1d > [<ffffffff8838d155>] :rdma_cm:cma_remove_one+0x171/0x1a2 > [<ffffffff80076525>] do_flush_tlb_all+0x0/0x6a > [<ffffffff8817d5f0>] :ib_core:ib_unregister_device+0x30/0xdb > [<ffffffff881a918a>] :ib_mthca:__mthca_remove_one+0x30/0x11a > [<ffffffff80063b05>] mutex_lock+0xd/0x1d > [<ffffffff881a928c>] :ib_mthca:mthca_remove_one+0x18/0x25 > [<ffffffff8015daeb>] pci_device_remove+0x24/0x3a > [<ffffffff801c7a3e>] __device_release_driver+0x9f/0xe9 > [<ffffffff801c7e04>] driver_detach+0xad/0x101 > [<ffffffff801c6ffe>] bus_remove_driver+0x6f/0x92 > [<ffffffff801c7e8b>] driver_unregister+0xd/0x16 > [<ffffffff8015ddb4>] pci_unregister_driver+0x2a/0x79 > [<ffffffff881bc398>] :ib_mthca:mthca_cleanup+0x10/0x16 > [<ffffffff800a6674>] sys_delete_module+0x196/0x1c5 > [<ffffffff8005d116>] system_call+0x7e/0x83 > > > Nirmal > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc. _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
