Re: Soft lockup in unloading kernel modules
Klements, Can you add more details on how to unloading the modules (step by step) in the bug report? Thanks Shirley On 05/19/2014 10:51 AM, Chuck Lever wrote: Hi Klemens- On May 13, 2014, at 12:48 PM, Klemens Senn wrote: Hi Anna, today I retried unloading the kernel modules with your updated kernel and additionally I tried the nfsd-next kernel from J. Bruce Fields and Chuck's nfs-rdma-client kernel. I filed https://bugzilla.linux-nfs.org/show_bug.cgi?id=252 to track this issue. In short: None of these was able to unload the kernel modules with an active connection. In detail: With your kernel I got following 3 faults: o BUG: soft lockup - CPU#0 stuck for 22s! [modprobe:4615] o BUG: unable to handle kernel NULL pointer dereference at 0003 o BUG: unable to handle kernel paging request at 5b8c With the nfsd-next kernel I got following results: o BUG: soft lockup - CPU#0 stuck for 23s! [modprobe:4452] o module unloading blocks forever, dmesg shows: nfsd: last server has exited, flushing export cache waiting module removal not supported: please upgrade o Kernel keeps running but reports the following: nfsd: last server has exited, flushing export cache waiting module removal not supported: please upgrade svc_xprt_enqueue: threads and transports both waiting?? INFO: task modprobe:4510 blocked for more than 480 seconds. Not tainted 3.15.0-rc1-bfields-master+ #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. modprobeD 88087fc13440 0 4510 4458 0x 88105bb23c58 0086 88105c14e690 00013440 88105bb23fd8 00013440 81a14480 88105c14e690 0037 88085d7f74d8 88085d7f74e0 7fff Call Trace: [] schedule+0x24/0x70 [] schedule_timeout+0x1ec/0x260 [] ? printk+0x5c/0x5e [] wait_for_completion+0x96/0x100 [] ? try_to_wake_up+0x2b0/0x2b0 [] cma_remove_one+0x1a9/0x220 [rdma_cm] [] ib_unregister_device+0x46/0x120 [ib_core] [] mlx4_ib_remove+0x29/0x260 [mlx4_ib] [] mlx4_remove_device+0xa0/0xc0 [mlx4_core] [] mlx4_unregister_interface+0x3b/0xa0 [mlx4_core] [] mlx4_ib_cleanup+0x10/0x23 [mlx4_ib] [] SyS_delete_module+0x152/0x220 [] ? vm_munmap+0x54/0x70 [] system_call_fastpath+0x1a/0x1f With the nfs-rdma-client I got following results: o module unloading blocks forever, dmesg shows: nfsd: last server has exited, flushing export cache svc_xprt_enqueue: threads and transports both waiting?? o BUG: unable to handle kernel paging request at 4dec IP: [] _raw_spin_lock_bh+0x15/0x40 PGD 107ba9a067 PUD 105c093067 PMD 0 Oops: 0002 [#1] SMP Modules linked in: nfsd nfs_acl auth_rpcgss oid_registry svcrdma dm_mod cpuid nfs fscache lockd sunrpc af_packet 8021q garp stp llc rdma_ucm ib_ucm rdma_cm iw_cm ib_ipoib ib_cm ib_uverbs ib_umad mlx4_en mlx4_ib(-) ib_sa ib_mad ib_core ib_addr sr_mod cdrom usb_storage joydev mlx4_core usbhid x86_pkg_temp_thermal coretemp kvm_intel kvm ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper ehci_pci aes_x86_64 ehci_hcd isci iTCO_wdt libsas pcspkr iTCO_vendor_support igb i2c_algo_bit sb_edac lpc_ich edac_core ioatdma usbcore tpm_tis ptp microcode i2c_i801 sg mfd_core scsi_transport_sas ipmi_si usb_common tpm wmi pps_core dca ipmi_msghandler acpi_cpufreq button edd autofs4 xfs libcrc32c crc32c_intel processor thermal_sys scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh CPU: 14 PID: 4813 Comm: modprobe Not tainted 3.15.0-rc5-cel-nfs-rdma-client-unpatched+ #2 Hardware name: Supermicro B9DRG-E/B9DRG-E, BIOS 3.0 09/04/2013 task: 88085bf96190 ti: 88085d42a000 task.ti: 88085d42a000 RIP: 0010:[] [] _raw_spin_lock_bh+0x15/0x40 RSP: 0018:88085d42bd18 EFLAGS: 00010286 RAX: 0001 RBX: 4de8 RCX: RDX: 000b RSI: 000e RDI: 4dec RBP: 88085d42bd18 R08: 88087c611f38 R09: a140 R10: 002b R11: R12: 88085dcc3c00 R13: 88105ca13280 R14: 4dec R15: 4df0 FS: 7f0e49fb5700() GS:88107fcc() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 4dec CR3: 00105b027000 CR4: 000407e0 Stack: 88085d42bd58 a03bd9f0 01328b88 88085dcc3c00 88085dce8000 88105ca13280 88085dce8260 88085dce81c8 88085d42bd78 a0441ce9 88085dce8000 88105ca13240 Call Trace: [] svc_xprt_enqueue+0x50/0x220 [sunrpc] [] rdma_cma_handler+0x69/0x180 [svcrdma] [] cma_remove_one+0x1f6/0x220 [rdma_cm] [] ib_unregister_device+0x46/0x120 [ib_core] [] mlx4_ib_remove+0x29/0x260 [mlx4_ib] [] mlx4_remove_d
Re: Soft lockup in unloading kernel modules
Hi Klemens- On May 13, 2014, at 12:48 PM, Klemens Senn wrote: > Hi Anna, > > today I retried unloading the kernel modules with your updated kernel > and additionally I tried the nfsd-next kernel from J. Bruce Fields and > Chuck's nfs-rdma-client kernel. I filed https://bugzilla.linux-nfs.org/show_bug.cgi?id=252 to track this issue. > In short: None of these was able to unload the kernel modules with an > active connection. > > In detail: > > With your kernel I got following 3 faults: > o BUG: soft lockup - CPU#0 stuck for 22s! [modprobe:4615] > o BUG: unable to handle kernel NULL pointer dereference at > 0003 > o BUG: unable to handle kernel paging request at 5b8c > > With the nfsd-next kernel I got following results: > o BUG: soft lockup - CPU#0 stuck for 23s! [modprobe:4452] > o module unloading blocks forever, dmesg shows: >nfsd: last server has exited, flushing export cache >waiting module removal not supported: please upgrade > o Kernel keeps running but reports the following: >nfsd: last server has exited, flushing export cache >waiting module removal not supported: please upgrade >svc_xprt_enqueue: threads and transports both waiting?? >INFO: task modprobe:4510 blocked for more than 480 seconds. > Not tainted 3.15.0-rc1-bfields-master+ #1 >"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this > message. >modprobeD 88087fc13440 0 4510 4458 0x > 88105bb23c58 0086 88105c14e690 00013440 > 88105bb23fd8 00013440 81a14480 88105c14e690 > 0037 88085d7f74d8 88085d7f74e0 7fff >Call Trace: > [] schedule+0x24/0x70 > [] schedule_timeout+0x1ec/0x260 > [] ? printk+0x5c/0x5e > [] wait_for_completion+0x96/0x100 > [] ? try_to_wake_up+0x2b0/0x2b0 > [] cma_remove_one+0x1a9/0x220 [rdma_cm] > [] ib_unregister_device+0x46/0x120 [ib_core] > [] mlx4_ib_remove+0x29/0x260 [mlx4_ib] > [] mlx4_remove_device+0xa0/0xc0 [mlx4_core] > [] mlx4_unregister_interface+0x3b/0xa0 [mlx4_core] > [] mlx4_ib_cleanup+0x10/0x23 [mlx4_ib] > [] SyS_delete_module+0x152/0x220 > [] ? vm_munmap+0x54/0x70 > [] system_call_fastpath+0x1a/0x1f > > With the nfs-rdma-client I got following results: > o module unloading blocks forever, dmesg shows: >nfsd: last server has exited, flushing export cache >svc_xprt_enqueue: threads and transports both waiting?? > o BUG: unable to handle kernel paging request at 4dec >IP: [] _raw_spin_lock_bh+0x15/0x40 >PGD 107ba9a067 PUD 105c093067 PMD 0 >Oops: 0002 [#1] SMP >Modules linked in: nfsd nfs_acl auth_rpcgss oid_registry svcrdma > dm_mod cpuid nfs fscache lockd sunrpc af_packet 8021q garp stp llc > rdma_ucm ib_ucm rdma_cm iw_cm ib_ipoib ib_cm ib_uverbs ib_umad mlx4_en > mlx4_ib(-) ib_sa ib_mad ib_core ib_addr sr_mod cdrom usb_storage joydev > mlx4_core usbhid x86_pkg_temp_thermal coretemp kvm_intel kvm > ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul > glue_helper ehci_pci aes_x86_64 ehci_hcd isci iTCO_wdt libsas pcspkr > iTCO_vendor_support igb i2c_algo_bit sb_edac lpc_ich edac_core ioatdma > usbcore tpm_tis ptp microcode i2c_i801 sg mfd_core scsi_transport_sas > ipmi_si usb_common tpm wmi pps_core dca ipmi_msghandler acpi_cpufreq > button edd autofs4 xfs libcrc32c crc32c_intel processor thermal_sys > scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh >CPU: 14 PID: 4813 Comm: modprobe Not tainted > 3.15.0-rc5-cel-nfs-rdma-client-unpatched+ #2 >Hardware name: Supermicro B9DRG-E/B9DRG-E, BIOS 3.0 09/04/2013 >task: 88085bf96190 ti: 88085d42a000 task.ti: 88085d42a000 >RIP: 0010:[] [] > _raw_spin_lock_bh+0x15/0x40 >RSP: 0018:88085d42bd18 EFLAGS: 00010286 >RAX: 0001 RBX: 4de8 RCX: >RDX: 000b RSI: 000e RDI: 4dec >RBP: 88085d42bd18 R08: 88087c611f38 R09: a140 >R10: 002b R11: R12: 88085dcc3c00 >R13: 88105ca13280 R14: 4dec R15: 4df0 >FS: 7f0e49fb5700() GS:88107fcc() > knlGS: >CS: 0010 DS: ES: CR0: 80050033 >CR2: 4dec CR3: 00105b027000 CR4: 000407e0 >Stack: > 88085d42bd58 a03bd9f0 01328b88 88085dcc3c00 > 88085dce8000 88105ca13280 88085dce8260 88085dce81c8 > 88085d42bd78 a0441ce9 88085dce8000 88105ca13240 >Call Trace: > [] svc_xprt_enqueue+0x50/0x220 [sunrpc] > [] rdma_cma_handler+0x69/0x180 [svcrdma] > [] cma_remove_one+0x1f6/0x220 [rdma_cm] > [] ib_unregister_device+0x46/0x120 [ib_core] > [] mlx4_ib_remove+0x29/0x260 [mlx4_ib] > [] mlx4_remove_device+0xa0/0xc0 [mlx4_core] > [] mlx4_
Re: Soft lockup in unloading kernel modules
Hi Anna, today I retried unloading the kernel modules with your updated kernel and additionally I tried the nfsd-next kernel from J. Bruce Fields and Chuck's nfs-rdma-client kernel. In short: None of these was able to unload the kernel modules with an active connection. In detail: With your kernel I got following 3 faults: o BUG: soft lockup - CPU#0 stuck for 22s! [modprobe:4615] o BUG: unable to handle kernel NULL pointer dereference at 0003 o BUG: unable to handle kernel paging request at 5b8c With the nfsd-next kernel I got following results: o BUG: soft lockup - CPU#0 stuck for 23s! [modprobe:4452] o module unloading blocks forever, dmesg shows: nfsd: last server has exited, flushing export cache waiting module removal not supported: please upgrade o Kernel keeps running but reports the following: nfsd: last server has exited, flushing export cache waiting module removal not supported: please upgrade svc_xprt_enqueue: threads and transports both waiting?? INFO: task modprobe:4510 blocked for more than 480 seconds. Not tainted 3.15.0-rc1-bfields-master+ #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. modprobeD 88087fc13440 0 4510 4458 0x 88105bb23c58 0086 88105c14e690 00013440 88105bb23fd8 00013440 81a14480 88105c14e690 0037 88085d7f74d8 88085d7f74e0 7fff Call Trace: [] schedule+0x24/0x70 [] schedule_timeout+0x1ec/0x260 [] ? printk+0x5c/0x5e [] wait_for_completion+0x96/0x100 [] ? try_to_wake_up+0x2b0/0x2b0 [] cma_remove_one+0x1a9/0x220 [rdma_cm] [] ib_unregister_device+0x46/0x120 [ib_core] [] mlx4_ib_remove+0x29/0x260 [mlx4_ib] [] mlx4_remove_device+0xa0/0xc0 [mlx4_core] [] mlx4_unregister_interface+0x3b/0xa0 [mlx4_core] [] mlx4_ib_cleanup+0x10/0x23 [mlx4_ib] [] SyS_delete_module+0x152/0x220 [] ? vm_munmap+0x54/0x70 [] system_call_fastpath+0x1a/0x1f With the nfs-rdma-client I got following results: o module unloading blocks forever, dmesg shows: nfsd: last server has exited, flushing export cache svc_xprt_enqueue: threads and transports both waiting?? o BUG: unable to handle kernel paging request at 4dec IP: [] _raw_spin_lock_bh+0x15/0x40 PGD 107ba9a067 PUD 105c093067 PMD 0 Oops: 0002 [#1] SMP Modules linked in: nfsd nfs_acl auth_rpcgss oid_registry svcrdma dm_mod cpuid nfs fscache lockd sunrpc af_packet 8021q garp stp llc rdma_ucm ib_ucm rdma_cm iw_cm ib_ipoib ib_cm ib_uverbs ib_umad mlx4_en mlx4_ib(-) ib_sa ib_mad ib_core ib_addr sr_mod cdrom usb_storage joydev mlx4_core usbhid x86_pkg_temp_thermal coretemp kvm_intel kvm ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper ehci_pci aes_x86_64 ehci_hcd isci iTCO_wdt libsas pcspkr iTCO_vendor_support igb i2c_algo_bit sb_edac lpc_ich edac_core ioatdma usbcore tpm_tis ptp microcode i2c_i801 sg mfd_core scsi_transport_sas ipmi_si usb_common tpm wmi pps_core dca ipmi_msghandler acpi_cpufreq button edd autofs4 xfs libcrc32c crc32c_intel processor thermal_sys scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh CPU: 14 PID: 4813 Comm: modprobe Not tainted 3.15.0-rc5-cel-nfs-rdma-client-unpatched+ #2 Hardware name: Supermicro B9DRG-E/B9DRG-E, BIOS 3.0 09/04/2013 task: 88085bf96190 ti: 88085d42a000 task.ti: 88085d42a000 RIP: 0010:[] [] _raw_spin_lock_bh+0x15/0x40 RSP: 0018:88085d42bd18 EFLAGS: 00010286 RAX: 0001 RBX: 4de8 RCX: RDX: 000b RSI: 000e RDI: 4dec RBP: 88085d42bd18 R08: 88087c611f38 R09: a140 R10: 002b R11: R12: 88085dcc3c00 R13: 88105ca13280 R14: 4dec R15: 4df0 FS: 7f0e49fb5700() GS:88107fcc() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 4dec CR3: 00105b027000 CR4: 000407e0 Stack: 88085d42bd58 a03bd9f0 01328b88 88085dcc3c00 88085dce8000 88105ca13280 88085dce8260 88085dce81c8 88085d42bd78 a0441ce9 88085dce8000 88105ca13240 Call Trace: [] svc_xprt_enqueue+0x50/0x220 [sunrpc] [] rdma_cma_handler+0x69/0x180 [svcrdma] [] cma_remove_one+0x1f6/0x220 [rdma_cm] [] ib_unregister_device+0x46/0x120 [ib_core] [] mlx4_ib_remove+0x29/0x260 [mlx4_ib] [] mlx4_remove_device+0xa0/0xc0 [mlx4_core] [] mlx4_unregister_interface+0x3b/0xa0 [mlx4_core] [] mlx4_ib_cleanup+0x10/0x23 [mlx4_ib] [] SyS_delete_module+0x170/0x1f0 [] ? vm_munmap+0x54/0x70 [] system_call_fastpath+0x1a/0x1f Code: 5d c3 0f b7 17 66 39 ca 74 f6 f3 90 0f b7 17 66 39 d1 75 f6 5d c3 55 65 81
Re: Soft lockup in unloading kernel modules
I haven't applied Chuck's recent (v3) patches to that kernel yet (I've been waiting to see if people have comments). I'll try to push something out today. On 05/08/2014 10:28 AM, Senn Klemens wrote: > Hi, > > I am getting a soft lockup on the NFS server on its reboot if at least > one client mount is established. I am using OpenSUSE 12.3 with the > nfs-rdma kernel from Anna Schumaker > (git://git.linux-nfs.org/projects/anna/nfs-rdma.git). > > The export on the server side is done with > /data *(fsid=0,crossmnt,rw,mp,no_root_squash,sync,no_subtree_check,insecure) > > Following command is used for mounting the NFSv4 share: > mount -t nfs -o port=20049,rdma,vers=4.0,timeo=900 172.16.100.19:/ /mnt > > The HCA is a Mellanox MT4099 on the server and the client. > > The soft lockup can be reproduced by following steps: > o server: Start the nfs server > o client: Mount the share > o client: Do a "ls" in the mounted directory > o server: Stop the nfs server > o server: Unload the nfs and mlx4 modules or reboot the server (I used > the openibd init script from the Mellanox driver without having the > Mellanox stack installed) > > The server reports a soft lockup > BUG: soft lockup - CPU#0 stuck for 22s! [modprobe:6146] > most times. > > Sometimes I get following kernel panic > BUG: unable to handle kernel NULL pointer dereference at 0003 > IP: [] _raw_spin_lock_bh+0x15/0x40 > PGD 82a820067 PUD 857832067 PMD 0 > Oops: 0002 [#1] SMP > Modules linked in: nfsd nfs_acl auth_rpcgss oid_registry nfnetlink_log > nfnetlink bluetooth rfkill nfsv4 svcrdma dm_mod cpuid nfs fscache lockd > sunrpc af_packet 8021q garp stp llc rdma_ucm ib_ucm rdma_cm iw_cm > ib_ipoib ib_cm ib_uverbs ib_umad mlx4_en mlx4_ib(-) ib_sa ib_mad ib_core > ib_addr sr_mod cdrom usb_storage joydev mlx4_core usbhid > x86_pkg_temp_thermal coretemp kvm_intel kvm ghash_clmulni_intel > aesni_intel ablk_helper cryptd iTCO_wdt lrw igb gf128mul > iTCO_vendor_support ehci_pci glue_helper pcspkr i2c_algo_bit isci > ehci_hcd aes_x86_64 ptp libsas ioatdma lpc_ich microcode sb_edac sg > pps_core usbcore ipmi_si tpm_tis edac_core scsi_transport_sas i2c_i801 > mfd_core dca usb_common tpm ipmi_msghandler wmi acpi_cpufreq button edd > autofs4 xfs libcrc32c crc32c_intel processor thermal_sys scsi_dh_rdac > scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh [last unloaded: oid_registry] > CPU: 0 PID: 6603 Comm: modprobe Not tainted 3.15.0-rc2-anna-nfs-rdma+ #3 > Hardware name: Supermicro B9DRG-E/B9DRG-E, BIOS 3.0 09/04/2013 > task: 88105b8c6050 ti: 88105d814000 task.ti: 88105d814000 > RIP: 0010:[] [] > _raw_spin_lock_bh+0x15/0x40 > RSP: 0018:88105d815d18 EFLAGS: 00010286 > RAX: 0001 RBX: RCX: > RDX: 000b RSI: RDI: 0003 > RBP: 88105d815d18 R08: 88087c611f38 R09: 0001 > R10: R11: R12: 88087c3c9800 > R13: 88107b82ab00 R14: 0003 R15: 0007 > FS: 7fef64612700() GS:88087fc0() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 0003 CR3: 00087c2c7000 CR4: 000407f0 > Stack: > 88105d815d58 a05199f0 88105d815d88 88087c3c9800 > 88087c3c9400 88107b82ab00 88087c3c9660 88087c3c95c8 > 88105d815d78 a0421ce9 88087c3c9400 88107b82aac0 > Call Trace: > [] svc_xprt_enqueue+0x50/0x220 [sunrpc] > [] rdma_cma_handler+0x69/0x180 [svcrdma] > [] cma_remove_one+0x1f6/0x220 [rdma_cm] > [] ib_unregister_device+0x46/0x120 [ib_core] > [] mlx4_ib_remove+0x29/0x260 [mlx4_ib] > [] mlx4_remove_device+0xa0/0xc0 [mlx4_core] > [] mlx4_unregister_interface+0x3b/0xa0 [mlx4_core] > [] mlx4_ib_cleanup+0x10/0x23 [mlx4_ib] > [] SyS_delete_module+0x152/0x220 > [] ? vm_munmap+0x54/0x70 > [] system_call_fastpath+0x1a/0x1f > Code: 5d c3 0f b7 17 66 39 ca 74 f6 f3 90 0f b7 17 66 39 d1 75 f6 5d c3 > 55 65 81 04 25 20 b9 00 00 00 02 00 00 48 89 e5 b8 00 00 01 00 0f > c1 07 89 c2 c1 ea 10 66 39 c2 75 04 5d c3 f3 90 0f b7 07 > RIP [] _raw_spin_lock_bh+0x15/0x40 > RSP > CR2: 0003 > ---[ end trace 18e02ff413ac4b9b ]--- > Kernel panic - not syncing: Fatal exception in interrupt > Kernel Offset: 0x0 from 0x8100 (relocation range: > 0x8000-0x9fff) > ---[ end Kernel panic - not syncing: Fatal exception in interrupt > > Kind regards, > Klemens > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html