Re: Soft lockup in unloading kernel modules

2014-05-19 Thread Shirley Ma

Klements,

Can you add more details on how to unloading the modules (step by step) 
in the bug report?


Thanks
Shirley

On 05/19/2014 10:51 AM, Chuck Lever wrote:

Hi Klemens-

On May 13, 2014, at 12:48 PM, Klemens Senn  wrote:


Hi Anna,

today I retried unloading the kernel modules with your updated kernel
and additionally I tried the nfsd-next kernel from J. Bruce Fields and
Chuck's nfs-rdma-client kernel.

I filed

   https://bugzilla.linux-nfs.org/show_bug.cgi?id=252

to track this issue.



In short: None of these was able to unload the kernel modules with an
active connection.

In detail:

With your kernel I got following 3 faults:
  o BUG: soft lockup - CPU#0 stuck for 22s! [modprobe:4615]
  o BUG: unable to handle kernel NULL pointer dereference at
0003
  o BUG: unable to handle kernel paging request at 5b8c

With the nfsd-next kernel I got following results:
  o BUG: soft lockup - CPU#0 stuck for 23s! [modprobe:4452]
  o module unloading blocks forever, dmesg shows:
nfsd: last server has exited, flushing export cache
waiting module removal not supported: please upgrade
  o Kernel keeps running but reports the following:
nfsd: last server has exited, flushing export cache
waiting module removal not supported: please upgrade
svc_xprt_enqueue: threads and transports both waiting??
INFO: task modprobe:4510 blocked for more than 480 seconds.
  Not tainted 3.15.0-rc1-bfields-master+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
modprobeD 88087fc13440 0  4510   4458 0x
 88105bb23c58 0086 88105c14e690 00013440
 88105bb23fd8 00013440 81a14480 88105c14e690
 0037 88085d7f74d8 88085d7f74e0 7fff
Call Trace:
 [] schedule+0x24/0x70
 [] schedule_timeout+0x1ec/0x260
 [] ? printk+0x5c/0x5e
 [] wait_for_completion+0x96/0x100
 [] ? try_to_wake_up+0x2b0/0x2b0
 [] cma_remove_one+0x1a9/0x220 [rdma_cm]
 [] ib_unregister_device+0x46/0x120 [ib_core]
 [] mlx4_ib_remove+0x29/0x260 [mlx4_ib]
 [] mlx4_remove_device+0xa0/0xc0 [mlx4_core]
 [] mlx4_unregister_interface+0x3b/0xa0 [mlx4_core]
 [] mlx4_ib_cleanup+0x10/0x23 [mlx4_ib]
 [] SyS_delete_module+0x152/0x220
 [] ? vm_munmap+0x54/0x70
 [] system_call_fastpath+0x1a/0x1f

With the nfs-rdma-client I got following results:
  o module unloading blocks forever, dmesg shows:
nfsd: last server has exited, flushing export cache
svc_xprt_enqueue: threads and transports both waiting??
  o BUG: unable to handle kernel paging request at 4dec
IP: [] _raw_spin_lock_bh+0x15/0x40
PGD 107ba9a067 PUD 105c093067 PMD 0
Oops: 0002 [#1] SMP
Modules linked in: nfsd nfs_acl auth_rpcgss oid_registry svcrdma
dm_mod cpuid nfs fscache lockd sunrpc af_packet 8021q garp stp llc
rdma_ucm ib_ucm rdma_cm iw_cm ib_ipoib ib_cm ib_uverbs ib_umad mlx4_en
mlx4_ib(-) ib_sa ib_mad ib_core ib_addr sr_mod cdrom usb_storage joydev
mlx4_core usbhid x86_pkg_temp_thermal coretemp kvm_intel kvm
ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul
glue_helper ehci_pci aes_x86_64 ehci_hcd isci iTCO_wdt libsas pcspkr
iTCO_vendor_support igb i2c_algo_bit sb_edac lpc_ich edac_core ioatdma
usbcore tpm_tis ptp microcode i2c_i801 sg mfd_core scsi_transport_sas
ipmi_si usb_common tpm wmi pps_core dca ipmi_msghandler acpi_cpufreq
button edd autofs4 xfs libcrc32c crc32c_intel processor thermal_sys
scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh
CPU: 14 PID: 4813 Comm: modprobe Not tainted
3.15.0-rc5-cel-nfs-rdma-client-unpatched+ #2
Hardware name: Supermicro B9DRG-E/B9DRG-E, BIOS 3.0 09/04/2013
task: 88085bf96190 ti: 88085d42a000 task.ti: 88085d42a000
RIP: 0010:[]  []
_raw_spin_lock_bh+0x15/0x40
RSP: 0018:88085d42bd18  EFLAGS: 00010286
RAX: 0001 RBX: 4de8 RCX: 
RDX: 000b RSI: 000e RDI: 4dec
RBP: 88085d42bd18 R08: 88087c611f38 R09: a140
R10: 002b R11:  R12: 88085dcc3c00
R13: 88105ca13280 R14: 4dec R15: 4df0
FS:  7f0e49fb5700() GS:88107fcc()
knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 4dec CR3: 00105b027000 CR4: 000407e0
Stack:
 88085d42bd58 a03bd9f0 01328b88 88085dcc3c00
 88085dce8000 88105ca13280 88085dce8260 88085dce81c8
 88085d42bd78 a0441ce9 88085dce8000 88105ca13240
Call Trace:
 [] svc_xprt_enqueue+0x50/0x220 [sunrpc]
 [] rdma_cma_handler+0x69/0x180 [svcrdma]
 [] cma_remove_one+0x1f6/0x220 [rdma_cm]
 [] ib_unregister_device+0x46/0x120 [ib_core]
 [] mlx4_ib_remove+0x29/0x260 [mlx4_ib]
 [] mlx4_remove_d

Re: Soft lockup in unloading kernel modules

2014-05-19 Thread Chuck Lever
Hi Klemens-

On May 13, 2014, at 12:48 PM, Klemens Senn  wrote:

> Hi Anna,
> 
> today I retried unloading the kernel modules with your updated kernel
> and additionally I tried the nfsd-next kernel from J. Bruce Fields and
> Chuck's nfs-rdma-client kernel.

I filed

  https://bugzilla.linux-nfs.org/show_bug.cgi?id=252

to track this issue.


> In short: None of these was able to unload the kernel modules with an
> active connection.
> 
> In detail:
> 
> With your kernel I got following 3 faults:
>  o BUG: soft lockup - CPU#0 stuck for 22s! [modprobe:4615]
>  o BUG: unable to handle kernel NULL pointer dereference at
> 0003
>  o BUG: unable to handle kernel paging request at 5b8c
> 
> With the nfsd-next kernel I got following results:
>  o BUG: soft lockup - CPU#0 stuck for 23s! [modprobe:4452]
>  o module unloading blocks forever, dmesg shows:
>nfsd: last server has exited, flushing export cache
>waiting module removal not supported: please upgrade
>  o Kernel keeps running but reports the following:
>nfsd: last server has exited, flushing export cache
>waiting module removal not supported: please upgrade
>svc_xprt_enqueue: threads and transports both waiting??
>INFO: task modprobe:4510 blocked for more than 480 seconds.
>  Not tainted 3.15.0-rc1-bfields-master+ #1
>"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
>modprobeD 88087fc13440 0  4510   4458 0x
> 88105bb23c58 0086 88105c14e690 00013440
> 88105bb23fd8 00013440 81a14480 88105c14e690
> 0037 88085d7f74d8 88085d7f74e0 7fff
>Call Trace:
> [] schedule+0x24/0x70
> [] schedule_timeout+0x1ec/0x260
> [] ? printk+0x5c/0x5e
> [] wait_for_completion+0x96/0x100
> [] ? try_to_wake_up+0x2b0/0x2b0
> [] cma_remove_one+0x1a9/0x220 [rdma_cm]
> [] ib_unregister_device+0x46/0x120 [ib_core]
> [] mlx4_ib_remove+0x29/0x260 [mlx4_ib]
> [] mlx4_remove_device+0xa0/0xc0 [mlx4_core]
> [] mlx4_unregister_interface+0x3b/0xa0 [mlx4_core]
> [] mlx4_ib_cleanup+0x10/0x23 [mlx4_ib]
> [] SyS_delete_module+0x152/0x220
> [] ? vm_munmap+0x54/0x70
> [] system_call_fastpath+0x1a/0x1f
> 
> With the nfs-rdma-client I got following results:
>  o module unloading blocks forever, dmesg shows:
>nfsd: last server has exited, flushing export cache
>svc_xprt_enqueue: threads and transports both waiting??
>  o BUG: unable to handle kernel paging request at 4dec
>IP: [] _raw_spin_lock_bh+0x15/0x40
>PGD 107ba9a067 PUD 105c093067 PMD 0
>Oops: 0002 [#1] SMP
>Modules linked in: nfsd nfs_acl auth_rpcgss oid_registry svcrdma
> dm_mod cpuid nfs fscache lockd sunrpc af_packet 8021q garp stp llc
> rdma_ucm ib_ucm rdma_cm iw_cm ib_ipoib ib_cm ib_uverbs ib_umad mlx4_en
> mlx4_ib(-) ib_sa ib_mad ib_core ib_addr sr_mod cdrom usb_storage joydev
> mlx4_core usbhid x86_pkg_temp_thermal coretemp kvm_intel kvm
> ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul
> glue_helper ehci_pci aes_x86_64 ehci_hcd isci iTCO_wdt libsas pcspkr
> iTCO_vendor_support igb i2c_algo_bit sb_edac lpc_ich edac_core ioatdma
> usbcore tpm_tis ptp microcode i2c_i801 sg mfd_core scsi_transport_sas
> ipmi_si usb_common tpm wmi pps_core dca ipmi_msghandler acpi_cpufreq
> button edd autofs4 xfs libcrc32c crc32c_intel processor thermal_sys
> scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh
>CPU: 14 PID: 4813 Comm: modprobe Not tainted
> 3.15.0-rc5-cel-nfs-rdma-client-unpatched+ #2
>Hardware name: Supermicro B9DRG-E/B9DRG-E, BIOS 3.0 09/04/2013
>task: 88085bf96190 ti: 88085d42a000 task.ti: 88085d42a000
>RIP: 0010:[]  []
> _raw_spin_lock_bh+0x15/0x40
>RSP: 0018:88085d42bd18  EFLAGS: 00010286
>RAX: 0001 RBX: 4de8 RCX: 
>RDX: 000b RSI: 000e RDI: 4dec
>RBP: 88085d42bd18 R08: 88087c611f38 R09: a140
>R10: 002b R11:  R12: 88085dcc3c00
>R13: 88105ca13280 R14: 4dec R15: 4df0
>FS:  7f0e49fb5700() GS:88107fcc()
> knlGS:
>CS:  0010 DS:  ES:  CR0: 80050033
>CR2: 4dec CR3: 00105b027000 CR4: 000407e0
>Stack:
> 88085d42bd58 a03bd9f0 01328b88 88085dcc3c00
> 88085dce8000 88105ca13280 88085dce8260 88085dce81c8
> 88085d42bd78 a0441ce9 88085dce8000 88105ca13240
>Call Trace:
> [] svc_xprt_enqueue+0x50/0x220 [sunrpc]
> [] rdma_cma_handler+0x69/0x180 [svcrdma]
> [] cma_remove_one+0x1f6/0x220 [rdma_cm]
> [] ib_unregister_device+0x46/0x120 [ib_core]
> [] mlx4_ib_remove+0x29/0x260 [mlx4_ib]
> [] mlx4_remove_device+0xa0/0xc0 [mlx4_core]
> [] mlx4_

Re: Soft lockup in unloading kernel modules

2014-05-13 Thread Klemens Senn
Hi Anna,

today I retried unloading the kernel modules with your updated kernel
and additionally I tried the nfsd-next kernel from J. Bruce Fields and
Chuck's nfs-rdma-client kernel.

In short: None of these was able to unload the kernel modules with an
active connection.

In detail:

With your kernel I got following 3 faults:
  o BUG: soft lockup - CPU#0 stuck for 22s! [modprobe:4615]
  o BUG: unable to handle kernel NULL pointer dereference at
0003
  o BUG: unable to handle kernel paging request at 5b8c

With the nfsd-next kernel I got following results:
  o BUG: soft lockup - CPU#0 stuck for 23s! [modprobe:4452]
  o module unloading blocks forever, dmesg shows:
nfsd: last server has exited, flushing export cache
waiting module removal not supported: please upgrade
  o Kernel keeps running but reports the following:
nfsd: last server has exited, flushing export cache
waiting module removal not supported: please upgrade
svc_xprt_enqueue: threads and transports both waiting??
INFO: task modprobe:4510 blocked for more than 480 seconds.
  Not tainted 3.15.0-rc1-bfields-master+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
modprobeD 88087fc13440 0  4510   4458 0x
 88105bb23c58 0086 88105c14e690 00013440
 88105bb23fd8 00013440 81a14480 88105c14e690
 0037 88085d7f74d8 88085d7f74e0 7fff
Call Trace:
 [] schedule+0x24/0x70
 [] schedule_timeout+0x1ec/0x260
 [] ? printk+0x5c/0x5e
 [] wait_for_completion+0x96/0x100
 [] ? try_to_wake_up+0x2b0/0x2b0
 [] cma_remove_one+0x1a9/0x220 [rdma_cm]
 [] ib_unregister_device+0x46/0x120 [ib_core]
 [] mlx4_ib_remove+0x29/0x260 [mlx4_ib]
 [] mlx4_remove_device+0xa0/0xc0 [mlx4_core]
 [] mlx4_unregister_interface+0x3b/0xa0 [mlx4_core]
 [] mlx4_ib_cleanup+0x10/0x23 [mlx4_ib]
 [] SyS_delete_module+0x152/0x220
 [] ? vm_munmap+0x54/0x70
 [] system_call_fastpath+0x1a/0x1f

With the nfs-rdma-client I got following results:
  o module unloading blocks forever, dmesg shows:
nfsd: last server has exited, flushing export cache
svc_xprt_enqueue: threads and transports both waiting??
  o BUG: unable to handle kernel paging request at 4dec
IP: [] _raw_spin_lock_bh+0x15/0x40
PGD 107ba9a067 PUD 105c093067 PMD 0
Oops: 0002 [#1] SMP
Modules linked in: nfsd nfs_acl auth_rpcgss oid_registry svcrdma
dm_mod cpuid nfs fscache lockd sunrpc af_packet 8021q garp stp llc
rdma_ucm ib_ucm rdma_cm iw_cm ib_ipoib ib_cm ib_uverbs ib_umad mlx4_en
mlx4_ib(-) ib_sa ib_mad ib_core ib_addr sr_mod cdrom usb_storage joydev
mlx4_core usbhid x86_pkg_temp_thermal coretemp kvm_intel kvm
ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul
glue_helper ehci_pci aes_x86_64 ehci_hcd isci iTCO_wdt libsas pcspkr
iTCO_vendor_support igb i2c_algo_bit sb_edac lpc_ich edac_core ioatdma
usbcore tpm_tis ptp microcode i2c_i801 sg mfd_core scsi_transport_sas
ipmi_si usb_common tpm wmi pps_core dca ipmi_msghandler acpi_cpufreq
button edd autofs4 xfs libcrc32c crc32c_intel processor thermal_sys
scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh
CPU: 14 PID: 4813 Comm: modprobe Not tainted
3.15.0-rc5-cel-nfs-rdma-client-unpatched+ #2
Hardware name: Supermicro B9DRG-E/B9DRG-E, BIOS 3.0 09/04/2013
task: 88085bf96190 ti: 88085d42a000 task.ti: 88085d42a000
RIP: 0010:[]  []
_raw_spin_lock_bh+0x15/0x40
RSP: 0018:88085d42bd18  EFLAGS: 00010286
RAX: 0001 RBX: 4de8 RCX: 
RDX: 000b RSI: 000e RDI: 4dec
RBP: 88085d42bd18 R08: 88087c611f38 R09: a140
R10: 002b R11:  R12: 88085dcc3c00
R13: 88105ca13280 R14: 4dec R15: 4df0
FS:  7f0e49fb5700() GS:88107fcc()
knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 4dec CR3: 00105b027000 CR4: 000407e0
Stack:
 88085d42bd58 a03bd9f0 01328b88 88085dcc3c00
 88085dce8000 88105ca13280 88085dce8260 88085dce81c8
 88085d42bd78 a0441ce9 88085dce8000 88105ca13240
Call Trace:
 [] svc_xprt_enqueue+0x50/0x220 [sunrpc]
 [] rdma_cma_handler+0x69/0x180 [svcrdma]
 [] cma_remove_one+0x1f6/0x220 [rdma_cm]
 [] ib_unregister_device+0x46/0x120 [ib_core]
 [] mlx4_ib_remove+0x29/0x260 [mlx4_ib]
 [] mlx4_remove_device+0xa0/0xc0 [mlx4_core]
 [] mlx4_unregister_interface+0x3b/0xa0 [mlx4_core]
 [] mlx4_ib_cleanup+0x10/0x23 [mlx4_ib]
 [] SyS_delete_module+0x170/0x1f0
 [] ? vm_munmap+0x54/0x70
 [] system_call_fastpath+0x1a/0x1f
Code: 5d c3 0f b7 17 66 39 ca 74 f6 f3 90 0f b7 17 66 39 d1 75 f6 5d
c3 55 65 81 

Re: Soft lockup in unloading kernel modules

2014-05-08 Thread Anna Schumaker
I haven't applied Chuck's recent (v3) patches to that kernel yet (I've been 
waiting to see if people have comments).  I'll try to push something out today.

On 05/08/2014 10:28 AM, Senn Klemens wrote:
> Hi,
>
> I am getting a soft lockup on the NFS server on its reboot if at least
> one client mount is established. I am using OpenSUSE 12.3 with the
> nfs-rdma kernel from Anna Schumaker
> (git://git.linux-nfs.org/projects/anna/nfs-rdma.git).
>
> The export on the server side is done with
> /data *(fsid=0,crossmnt,rw,mp,no_root_squash,sync,no_subtree_check,insecure)
>
> Following command is used for mounting the NFSv4 share:
> mount -t nfs -o port=20049,rdma,vers=4.0,timeo=900 172.16.100.19:/ /mnt
>
> The HCA is a Mellanox MT4099 on the server and the client.
>
> The soft lockup can be reproduced by following steps:
>   o server: Start the nfs server
>   o client: Mount the share
>   o client: Do a "ls" in the mounted directory
>   o server: Stop the nfs server
>   o server: Unload the nfs and mlx4 modules or reboot the server (I used
> the openibd init script from the Mellanox driver without having the
> Mellanox stack installed)
>
> The server reports a soft lockup
>   BUG: soft lockup - CPU#0 stuck for 22s! [modprobe:6146]
> most times.
>
> Sometimes I get following kernel panic
> BUG: unable to handle kernel NULL pointer dereference at 0003
> IP: [] _raw_spin_lock_bh+0x15/0x40
> PGD 82a820067 PUD 857832067 PMD 0
> Oops: 0002 [#1] SMP
> Modules linked in: nfsd nfs_acl auth_rpcgss oid_registry nfnetlink_log
> nfnetlink bluetooth rfkill nfsv4 svcrdma dm_mod cpuid nfs fscache lockd
> sunrpc af_packet 8021q garp stp llc rdma_ucm ib_ucm rdma_cm iw_cm
> ib_ipoib ib_cm ib_uverbs ib_umad mlx4_en mlx4_ib(-) ib_sa ib_mad ib_core
> ib_addr sr_mod cdrom usb_storage joydev mlx4_core usbhid
> x86_pkg_temp_thermal coretemp kvm_intel kvm ghash_clmulni_intel
> aesni_intel ablk_helper cryptd iTCO_wdt lrw igb gf128mul
> iTCO_vendor_support ehci_pci glue_helper pcspkr i2c_algo_bit isci
> ehci_hcd aes_x86_64 ptp libsas ioatdma lpc_ich microcode sb_edac sg
> pps_core usbcore ipmi_si tpm_tis edac_core scsi_transport_sas i2c_i801
> mfd_core dca usb_common tpm ipmi_msghandler wmi acpi_cpufreq button edd
> autofs4 xfs libcrc32c crc32c_intel processor thermal_sys scsi_dh_rdac
> scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh [last unloaded: oid_registry]
> CPU: 0 PID: 6603 Comm: modprobe Not tainted 3.15.0-rc2-anna-nfs-rdma+ #3
> Hardware name: Supermicro B9DRG-E/B9DRG-E, BIOS 3.0 09/04/2013
> task: 88105b8c6050 ti: 88105d814000 task.ti: 88105d814000
> RIP: 0010:[]  []
> _raw_spin_lock_bh+0x15/0x40
> RSP: 0018:88105d815d18  EFLAGS: 00010286
> RAX: 0001 RBX:  RCX: 
> RDX: 000b RSI:  RDI: 0003
> RBP: 88105d815d18 R08: 88087c611f38 R09: 0001
> R10:  R11:  R12: 88087c3c9800
> R13: 88107b82ab00 R14: 0003 R15: 0007
> FS:  7fef64612700() GS:88087fc0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 0003 CR3: 00087c2c7000 CR4: 000407f0
> Stack:
>  88105d815d58 a05199f0 88105d815d88 88087c3c9800
>  88087c3c9400 88107b82ab00 88087c3c9660 88087c3c95c8
>  88105d815d78 a0421ce9 88087c3c9400 88107b82aac0
> Call Trace:
>  [] svc_xprt_enqueue+0x50/0x220 [sunrpc]
>  [] rdma_cma_handler+0x69/0x180 [svcrdma]
>  [] cma_remove_one+0x1f6/0x220 [rdma_cm]
>  [] ib_unregister_device+0x46/0x120 [ib_core]
>  [] mlx4_ib_remove+0x29/0x260 [mlx4_ib]
>  [] mlx4_remove_device+0xa0/0xc0 [mlx4_core]
>  [] mlx4_unregister_interface+0x3b/0xa0 [mlx4_core]
>  [] mlx4_ib_cleanup+0x10/0x23 [mlx4_ib]
>  [] SyS_delete_module+0x152/0x220
>  [] ? vm_munmap+0x54/0x70
>  [] system_call_fastpath+0x1a/0x1f
> Code: 5d c3 0f b7 17 66 39 ca 74 f6 f3 90 0f b7 17 66 39 d1 75 f6 5d c3
> 55 65 81 04 25 20 b9 00 00 00 02 00 00 48 89 e5 b8 00 00 01 00  0f
> c1 07 89 c2 c1 ea 10 66 39 c2 75 04 5d c3 f3 90 0f b7 07
> RIP  [] _raw_spin_lock_bh+0x15/0x40
>  RSP 
> CR2: 0003
> ---[ end trace 18e02ff413ac4b9b ]---
> Kernel panic - not syncing: Fatal exception in interrupt
> Kernel Offset: 0x0 from 0x8100 (relocation range:
> 0x8000-0x9fff)
> ---[ end Kernel panic - not syncing: Fatal exception in interrupt
>
> Kind regards,
> Klemens
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html