Re: 2.6.22-rc3-mm1 - pppd hanging in netdev_run_todo while holding mutex

2007-06-06 Thread Andrew Morton
On Mon, 04 Jun 2007 14:00:56 -0400 [EMAIL PROTECTED] wrote:

 On Wed, 30 May 2007 23:58:23 PDT, Andrew Morton said:
  ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc3/2.6.22-rc3-mm1/
 
 Under 22-rc2-mm1, if my VPN connection got reset, ppp0 just quietly went away.
 
 Under 22-rc3-mm1, it seems to end up wedged and waiting for references to
 go away:
 
 Jun  4 09:23:01 turing-police kernel: [90089.270707] unregister_netdevice: 
 waiting for ppp0 to become free. Usage count = 8
 Jun  4 09:23:11 turing-police kernel: [90099.396121] unregister_netdevice: 
 waiting for ppp0 to become free. Usage count = 8
 Jun  4 09:23:21 turing-police kernel: [90109.520574] unregister_netdevice: 
 waiting for ppp0 to become free. Usage count = 8
 Jun  4 09:23:32 turing-police kernel: [90119.653129] unregister_netdevice: 
 waiting for ppp0 to become free. Usage count = 8

Interesting refcount.

 'echo t  /proc/sysrq_trigger' shows pppd hung up here:
 
 Jun  4 10:52:57 turing-police kernel: [95478.047892] pppd  D 
 000105ad3830  4968  3815  1 (NOTLB)
 Jun  4 10:52:57 turing-police kernel: [95478.047902]  810008d5fd78 
 0086  81000349
 Jun  4 10:52:57 turing-police kernel: [95478.047911]  810008d5fd28 
 810008d4a040 810003461820 810008d4a2b0
 Jun  4 10:52:57 turing-police kernel: [95478.047920]  000105ad3733 
 0202 00ff 80239795
 Jun  4 10:52:57 turing-police kernel: [95478.047928] Call Trace:
 Jun  4 10:52:57 turing-police kernel: [95478.047936]  [805207a2] 
 schedule_timeout+0x8d/0xb4
 Jun  4 10:52:57 turing-police kernel: [95478.047945]  [805207e2] 
 schedule_timeout_uninterruptible+0x19/0x1b
 Jun  4 10:52:57 turing-police kernel: [95478.047954]  [802397bb] 
 msleep+0x14/0x1e
 Jun  4 10:52:57 turing-police kernel: [95478.047963]  [8048aa4e] 
 netdev_run_todo+0x12f/0x234 
 Jun  4 10:52:57 turing-police kernel: [95478.047972]  [8049166f] 
 rtnl_unlock+0x35/0x37
 Jun  4 10:52:57 turing-police kernel: [95478.047981]  [804894a9] 
 unregister_netdev+0x1e/0x23
 Jun  4 10:52:57 turing-police kernel: [95478.047994]  [88a5f2c2] 
 :ppp_generic:ppp_shutdown_interface+0x67/0xbb
 Jun  4 10:52:57 turing-police kernel: [95478.048018]  [88a5f5b8] 
 :ppp_generic:ppp_release+0x33/0x65
 Jun  4 10:52:57 turing-police kernel: [95478.048028]  [8028d54a] 
 __fput+0xac/0x176
 Jun  4 10:52:57 turing-police kernel: [95478.048036]  [8028d628] 
 fput+0x14/0x16
 Jun  4 10:52:57 turing-police kernel: [95478.048045]  [8028a9c6] 
 filp_close+0x66/0x71
 Jun  4 10:52:57 turing-police kernel: [95478.048054]  [8028bd54] 
 sys_close+0x98/0xd7
 Jun  4 10:52:57 turing-police kernel: [95478.048062]  [8020a03c] 
 tracesys+0xdc/0xe1
 Jun  4 10:52:57 turing-police kernel: [95478.048073]  [2b45cd2429a0]

I don't know what could have caused this, sorry.  If it's still there in next 
-mm
(which is still 10 compile fixes away) it'd be good if you could bisect it.
Suspects would be git-net.patch, get-netdev-all.patch and gregkh-driver-*.patch

Thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


2.6.22-rc3-mm1 - pppd hanging in netdev_run_todo while holding mutex

2007-06-04 Thread Valdis . Kletnieks
On Wed, 30 May 2007 23:58:23 PDT, Andrew Morton said:
 ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc3/2.6.22-rc3-mm1/

Under 22-rc2-mm1, if my VPN connection got reset, ppp0 just quietly went away.

Under 22-rc3-mm1, it seems to end up wedged and waiting for references to
go away:

Jun  4 09:23:01 turing-police kernel: [90089.270707] unregister_netdevice: 
waiting for ppp0 to become free. Usage count = 8
Jun  4 09:23:11 turing-police kernel: [90099.396121] unregister_netdevice: 
waiting for ppp0 to become free. Usage count = 8
Jun  4 09:23:21 turing-police kernel: [90109.520574] unregister_netdevice: 
waiting for ppp0 to become free. Usage count = 8
Jun  4 09:23:32 turing-police kernel: [90119.653129] unregister_netdevice: 
waiting for ppp0 to become free. Usage count = 8

'echo t  /proc/sysrq_trigger' shows pppd hung up here:

Jun  4 10:52:57 turing-police kernel: [95478.047892] pppd  D 
000105ad3830  4968  3815  1 (NOTLB)
Jun  4 10:52:57 turing-police kernel: [95478.047902]  810008d5fd78 
0086  81000349
Jun  4 10:52:57 turing-police kernel: [95478.047911]  810008d5fd28 
810008d4a040 810003461820 810008d4a2b0
Jun  4 10:52:57 turing-police kernel: [95478.047920]  000105ad3733 
0202 00ff 80239795
Jun  4 10:52:57 turing-police kernel: [95478.047928] Call Trace:
Jun  4 10:52:57 turing-police kernel: [95478.047936]  [805207a2] 
schedule_timeout+0x8d/0xb4
Jun  4 10:52:57 turing-police kernel: [95478.047945]  [805207e2] 
schedule_timeout_uninterruptible+0x19/0x1b
Jun  4 10:52:57 turing-police kernel: [95478.047954]  [802397bb] 
msleep+0x14/0x1e
Jun  4 10:52:57 turing-police kernel: [95478.047963]  [8048aa4e] 
netdev_run_todo+0x12f/0x234 
Jun  4 10:52:57 turing-police kernel: [95478.047972]  [8049166f] 
rtnl_unlock+0x35/0x37
Jun  4 10:52:57 turing-police kernel: [95478.047981]  [804894a9] 
unregister_netdev+0x1e/0x23
Jun  4 10:52:57 turing-police kernel: [95478.047994]  [88a5f2c2] 
:ppp_generic:ppp_shutdown_interface+0x67/0xbb
Jun  4 10:52:57 turing-police kernel: [95478.048018]  [88a5f5b8] 
:ppp_generic:ppp_release+0x33/0x65
Jun  4 10:52:57 turing-police kernel: [95478.048028]  [8028d54a] 
__fput+0xac/0x176
Jun  4 10:52:57 turing-police kernel: [95478.048036]  [8028d628] 
fput+0x14/0x16
Jun  4 10:52:57 turing-police kernel: [95478.048045]  [8028a9c6] 
filp_close+0x66/0x71
Jun  4 10:52:57 turing-police kernel: [95478.048054]  [8028bd54] 
sys_close+0x98/0xd7
Jun  4 10:52:57 turing-police kernel: [95478.048062]  [8020a03c] 
tracesys+0xdc/0xe1
Jun  4 10:52:57 turing-police kernel: [95478.048073]  [2b45cd2429a0]

Which in itself wouldn't be so bad, except that it's holding a mutex and
lots of other stuff gets wedged up waiting for it (here's 1 of 6 processes
that was wedged this morning):

Jun  4 10:52:58 turing-police kernel: [95478.051129] ifconfig  D 
810005e19820  5800  9787  20510 (NOTLB)
Jun  4 10:52:58 turing-police kernel: [95478.051141]  81000868fd08 
0082 81000868fec8 0246
Jun  4 10:52:58 turing-police kernel: [95478.051150]  00010101 
810005e19820 810003fe0820 810005e19a90
Jun  4 10:52:58 turing-police kernel: [95478.051159]  0a3f26c0 
0006 81000868ff28 8028aacc
Jun  4 10:52:58 turing-police kernel: [95478.051167] Call Trace:
Jun  4 10:52:58 turing-police kernel: [95478.051176]  [80520bc4] 
__mutex_lock_slowpath+0x74/0xb6
Jun  4 10:52:58 turing-police kernel: [95478.051185]  [805209f3] 
mutex_lock+0xe/0x10
Jun  4 10:52:58 turing-police kernel: [95478.051193]  [8048a938] 
netdev_run_todo+0x19/0x234
Jun  4 10:52:58 turing-police kernel: [95478.051202]  [8049166f] 
rtnl_unlock+0x35/0x37
Jun  4 10:52:58 turing-police kernel: [95478.051210]  [8048a3f2] 
dev_ioctl+0x3e3/0x483
Jun  4 10:52:58 turing-police kernel: [95478.051218]  [8047df30] 
sock_ioctl+0x1ef/0x1fc
Jun  4 10:52:58 turing-police kernel: [95478.051227]  [802989be] 
do_ioctl+0x2a/0x77
Jun  4 10:52:58 turing-police kernel: [95478.051235]  [80298c52] 
vfs_ioctl+0x247/0x264
Jun  4 10:52:58 turing-police kernel: [95478.051243]  [80298cce] 
sys_ioctl+0x5f/0x85
Jun  4 10:52:58 turing-police kernel: [95478.051252]  [8020a03c] 
tracesys+0xdc/0xe1

(And of course, you can't shutdown cleanly, because /etc/init.d/network tries
to down other interfaces on the way out, and)

I'd bisect this, except I don't have a better way to replicate it than wait for
our VPN box to reset the connection after 24 hours of connect - basically means
I get 2 tries per weekend..)

An hour or so of digging through the -rc3-mm1 broken-out/ didn't find any
obvious-to-me culprits.  Any ideas/suggestions?


pgpgLKOKJ5mzu.pgp
Description: PGP signature