On Wed, 30 May 2007 23:58:23 PDT, Andrew Morton said:
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc3/2.6.22-rc3-mm1/
Under 22-rc2-mm1, if my VPN connection got reset, ppp0 just quietly went away.
Under 22-rc3-mm1, it seems to end up wedged and waiting for references to
go away:
Jun 4 09:23:01 turing-police kernel: [90089.270707] unregister_netdevice:
waiting for ppp0 to become free. Usage count = 8
Jun 4 09:23:11 turing-police kernel: [90099.396121] unregister_netdevice:
waiting for ppp0 to become free. Usage count = 8
Jun 4 09:23:21 turing-police kernel: [90109.520574] unregister_netdevice:
waiting for ppp0 to become free. Usage count = 8
Jun 4 09:23:32 turing-police kernel: [90119.653129] unregister_netdevice:
waiting for ppp0 to become free. Usage count = 8
'echo t /proc/sysrq_trigger' shows pppd hung up here:
Jun 4 10:52:57 turing-police kernel: [95478.047892] pppd D
000105ad3830 4968 3815 1 (NOTLB)
Jun 4 10:52:57 turing-police kernel: [95478.047902] 810008d5fd78
0086 81000349
Jun 4 10:52:57 turing-police kernel: [95478.047911] 810008d5fd28
810008d4a040 810003461820 810008d4a2b0
Jun 4 10:52:57 turing-police kernel: [95478.047920] 000105ad3733
0202 00ff 80239795
Jun 4 10:52:57 turing-police kernel: [95478.047928] Call Trace:
Jun 4 10:52:57 turing-police kernel: [95478.047936] [805207a2]
schedule_timeout+0x8d/0xb4
Jun 4 10:52:57 turing-police kernel: [95478.047945] [805207e2]
schedule_timeout_uninterruptible+0x19/0x1b
Jun 4 10:52:57 turing-police kernel: [95478.047954] [802397bb]
msleep+0x14/0x1e
Jun 4 10:52:57 turing-police kernel: [95478.047963] [8048aa4e]
netdev_run_todo+0x12f/0x234
Jun 4 10:52:57 turing-police kernel: [95478.047972] [8049166f]
rtnl_unlock+0x35/0x37
Jun 4 10:52:57 turing-police kernel: [95478.047981] [804894a9]
unregister_netdev+0x1e/0x23
Jun 4 10:52:57 turing-police kernel: [95478.047994] [88a5f2c2]
:ppp_generic:ppp_shutdown_interface+0x67/0xbb
Jun 4 10:52:57 turing-police kernel: [95478.048018] [88a5f5b8]
:ppp_generic:ppp_release+0x33/0x65
Jun 4 10:52:57 turing-police kernel: [95478.048028] [8028d54a]
__fput+0xac/0x176
Jun 4 10:52:57 turing-police kernel: [95478.048036] [8028d628]
fput+0x14/0x16
Jun 4 10:52:57 turing-police kernel: [95478.048045] [8028a9c6]
filp_close+0x66/0x71
Jun 4 10:52:57 turing-police kernel: [95478.048054] [8028bd54]
sys_close+0x98/0xd7
Jun 4 10:52:57 turing-police kernel: [95478.048062] [8020a03c]
tracesys+0xdc/0xe1
Jun 4 10:52:57 turing-police kernel: [95478.048073] [2b45cd2429a0]
Which in itself wouldn't be so bad, except that it's holding a mutex and
lots of other stuff gets wedged up waiting for it (here's 1 of 6 processes
that was wedged this morning):
Jun 4 10:52:58 turing-police kernel: [95478.051129] ifconfig D
810005e19820 5800 9787 20510 (NOTLB)
Jun 4 10:52:58 turing-police kernel: [95478.051141] 81000868fd08
0082 81000868fec8 0246
Jun 4 10:52:58 turing-police kernel: [95478.051150] 00010101
810005e19820 810003fe0820 810005e19a90
Jun 4 10:52:58 turing-police kernel: [95478.051159] 0a3f26c0
0006 81000868ff28 8028aacc
Jun 4 10:52:58 turing-police kernel: [95478.051167] Call Trace:
Jun 4 10:52:58 turing-police kernel: [95478.051176] [80520bc4]
__mutex_lock_slowpath+0x74/0xb6
Jun 4 10:52:58 turing-police kernel: [95478.051185] [805209f3]
mutex_lock+0xe/0x10
Jun 4 10:52:58 turing-police kernel: [95478.051193] [8048a938]
netdev_run_todo+0x19/0x234
Jun 4 10:52:58 turing-police kernel: [95478.051202] [8049166f]
rtnl_unlock+0x35/0x37
Jun 4 10:52:58 turing-police kernel: [95478.051210] [8048a3f2]
dev_ioctl+0x3e3/0x483
Jun 4 10:52:58 turing-police kernel: [95478.051218] [8047df30]
sock_ioctl+0x1ef/0x1fc
Jun 4 10:52:58 turing-police kernel: [95478.051227] [802989be]
do_ioctl+0x2a/0x77
Jun 4 10:52:58 turing-police kernel: [95478.051235] [80298c52]
vfs_ioctl+0x247/0x264
Jun 4 10:52:58 turing-police kernel: [95478.051243] [80298cce]
sys_ioctl+0x5f/0x85
Jun 4 10:52:58 turing-police kernel: [95478.051252] [8020a03c]
tracesys+0xdc/0xe1
(And of course, you can't shutdown cleanly, because /etc/init.d/network tries
to down other interfaces on the way out, and)
I'd bisect this, except I don't have a better way to replicate it than wait for
our VPN box to reset the connection after 24 hours of connect - basically means
I get 2 tries per weekend..)
An hour or so of digging through the -rc3-mm1 broken-out/ didn't find any
obvious-to-me culprits. Any ideas/suggestions?
pgpgLKOKJ5mzu.pgp
Description: PGP signature