> 24 sep 2015 kl. 00:05 skrev Palle Girgensohn <gir...@pingpong.net>: > > Hi, > >> 23 sep 2015 kl. 20:32 skrev Julien Charbon <j...@freebsd.org>: >> >> >> Hi, >> >> On 23/09/15 20:26, Palle Girgensohn wrote: >>>> 23 sep. 2015 kl. 20:01 skrev Julien Charbon <j...@freebsd.org>: >>>> On 23/09/15 16:36, Palle Girgensohn wrote: >>>>>> 22 sep 2015 kl. 23:59 skrev Julien Charbon <j...@freebsd.org>: On >>>>>> 22/09/15 22:58, Palle Girgensohn wrote: >>>>>>>> 22 sep 2015 kl. 20:16 skrev Julien Charbon <j...@freebsd.org>: >>>>>>>> On 22/09/15 18:49, Palle Girgensohn wrote: >>>>>>>>>> 22 sep 2015 kl. 18:46 skrev Palle Girgensohn >>>>>>>>>> <gir...@freebsd.org>: >>>>>>>>>>> 21 sep 2015 kl. 15:53 skrev Palle Girgensohn >>>>>>>>>>> <gir...@freebsd.org>: >>>>>>>>>>>> 21 sep 2015 kl. 10:21 skrev Julien Charbon >>>>>>>>>>>> <j...@freebsd.org>: On 18/09/15 18:06, Konstantin >>>>>>>>>>>> Belousov wrote: >>>>>>>>>>>>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien >>>>>>>>>>>>> Charbon wrote: >>>>>>>>>>>>>> [...] >>>>>>>>>>>> - Second, if issue is still in stable/10, compile 10.2 >>>>>>>>>>>> kernel with these options: >>>>>>>>>>>> >>>>>>>>>>>> options DDB options DEADLKRES options >>>>>>>>>>>> INVARIANTS options INVARIANT_SUPPORT options >>>>>>>>>>>> WITNESS options WITNESS_SKIPSPIN >>>>>>>>>>>> >>>>>>>>>>>> To see where the original fault is coming from. >>>>>>>>>>> [...] >>>>>>>>>>> >>>>>>>>>>> I'll try stable/10 now. Would you suggest a "clean" >>>>>>>>>>> stable/10, or could 287621 and 287780 help? >>>>>>>>>>> >>>>>>>>>>> I'll add the debugging suggested options right away. >>>>>>>>>>> >>>>>>>>>>> Palle >>>>>>>>>> >>>>>>>>>> I have a new core dump from ^/stable/10 with: >>>>>>>>>> >>>>>>>>>> options DDB options DEADLKRES options >>>>>>>>>> INVARIANTS options INVARIANT_SUPPORT options WITNESS >>>>>>>>>> options WITNESS_SKIPSPIN >>>>>>>>> >>>>>>>>> # kgdb kernel /var/crash/vmcore.2 GNU gdb 6.1.1 [FreeBSD] >>>>>>>>> Copyright 2004 Free Software Foundation, Inc. GDB is free >>>>>>>>> software, covered by the GNU General Public License, and you >>>>>>>>> are welcome to change it and/or distribute copies of it under >>>>>>>>> certain conditions. Type "show copying" to see the >>>>>>>>> conditions. There is absolutely no warranty for GDB. Type >>>>>>>>> "show warranty" for details. This GDB was configured as >>>>>>>>> "amd64-marcel-freebsd"... >>>>>>>>> >>>>>>>>> Unread portion of the kernel message buffer: panic: >>>>>>>>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp != NULL cpuid = >>>>>>>>> 16 KDB: stack backtrace: db_trace_self_wrapper() at >>>>>>>>> db_trace_self_wrapper+0x2b/frame 0xfffffe183d9e97e0 >>>>>>>>> kdb_backtrace() at kdb_backtrace+0x39/frame >>>>>>>>> 0xfffffe183d9e9890 vpanic() at vpanic+0x126/frame >>>>>>>>> 0xfffffe183d9e98d0 kassert_panic() at >>>>>>>>> kassert_panic+0x139/frame 0xfffffe183d9e9940 >>>>>>>>> tcp_usr_detach() at tcp_usr_detach+0xf9/frame >>>>>>>>> 0xfffffe183d9e9970 sofree() at sofree+0x1f1/frame >>>>>>>>> 0xfffffe183d9e99a0 soclose() at soclose+0x3a0/frame >>>>>>>>> 0xfffffe183d9e99f0 _fdrop() at _fdrop+0x29/frame >>>>>>>>> 0xfffffe183d9e9a10 closef() at closef+0x1e2/frame >>>>>>>>> 0xfffffe183d9e9aa0 closefp() at closefp+0x9d/frame >>>>>>>>> 0xfffffe183d9e9ae0 amd64_syscall() at >>>>>>>>> amd64_syscall+0x25a/frame 0xfffffe183d9e9bf0 Xfast_syscall() >>>>>>>>> at Xfast_syscall+0xfb/frame 0xfffffe183d9e9bf0 --- syscall >>>>>>>>> (6, FreeBSD ELF64, sys_close), rip = 0x801c8d94a, rsp = >>>>>>>>> 0x7ffff91c8668, rbp = 0x7ffff91c8680 --- KDB: enter: panic >>>>>>>>> Uptime: 18h57m59s Dumping 23085 out of 98263 >>>>>>>>> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% >>>>>>>>> >>>>>>>>> Reading symbols from /boot/kernel/nullfs.ko.symbols...done. >>>>>>>>> Loaded symbols for /boot/kernel/nullfs.ko.symbols Reading >>>>>>>>> symbols from /boot/kernel/zfs.ko.symbols...done. Loaded >>>>>>>>> symbols for /boot/kernel/zfs.ko.symbols Reading symbols from >>>>>>>>> /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols >>>>>>>>> for /boot/kernel/opensolaris.ko.symbols Reading symbols from >>>>>>>>> /boot/kernel/ng_bridge.ko.symbols...done. Loaded symbols for >>>>>>>>> /boot/kernel/ng_bridge.ko.symbols Reading symbols from >>>>>>>>> /boot/kernel/netgraph.ko.symbols...done. Loaded symbols for >>>>>>>>> /boot/kernel/netgraph.ko.symbols Reading symbols from >>>>>>>>> /boot/kernel/ng_eiface.ko.symbols...done. Loaded symbols for >>>>>>>>> /boot/kernel/ng_eiface.ko.symbols Reading symbols from >>>>>>>>> /boot/kernel/ng_ether.ko.symbols...done. Loaded symbols for >>>>>>>>> /boot/kernel/ng_ether.ko.symbols Reading symbols from >>>>>>>>> /boot/kernel/accf_data.ko.symbols...done. Loaded symbols for >>>>>>>>> /boot/kernel/accf_data.ko.symbols Reading symbols from >>>>>>>>> /boot/kernel/accf_http.ko.symbols...done. Loaded symbols for >>>>>>>>> /boot/kernel/accf_http.ko.symbols Reading symbols from >>>>>>>>> /boot/kernel/ums.ko.symbols...done. Loaded symbols for >>>>>>>>> /boot/kernel/ums.ko.symbols Reading symbols from >>>>>>>>> /boot/kernel/ng_socket.ko.symbols...done. Loaded symbols for >>>>>>>>> /boot/kernel/ng_socket.ko.symbols Reading symbols from >>>>>>>>> /boot/kernel/fdescfs.ko.symbols...done. Loaded symbols for >>>>>>>>> /boot/kernel/fdescfs.ko.symbols #0 doadump (textdump=1) at >>>>>>>>> pcpu.h:219 219 __asm("movq %%gs:%1,%0" : "=r" (td) (kgdb) bt >>>>>>>>> #0 doadump (textdump=1) at pcpu.h:219 #1 0xffffffff8094b337 >>>>>>>>> in kern_reboot (howto=260) at >>>>>>>>> /usr/src/sys/kern/kern_shutdown.c:451 #2 0xffffffff8094b845 >>>>>>>>> in vpanic (fmt=<value optimized out>, ap=<value optimized >>>>>>>>> out>) at /usr/src/sys/kern/kern_shutdown.c:758 #3 >>>>>>>>> 0xffffffff8094b6d9 in kassert_panic (fmt=<value optimized >>>>>>>>> out>) at /usr/src/sys/kern/kern_shutdown.c:646 #4 >>>>>>>>> 0xffffffff80b1ee59 in tcp_usr_detach (so=<value optimized >>>>>>>>> out>) at /usr/src/sys/netinet/tcp_usrreq.c:202 #5 >>>>>>>>> 0xffffffff809cd291 in sofree (so=0xfffff801dd302000) at >>>>>>>>> /usr/src/sys/kern/uipc_socket.c:747 #6 0xffffffff809cdb00 in >>>>>>>>> soclose (so=<value optimized out>) at >>>>>>>>> /usr/src/sys/kern/uipc_socket.c:849 #7 0xffffffff808fe659 in >>>>>>>>> _fdrop (fp=0xfffff802a593db40, td=0x0) at file.h:343 #8 >>>>>>>>> 0xffffffff80901092 in closef (fp=0xfffff802a593db40, >>>>>>>>> td=0xfffff80eebc894a0) at >>>>>>>>> /usr/src/sys/kern/kern_descrip.c:2338 #9 0xffffffff808feb5d >>>>>>>>> in closefp (fdp=0xfffff80b20cce000, fd=<value optimized out>, >>>>>>>>> fp=0xfffff802a593db40, td=0xfffff80eebc894a0, >>>>>>>>> holdleaders=<value optimized out>) at >>>>>>>>> /usr/src/sys/kern/kern_descrip.c:1194 #10 0xffffffff80d7bc3a >>>>>>>>> in amd64_syscall (td=0xfffff80eebc894a0, traced=0) at >>>>>>>>> subr_syscall.c:134 #11 0xffffffff80d5f1db in Xfast_syscall () >>>>>>>>> at /usr/src/sys/amd64/amd64/exception.S:396 #12 >>>>>>>>> 0x0000000801c8d94a in ?? () Previous frame inner to this >>>>>>>>> frame (corrupt stack?) Current language: auto; currently >>>>>>>>> minimal >>>>>>>> >>>>>>>> Thanks for the information. As I suspected the initial error >>>>>>>> was elsewhere than tcp_twclose(), never got this assertion >>>>>>>> before: >>>>>>>> >>>>>>>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp != NULL >>>>>>>> >>>>>>>> from here: >>>>>>>> >>>>>>>> static void tcp_detach(struct socket *so, struct inpcb *inp) { >>>>>>>> struct tcpcb *tp; >>>>>>>> >>>>>>>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(inp); >>>>>>>> >>>>>>>> KASSERT(so->so_pcb == inp, ("tcp_detach: so_pcb != inp")); >>>>>>>> KASSERT(inp->inp_socket == so, ("tcp_detach: inp_socket != >>>>>>>> so")); >>>>>>>> >>>>>>>> tp = intotcpcb(inp); >>>>>>>> >>>>>>>> if (inp->inp_flags & INP_TIMEWAIT) { if (inp->inp_flags & >>>>>>>> INP_DROPPED) { KASSERT(tp == NULL, ("tcp_detach: INP_TIMEWAIT >>>>>>>> && " "INP_DROPPED && tp != NULL")); >>>>>>>> >>>>>>>> Let me check if I could find a path that could lead to this >>>>>>>> unexpected case. Unexpected because: INP_DROPPED and >>>>>>>> inp->inp_ppcb is set to NULL are set at same time here: >>>>>>>> >>>>>>>> void tcp_twclose(struct tcptw *tw, int reuse) { struct socket >>>>>>>> *so; struct inpcb *inp; >>>>>>>> >>>>>>>> inp = tw->tw_inpcb; KASSERT((inp->inp_flags & INP_TIMEWAIT), >>>>>>>> ("tcp_twclose: !timewait")); KASSERT(intotw(inp) == tw, >>>>>>>> ("tcp_twclose: inp_ppcb != tw")); >>>>>>>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); /* in_pcbfree() */ >>>>>>>> INP_WLOCK_ASSERT(inp); >>>>>>>> >>>>>>>> tcp_tw_2msl_stop(tw, reuse); inp->inp_ppcb = NULL; >>>>>>>> in_pcbdrop(inp); ... >>>>>>>> >>>>>>>> Interesting [...] >>>>> >>>>> Just a quick update. Julien is pursuing this off list with a core >>>>> dump and we are now waiting for a new core dump with the first >>>>> KASSERT removed. this is on a stable/10 kernel. >>>> >>>> By the way Palle could you also run below Dtrace script to see where >>>> this tcp_close() in INP_TIMEWAIT comes from: >>>> >>>> $ cat tcp-close-tw.d >>>> fbt::tcp_close:entry >>>> /args[0]->t_inpcb->inp_flags & 0x01000000/ >>>> { >>>> @s1[stack()] = count() >>>> } >>>> >>>> tick-1sec { >>>> printa(@s1); >>>> } >>>> $ sudo dtrace -s tcp-close-tw.d >>>> >>>> And share any backtraces reported in this dtrace script output. >>>> >>>> George, could you check if this dtrace script makes sense for you, and >>>> if you have any improvements to add, I am quite a rookie in Dtrace scripts. >>> >>> Shall I let the dtrace script run continuously until the machine crashes? >>> Or just run it once? >> >> Continuously until the machine crashes. You can report any backstrace >> outputs like: >> >> kernel`tcp_usr_close+0x86 >> kernel`soclose+0xe4 >> kernel`_fdrop+0x29 >> kernel`closef+0x237 >> kernel`closefp+0x95 >> kernel`amd64_syscall+0x357 >> kernel`0xffffffff80c83c4b >> 1 >> >> before the machine crashes. But I expect the problematic case >> detection with Dtrace to be quickly followed by the crash. Will see. >> >> -- >> Julien >> > > Kernels and userland are updated to 10.2-p3 with the patch removing the > suspicous KASSERT. > > dtrace running continously redirecting to a log file. > > now we're just waiting... :) > > Palle
Is the dtrace correct? $ sort -u dtrace.out 0 59779 :tick-1sec CPU ID FUNCTION:NAME $ wc -l dtrace.out 56233 dtrace.out All it does is write 0 59779 :tick-1sec once a second. Just checking... :) Palle _______________________________________________ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"