Hi Palle, On 22/09/15 22:58, Palle Girgensohn wrote: >> 22 sep 2015 kl. 20:16 skrev Julien Charbon <j...@freebsd.org>: >> On 22/09/15 18:49, Palle Girgensohn wrote: >>>> 22 sep 2015 kl. 18:46 skrev Palle Girgensohn >>>> <gir...@freebsd.org>: >>>>> 21 sep 2015 kl. 15:53 skrev Palle Girgensohn >>>>> <gir...@freebsd.org>: >>>>>> 21 sep 2015 kl. 10:21 skrev Julien Charbon >>>>>> <j...@freebsd.org>: On 18/09/15 18:06, Konstantin Belousov >>>>>> wrote: >>>>>>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien Charbon >>>>>>> wrote: >>>>>>>> [...] >>>>>> - Second, if issue is still in stable/10, compile 10.2 >>>>>> kernel with these options: >>>>>> >>>>>> options DDB options DEADLKRES options >>>>>> INVARIANTS options INVARIANT_SUPPORT options WITNESS >>>>>> options WITNESS_SKIPSPIN >>>>>> >>>>>> To see where the original fault is coming from. >>>>> [...] >>>>> >>>>> I'll try stable/10 now. Would you suggest a "clean" >>>>> stable/10, or could 287621 and 287780 help? >>>>> >>>>> I'll add the debugging suggested options right away. >>>>> >>>>> Palle >>>> >>>> I have a new core dump from ^/stable/10 with: >>>> >>>> options DDB options DEADLKRES options INVARIANTS >>>> options INVARIANT_SUPPORT options WITNESS options >>>> WITNESS_SKIPSPIN >>> >>> # kgdb kernel /var/crash/vmcore.2 GNU gdb 6.1.1 [FreeBSD] >>> Copyright 2004 Free Software Foundation, Inc. GDB is free >>> software, covered by the GNU General Public License, and you are >>> welcome to change it and/or distribute copies of it under certain >>> conditions. Type "show copying" to see the conditions. There is >>> absolutely no warranty for GDB. Type "show warranty" for >>> details. This GDB was configured as "amd64-marcel-freebsd"... >>> >>> Unread portion of the kernel message buffer: panic: tcp_detach: >>> INP_TIMEWAIT && INP_DROPPED && tp != NULL cpuid = 16 KDB: stack >>> backtrace: db_trace_self_wrapper() at >>> db_trace_self_wrapper+0x2b/frame 0xfffffe183d9e97e0 >>> kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe183d9e9890 >>> vpanic() at vpanic+0x126/frame 0xfffffe183d9e98d0 kassert_panic() >>> at kassert_panic+0x139/frame 0xfffffe183d9e9940 tcp_usr_detach() >>> at tcp_usr_detach+0xf9/frame 0xfffffe183d9e9970 sofree() at >>> sofree+0x1f1/frame 0xfffffe183d9e99a0 soclose() at >>> soclose+0x3a0/frame 0xfffffe183d9e99f0 _fdrop() at >>> _fdrop+0x29/frame 0xfffffe183d9e9a10 closef() at >>> closef+0x1e2/frame 0xfffffe183d9e9aa0 closefp() at >>> closefp+0x9d/frame 0xfffffe183d9e9ae0 amd64_syscall() at >>> amd64_syscall+0x25a/frame 0xfffffe183d9e9bf0 Xfast_syscall() at >>> Xfast_syscall+0xfb/frame 0xfffffe183d9e9bf0 --- syscall (6, >>> FreeBSD ELF64, sys_close), rip = 0x801c8d94a, rsp = >>> 0x7ffff91c8668, rbp = 0x7ffff91c8680 --- KDB: enter: panic >>> Uptime: 18h57m59s Dumping 23085 out of 98263 >>> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% >>> >>> Reading symbols from /boot/kernel/nullfs.ko.symbols...done. >>> Loaded symbols for /boot/kernel/nullfs.ko.symbols Reading symbols >>> from /boot/kernel/zfs.ko.symbols...done. Loaded symbols for >>> /boot/kernel/zfs.ko.symbols Reading symbols from >>> /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols for >>> /boot/kernel/opensolaris.ko.symbols Reading symbols from >>> /boot/kernel/ng_bridge.ko.symbols...done. Loaded symbols for >>> /boot/kernel/ng_bridge.ko.symbols Reading symbols from >>> /boot/kernel/netgraph.ko.symbols...done. Loaded symbols for >>> /boot/kernel/netgraph.ko.symbols Reading symbols from >>> /boot/kernel/ng_eiface.ko.symbols...done. Loaded symbols for >>> /boot/kernel/ng_eiface.ko.symbols Reading symbols from >>> /boot/kernel/ng_ether.ko.symbols...done. Loaded symbols for >>> /boot/kernel/ng_ether.ko.symbols Reading symbols from >>> /boot/kernel/accf_data.ko.symbols...done. Loaded symbols for >>> /boot/kernel/accf_data.ko.symbols Reading symbols from >>> /boot/kernel/accf_http.ko.symbols...done. Loaded symbols for >>> /boot/kernel/accf_http.ko.symbols Reading symbols from >>> /boot/kernel/ums.ko.symbols...done. Loaded symbols for >>> /boot/kernel/ums.ko.symbols Reading symbols from >>> /boot/kernel/ng_socket.ko.symbols...done. Loaded symbols for >>> /boot/kernel/ng_socket.ko.symbols Reading symbols from >>> /boot/kernel/fdescfs.ko.symbols...done. Loaded symbols for >>> /boot/kernel/fdescfs.ko.symbols #0 doadump (textdump=1) at >>> pcpu.h:219 219 __asm("movq %%gs:%1,%0" : "=r" (td) (kgdb) bt >>> #0 >>> doadump (textdump=1) at pcpu.h:219 #1 0xffffffff8094b337 in >>> kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:451 >>> #2 0xffffffff8094b845 in vpanic (fmt=<value optimized out>, >>> ap=<value optimized out>) at >>> /usr/src/sys/kern/kern_shutdown.c:758 #3 0xffffffff8094b6d9 in >>> kassert_panic (fmt=<value optimized out>) at >>> /usr/src/sys/kern/kern_shutdown.c:646 #4 0xffffffff80b1ee59 in >>> tcp_usr_detach (so=<value optimized out>) at >>> /usr/src/sys/netinet/tcp_usrreq.c:202 #5 0xffffffff809cd291 in >>> sofree (so=0xfffff801dd302000) at >>> /usr/src/sys/kern/uipc_socket.c:747 #6 0xffffffff809cdb00 in >>> soclose (so=<value optimized out>) at >>> /usr/src/sys/kern/uipc_socket.c:849 #7 0xffffffff808fe659 in >>> _fdrop (fp=0xfffff802a593db40, td=0x0) at file.h:343 #8 >>> 0xffffffff80901092 in closef (fp=0xfffff802a593db40, >>> td=0xfffff80eebc894a0) at /usr/src/sys/kern/kern_descrip.c:2338 >>> #9 0xffffffff808feb5d in closefp (fdp=0xfffff80b20cce000, >>> fd=<value optimized out>, fp=0xfffff802a593db40, >>> td=0xfffff80eebc894a0, holdleaders=<value optimized out>) at >>> /usr/src/sys/kern/kern_descrip.c:1194 #10 0xffffffff80d7bc3a in >>> amd64_syscall (td=0xfffff80eebc894a0, traced=0) at >>> subr_syscall.c:134 #11 0xffffffff80d5f1db in Xfast_syscall () at >>> /usr/src/sys/amd64/amd64/exception.S:396 #12 0x0000000801c8d94a >>> in ?? () Previous frame inner to this frame (corrupt stack?) >>> Current language: auto; currently minimal >> >> Thanks for the information. As I suspected the initial error was >> elsewhere than tcp_twclose(), never got this assertion before: >> >> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp != NULL >> >> from here: >> >> static void tcp_detach(struct socket *so, struct inpcb *inp) { >> struct tcpcb *tp; >> >> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(inp); >> >> KASSERT(so->so_pcb == inp, ("tcp_detach: so_pcb != inp")); >> KASSERT(inp->inp_socket == so, ("tcp_detach: inp_socket != so")); >> >> tp = intotcpcb(inp); >> >> if (inp->inp_flags & INP_TIMEWAIT) { if (inp->inp_flags & >> INP_DROPPED) { KASSERT(tp == NULL, ("tcp_detach: INP_TIMEWAIT && " >> "INP_DROPPED && tp != NULL")); >> >> Let me check if I could find a path that could lead to this >> unexpected case. Unexpected because: INP_DROPPED and >> inp->inp_ppcb is set to NULL are set at same time here: >> >> void tcp_twclose(struct tcptw *tw, int reuse) { struct socket *so; >> struct inpcb *inp; >> >> inp = tw->tw_inpcb; KASSERT((inp->inp_flags & INP_TIMEWAIT), >> ("tcp_twclose: !timewait")); KASSERT(intotw(inp) == tw, >> ("tcp_twclose: inp_ppcb != tw")); >> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); /* in_pcbfree() */ >> INP_WLOCK_ASSERT(inp); >> >> tcp_tw_2msl_stop(tw, reuse); inp->inp_ppcb = NULL; >> in_pcbdrop(inp); ... >> >> Interesting and by the way could you try: >> >> # kgdb kernel /var/crash/vmcore.2 (kgdb) info threads >> >> To see if other thread are also in TCP stack at the same time, and >> if one of this thread is referencing the same inp. >> > > BTW, this backtrace looks quite different from the previous ones? Is > it a different problem, or just a different way to reveal the same > problem?
Having a different backstrace is expected, the first backstrace was quite deep in the stack, and kernel panic-ed quite late after the original issue. With the kernel debug options, the kernel stopped at the first suspicious fact. And I would say that I still does not understand how it is possible to reach that state, it is like kernel exclusive lock stopped to work and allow several thread to work on the same locked inp at same time. (Stack overflow, VIMAGE memory corruption, UMA issue? ... weird). I will ask you the kernel and the core dump but off list to avoid too much spam in -net. -- Julien
signature.asc
Description: OpenPGP digital signature