Re: panic in RELENG_5 UMA - two new stack traces
Gleb Smirnoff wrote: G How often does it crash? Does debug.mpsafenet=0 increases stability? G G I can reproduce the crash within 60 seconds of firing off 30+ ping/arp G -d scripts, all running in parallel. G G debug.mpsafenet=0 seems to have solved the problem. I'm running 100+ G instances of the above script and the system has been stable for over an G hour. Thanks! We definitely see that the bug is a race, not a broken logic. I am almost sure, that you are experiencing the same bug as I described in the beginning of the thread. Although there is no yet fix available for race between 'arp -d' and outgoing packet, there is one for race between incoming ARP reply and outgoing packet. We will probably commit it soon, after more review. Sorry to say, but it looks like debug.mpsafenet=0 reduced the frequency of the problem, but did not eliminate it. The system crashed and hung again over the weekend with very little load. There was no kernel panic, so no core files. I can leave 5.4 on this system for a week or so before installing 4.11, if you want me to continue doing diagnostics on it. Gary ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic in RELENG_5 UMA - two new stack traces
On Tue, Jun 28, 2005 at 11:24:47AM -0400, Gary Mu1der wrote: G I spent the day yesterday trying to reproduce the crash that I posted G last week and you kindly replied to. This is due to the fact that I G stupidly managed to overwrite the kernel.debug that I used to generate G the stack trace. Sadly I could not cause the system to crash again with G the same sb* errors. G G I did however remove both the Berkley Packet Filter and IPFilter from my G custom kernel to try and isolate the problem. This has caused the crash G to occur in a different and more reproducible form. I have both G INVARIANTS and WITNESS enabled, as you can see from my kernel conf. G which is included at the end of this e-mail. G G Below are the latest stack traces (using bge and then fxp NICs), kernel G conf. and dmesg. Any help would be appreciated. This time I have a copy G of both the core files and corresponding kernel.debug so I can hopefully G provide you with any info you need. How often does it crash? Does debug.mpsafenet=0 increases stability? -- Totus tuus, Glebius. GLEBIUS-RIPN GLEB-RIPE ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic in RELENG_5 UMA - two new stack traces
Gleb Smirnoff wrote: On Tue, Jun 28, 2005 at 11:24:47AM -0400, Gary Mu1der wrote: G I spent the day yesterday trying to reproduce the crash that I posted G last week and you kindly replied to. This is due to the fact that I G stupidly managed to overwrite the kernel.debug that I used to generate G the stack trace. Sadly I could not cause the system to crash again with G the same sb* errors. G G I did however remove both the Berkley Packet Filter and IPFilter from my G custom kernel to try and isolate the problem. This has caused the crash G to occur in a different and more reproducible form. I have both G INVARIANTS and WITNESS enabled, as you can see from my kernel conf. G which is included at the end of this e-mail. G G Below are the latest stack traces (using bge and then fxp NICs), kernel G conf. and dmesg. Any help would be appreciated. This time I have a copy G of both the core files and corresponding kernel.debug so I can hopefully G provide you with any info you need. How often does it crash? Does debug.mpsafenet=0 increases stability? I can reproduce the crash within 60 seconds of firing off 30+ ping/arp -d scripts, all running in parallel. debug.mpsafenet=0 seems to have solved the problem. I'm running 100+ instances of the above script and the system has been stable for over an hour. As I wanted some background on what debug.mpsafenet=0 does, I did some Googling and found a good write up here: http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2004-08/2280.html Thanks, Gary ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic in RELENG_5 UMA - two new stack traces
On Fri, Jul 01, 2005 at 01:54:59PM -0400, Gary Mu1der wrote: G On Tue, Jun 28, 2005 at 11:24:47AM -0400, Gary Mu1der wrote: G G I spent the day yesterday trying to reproduce the crash that I posted G G last week and you kindly replied to. This is due to the fact that I G G stupidly managed to overwrite the kernel.debug that I used to generate G G the stack trace. Sadly I could not cause the system to crash again with G G the same sb* errors. G G G G I did however remove both the Berkley Packet Filter and IPFilter from G my G custom kernel to try and isolate the problem. This has caused the G crash G to occur in a different and more reproducible form. I have both G G INVARIANTS and WITNESS enabled, as you can see from my kernel conf. G G which is included at the end of this e-mail. G G G G Below are the latest stack traces (using bge and then fxp NICs), kernel G G conf. and dmesg. Any help would be appreciated. This time I have a copy G G of both the core files and corresponding kernel.debug so I can G hopefully G provide you with any info you need. G G How often does it crash? Does debug.mpsafenet=0 increases stability? G G I can reproduce the crash within 60 seconds of firing off 30+ ping/arp G -d scripts, all running in parallel. G G debug.mpsafenet=0 seems to have solved the problem. I'm running 100+ G instances of the above script and the system has been stable for over an G hour. Thanks! We definitely see that the bug is a race, not a broken logic. I am almost sure, that you are experiencing the same bug as I described in the beginning of the thread. Although there is no yet fix available for race between 'arp -d' and outgoing packet, there is one for race between incoming ARP reply and outgoing packet. We will probably commit it soon, after more review. -- Totus tuus, Glebius. GLEBIUS-RIPN GLEB-RIPE ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic in RELENG_5 UMA - two new stack traces
Gleb Smirnoff wrote: G I can reproduce the crash within 60 seconds of firing off 30+ ping/arp G -d scripts, all running in parallel. G G debug.mpsafenet=0 seems to have solved the problem. I'm running 100+ G instances of the above script and the system has been stable for over an G hour. Thanks! We definitely see that the bug is a race, not a broken logic. I am almost sure, that you are experiencing the same bug as I described in the beginning of the thread. Although there is no yet fix available for race between 'arp -d' and outgoing packet, there is one for race between incoming ARP reply and outgoing packet. We will probably commit it soon, after more review. Is this bug specific to only using arp -d, or does it look like the arp -d tests identify a bug that might cause TCP/IP related crashes with other types of real-world network traffic. To rephrase: Does it look like fixing this bug may fix a lot of the network-related crashes a number of people have reported? Thanks, Gary ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic in RELENG_5 UMA - two new stack traces
On Fri, Jul 01, 2005 at 04:32:38PM -0400, Gary Mu1der wrote: G G I can reproduce the crash within 60 seconds of firing off 30+ ping/arp G G -d scripts, all running in parallel. G G G G debug.mpsafenet=0 seems to have solved the problem. I'm running 100+ G G instances of the above script and the system has been stable for over G an G hour. G G Thanks! We definitely see that the bug is a race, not a broken logic. I am G almost sure, that you are experiencing the same bug as I described in G the beginning of the thread. G G Although there is no yet fix available for race between 'arp -d' and G outgoing packet, there is one for race between incoming ARP reply and G outgoing packet. We will probably commit it soon, after more review. G G Is this bug specific to only using arp -d, or does it look like the G arp -d tests identify a bug that might cause TCP/IP related crashes G with other types of real-world network traffic. G G To rephrase: Does it look like fixing this bug may fix a lot of the G network-related crashes a number of people have reported? See above in the thread. We have two races: one that can fire anytime in runtime, and we are going to fix it. The other with 'arp -d', not fixed yet. I am not sure how many reports on network related panics where related to this race. Let's fix it and see. You can patch your boxes with the patch and see whether they are more stable in runtime. -- Totus tuus, Glebius. GLEBIUS-RIPN GLEB-RIPE ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic in RELENG_5 UMA - two new stack traces
Gleb, Thank you very much for your reply. I spent the day yesterday trying to reproduce the crash that I posted last week and you kindly replied to. This is due to the fact that I stupidly managed to overwrite the kernel.debug that I used to generate the stack trace. Sadly I could not cause the system to crash again with the same sb* errors. I did however remove both the Berkley Packet Filter and IPFilter from my custom kernel to try and isolate the problem. This has caused the crash to occur in a different and more reproducible form. I have both INVARIANTS and WITNESS enabled, as you can see from my kernel conf. which is included at the end of this e-mail. Below are the latest stack traces (using bge and then fxp NICs), kernel conf. and dmesg. Any help would be appreciated. This time I have a copy of both the core files and corresponding kernel.debug so I can hopefully provide you with any info you need. d5# uname -a FreeBSD d5.bidx.com 5.4-RELEASE FreeBSD 5.4-RELEASE #12: Tue Jun 28 09:19:34 EDT 2005 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/DB-DUAL-AMD64-RAID5 amd64 Here is a stack trace when I am using the bge NIC driver (which I've had reports on the freebsd-amd64 list as being unstable under load): d5# kgdb kernel.debug.20 vmcore.20 [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol ps_pglobal_lookup] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as amd64-marcel-freebsd. #0 doadump () at pcpu.h:167 167 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt full #0 doadump () at pcpu.h:167 No locals. #1 0x80241dc9 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:410 _ep = (struct eventhandler_entry *) 0x0 _el = (struct eventhandler_list *) 0xff829e00 first_buf_printf = 1 #2 0x8024185b in panic ( fmt=0x803b35a8 Duplicate free of item %p from zone %p(%s)\n) at /usr/src/sys/kern/kern_shutdown.c:566 bootopt = 260 newpanic = 0 ap = {{gp_offset = 32, fp_offset = 48, overflow_arg_area = 0xb3431ad0, reg_save_area = 0xb34319f0}} buf = Duplicate free of item 0xff00d318bb00 from zone 0xff00f3fe46c0(Mbuf)\n, '\0' repeats 178 times #3 0x8031f2e8 in uma_dbg_free (zone=0xff00f3fe46c0, slab=0xff00d318bf50, item=0xff00d318bb00) at /usr/src/sys/vm/uma_dbg.c:301 keg = 0xff00f3fde000 freei = 11 #4 0x8031d720 in uma_zfree_arg (zone=0xff00f3fe46c0, item=0xff00d318bb00, udata=0x0) at /usr/src/sys/vm/uma_core.c:2273 keg = 0xff00f3fde000 cache = 0xff00f3fe4740 bucket = 0x9 bflags = 0 skip = SKIP_DTOR #5 0x8027f5d1 in m_freem (mb=0x0) at uma.h:304 No locals. #6 0x801d424e in bge_intr (xsc=0x0) at /usr/src/sys/dev/bge/if_bge.c:2862 sc = (struct bge_softc *) 0x80843000 status = 0 #7 0x8022c899 in ithread_loop (arg=0xff022300) at /usr/src/sys/kern/kern_intr.c:547 ih = (struct intrhand *) 0xffa1eb80 p = (struct proc *) 0xff00ec16f8b8 count = 0 warming = 0 warned = 0 __func__ = ithread_loop #8 0x8022b8d3 in fork_exit ( callout=0x8022c7c0 ithread_loop, arg=0xff022300, frame=0xb3431c50) at /usr/src/sys/kern/kern_fork.c:791 p = (struct proc *) 0xff00ec16f8b8 #9 0x8032879e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:296 No locals. #10 0x in ?? () No symbol table info available. #11 0x in ?? () No symbol table info available. #12 0x0001 in ?? () No symbol table info available. #13 0x in ?? () No symbol table info available. #14 0x in ?? () No symbol table info available. #15 0x in ?? () No symbol table info available. #16 0x in ?? () No symbol table info available. #17 0x in ?? () No symbol table info available. #18 0x in ?? () No symbol table info available. #19 0x in ?? () No symbol table info available. #20 0x in ?? () No symbol table info available. #21 0x in ?? () No symbol table info available. #22 0x in ?? () No symbol table info available. #23 0x in ?? () No symbol table info available. #24 0x in ?? () No symbol table info available. #25 0x in ?? () No symbol table info available. #26
Re: panic in RELENG_5 UMA
On Fri, Jun 24, 2005 at 03:28:34PM -0400, Gary Mu1der wrote: G Can someone confirm that the following stack trace is showing the same G problem, or not? G I can reproduce the problem with the custom kernel config included below G (which is basically GENERIC stripped of devices I don't have or need and G IPFILTER added), but not with a stock GENERIC kernel. G G To cause the crash I'm running 20-30 instances of the following script: G G d5# cat arping.sh G #!/bin/sh G G while : G do G arp -d 192.168.4.$1 /dev/null 21; G ping -c 1 -t 1 192.168.4.$1 /dev/null 21; G done When running without INVARIANTS, it is much more difficult to analyze panics. If this script drops your kernel to panic, then it is very likely that it is the same problem. Can you please provide the following info: G (kgdb) bt G #0 doadump () at pcpu.h:167 G #1 0x in ?? () G #2 0x802557b7 in boot (howto=260) at G /usr/src/sys/kern/kern_shutdown.c:410 G #3 0x80255fef in panic (fmt=0xff00b5907500 ?6?) G at /usr/src/sys/kern/kern_shutdown.c:566 (kgdb) p *panicstr G #4 0x8029ad2a in sbdrop_locked (sb=0xb6274860, len=1146) G at /usr/src/sys/kern/uipc_socket2.c:1149 (kgdb) frame 4 (kgdb) ls (kgdb) p *m G #5 0x8029afe2 in sbflush_locked (sb=0xb6274860) G at /usr/src/sys/kern/uipc_socket2.c:1116 G #6 0x8029b049 in sbrelease_locked (sb=0xb6274860, G so=0xff00a0a2a8a0) G at /usr/src/sys/kern/uipc_socket2.c:564 G #7 0x8029b0d5 in sbrelease (sb=0xb6274860, G so=0xff00a0a2a8a0) G at /usr/src/sys/kern/uipc_socket2.c:577 G #8 0x80297b03 in sorflush (so=0xff00a0a2a8a0) G at /usr/src/sys/kern/uipc_socket.c:1483 G #9 0x80297e42 in sofree (so=0xff00a0a2a8a0) at G /usr/src/sys/kern/uipc_socket.c:407 G #10 0x80298467 in soclose (so=0xff00a0a2a8a0) at G /usr/src/sys/kern/uipc_socket.c:485 G #11 0x802847b5 in soo_close (fp=0xff009ca95b60, td=0x0) G at /usr/src/sys/kern/sys_socket.c:299 G #12 0x8022c2c0 in fdrop_locked (fp=0xff009ca95b60, G td=0xff00b5907500) G at file.h:288 (kgdb) frame 12 (kgdb) p *td (kgdb) p *td-td_proc G #13 0x8022c40a in closef (fp=0xff009ca95b60, -- Totus tuus, Glebius. GLEBIUS-RIPN GLEB-RIPE ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic in RELENG_5 UMA
All, Can someone confirm that the following stack trace is showing the same problem, or not? I can reproduce the problem with the custom kernel config included below (which is basically GENERIC stripped of devices I don't have or need and IPFILTER added), but not with a stock GENERIC kernel. To cause the crash I'm running 20-30 instances of the following script: d5# cat arping.sh #!/bin/sh while : do arp -d 192.168.4.$1 /dev/null 21; ping -c 1 -t 1 192.168.4.$1 /dev/null 21; done d5# uname -a FreeBSD d5.bidx.com 5.4-RELEASE FreeBSD 5.4-RELEASE #6: Thu Jun 23 13:45:20 EDT 2005 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/DB-DUAL-AMD64-RAID5 amd64 d5# kgdb /usr/obj/usr/src/sys/DB-DUAL-AMD64-RAID5/kernel.debug ./vmcore.5 [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol ps_pglobal_lookup] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as amd64-marcel-freebsd. #0 doadump () at pcpu.h:167 167 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump () at pcpu.h:167 #1 0x in ?? () #2 0x802557b7 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:410 #3 0x80255fef in panic (fmt=0xff00b5907500 ë6µ) at /usr/src/sys/kern/kern_shutdown.c:566 #4 0x8029ad2a in sbdrop_locked (sb=0xb6274860, len=1146) at /usr/src/sys/kern/uipc_socket2.c:1149 #5 0x8029afe2 in sbflush_locked (sb=0xb6274860) at /usr/src/sys/kern/uipc_socket2.c:1116 #6 0x8029b049 in sbrelease_locked (sb=0xb6274860, so=0xff00a0a2a8a0) at /usr/src/sys/kern/uipc_socket2.c:564 #7 0x8029b0d5 in sbrelease (sb=0xb6274860, so=0xff00a0a2a8a0) at /usr/src/sys/kern/uipc_socket2.c:577 #8 0x80297b03 in sorflush (so=0xff00a0a2a8a0) at /usr/src/sys/kern/uipc_socket.c:1483 #9 0x80297e42 in sofree (so=0xff00a0a2a8a0) at /usr/src/sys/kern/uipc_socket.c:407 #10 0x80298467 in soclose (so=0xff00a0a2a8a0) at /usr/src/sys/kern/uipc_socket.c:485 #11 0x802847b5 in soo_close (fp=0xff009ca95b60, td=0x0) at /usr/src/sys/kern/sys_socket.c:299 #12 0x8022c2c0 in fdrop_locked (fp=0xff009ca95b60, td=0xff00b5907500) at file.h:288 #13 0x8022c40a in closef (fp=0xff009ca95b60, td=0xff00b5907500) at /usr/src/sys/kern/kern_descrip.c:1920 #14 0x8022e5be in fdfree (td=0xff00b5907500) at /usr/src/sys/kern/kern_descrip.c:1624 #15 0x80238bd0 in exit1 (td=0xff00b5907500, rv=0) at /usr/src/sys/kern/kern_exit.c:236 #16 0x8023a04e in sys_exit (td=0x0, uap=0x0) at /usr/src/sys/kern/kern_exit.c:93 #17 0x8035cd8c in syscall (frame= {tf_rdi = 0, tf_rsi = 5263360, tf_rdx = 0, tf_rcx = 34366596768, tf_r8 = 0, tf_r9 = 140737488350136, tf_rax = 1, tf_rbx = 0, tf_rbp = 3, tf_r10 = -1099499764224, tf_r11 = 515, tf_r12 = 140---Type return to continue, or q return to quit--- 737488350376, tf_r13 = 0, tf_r14 = 0, tf_r15 = 0, tf_trapno = 12, tf_addr = 34368259080, tf_flags = 0, tf_err = 2, tf_rip = 34366590280, tf_cs = 43, tf_rflags = 514, tf_rsp = 140737488350296, tf_ss = 35}) at /usr/src/sys/amd64/amd64/trap.c:771 #18 0x80349f88 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:248 #19 0x in ?? () #20 0x00505000 in ?? () #21 0x in ?? () #22 0x00080068a6a0 in ?? () #23 0x in ?? () #24 0x7fffebb8 in ?? () #25 0x0001 in ?? () #26 0x in ?? () #27 0x0003 in ?? () #28 0xffb50600 in ?? () #29 0x0203 in ?? () #30 0x7fffeca8 in ?? () #31 0x in ?? () #32 0x in ?? () #33 0x in ?? () #34 0x000c in ?? () #35 0x000800820408 in ?? () #36 0x in ?? () #37 0x0002 in ?? () #38 0x000800688d48 in ?? () #39 0x002b in ?? () #40 0x0202 in ?? () #41 0x7fffec58 in ?? () #42 0x0023 in ?? () #43 0x7fffe968 in ?? () #44 0x0023 in ?? () #45 0x in ?? () ---Type return to continue, or q return to quit--- #46 0x in ?? () #47 0x in ?? () #48 0x in ?? () #49 0x in ?? () #50 0x in ?? () #51 0x in ?? () #52 0x in ?? () #53 0xa14b4000 in ?? () #54 0xb6274c40 in ?? () #55 0x0101 in ?? () #56 0x in ?? () #57 0xff00b536eba0 in ?? () #58 0xff00ec19a780 in ?? () #59 0xb6274b58 in
Re: panic in RELENG_5 UMA
Sorry, I forgot to add that this is a Tyan Thunder K8SPRO w/dual AMD Opteron Processors, model no. 246, 4GB of RAM and an Adaptec 2200S RAID controller. The NIC being used is the onboard Broadcom Gigabit Ethernet (bge). Thanks, Gary ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic in RELENG_5 UMA
On Wed, Jun 22, 2005 at 03:03:53PM +0200, Andre Oppermann wrote: A Fixing this one is harder. We take la from unlocked rtentry obtained via A rt_check(), or from arplookup(). The latter drops lock on rtentry, too. A Then we do some work and use this la. It may have already been freed in A arp_rtrequest(), the RTM_DELETE case. A A I see two approaches here: A A 1) Protecting llinfo with route lock. In this case we need rt_check() A to return locked *rt (just reference won't help). We also need A arplookup() to return locked rt. And do not unlock it withing all A arpresolve() and a big part of in_arpinput() functions. A A I think for 5-stable this is the way to go. What about fixing it step by step? The patch attached to my previous message fixes the panic report by Jeremie, I suppose. It is race between output path and input path, that can occur anytime in runtime. The race that is not fixed by my patch (discussed above) is between output path and RTM_DELETE message, is less critical - it can occur only when administrator runs arp -d. Can you please review my patch? I think we should commit it first, and then work on the second race. -- Totus tuus, Glebius. GLEBIUS-RIPN GLEB-RIPE ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic in RELENG_5 UMA
Gleb, What about fixing it step by step? The patch attached to my previous message fixes the panic report by Jeremie, I suppose. It is race between output path and input path, that can occur anytime in runtime. FYI, I compiled my kernel with your patch and I have had no panic since then. Note that my previous uptime was multiple tens of days and I haven't done stress tests. But anyway I think your massively parallel arp -d/ping tests are far more significative than my box which only communicates with a couple of settled machines. Regards, -- Jeremie Le Hen jeremie at le-hen dot org ttz at chchile dot org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic in RELENG_5 UMA
Gleb Smirnoff wrote: [ cc'ing parties involved in this part of code] On Tue, Jun 21, 2005 at 01:07:01PM +0400, Gleb Smirnoff wrote: T On Tue, Jun 21, 2005 at 09:04:27AM +0200, Jeremie Le Hen wrote: T J #25 0xc05a0a0b in m_freem (mb=0x0) at uma.h:304 T J No locals. T J #26 0xc05ee0d5 in arpresolve (ifp=0xc1a5b000, rt0=0xc1d44000, m=0xc1be7200, T J dst=0xd6d3fa94, desten=0xd6d3fa2c /??]??w??) T J at ../../../netinet/if_ether.c:442 T J la = (struct llinfo_arp *) 0xc1a75a00 T J sdl = (struct sockaddr_dl *) 0xc2128910 T J error = -1038972656 T J rt = (struct rtentry *) 0xc1d44000 T T IMHO, this looks like a race. The route is not locked, when T its llinfo is edited. T T Probably the mbuf was freed when arp reply arrived and la_hold was send. T Look into in_arpinput() near 736: T T (*ifp-if_output)(ifp, la-la_hold, rt_key(rt), rt); T la-la_hold = 0; T T Yeah, I have just triggered another panic running 15 instances of this script on T SMP box: T T ( T while (true); do T arp -d 81.19.64.111 /dev/null 21; T ping -c 1 -t 1 81.19.64.111 /dev/null 21; T done T ) T T But my duplicate free is in fxp_txeof(). This means that output thread has T won the race. I suppose that the attached patch closes your race. However, there is still race between RTM_DELETE and output path. The above script still drops kernel to panic, but the other one. Output path works with already freed llinfo: #28 0xc0507000 in m_freem (mb=0x0) at mbuf.h:410 #29 0xc053fde3 in arpresolve (ifp=0xc2012800, rt0=0xc22fcdec, m=0xc25a8000, dst=0xe720bb28, desten=0xe720bacc uøbÀ+\001) at /usr/src/sys/netinet/if_ether.c:443 #30 0xc0538078 in ether_output (ifp=0xc2012800, m=0xc25a8000, dst=0xe720bb28, rt0=0xc22fcdec) at /usr/src/sys/net/if_ethersubr.c:173 #31 0xc054b5b4 in ip_output (m=0xc25a8000, opt=0xc25a80ac, ro=0xe720bb24, flags=0x20, imo=0x0, inp=0xc25eb5a0) at /usr/src/sys/netinet/ip_output.c:772 #32 0xc054d36b in rip_output (m=0xc25a8000, so=0x0, dst=0x0) at /usr/src/sys/netinet/raw_ip.c:320 #33 0xc054de7b in rip_send (so=0xc248c914, flags=0x0, m=0xc25a8000, nam=0xc218d410, control=0x0, td=0xc224d7d0) at /usr/src/sys/netinet/raw_ip.c:785 #34 0xc050a30f in sosend (so=0xc248c914, addr=0xc218d410, uio=0xe720bc3c, top=0xc25a8000, control=0x0, flags=0x0, td=0xc224d7d0) at /usr/src/sys/kern/uipc_socket.c:827 (kgdb) frame 29 #29 0xc053fde3 in arpresolve (ifp=0xc2012800, rt0=0xc22fcdec, m=0xc25a8000, dst=0xe720bb28, desten=0xe720bacc uøbÀ+\001) at /usr/src/sys/netinet/if_ether.c:443 443 m_freem(la-la_hold); (kgdb) p *la $3 = { la_le = { le_next = 0xdeadc0de, le_prev = 0xdeadc0de }, la_rt = 0xdeadc0de, la_hold = 0xdeadc0de, la_preempt = 0xc0de, la_asked = 0xdead } Fixing this one is harder. We take la from unlocked rtentry obtained via rt_check(), or from arplookup(). The latter drops lock on rtentry, too. Then we do some work and use this la. It may have already been freed in arp_rtrequest(), the RTM_DELETE case. I see two approaches here: 1) Protecting llinfo with route lock. In this case we need rt_check() to return locked *rt (just reference won't help). We also need arplookup() to return locked rt. And do not unlock it withing all arpresolve() and a big part of in_arpinput() functions. I think for 5-stable this is the way to go. 2) Add mutex to llinfo_arp. I'm afraid this will hurt performance. The new ARP stuff should fix these issues, however it is not ready yet. At the moment it looks like it wont make it right away into 6.0 but go into 7-current and then MFC'd back for 6.1R. -- Andre ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic in RELENG_5 UMA
On Wed, Jun 22, 2005 at 03:03:53PM +0200, Andre Oppermann wrote: A Fixing this one is harder. We take la from unlocked rtentry obtained via A rt_check(), or from arplookup(). The latter drops lock on rtentry, too. A Then we do some work and use this la. It may have already been freed in A arp_rtrequest(), the RTM_DELETE case. A A I see two approaches here: A A 1) Protecting llinfo with route lock. In this case we need rt_check() A to return locked *rt (just reference won't help). We also need A arplookup() to return locked rt. And do not unlock it withing all A arpresolve() and a big part of in_arpinput() functions. A A I think for 5-stable this is the way to go. I have started working on this. Making arplookup() to return locked rt looks possible. There are two more questions: - is it possible to make rt_check() to return locked *rt? This requires editing nd6.c, and if_*subr.c. We can't MFC this to RELENG_5. Probably, at first step I'll try to avoid changing rt_check and see whether changing arplookup() is enough to avoid panics. - Is the following statement always true? la-la_rt-rt_llinfo == la A 2) Add mutex to llinfo_arp. I'm afraid this will hurt performance. A A The new ARP stuff should fix these issues, however it is not ready yet. A At the moment it looks like it wont make it right away into 6.0 but go A into 7-current and then MFC'd back for 6.1R. Yeah. I've already compiled a kernel with it. It is bootable and working, but I haven't yet run hard tests. I'll work on locking now and perform testing. In general it looks much better than what we have now. The locking is going to be simple and straightforward. Thanks for nice code! Do you mind if I pull it into a perforce branch to work on it together? -- Totus tuus, Glebius. GLEBIUS-RIPN GLEB-RIPE ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic in RELENG_5 UMA
Gleb Smirnoff wrote: On Wed, Jun 22, 2005 at 03:03:53PM +0200, Andre Oppermann wrote: A Fixing this one is harder. We take la from unlocked rtentry obtained via A rt_check(), or from arplookup(). The latter drops lock on rtentry, too. A Then we do some work and use this la. It may have already been freed in A arp_rtrequest(), the RTM_DELETE case. A A I see two approaches here: A A 1) Protecting llinfo with route lock. In this case we need rt_check() A to return locked *rt (just reference won't help). We also need A arplookup() to return locked rt. And do not unlock it withing all A arpresolve() and a big part of in_arpinput() functions. A A I think for 5-stable this is the way to go. I have started working on this. Making arplookup() to return locked rt looks possible. There are two more questions: - is it possible to make rt_check() to return locked *rt? This requires editing nd6.c, and if_*subr.c. We can't MFC this to RELENG_5. Probably, at first step I'll try to avoid changing rt_check and see whether changing arplookup() is enough to avoid panics. Actually I don't know if rt_check() can return a locket *rt. - Is the following statement always true? la-la_rt-rt_llinfo == la Good question. I'll look into Design and Implementation of 4.4BSD and FreeBSD 5 when I get home. A 2) Add mutex to llinfo_arp. I'm afraid this will hurt performance. A A The new ARP stuff should fix these issues, however it is not ready yet. A At the moment it looks like it wont make it right away into 6.0 but go A into 7-current and then MFC'd back for 6.1R. Yeah. I've already compiled a kernel with it. It is bootable and working, but I haven't yet run hard tests. I'll work on locking now and perform testing. In general it looks much better than what we have now. The locking is going to be simple and straightforward. Thanks for nice code! Do you mind if I pull it into a perforce branch to work on it together? Better wait a bit before you pull it into perforce. First we have to move Qing along and second I'd like to do one more iteration with him over the code. There are a couple of rough edges and style issues I'd like to carve out first. And then there is the tab-space problem which makes it a pain importing. We need to fix Qing's editor as the very first thing. ;-) -- Andre ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
panic in RELENG_5 UMA
Hi list, I caught a panic this night on my RELENG_5. The kernel was compiled on 2005/05/21. Please, feel free to ask for further informations (and include me explicitely in the recipients list since I'm not subscribed to this list). kgdb stacktrace: %%% #22 0xc0566d1d in panic ( fmt=0xc0728d5d Duplicate free of item %p from zone %p(%s)\n) at ../../../kern/kern_shutdown.c:550 td = (struct thread *) 0xc205ec00 bootopt = 256 newpanic = 1 ap = 0xd6d3f968 buf = Duplicate free of item 0xc1be8800 from zone 0xc1045ae0(Mbuf)\n, '\0' repeats 194 times #23 0xc069e280 in uma_dbg_free (zone=0xc1045ae0, slab=0xc1be8fa8, item=0xc1be8800) at ../../../vm/uma_dbg.c:301 keg = 0xc101f3c0 slabref = 0x0 freei = 8 #24 0xc069cc39 in uma_zfree_arg (zone=0xc1045ae0, item=0xc1be8800, udata=0x0) at ../../../vm/uma_core.c:2273 keg = 0xc101f3c0 cache = 0xc1045b18 bucket = 0xc1be2000 bflags = 0 cpu = 0 skip = SKIP_DTOR #25 0xc05a0a0b in m_freem (mb=0x0) at uma.h:304 No locals. #26 0xc05ee0d5 in arpresolve (ifp=0xc1a5b000, rt0=0xc1d44000, m=0xc1be7200, dst=0xd6d3fa94, desten=0xd6d3fa2c /æ]ÀäµwÀ) at ../../../netinet/if_ether.c:442 la = (struct llinfo_arp *) 0xc1a75a00 sdl = (struct sockaddr_dl *) 0xc2128910 error = -1038972656 rt = (struct rtentry *) 0xc1d44000 #27 0xc05dac65 in ether_output (ifp=0xc1a5b000, m=0xc1be7200, dst=0xd6d3fa94, rt0=0x0) at ../../../net/if_ethersubr.c:165 type = -10541 error = 50 hdrcmplt = 0 esrc = K\000\000\000\214z edst = /æ]Àäµ eh = (struct ether_header *) 0x32 loop_copy = 0 #28 0xc060150c in ip_output (m=0xc1be7200, opt=0xc1be7240, ro=0xd6d3fa90, flags=0, imo=0x0, inp=0xc40f7a8c) at ../../../netinet/ip_output.c:770 ip = (struct ip *) 0xc1be7240 ifp = (struct ifnet *) 0xc1a5b000 m0 = (struct mbuf *) 0xc1be7240 hlen = 20 len = 1 error = 0 dst = (struct sockaddr_in *) 0xd6d3fa94 ia = (struct in_ifaddr *) 0xc1c2b300 isbroadcast = 0 sw_csum = 1 iproute = {ro_rt = 0xc1d44000, ro_dst = {sa_len = 16 '\020', sa_family = 2 '\002', sa_data = \000\000Àš\001²\000\000\000\000\000\000\000}} odst = {s_addr = 1} fwd_tag = (struct m_tag *) 0x0 __func__ = ip_output #29 0xc060aba1 in tcp_output (tp=0xc1d75534) at ../../../netinet/tcp_output.c:1119 so = (struct socket *) 0xc2afe000 len = 144 recwin = 66608 sendwin = -1044483500 flags = 24 error = -1044483500 m = (struct mbuf *) 0xc1be7200 ip = (struct ip *) 0xc1be7240 th = (struct tcphdr *) 0xc1be7254 opt = \001\001\b\n\002äm\003õJÁ+\001\000\000žà¯ÂÐà¯Â\000à¯Â\204ûÓÖ\203 ~ZÀÐà¯Â ipoptlen = 0 optlen = 12 hdrlen = 52 idle = 1 sendalot = 0 i = 299 sack_rxmit = 0 sack_bytes_rxmt = 0 p = (struct sackhole *) 0x0 tao = {tao_cc = 767, tao_ccsent = 3228670914, tao_mssopt = 64356} __func__ = tcp_output #30 0xc061167c in tcp_usr_send (so=0xc2afe000, flags=0, m=0xc1be7600, nam=0x0, control=0x0, td=0xc205ec00) at ../../../netinet/tcp_usrreq.c:699 error = 0 inp = (struct inpcb *) 0xc40f7a8c tp = (struct tcpcb *) 0xc1d75534 #31 0xc05a41e8 in sosend (so=0xc2afe000, addr=0x0, uio=0xd6d3fc70, top=0xc1be7600, control=0x0, flags=0, td=0xc205ec00) at ../../../kern/uipc_socket.c:835 mp = (struct mbuf **) 0xc1be7600 m = (struct mbuf *) 0xc1be7600 space = 33160 len = 144 resid = 0 clen = -1044482560 error = 0 dontroute = 0 atomic = 0 #32 0xc05928bf in soo_write (fp=0x0, uio=0xd6d3fc70, active_cred=0xc4211e80, flags=0, td=0xc205ec00) at ../../../kern/sys_socket.c:118 so = (struct socket *) 0xc2afe000 error = 144 #33 0xc058bc0b in dofilewrite (td=0xc205ec00, fp=0xc2aff83c, fd=0, buf=0x0, nbyte=3228877920, offset=Unhandled dwarf expression opcode 0x93 ) at file.h:245 auio = {uio_iov = 0xd6d3fc68, uio_iovcnt = 1, uio_offset = 143, uio_resid = 0, uio_segflg = UIO_USERSPACE, uio_rw = UIO_WRITE, uio_td = 0xc205ec00} aiov = {iov_base = 0x807d090, iov_len = 0} cnt = 144 error = -1066089376 ktruio = (struct uio *) 0x0 #34 0xc058ba74 in write (td=0xc205ec00, uap=0xd6d3fd04) at ../../../kern/sys_generic.c:300 fp = (struct file *) 0xc2aff83c error = 0 #35 0xc06d2a12 in syscall (frame= {tf_fs = -1078001617, tf_es = 47, tf_ds = -1078001617, tf_edi = 134671528, tf_esi = 144, tf_ebp = -1077943016, tf_isp = -690750108, tf_ebx = 671922152, tf _edx = 134671528, tf_ecx = 4, tf_eax = 4, tf_trapno = 12, tf_err = 2, tf_eip = 6 73631499, tf_cs = 31, tf_eflags = 518, tf_esp = -1077943044, tf_ss = 47}) at
Re: panic in RELENG_5 UMA
Hi, I caught a panic this night on my RELENG_5. The kernel was compiled on 2005/05/21. Please, feel free to ask for further informations (and include me explicitely in the recipients list since I'm not subscribed to this list). kgdb stacktrace: %%% [snip] %%% I was a little bit sleepy earlier this morning. I forgot to tell that my kernel is compiled with INVARIANTS and PREEMPTION. %%% (kgdb) up 26 #26 0xc05ee0d5 in arpresolve (ifp=0xc1a5b000, rt0=0xc1d44000, m=0xc1be7200, dst=0xd6d3fa94, desten=0xd6d3fa2c /æ]ÀäµwÀ) at ../../../netinet/if_ether.c:442 442 m_freem(la-la_hold); (kgdb) l 437 * There is an arptab entry, but no ethernet address 438 * response yet. Replace the held mbuf with this 439 * latest one. 440 */ 441 if (la-la_hold) 442 m_freem(la-la_hold); 443 la-la_hold = m; 444 if (rt-rt_expire) { 445 RT_LOCK(rt); 446 rt-rt_flags = ~RTF_REJECT; (kgdb) print *la $1 = {la_le = {le_next = 0xc1e74400, le_prev = 0xc077aa68}, la_rt = 0xc1d44000, la_hold = 0x0, la_preempt = 5, la_asked = 0} %%% -- Jeremie Le Hen jeremie at le-hen dot org ttz at chchile dot org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic in RELENG_5 UMA
On Tue, Jun 21, 2005 at 09:04:27AM +0200, Jeremie Le Hen wrote: J #25 0xc05a0a0b in m_freem (mb=0x0) at uma.h:304 J No locals. J #26 0xc05ee0d5 in arpresolve (ifp=0xc1a5b000, rt0=0xc1d44000, m=0xc1be7200, J dst=0xd6d3fa94, desten=0xd6d3fa2c /??]??w??) J at ../../../netinet/if_ether.c:442 J la = (struct llinfo_arp *) 0xc1a75a00 J sdl = (struct sockaddr_dl *) 0xc2128910 J error = -1038972656 J rt = (struct rtentry *) 0xc1d44000 IMHO, this looks like a race. The route is not locked, when its llinfo is edited. Probably the mbuf was freed when arp reply arrived and la_hold was send. Look into in_arpinput() near 736: (*ifp-if_output)(ifp, la-la_hold, rt_key(rt), rt); la-la_hold = 0; Yeah, I have just triggered another panic running 15 instances of this script on SMP box: ( while (true); do arp -d 81.19.64.111 /dev/null 21; ping -c 1 -t 1 81.19.64.111 /dev/null 21; done ) But my duplicate free is in fxp_txeof(). This means that output thread has won the race. -- Totus tuus, Glebius. GLEBIUS-RIPN GLEB-RIPE ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic in RELENG_5 UMA
Hi Gleb, IMHO, this looks like a race. The route is not locked, when its llinfo is edited. Probably the mbuf was freed when arp reply arrived and la_hold was send. Look into in_arpinput() near 736: (*ifp-if_output)(ifp, la-la_hold, rt_key(rt), rt); la-la_hold = 0; Yeah, I have just triggered another panic running 15 instances of this script on SMP box: ( while (true); do arp -d 81.19.64.111 /dev/null 21; ping -c 1 -t 1 81.19.64.111 /dev/null 21; done ) But my duplicate free is in fxp_txeof(). This means that output thread has won the race. This explanation sounds good but my box is an UP with PREEMPTION. Is is supposed to be also possible in this case ? Regards, -- Jeremie Le Hen jeremie at le-hen dot org ttz at chchile dot org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic in RELENG_5 UMA
On Tue, Jun 21, 2005 at 11:28:36AM +0200, Jeremie Le Hen wrote: J IMHO, this looks like a race. The route is not locked, when J its llinfo is edited. J J Probably the mbuf was freed when arp reply arrived and la_hold was send. J Look into in_arpinput() near 736: J J (*ifp-if_output)(ifp, la-la_hold, rt_key(rt), rt); J la-la_hold = 0; J J Yeah, I have just triggered another panic running 15 instances of this J script on SMP box: J J ( J while (true); do J arp -d 81.19.64.111 /dev/null 21; J ping -c 1 -t 1 81.19.64.111 /dev/null 21; J done J ) J J But my duplicate free is in fxp_txeof(). This means that output thread has J won the race. J J This explanation sounds good but my box is an UP with PREEMPTION. J Is is supposed to be also possible in this case ? I guess yes, because of preemption. -- Totus tuus, Glebius. GLEBIUS-RIPN GLEB-RIPE ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic in RELENG_5 UMA
[ cc'ing parties involved in this part of code] On Tue, Jun 21, 2005 at 01:07:01PM +0400, Gleb Smirnoff wrote: T On Tue, Jun 21, 2005 at 09:04:27AM +0200, Jeremie Le Hen wrote: T J #25 0xc05a0a0b in m_freem (mb=0x0) at uma.h:304 T J No locals. T J #26 0xc05ee0d5 in arpresolve (ifp=0xc1a5b000, rt0=0xc1d44000, m=0xc1be7200, T J dst=0xd6d3fa94, desten=0xd6d3fa2c /??]??w??) T J at ../../../netinet/if_ether.c:442 T J la = (struct llinfo_arp *) 0xc1a75a00 T J sdl = (struct sockaddr_dl *) 0xc2128910 T J error = -1038972656 T J rt = (struct rtentry *) 0xc1d44000 T T IMHO, this looks like a race. The route is not locked, when T its llinfo is edited. T T Probably the mbuf was freed when arp reply arrived and la_hold was send. T Look into in_arpinput() near 736: T T (*ifp-if_output)(ifp, la-la_hold, rt_key(rt), rt); T la-la_hold = 0; T T Yeah, I have just triggered another panic running 15 instances of this script on T SMP box: T T ( T while (true); do T arp -d 81.19.64.111 /dev/null 21; T ping -c 1 -t 1 81.19.64.111 /dev/null 21; T done T ) T T But my duplicate free is in fxp_txeof(). This means that output thread has T won the race. I suppose that the attached patch closes your race. However, there is still race between RTM_DELETE and output path. The above script still drops kernel to panic, but the other one. Output path works with already freed llinfo: #28 0xc0507000 in m_freem (mb=0x0) at mbuf.h:410 #29 0xc053fde3 in arpresolve (ifp=0xc2012800, rt0=0xc22fcdec, m=0xc25a8000, dst=0xe720bb28, desten=0xe720bacc uЬbю+\001) at /usr/src/sys/netinet/if_ether.c:443 #30 0xc0538078 in ether_output (ifp=0xc2012800, m=0xc25a8000, dst=0xe720bb28, rt0=0xc22fcdec) at /usr/src/sys/net/if_ethersubr.c:173 #31 0xc054b5b4 in ip_output (m=0xc25a8000, opt=0xc25a80ac, ro=0xe720bb24, flags=0x20, imo=0x0, inp=0xc25eb5a0) at /usr/src/sys/netinet/ip_output.c:772 #32 0xc054d36b in rip_output (m=0xc25a8000, so=0x0, dst=0x0) at /usr/src/sys/netinet/raw_ip.c:320 #33 0xc054de7b in rip_send (so=0xc248c914, flags=0x0, m=0xc25a8000, nam=0xc218d410, control=0x0, td=0xc224d7d0) at /usr/src/sys/netinet/raw_ip.c:785 #34 0xc050a30f in sosend (so=0xc248c914, addr=0xc218d410, uio=0xe720bc3c, top=0xc25a8000, control=0x0, flags=0x0, td=0xc224d7d0) at /usr/src/sys/kern/uipc_socket.c:827 (kgdb) frame 29 #29 0xc053fde3 in arpresolve (ifp=0xc2012800, rt0=0xc22fcdec, m=0xc25a8000, dst=0xe720bb28, desten=0xe720bacc uЬbю+\001) at /usr/src/sys/netinet/if_ether.c:443 443 m_freem(la-la_hold); (kgdb) p *la $3 = { la_le = { le_next = 0xdeadc0de, le_prev = 0xdeadc0de }, la_rt = 0xdeadc0de, la_hold = 0xdeadc0de, la_preempt = 0xc0de, la_asked = 0xdead } Fixing this one is harder. We take la from unlocked rtentry obtained via rt_check(), or from arplookup(). The latter drops lock on rtentry, too. Then we do some work and use this la. It may have already been freed in arp_rtrequest(), the RTM_DELETE case. I see two approaches here: 1) Protecting llinfo with route lock. In this case we need rt_check() to return locked *rt (just reference won't help). We also need arplookup() to return locked rt. And do not unlock it withing all arpresolve() and a big part of in_arpinput() functions. 2) Add mutex to llinfo_arp. I'm afraid this will hurt performance. -- Totus tuus, Glebius. GLEBIUS-RIPN GLEB-RIPE Index: if_ether.c === RCS file: /home/ncvs/src/sys/netinet/if_ether.c,v retrieving revision 1.137 diff -u -r1.137 if_ether.c --- if_ether.c 5 Jun 2005 03:13:12 - 1.137 +++ if_ether.c 21 Jun 2005 10:36:08 - @@ -438,11 +438,11 @@ * response yet. Replace the held mbuf with this * latest one. */ + RT_LOCK(rt); if (la-la_hold) m_freem(la-la_hold); la-la_hold = m; if (rt-rt_expire) { - RT_LOCK(rt); rt-rt_flags = ~RTF_REJECT; if (la-la_asked == 0 || rt-rt_expire != time_second) { rt-rt_expire = time_second; @@ -459,8 +459,8 @@ } } - RT_UNLOCK(rt); } + RT_UNLOCK(rt); return (EWOULDBLOCK); } @@ -642,6 +642,8 @@ goto reply; la = arplookup(isaddr.s_addr, itaddr.s_addr == myaddr.s_addr, 0); if (la (rt = la-la_rt) (sdl = SDL(rt-rt_gateway))) { + struct mbuf *hold; + /* the following is not an error when doing bridging */ if (!bridged rt-rt_ifp != ifp #ifdef DEV_CARP @@ -729,11 +731,13 @@ if (rt-rt_expire) rt-rt_expire = time_second + arpt_keep; rt-rt_flags = ~RTF_REJECT; - RT_UNLOCK(rt); la-la_asked = 0; la-la_preempt