Re: pf crash on -current
On 2015-02-24 08:05:47 (+0100), Kristof Provost kris...@sigsegv.be wrote: On 2015-02-23 17:23:55 (-0800), Davide Italiano dav...@freebsd.org wrote: The bt you posted suggest this could be stack overflow, probably due to infinite recursion. Also, as a wild guess, just looking at the stacktrace, I think this might be related to the recent ipv6 fragment changes. Try to back them out, and see if things gets more stable ( r278831 and r278843). That's almost certainly what it is. After a bit of fiddling around I've managed to reproduce this locally. Essentially we get caught in a loop of defragmenting and refragmenting: Fragmented packets come in on one interface and get collected until we can defragment it. At that point the defragmented packet is handed back to the ip stack (at the pfil point in ip6_input(). Normal processing continues. Eventually we figure out that the packet has to be forwarded and we end up at the pfil hook in ip6_forward(). After doing the inspection on the defragmented packet we see that the packet has been defragmented and because we're forwarding we have to refragment it. That's indicated by the presence of the PF_REASSEMBLED tag. In pf_refragment6() we remove that tag, split the packet up again and then ip6_forward() the individual fragments. Those fragments hit the pfil hook on the way out, so they're collected until we can reconstruct the full packet, at which point we're right back where we left off and things continue until we run out of stack. There are two reasons Allan is seeing this and no one else has so far. The first is that he's scrubbing both on input and output. My own tests have always been done with 'scrub in all fragment reassemble', rather than 'scrub all fragment reassemble' so I didn't see this problem. The second is that he's got an internal interface with a higher MTU, so the refragmentation actually works for him. There's an open problem where ip6_forward() drops the defragmented packet before the pfil(PFIL_OUT) hook because it's too big for the output interface. If the last patch of my series (https://reviews.freebsd.org/D1815) had been merged as well more people would have been affected. One possible fix for Allan's problem would be to tag the fragments after refragmentation so that pf ignores them. After all, the defragmented packet has already been inspected so there's no point in checking the fragments again. I have the feeling there's a way to fix this problem and the issue D1815 tries to fix in one go though. I'll need to think about it a bit more. Regards, Kristof ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: pf crash on -current
On 2015-02-24 18:10, Kristof Provost wrote: On 2015-02-24 08:05:47 (+0100), Kristof Provost kris...@sigsegv.be wrote: On 2015-02-23 17:23:55 (-0800), Davide Italiano dav...@freebsd.org wrote: The bt you posted suggest this could be stack overflow, probably due to infinite recursion. Also, as a wild guess, just looking at the stacktrace, I think this might be related to the recent ipv6 fragment changes. Try to back them out, and see if things gets more stable ( r278831 and r278843). That's almost certainly what it is. After a bit of fiddling around I've managed to reproduce this locally. Essentially we get caught in a loop of defragmenting and refragmenting: Fragmented packets come in on one interface and get collected until we can defragment it. At that point the defragmented packet is handed back to the ip stack (at the pfil point in ip6_input(). Normal processing continues. Eventually we figure out that the packet has to be forwarded and we end up at the pfil hook in ip6_forward(). After doing the inspection on the defragmented packet we see that the packet has been defragmented and because we're forwarding we have to refragment it. That's indicated by the presence of the PF_REASSEMBLED tag. In pf_refragment6() we remove that tag, split the packet up again and then ip6_forward() the individual fragments. Those fragments hit the pfil hook on the way out, so they're collected until we can reconstruct the full packet, at which point we're right back where we left off and things continue until we run out of stack. There are two reasons Allan is seeing this and no one else has so far. The first is that he's scrubbing both on input and output. My own tests have always been done with 'scrub in all fragment reassemble', rather than 'scrub all fragment reassemble' so I didn't see this problem. The second is that he's got an internal interface with a higher MTU, so the refragmentation actually works for him. There's an open problem where ip6_forward() drops the defragmented packet before the pfil(PFIL_OUT) hook because it's too big for the output interface. If the last patch of my series (https://reviews.freebsd.org/D1815) had been merged as well more people would have been affected. One possible fix for Allan's problem would be to tag the fragments after refragmentation so that pf ignores them. After all, the defragmented packet has already been inspected so there's no point in checking the fragments again. I have the feeling there's a way to fix this problem and the issue D1815 tries to fix in one go though. I'll need to think about it a bit more. Regards, Kristof ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org Admittedly, when I switched from using pfSense to vanilla FreeBSD for my firewall, when I got new hardware and wanted to run bhyve as well, i just dumped the rules from my pfsense into a file and then started editing by hand. So the scrub rule was not really a decision I made, but was inherited from the pfsense. It is actually my external interface (point-to-point fibre transport from my basement to the data center) that has the higher MTU (4400), whereas my internal network is all standard 1500 MTU. -- Allan Jude signature.asc Description: OpenPGP digital signature
Re: pf crash on -current
On Mon, Feb 23, 2015 at 5:17 PM, Allan Jude allanj...@freebsd.org wrote: Upgraded my router today, because it was approaching the 24 uptime days of doom Now, it likes to die on me, a lot The bt you posted suggest this could be stack overflow, probably due to infinite recursion. Also, as a wild guess, just looking at the stacktrace, I think this might be related to the recent ipv6 fragment changes. Try to back them out, and see if things gets more stable ( r278831 and r278843). -- Davide ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: pf crash on -current
On Mon, 23 Feb 2015 20:17:06 -0500 Allan Jude allanj...@freebsd.org wrote Upgraded my router today, because it was approaching the 24 uptime days of doom As to the uptime days of doom... I inquired about this a week ago, and was informed the matter had been resolved about a week earlier. I can't find the message at the moment, or I'd share the revision I was provided. But thought you like to know. --Chris Now, it likes to die on me, a lot FreeBSD Nexus.HML3.ScaleEngine.net 11.0-CURRENT FreeBSD 11.0-CURRENT #0 r279218: Mon Feb 23 22:16:24 UTC 2015 r...@nexus.hml3.scaleengine.net:/usr/obj/usr/src/sys/GENERIC amd64 panic: double fault GNU gdb 6.1.1 [FreeBSD] .. 8- .. Previous frame inner to this frame (corrupt stack?) -- Allan Jude ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: pf crash on -current
On 2015-02-23 17:23:55 (-0800), Davide Italiano dav...@freebsd.org wrote: On Mon, Feb 23, 2015 at 5:17 PM, Allan Jude allanj...@freebsd.org wrote: Upgraded my router today, because it was approaching the 24 uptime days of doom Now, it likes to die on me, a lot The bt you posted suggest this could be stack overflow, probably due to infinite recursion. Also, as a wild guess, just looking at the stacktrace, I think this might be related to the recent ipv6 fragment changes. Try to back them out, and see if things gets more stable ( r278831 and r278843). That's almost certainly what it is. Allan, can you give me a bit more information about your setup? Specifically the pf rules, the network interfaces, the IP(v6) addresses and the routes? Thanks, Kristof ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
pf crash on -current
Upgraded my router today, because it was approaching the 24 uptime days of doom Now, it likes to die on me, a lot FreeBSD Nexus.HML3.ScaleEngine.net 11.0-CURRENT FreeBSD 11.0-CURRENT #0 r279218: Mon Feb 23 22:16:24 UTC 2015 r...@nexus.hml3.scaleengine.net:/usr/obj/usr/src/sys/GENERIC amd64 panic: double fault GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as amd64-marcel-freebsd... Unread portion of the kernel message buffer: Fatal double fault rip = 0x820457a4 rsp = 0xfe01ee1f6ed0 rbp = 0xfe01ee1f73b0 cpuid = 0; apic id = 00 panic: double fault cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0x817d0cc0 vpanic() at vpanic+0x189/frame 0x817d0d40 panic() at panic+0x43/frame 0x817d0da0 dblfault_handler() at dblfault_handler+0xa2/frame 0x817d0dc0 Xdblfault() at Xdblfault+0xac/frame 0x817d0dc0 --- trap 0x17, rip = 0x820457a4, rsp = 0x817d0e80, rbp = 0xfe01ee1f73b0 --- pf_test_rule() at pf_test_rule+0x14/frame 0xfe01ee1f73b0 pf_test6() at pf_test6+0x1074/frame 0xfe01ee1f7570 pf_check6_out() at pf_check6_out+0x4d/frame 0xfe01ee1f75a0 pfil_run_hooks() at pfil_run_hooks+0xa3/frame 0xfe01ee1f7630 ip6_forward() at ip6_forward+0x44e/frame 0xfe01ee1f7780 pf_refragment6() at pf_refragment6+0x17a/frame 0xfe01ee1f7840 pf_test6() at pf_test6+0x98a/frame 0xfe01ee1f7a00 pf_check6_out() at pf_check6_out+0x4d/frame 0xfe01ee1f7a30 pfil_run_hooks() at pfil_run_hooks+0xa3/frame 0xfe01ee1f7ac0 ip6_forward() at ip6_forward+0x44e/frame 0xfe01ee1f7c10 pf_refragment6() at pf_refragment6+0x17a/frame 0xfe01ee1f7cd0 pf_test6() at pf_test6+0x98a/frame 0xfe01ee1f7e90 pf_check6_out() at pf_check6_out+0x4d/frame 0xfe01ee1f7ec0 pfil_run_hooks() at pfil_run_hooks+0xa3/frame 0xfe01ee1f7f50 ip6_forward() at ip6_forward+0x44e/frame 0xfe01ee1f80a0 pf_refragment6() at pf_refragment6+0x17a/frame 0xfe01ee1f8160 pf_test6() at pf_test6+0x98a/frame 0xfe01ee1f8320 pf_check6_out() at pf_check6_out+0x4d/frame 0xfe01ee1f8350 pfil_run_hooks() at pfil_run_hooks+0xa3/frame 0xfe01ee1f83e0 ip6_forward() at ip6_forward+0x44e/frame 0xfe01ee1f8530 pf_refragment6() at pf_refragment6+0x17a/frame 0xfe01ee1f85f0 pf_test6() at pf_test6+0x98a/frame 0xfe01ee1f87b0 pf_check6_out() at pf_check6_out+0x4d/frame 0xfe01ee1f87e0 pfil_run_hooks() at pfil_run_hooks+0xa3/frame 0xfe01ee1f8870 ip6_forward() at ip6_forward+0x44e/frame 0xfe01ee1f89c0 pf_refragment6() at pf_refragment6+0x17a/frame 0xfe01ee1f8a80 pf_test6() at pf_test6+0x98a/frame 0xfe01ee1f8c40 pf_check6_out() at pf_check6_out+0x4d/frame 0xfe01ee1f8c70 pfil_run_hooks() at pfil_run_hooks+0xa3/frame 0xfe01ee1f8d00 ip6_forward() at ip6_forward+0x44e/frame 0xfe01ee1f8e50 pf_refragment6() at pf_refragment6+0x17a/frame 0xfe01ee1f8f10 pf_test6() at pf_test6+0x98a/frame 0xfe01ee1f90d0 pf_check6_out() at pf_check6_out+0x4d/frame 0xfe01ee1f9100 pfil_run_hooks() at pfil_run_hooks+0xa3/frame 0xfe01ee1f9190 ip6_forward() at ip6_forward+0x44e/frame 0xfe01ee1f92e0 pf_refragment6() at pf_refragment6+0x17a/frame 0xfe01ee1f93a0 pf_test6() at pf_test6+0x98a/frame 0xfe01ee1f9560 pf_check6_out() at pf_check6_out+0x4d/frame 0xfe01ee1f9590 pfil_run_hooks() at pfil_run_hooks+0xa3/frame 0xfe01ee1f9620 ip6_forward() at ip6_forward+0x44e/frame 0xfe01ee1f9770 pf_refragment6() at pf_refragment6+0x17a/frame 0xfe01ee1f9830 pf_test6() at pf_test6+0x98a/frame 0xfe01ee1f99f0 pf_check6_out() at pf_check6_out+0x4d/frame 0xfe01ee1f9a20 pfil_run_hooks() at pfil_run_hooks+0xa3/frame 0xfe01ee1f9ab0 ip6_forward() at ip6_forward+0x44e/frame 0xfe01ee1f9c00 pf_refragment6() at pf_refragment6+0x17a/frame 0xfe01ee1f9cc0 pf_test6() at pf_test6+0x98a/frame 0xfe01ee1f9e80 pf_check6_out() at pf_check6_out+0x4d/frame 0xfe01ee1f9eb0 pfil_run_hooks() at pfil_run_hooks+0xa3/frame 0xfe01ee1f9f40 ip6_forward() at ip6_forward+0x44e/frame 0xfe01ee1fa090 pf_refragment6() at pf_refragment6+0x17a/frame 0xfe01ee1fa150 pf_test6() at pf_test6+0x98a/frame 0xfe01ee1fa310 pf_check6_out() at pf_check6_out+0x4d/frame 0xfe01ee1fa340 pfil_run_hooks() at pfil_run_hooks+0xa3/frame 0xfe01ee1fa3d0 ip6_forward() at ip6_forward+0x44e/frame 0xfe01ee1fa520 ip6_input() at ip6_input+0x2ed/frame 0xfe01ee1fa600 netisr_dispatch_src() at netisr_dispatch_src+0x86/frame 0xfe01ee1fa670 ether_demux() at ether_demux+0x17b/frame 0xfe01ee1fa6a0 ether_nh_input() at ether_nh_input+0x336/frame 0xfe01ee1fa6e0 netisr_dispatch_src() at
Re: pf crash on -current
On 2015-02-23 20:44, Chris H wrote: On Mon, 23 Feb 2015 20:17:06 -0500 Allan Jude allanj...@freebsd.org wrote Upgraded my router today, because it was approaching the 24 uptime days of doom As to the uptime days of doom... I inquired about this a week ago, and was informed the matter had been resolved about a week earlier. I can't find the message at the moment, or I'd share the revision I was provided. But thought you like to know. --Chris Yes, I was installing the latest to get that fix, I know it is fixed. -- Allan Jude signature.asc Description: OpenPGP digital signature
Re: pf crash on -current
On Mon, 23 Feb 2015 20:49:02 -0500 Allan Jude allanj...@freebsd.org wrote On 2015-02-23 20:44, Chris H wrote: On Mon, 23 Feb 2015 20:17:06 -0500 Allan Jude allanj...@freebsd.org wrote Upgraded my router today, because it was approaching the 24 uptime days of doom As to the uptime days of doom... I inquired about this a week ago, and was informed the matter had been resolved about a week earlier. I can't find the message at the moment, or I'd share the revision I was provided. But thought you like to know. --Chris Yes, I was installing the latest to get that fix, I know it is fixed. Sure, of course. I just thought the time frame might help walking back revisions, till whatever cased the panic stopped. But still be ahead of the uptime days of doom. FWIW the revision that fixed it was: r278229 -- Allan Jude --Chris ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org