Re: pf crash on -current

2015-02-24 Thread Kristof Provost
On 2015-02-24 08:05:47 (+0100), Kristof Provost kris...@sigsegv.be wrote:
 On 2015-02-23 17:23:55 (-0800), Davide Italiano dav...@freebsd.org wrote:
  The bt you posted suggest this could be stack overflow, probably due
  to infinite recursion.
  Also, as a wild guess, just looking at the stacktrace, I think this
  might be related to the recent ipv6 fragment changes. Try to back them
  out, and see if things gets more stable ( r278831 and r278843).
  
 That's almost certainly what it is.
 
After a bit of fiddling around I've managed to reproduce this locally.

Essentially we get caught in a loop of defragmenting and refragmenting:
Fragmented packets come in on one interface and get collected until we
can defragment it. At that point the defragmented packet is handed back
to the ip stack (at the pfil point in ip6_input(). Normal processing
continues.
Eventually we figure out that the packet has to be forwarded and we end
up at the pfil hook in ip6_forward(). After doing the inspection on the
defragmented packet we see that the packet has been defragmented and
because we're forwarding we have to refragment it. That's indicated by
the presence of the PF_REASSEMBLED tag.

In pf_refragment6() we remove that tag, split the packet up again and
then ip6_forward() the individual fragments.
Those fragments hit the pfil hook on the way out, so they're
collected until we can reconstruct the full packet, at which point we're
right back where we left off and things continue until we run out of
stack.

There are two reasons Allan is seeing this and no one else has so far.

The first is that he's scrubbing both on input and output. My own tests
have always been done with 'scrub in all fragment reassemble', rather
than 'scrub all fragment reassemble' so I didn't see this problem.

The second is that he's got an internal interface with a higher MTU,
so the refragmentation actually works for him.
There's an open problem where ip6_forward() drops the defragmented
packet before the pfil(PFIL_OUT) hook because it's too big for the
output interface.
If the last patch of my series (https://reviews.freebsd.org/D1815) had
been merged as well more people would have been affected.

One possible fix for Allan's problem would be to tag the fragments after
refragmentation so that pf ignores them. After all, the defragmented
packet has already been inspected so there's no point in checking the
fragments again.

I have the feeling there's a way to fix this problem and the issue D1815
tries to fix in one go though. I'll need to think about it a bit more.

Regards,
Kristof
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: pf crash on -current

2015-02-24 Thread Allan Jude
On 2015-02-24 18:10, Kristof Provost wrote:
 On 2015-02-24 08:05:47 (+0100), Kristof Provost kris...@sigsegv.be wrote:
 On 2015-02-23 17:23:55 (-0800), Davide Italiano dav...@freebsd.org wrote:
 The bt you posted suggest this could be stack overflow, probably due
 to infinite recursion.
 Also, as a wild guess, just looking at the stacktrace, I think this
 might be related to the recent ipv6 fragment changes. Try to back them
 out, and see if things gets more stable ( r278831 and r278843).

 That's almost certainly what it is.

 After a bit of fiddling around I've managed to reproduce this locally.
 
 Essentially we get caught in a loop of defragmenting and refragmenting:
 Fragmented packets come in on one interface and get collected until we
 can defragment it. At that point the defragmented packet is handed back
 to the ip stack (at the pfil point in ip6_input(). Normal processing
 continues.
 Eventually we figure out that the packet has to be forwarded and we end
 up at the pfil hook in ip6_forward(). After doing the inspection on the
 defragmented packet we see that the packet has been defragmented and
 because we're forwarding we have to refragment it. That's indicated by
 the presence of the PF_REASSEMBLED tag.
 
 In pf_refragment6() we remove that tag, split the packet up again and
 then ip6_forward() the individual fragments.
 Those fragments hit the pfil hook on the way out, so they're
 collected until we can reconstruct the full packet, at which point we're
 right back where we left off and things continue until we run out of
 stack.
 
 There are two reasons Allan is seeing this and no one else has so far.
 
 The first is that he's scrubbing both on input and output. My own tests
 have always been done with 'scrub in all fragment reassemble', rather
 than 'scrub all fragment reassemble' so I didn't see this problem.
 
 The second is that he's got an internal interface with a higher MTU,
 so the refragmentation actually works for him.
 There's an open problem where ip6_forward() drops the defragmented
 packet before the pfil(PFIL_OUT) hook because it's too big for the
 output interface.
 If the last patch of my series (https://reviews.freebsd.org/D1815) had
 been merged as well more people would have been affected.
 
 One possible fix for Allan's problem would be to tag the fragments after
 refragmentation so that pf ignores them. After all, the defragmented
 packet has already been inspected so there's no point in checking the
 fragments again.
 
 I have the feeling there's a way to fix this problem and the issue D1815
 tries to fix in one go though. I'll need to think about it a bit more.
 
 Regards,
 Kristof
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
 

Admittedly, when I switched from using pfSense to vanilla FreeBSD for my
firewall, when I got new hardware and wanted to run bhyve as well, i
just dumped the rules from my pfsense into a file and then started
editing by hand. So the scrub rule was not really a decision I made, but
was inherited from the pfsense.

It is actually my external interface (point-to-point fibre transport
from my basement to the data center) that has the higher MTU (4400),
whereas my internal network is all standard 1500 MTU.


-- 
Allan Jude



signature.asc
Description: OpenPGP digital signature


Re: pf crash on -current

2015-02-23 Thread Davide Italiano
On Mon, Feb 23, 2015 at 5:17 PM, Allan Jude allanj...@freebsd.org wrote:
 Upgraded my router today, because it was approaching the 24 uptime days
 of doom

 Now, it likes to die on me, a lot



The bt you posted suggest this could be stack overflow, probably due
to infinite recursion.
Also, as a wild guess, just looking at the stacktrace, I think this
might be related to the recent ipv6 fragment changes. Try to back them
out, and see if things gets more stable ( r278831 and r278843).

--
Davide
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: pf crash on -current

2015-02-23 Thread Chris H
On Mon, 23 Feb 2015 20:17:06 -0500 Allan Jude allanj...@freebsd.org wrote

 Upgraded my router today, because it was approaching the 24 uptime days
 of doom
As to the uptime days of doom...
I inquired about this a week ago, and was informed the matter
had been resolved about a week earlier. I can't find the message
at the moment, or I'd share the revision I was provided. But
thought you like to know.

--Chris
 
 Now, it likes to die on me, a lot
 
 
 FreeBSD Nexus.HML3.ScaleEngine.net 11.0-CURRENT FreeBSD 11.0-CURRENT #0
 r279218:
  Mon Feb 23 22:16:24 UTC 2015
 r...@nexus.hml3.scaleengine.net:/usr/obj/usr/src/sys/GENERIC  amd64
 
 panic: double fault
 
 GNU gdb 6.1.1 [FreeBSD]
..
8-
..
 Previous frame inner to this frame (corrupt stack?)
 
 
 -- 
 Allan Jude


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: pf crash on -current

2015-02-23 Thread Kristof Provost
On 2015-02-23 17:23:55 (-0800), Davide Italiano dav...@freebsd.org wrote:
 On Mon, Feb 23, 2015 at 5:17 PM, Allan Jude allanj...@freebsd.org wrote:
  Upgraded my router today, because it was approaching the 24 uptime days
  of doom
 
  Now, it likes to die on me, a lot
 
 
 
 The bt you posted suggest this could be stack overflow, probably due
 to infinite recursion.
 Also, as a wild guess, just looking at the stacktrace, I think this
 might be related to the recent ipv6 fragment changes. Try to back them
 out, and see if things gets more stable ( r278831 and r278843).
 
That's almost certainly what it is.

Allan, can you give me a bit more information about your setup?
Specifically the pf rules, the network interfaces, the IP(v6)
addresses and the routes?

Thanks,
Kristof
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


pf crash on -current

2015-02-23 Thread Allan Jude
Upgraded my router today, because it was approaching the 24 uptime days
of doom

Now, it likes to die on me, a lot


FreeBSD Nexus.HML3.ScaleEngine.net 11.0-CURRENT FreeBSD 11.0-CURRENT #0
r279218:
 Mon Feb 23 22:16:24 UTC 2015
r...@nexus.hml3.scaleengine.net:/usr/obj/usr/src/sys/GENERIC  amd64

panic: double fault

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as amd64-marcel-freebsd...

Unread portion of the kernel message buffer:

Fatal double fault
rip = 0x820457a4
rsp = 0xfe01ee1f6ed0
rbp = 0xfe01ee1f73b0
cpuid = 0; apic id = 00
panic: double fault
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
0x817d0cc0
vpanic() at vpanic+0x189/frame 0x817d0d40
panic() at panic+0x43/frame 0x817d0da0
dblfault_handler() at dblfault_handler+0xa2/frame 0x817d0dc0
Xdblfault() at Xdblfault+0xac/frame 0x817d0dc0
--- trap 0x17, rip = 0x820457a4, rsp = 0x817d0e80, rbp =
0xfe01ee1f73b0 ---
pf_test_rule() at pf_test_rule+0x14/frame 0xfe01ee1f73b0
pf_test6() at pf_test6+0x1074/frame 0xfe01ee1f7570
pf_check6_out() at pf_check6_out+0x4d/frame 0xfe01ee1f75a0
pfil_run_hooks() at pfil_run_hooks+0xa3/frame 0xfe01ee1f7630
ip6_forward() at ip6_forward+0x44e/frame 0xfe01ee1f7780
pf_refragment6() at pf_refragment6+0x17a/frame 0xfe01ee1f7840
pf_test6() at pf_test6+0x98a/frame 0xfe01ee1f7a00
pf_check6_out() at pf_check6_out+0x4d/frame 0xfe01ee1f7a30
pfil_run_hooks() at pfil_run_hooks+0xa3/frame 0xfe01ee1f7ac0
ip6_forward() at ip6_forward+0x44e/frame 0xfe01ee1f7c10
pf_refragment6() at pf_refragment6+0x17a/frame 0xfe01ee1f7cd0
pf_test6() at pf_test6+0x98a/frame 0xfe01ee1f7e90
pf_check6_out() at pf_check6_out+0x4d/frame 0xfe01ee1f7ec0
pfil_run_hooks() at pfil_run_hooks+0xa3/frame 0xfe01ee1f7f50
ip6_forward() at ip6_forward+0x44e/frame 0xfe01ee1f80a0
pf_refragment6() at pf_refragment6+0x17a/frame 0xfe01ee1f8160
pf_test6() at pf_test6+0x98a/frame 0xfe01ee1f8320
pf_check6_out() at pf_check6_out+0x4d/frame 0xfe01ee1f8350
pfil_run_hooks() at pfil_run_hooks+0xa3/frame 0xfe01ee1f83e0
ip6_forward() at ip6_forward+0x44e/frame 0xfe01ee1f8530
pf_refragment6() at pf_refragment6+0x17a/frame 0xfe01ee1f85f0
pf_test6() at pf_test6+0x98a/frame 0xfe01ee1f87b0
pf_check6_out() at pf_check6_out+0x4d/frame 0xfe01ee1f87e0
pfil_run_hooks() at pfil_run_hooks+0xa3/frame 0xfe01ee1f8870
ip6_forward() at ip6_forward+0x44e/frame 0xfe01ee1f89c0
pf_refragment6() at pf_refragment6+0x17a/frame 0xfe01ee1f8a80
pf_test6() at pf_test6+0x98a/frame 0xfe01ee1f8c40
pf_check6_out() at pf_check6_out+0x4d/frame 0xfe01ee1f8c70
pfil_run_hooks() at pfil_run_hooks+0xa3/frame 0xfe01ee1f8d00
ip6_forward() at ip6_forward+0x44e/frame 0xfe01ee1f8e50
pf_refragment6() at pf_refragment6+0x17a/frame 0xfe01ee1f8f10
pf_test6() at pf_test6+0x98a/frame 0xfe01ee1f90d0
pf_check6_out() at pf_check6_out+0x4d/frame 0xfe01ee1f9100
pfil_run_hooks() at pfil_run_hooks+0xa3/frame 0xfe01ee1f9190
ip6_forward() at ip6_forward+0x44e/frame 0xfe01ee1f92e0
pf_refragment6() at pf_refragment6+0x17a/frame 0xfe01ee1f93a0
pf_test6() at pf_test6+0x98a/frame 0xfe01ee1f9560
pf_check6_out() at pf_check6_out+0x4d/frame 0xfe01ee1f9590
pfil_run_hooks() at pfil_run_hooks+0xa3/frame 0xfe01ee1f9620
ip6_forward() at ip6_forward+0x44e/frame 0xfe01ee1f9770
pf_refragment6() at pf_refragment6+0x17a/frame 0xfe01ee1f9830
pf_test6() at pf_test6+0x98a/frame 0xfe01ee1f99f0
pf_check6_out() at pf_check6_out+0x4d/frame 0xfe01ee1f9a20
pfil_run_hooks() at pfil_run_hooks+0xa3/frame 0xfe01ee1f9ab0
ip6_forward() at ip6_forward+0x44e/frame 0xfe01ee1f9c00
pf_refragment6() at pf_refragment6+0x17a/frame 0xfe01ee1f9cc0
pf_test6() at pf_test6+0x98a/frame 0xfe01ee1f9e80
pf_check6_out() at pf_check6_out+0x4d/frame 0xfe01ee1f9eb0
pfil_run_hooks() at pfil_run_hooks+0xa3/frame 0xfe01ee1f9f40
ip6_forward() at ip6_forward+0x44e/frame 0xfe01ee1fa090
pf_refragment6() at pf_refragment6+0x17a/frame 0xfe01ee1fa150
pf_test6() at pf_test6+0x98a/frame 0xfe01ee1fa310
pf_check6_out() at pf_check6_out+0x4d/frame 0xfe01ee1fa340
pfil_run_hooks() at pfil_run_hooks+0xa3/frame 0xfe01ee1fa3d0
ip6_forward() at ip6_forward+0x44e/frame 0xfe01ee1fa520
ip6_input() at ip6_input+0x2ed/frame 0xfe01ee1fa600
netisr_dispatch_src() at netisr_dispatch_src+0x86/frame 0xfe01ee1fa670
ether_demux() at ether_demux+0x17b/frame 0xfe01ee1fa6a0
ether_nh_input() at ether_nh_input+0x336/frame 0xfe01ee1fa6e0
netisr_dispatch_src() at 

Re: pf crash on -current

2015-02-23 Thread Allan Jude
On 2015-02-23 20:44, Chris H wrote:
 On Mon, 23 Feb 2015 20:17:06 -0500 Allan Jude allanj...@freebsd.org wrote
 
 Upgraded my router today, because it was approaching the 24 uptime days
 of doom
 As to the uptime days of doom...
 I inquired about this a week ago, and was informed the matter
 had been resolved about a week earlier. I can't find the message
 at the moment, or I'd share the revision I was provided. But
 thought you like to know.
 
 --Chris

Yes, I was installing the latest to get that fix, I know it is fixed.

-- 
Allan Jude



signature.asc
Description: OpenPGP digital signature


Re: pf crash on -current

2015-02-23 Thread Chris H
On Mon, 23 Feb 2015 20:49:02 -0500 Allan Jude allanj...@freebsd.org wrote

 On 2015-02-23 20:44, Chris H wrote:
  On Mon, 23 Feb 2015 20:17:06 -0500 Allan Jude allanj...@freebsd.org wrote
  
  Upgraded my router today, because it was approaching the 24 uptime days
  of doom
  As to the uptime days of doom...
  I inquired about this a week ago, and was informed the matter
  had been resolved about a week earlier. I can't find the message
  at the moment, or I'd share the revision I was provided. But
  thought you like to know.
  
  --Chris
 
 Yes, I was installing the latest to get that fix, I know it is fixed.
Sure, of course. I just thought the time frame might help
walking back revisions, till whatever cased the panic stopped.
But still be ahead of the uptime days of doom.
FWIW the revision that fixed it was: r278229
 
 -- 
 Allan Jude
--Chris


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org