tl;dr: How do I figure out what the state of the ip6_forward (or other
associated functions) are after a crash? It is really a pain trying to
figure this out based on traffic and trying to replicate the correct
set of pfSense configurations.


Just an update on my adventures. It's quite complicated so I shall put
it as a list of points...

Part 1
1. Spent a weekend copying the pfSense VM to a physical machine and
running it in place of the VM.

2. Eventually the crashes do occur, so I'm entirely convinced this is
not a VM issue

3. Over the last week, I've been disabling various interfaces and
services to narrow down the cause, finally it has been narrowed down
to our VPN bridge interface <LAN, OVPN1, OVPN2>

4. If I bring any of the VPN interfaces down/take any of the VPN
interfaces out of the bridge, or bring the bridge interface down, the
crashes stop


Part 2
1. I managed to dig up an old (closed) bug with the exact backtrace I
was getting https://redmine.pfsense.org/issues/5428

2. There are two more bugs on the FreeBSD bugtracker with similar
crashes, the latest was in July this year

3. However, they've all been fixed with the same patch. Checked
RELENG_2_4/sys/netpfil/pf/pf.c , and the patch should be in the 2.4.X
release

    a. The network is running with jumbo packets (9k), is it possible
the patch does not cover such a case?

4. This similarity led me to believe that I could be facing a similar
issue, apparently with IPv6 multicast traffic

5. Set up port remote tcpdump so I could capture traffic right before the crash

6. Isolated the traffic cause! Two conditions happening causes the crash:

    a. There is at least one VPN client connected
    b. There is a macbook running Sierra/High Sierra on the main network

7. Each time the macbook joins the network/sleeps/wakes, the V6
traffic, specifically a certain MDNS query, causes the crash

8. Now the somewhat random but consistent timing makes sense! We have
someone using a macbook come in at around 8pm every day

9. Isolated 2 packet specimens that causes the crash, and 2 of the
same type that does not

    a. It does contain names of our users' computers (which on a mac
contains real names), so I'm not inclined to share them on the list,
furthermore, I don't have steps to reproduce the crash with the
packets from a vanilla install, so they're of limited use
    b. If anyone is interested to take a look at what the differences
between these two sets of packets are, I can email them to you
directly

Part 3
1. Since I could cause the crash at will, I tried creating
reproducible steps so I can properly report this as a bug

2. Set up a new pfSense install, replicated the interfaces, set up
oVPN, made a single client connection

3. Unable to reproduce crash with clean install

4. Tried reproducing crash on the actual pfSense install... crashes
now not happening??!!?

5. It was very late into the night, no devices except my dummy oVPN
connection and test machine were online, maybe the captured MDNS
packets were not the direct cause, but the response from one of the
devices is?


So I'm at a loss right now. I have things narrowed down really tight
on the traffic end, but still have no way to reproduce it from a
vanilla install, nor do I know where to even begin looking for the
cause in the kernel code.

I'll try again tomorrow to see if there is a response from some device
that is the actual cause of the crash. But some suggestions are
welcome!



Liwei



On 23 November 2017 at 01:17, WebDawg <webd...@gmail.com> wrote:
> The bridging may need tested and filed as a bug.
>
> On Wed, Nov 22, 2017 at 11:15 AM, Liwei <xieli...@gmail.com> wrote:
>> On Thu, 23 Nov 2017 at 00:38 WebDawg <webd...@gmail.com> wrote:
>>
>>> I am glad that you seemed to have resolved it, does the serial port
>>> get the standard kernel messages...
>>>
>>
>> It isn't really solved though as I have to take our bridged VPNs offline.
>>
>> Yes it does, but nothing relevant gets spewed out of the serial port before
>> the panic comes up. The first sign I can see on the serial port of things
>> going wrong is the kernel panic itself.
>>
>>
>>>
>>> usually you log in and tail some log files
>>>
>>
>> Got it
>>
>>
>>>
>>> (bridging our oVPN tap interfaces to the main and private LANs)
>>>
>>> This was bridging done in pfSense right?
>>>
>>
>> That's right.
>>
>>
>>>
>>> On Wed, Nov 22, 2017 at 8:07 AM, Liwei <xieli...@gmail.com> wrote:
>>> > On Tue, 21 Nov 2017 at 01:08 WebDawg <webd...@gmail.com> wrote:
>>> >
>>> >> It should work though.  A great many people virtualize pfSense:
>>> >>
>>> >> https://doc.pfsense.org/index.php/PfSense_on_VMware_vSphere_/_ESXi
>>> >>
>>> >> Here is some more information:
>>> >>
>>> >> https://doc.pfsense.org/index.php/VirtIO_Driver_Support
>>> >> https://doc.pfsense.org/index.php/Lost_Traffic_/_Packets_Disappear
>>> >> https://doc.pfsense.org/index.php/Virtualizing_pfSense_on_Proxmox
>>> >>
>>> >> I know what it is like to ask for support and see people stop helping
>>> >> because something is virtualized.  I have seen bad code fail in
>>> >> virtualization situations only to here 'do not virtualize'.
>>> >>
>>> >> From what I know, BSD has trouble with NIC interfaces and such.  Do
>>> >> you have any limiters or QOS installed?  I would take a look at the
>>> >> nic interfaces first.  Can you actively monitor the log to look for
>>> >> errors once the VM is booted?
>>> >>
>>> >> I virtualized pfSense on proxmox about a year ago and BSD hated the
>>> >> cpu timers and such.  I would get so many issues from it until I
>>> >> figured it out but everything was plain as day in the kernel messages
>>> >> that were outputted.
>>> >>
>>> >> There is an ova file available via the gold subscription:
>>> >>
>>> >> https://doc.pfsense.org/index.php/VMware_Appliance
>>> >>
>>> >> You need to get more information for me to help further.  It would be
>>> >> great to get a copy of some logs.
>>> >>
>>> >> Here is a XenServer thread:
>>> >> https://forum.pfsense.org/index.php?topic=88467
>>> >>
>>> >> Last time I virtualized the big deal was hvm nic vs pvhvm NIC.  You
>>> >> could do limiters on one (I think hvm) but the NIC's become CPU bound
>>> >> because of how HVM works.  I could only push like 10-30 mbits out of
>>> >> an i3 processor.
>>> >>
>>> >> I do not know if this has been solved, or if it is solvable.  pfSense
>>> >> follows FreeBSD so most of the fixes for this come from FreeBSD,
>>> >> though pfSense had/has some of its own kernel hacks.
>>> >>
>>> >>
>>> >>
>>> > Hi Vick, thanks for the assistance, nonetheless!
>>> >
>>> > Hi WebDawg,
>>> >     Yeah, I guessed as much that the problem should be on my side,
>>> because
>>> > something this fatal should already be widely reported.
>>> >
>>> >     I don't have any limiters or QoS set. I've set up logging of the
>>> serial
>>> > port so at least I know what are the events leading up to the crash.
>>> > Nothing interesting though, it just... happens. How do I set up log
>>> > monitoring? My guess is I'll probably have to turn on remote syslog and
>>> log
>>> > over. Will set up when I get the chance.
>>> >
>>> >     The odd thing is this is a 7+ years old setup (but we did do a fresh
>>> > install of 2.3 when we upgraded hardware 1+ years ago), and we never had
>>> > any serious issues. In fact it was purring along nicely on 2.3 since it
>>> was
>>> > first installed, until we upgraded to 2.4.
>>> >
>>> >     I'm pretty confident of the hardware since it is only a year old, the
>>> > other VMs are not having any issues, and reverting to 2.3 works fine.
>>> Thus
>>> > based on a hunch I decided to remove a couple of bridge interfaces
>>> > (bridging our oVPN tap interfaces to the main and private LANs) when I
>>> sent
>>> > my first email to the list.
>>> >
>>> >     The crashes haven't occurred since then for 2 days. I'm not sure if
>>> it
>>> > is a coincidence or not, but it does seem like my configuration may be
>>> > triggering some bug. Or I may have mis-configured something.
>>> >
>>> >     I'll continue to iterate things around to narrow down the problem,
>>> but
>>> > given that I have to wait a few days after each change to be sure on
>>> > whether it crashes or not, any suggestion is very welcome!
>>> >
>>> > Warm regards,
>>> > Liwei
_______________________________________________
pfSense mailing list
https://lists.pfsense.org/mailman/listinfo/list
Support the project with Gold! https://pfsense.org/gold

Reply via email to