Re: Kernel Panic

2018-03-01 Thread Joe Jones
Hi,


there is a function called pf_get_sport in /usr/src/sys/netpfil/pf/pf_lb.c 
which contains a do while loop, the guard is ! PF_AEQ(_addr, naddr, af)). 
We put a counter in this loop and we saw it spin 431728 times, this appears to 
coincide with a lockup. we'll continue investigating tomorrow.


Regards

Joe Jones


From: Kristof Provost 
Sent: 01 March 2018 09:57:18
To: Joe Jones
Cc: freebsd-pf@freebsd.org
Subject: Re: Kernel Panic

On 1 Mar 2018, at 15:37, Joe Jones wrote:
> yes we use pfsync. Yesterday we tried with pfsync switched off, the
> box still locked up but this time without a panic.
>
> We make the DIOCRADDADDRS ioctl on the master and the backup (we use
> CARPed pairs).
>
Interesting. It might be related to pfsync. Is is the master that panics
or the backup? Or both?

Regards,
Kristof
___
freebsd-pf@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-pf
To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"


Re: Kernel Panic

2018-03-01 Thread Ermal Luçi
On Thu, Mar 1, 2018 at 9:43 AM, Joe Jones 
wrote:

> Hi Kristo,
>
> It's just the master that crashed, the backup can take over.
>
> We think the panic we got by compiling with witness and invariant may be a
> red herring.
>
> We are now looking rules like
>
> nat on $isp_if from  to any ->  sticky-address
>
> if we replace the external_napts table with a single address rather than a
> block of addresses the box does not crash.
>
> We are following this line of investigation at the moment.
>

This is a known issue and should be documented somewhere, possibly man page.
It source is when locking was re-designed for pf(4).

https://github.com/freebsd/freebsd/blob/releng/11.1/sys/netpfil/pf/pf_lb.c#L428

* XXXGL: in the round-robin case we need to store
* the round-robin machine state in the rule, thus
* forwarding thread needs to modify rule.
*
* This is done w/o locking, because performance is assumed
* more important than round-robin precision.
*
* In the simpliest case we just update the "rpool->cur"
* pointer. However, if pool contains tables or dynamic
* addresses, then "tblidx" is also used to store machine
* state. Since "tblidx" is int, concurrent access to it can't
* lead to inconsistence, only to lost of precision.
*
* Things get worse, if table contains not hosts, but
* prefixes. In this case counter also stores machine state,
* and for IPv6 address, counter can't be updated atomically.
* Probably, using round-robin on a table containing IPv6
* prefixes (or even IPv4) would cause a panic.

The fix is to add proper locking around such scenario.
At minimum there would be needed a RULES_WLOCK in there or maybe reside to
atomics.



> Regards
> Joe Jones
>
>
> On 01/03/18 09:57, Kristof Provost wrote:
>
>> On 1 Mar 2018, at 15:37, Joe Jones wrote:
>>
>>> yes we use pfsync. Yesterday we tried with pfsync switched off, the box
>>> still locked up but this time without a panic.
>>>
>>> We make the DIOCRADDADDRS ioctl on the master and the backup (we use
>>> CARPed pairs).
>>>
>>> Interesting. It might be related to pfsync. Is is the master that panics
>> or the backup? Or both?
>>
>> Regards,
>> Kristof
>>
>
> ___
> freebsd-pf@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-pf
> To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"
>
> --
> Ermal
>
___
freebsd-pf@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-pf
To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"


Re: Kernel Panic

2018-03-01 Thread Joe Jones

Hi Kristo,

It's just the master that crashed, the backup can take over.

We think the panic we got by compiling with witness and invariant may be 
a red herring.


We are now looking rules like

nat on $isp_if from  to any ->  sticky-address

if we replace the external_napts table with a single address rather than 
a block of addresses the box does not crash.


We are following this line of investigation at the moment.

Regards
Joe Jones

On 01/03/18 09:57, Kristof Provost wrote:

On 1 Mar 2018, at 15:37, Joe Jones wrote:
yes we use pfsync. Yesterday we tried with pfsync switched off, the 
box still locked up but this time without a panic.


We make the DIOCRADDADDRS ioctl on the master and the backup (we use 
CARPed pairs).


Interesting. It might be related to pfsync. Is is the master that 
panics or the backup? Or both?


Regards,
Kristof


___
freebsd-pf@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-pf
To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"


Re: Kernel Panic

2018-03-01 Thread Joe Jones

Hi Kristof,

yes we use pfsync. Yesterday we tried with pfsync switched off, the box 
still locked up but this time without a panic.


We make the DIOCRADDADDRS ioctl on the master and the backup (we use 
CARPed pairs).


Regards Joe Jones

On 01/03/18 03:00, Kristof Provost wrote:

On 28 Feb 2018, at 9:52, Kristof Provost wrote:

On 27 Feb 2018, at 20:40, Joe Jones wrote:

we have a kernel panic after compiling with witness and invariant

Feb 27 13:49:33 sovapn1 kernel: lock order reversal:
Feb 27 13:49:33 sovapn1 kernel: 1st 0xfe000fed78b8 pf_idhash 
(pf_idhash) @ /usr/src/sys/netpfil/pf/pf.c:1078
Feb 27 13:49:33 sovapn1 kernel: 2nd 0xf8001e0474a8 pfsync 
(pfsync) @ /usr/src/sys/netpfil/pf/if_pfsync.c:1667


That’s a lock order reversal. It’s not good, but it should at worst 
result in a deadlock. Did the system stop after this?
It also looks like a different problem from the panic you initially 
reported.


Also, do you actively use pfsync in this setup? Does the panic happen 
on the box where you DIOCRADDADDRS or the other(s)?


Regards,
Kristof


___
freebsd-pf@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-pf
To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"


Re: Kernel Panic

2018-03-01 Thread Kristof Provost

On 1 Mar 2018, at 15:37, Joe Jones wrote:
yes we use pfsync. Yesterday we tried with pfsync switched off, the 
box still locked up but this time without a panic.


We make the DIOCRADDADDRS ioctl on the master and the backup (we use 
CARPed pairs).


Interesting. It might be related to pfsync. Is is the master that panics 
or the backup? Or both?


Regards,
Kristof
___
freebsd-pf@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-pf
To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"