Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)

2017-07-25 Thread Mathieu BLANC
On Tue, May 02, 2017 at 05:03:20PM +, Stuart Henderson wrote:
> Probably the best thing to do at this point is to write a mail to bugs@:
> 
> 1. describe what the machine is doing in detail. carp? ipsec? pfsync?
> what sort of relays? include config (sanitized if necessary, but do that
> consistently).
> 
> 2. copy in the panic message and stack trace as text (re-type it,
> don't attach a picture or send a link to a picture).
> 
> 3. make it a self-contained report with description etc all in the one
> message, don't rely on people having message history.
> 
> 4. include dmesg.

Hi Stuart, 

Thx for your answer !
I didn't have the time to work on this since early may.
But from time to time, I check the commit on pf.c and I saw this one which
seemed to perfectly match my bug :
http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/net/pf.c?rev=1.1035=text/x-cvsweb-markup

I tried the diff, and it seems to be OK ! I can't trigger the bug right now (it
was 100% before).

So, thx you again, and special thx to bluhm@ who made the patch ! 

-- 
Mathieu



Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)

2017-05-02 Thread Stuart Henderson
On 2017-05-02, Mathieu BLANC  wrote:
> On Wed, Mar 29, 2017 at 02:06:23PM +0200, Mathieu BLANC wrote:
>> It also kernel panics with just this pf rules :
>> # cat pf_minimal.conf 
>> set limit { states 10 }  
>> set skip on lo   
>> anchor "relayd/*"
>> pass 
>> 
>
> I upgraded the system to 6.1 release last week, the kernel panic is still here
> (with the same logs).

Probably the best thing to do at this point is to write a mail to bugs@:

1. describe what the machine is doing in detail. carp? ipsec? pfsync?
what sort of relays? include config (sanitized if necessary, but do that
consistently).

2. copy in the panic message and stack trace as text (re-type it,
don't attach a picture or send a link to a picture).

3. make it a self-contained report with description etc all in the one
message, don't rely on people having message history.

4. include dmesg.




Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)

2017-05-02 Thread Mathieu BLANC
On Tue, May 02, 2017 at 03:44:43PM +0200, Andre Ruppert wrote:
> Hi,
> 
> Im running 6.0 amd64 on a pair of R210 with relayd, but these are R210 (II).
> 
> No kernel panics at all, and these systems are working in a live
> environment...
> 
> Regards
> Andre

Hi,

Yes, i have also several OpenBSD on R210 + 6.0 (or 6.1) + relayd and it works
like a charm. 

The only problem appeared when an admin did a REJECT (iptables) on one on the
host checked by relayd with check tcp (i tried to put all the details i could
in the previous mails).

The next step is to try with current (until now i've waited for the 6.1 release
which was very close to be released).

-- 
Mathieu



Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)

2017-05-02 Thread Andre Ruppert

Hi,

Im running 6.0 amd64 on a pair of R210 with relayd, but these are R210 (II).

No kernel panics at all, and these systems are working in a live 
environment...


Regards
Andre



Am 02.05.17 um 15:03 schrieb Mathieu BLANC:

On Wed, Mar 29, 2017 at 02:06:23PM +0200, Mathieu BLANC wrote:

It also kernel panics with just this pf rules :
# cat pf_minimal.conf
set limit { states 10 }
set skip on lo
anchor "relayd/*"
pass



I upgraded the system to 6.1 release last week, the kernel panic is still here
(with the same logs).





smime.p7s
Description: S/MIME Cryptographic Signature


Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)

2017-05-02 Thread Mathieu BLANC
On Wed, Mar 29, 2017 at 02:06:23PM +0200, Mathieu BLANC wrote:
> It also kernel panics with just this pf rules :
> # cat pf_minimal.conf 
> set limit { states 10 }  
> set skip on lo   
> anchor "relayd/*"
> pass 
> 

I upgraded the system to 6.1 release last week, the kernel panic is still here
(with the same logs).

-- 
Mathieu



Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)

2017-03-29 Thread Mathieu BLANC
On Wed, Mar 29, 2017 at 10:40:08AM +0200, Mathieu BLANC wrote:
> On Tue, Mar 28, 2017 at 05:58:02PM +0200, Hiltjo Posthuma wrote:
> > On Tue, Mar 28, 2017 at 02:39:44PM +0200, Mathieu BLANC wrote:
> > > On Tue, Mar 28, 2017 at 02:22:28PM +0200, Mathieu BLANC wrote:
> > > > I can reproduce the bug (on the slave firewall) as many times as I want.
> > > > 
> > > 
> > > I've just read https://www.openbsd.org/ddb.html and saw that you need a 
> > > trace
> > > for all cpu.
> > > 
> > > http://www.hostingpics.net/viewer.php?id=238876panic9.jpg
> > > http://www.hostingpics.net/viewer.php?id=275943panic10.jpg
> > > http://www.hostingpics.net/viewer.php?id=375143panic11.jpg
> > > http://www.hostingpics.net/viewer.php?id=220012panic12.jpg
> > > 
> > > (it's a different crash from the last screenshots i've made, if it's not 
> > > good i
> > > can provide a full new set of pics)
> > > 
> > > -- 
> > > Mathieu
> > > 
> > 
> > Hey,
> > 
> > Can you also provide your pf.conf ?
> > 
> > Can you test if it also happens on -current?
> > 
> > -- 
> > Kind regards,
> > Hiltjo
> 
> Hello,
> 
> Unfortunately, i can't provide pf.conf as is (too many references to 
> customers,
> ips, etc...). But i think i can work on a minimal file which triggers the bug.
> I'll see that.
> 

It also kernel panics with just this pf rules :
# cat pf_minimal.conf 
set limit { states 10 }  
set skip on lo   
anchor "relayd/*"
pass 

-- 
Mathieu



Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)

2017-03-29 Thread Mathieu BLANC
On Tue, Mar 28, 2017 at 05:58:02PM +0200, Hiltjo Posthuma wrote:
> On Tue, Mar 28, 2017 at 02:39:44PM +0200, Mathieu BLANC wrote:
> > On Tue, Mar 28, 2017 at 02:22:28PM +0200, Mathieu BLANC wrote:
> > > I can reproduce the bug (on the slave firewall) as many times as I want.
> > > 
> > 
> > I've just read https://www.openbsd.org/ddb.html and saw that you need a 
> > trace
> > for all cpu.
> > 
> > http://www.hostingpics.net/viewer.php?id=238876panic9.jpg
> > http://www.hostingpics.net/viewer.php?id=275943panic10.jpg
> > http://www.hostingpics.net/viewer.php?id=375143panic11.jpg
> > http://www.hostingpics.net/viewer.php?id=220012panic12.jpg
> > 
> > (it's a different crash from the last screenshots i've made, if it's not 
> > good i
> > can provide a full new set of pics)
> > 
> > -- 
> > Mathieu
> > 
> 
> Hey,
> 
> Can you also provide your pf.conf ?
> 
> Can you test if it also happens on -current?
> 
> -- 
> Kind regards,
> Hiltjo

Hello,

Unfortunately, i can't provide pf.conf as is (too many references to customers,
ips, etc...). But i think i can work on a minimal file which triggers the bug.
I'll see that.

Fur -current, my idea was to try if i didn't get any response on the list for
-stable. 

But for now, we don't have any -current in production so i'm not sure :) 

I know there are plenty of people who have -current, i'm pretty confident with
it, but it's more a question of procedure, for example how to follow -current
efficiently over time. With -release and -stable it's pretty simple, upgrade
every 6 months + a few patch and it's ok :)

-- 
Mathieu



Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)

2017-03-28 Thread Hiltjo Posthuma
On Tue, Mar 28, 2017 at 02:39:44PM +0200, Mathieu BLANC wrote:
> On Tue, Mar 28, 2017 at 02:22:28PM +0200, Mathieu BLANC wrote:
> > I can reproduce the bug (on the slave firewall) as many times as I want.
> > 
> 
> I've just read https://www.openbsd.org/ddb.html and saw that you need a trace
> for all cpu.
> 
> http://www.hostingpics.net/viewer.php?id=238876panic9.jpg
> http://www.hostingpics.net/viewer.php?id=275943panic10.jpg
> http://www.hostingpics.net/viewer.php?id=375143panic11.jpg
> http://www.hostingpics.net/viewer.php?id=220012panic12.jpg
> 
> (it's a different crash from the last screenshots i've made, if it's not good 
> i
> can provide a full new set of pics)
> 
> -- 
> Mathieu
> 

Hey,

Can you also provide your pf.conf ?

Can you test if it also happens on -current?

-- 
Kind regards,
Hiltjo



Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)

2017-03-28 Thread Mathieu BLANC
On Tue, Mar 28, 2017 at 02:22:28PM +0200, Mathieu BLANC wrote:
> I can reproduce the bug (on the slave firewall) as many times as I want.
> 

I've just read https://www.openbsd.org/ddb.html and saw that you need a trace
for all cpu.

http://www.hostingpics.net/viewer.php?id=238876panic9.jpg
http://www.hostingpics.net/viewer.php?id=275943panic10.jpg
http://www.hostingpics.net/viewer.php?id=375143panic11.jpg
http://www.hostingpics.net/viewer.php?id=220012panic12.jpg

(it's a different crash from the last screenshots i've made, if it's not good i
can provide a full new set of pics)

-- 
Mathieu



Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)

2017-03-28 Thread Mathieu BLANC
On Tue, Mar 28, 2017 at 12:05:56PM +0300, Mihai Popescu wrote:
> Isn't there a CAPSLOOK written message at panic time on the screen?
> If not, look here:
> http://www.openbsd.org/report.html
> 

I can reproduce the bug (on the slave firewall) as many times as I want.

I made some screenshots. Sorry, I didn't manage to provide text logs (i'm in
DRAC). 

In http://man.openbsd.org/OpenBSD-6.0/crash i saw that i might be able to have
the ddb logs in dmesg after a warm reboot but it didn't work for me.

I don't know if you prefer http links or attached files. I have uploaded the
jpg here : 
http://www.hostingpics.net/viewer.php?id=835545panic1.jpg
http://www.hostingpics.net/viewer.php?id=149061panic2.jpg
http://www.hostingpics.net/viewer.php?id=328015panic3.jpg
http://www.hostingpics.net/viewer.php?id=730910panic4.jpg
http://www.hostingpics.net/viewer.php?id=607164panic5.jpg
http://www.hostingpics.net/viewer.php?id=272177panic6.jpg
http://www.hostingpics.net/viewer.php?id=689399panic7.jpg
http://www.hostingpics.net/viewer.php?id=499214panic8.jpg

I can attach the files if you want.

Here is my relayd conf :

_front_vip="A.B.C.D"

_front1="E.F.G.H"
_front2="I.J.K.L"

table  { $_front1 $_front2 }

redirect _http_vip {
listen on $_front_vip port http
forward to  mode source-hash check tcp
pftag RELAYD_VIP_NAT
}

On front1, if i made this command, my openbsd system crash. With DROP instead
of REJECT it's OK (tested 5-6 times) :
iptables -I INPUT -j REJECT -p tcp --dport 80 -s 

-- 
Mathieu



Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)

2017-03-28 Thread Mihai Popescu
Isn't there a CAPSLOOK written message at panic time on the screen?
If not, look here:
http://www.openbsd.org/report.html



Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)

2017-03-28 Thread Mathieu BLANC
On Mon, Mar 27, 2017 at 02:42:23PM +0200, Mathieu BLANC wrote:
> Hello all,
> 
> I have a pair of firewalls running 6.0 (patched with openup in october, no 
> patch
> applied since then). 
> 
> Since the upgrade, this pair has some problem with kernel
> panics (4 times since the upgrade in october).
> 
> The last one was this morning. The two firewall crashed at the same time with
> these logs :
> 
> /bsd: panic: kernel diagnostic assertion "(sk->inp == NULL) || 
> (sk->inp->inp_pf_sk == NULL)" failed: file "../../../../net/pf.c", line 6891
> /bsd: Starting stack trace...
> /bsd: panic() at panic+0x10b
> /bsd: __assert() at __assert+0x25
> /bsd: pf_state_key_unref() at pf_state_key_unref+0xc6
> /bsd: pf_pkt_unlink_state_key() at pf_pkt_unlink_state_key+0x15
> /bsd: m_free() at m_free+0xa0
> /bsd: sbdroprecord() at sbdroprecord+0x61
> /bsd: soreceive() at soreceive+0xb4f
> /bsd: recvit() at recvit+0x139
> /bsd: sys_recvfrom() at sys_recvfrom+0x9d
> /bsd: syscall() at syscall+0x27b
> /bsd: --- syscall (number 29) ---
> /bsd: end of kernel
> /bsd: end trace frame: 0x7f7dc870, count: 247
> /bsd: 0x18ccb3b21ada:
> /bsd: End of stack trace. 
> 

Hello,

This morning, another crash.

I found in daemon.log something very interesting. At the same second the
firewall crashed, i had the same resource checked by relayd which was gone down 
:

Yesterday :
Mar 27 11:51:48 fw5 relayd[94179]: host W.X.Y.Z, check tcp (16010ms,tcp connect 
timeout), state up -> down, availability 99.94%
Mar 27 11:51:48 fw5 relayd[89662]: table _http_vip: 0 added, 1 deleted, 0 
changed, 0 killed

This morning :
Mar 28 09:08:54 fw5 relayd[46733]: host W.X.Y.Z, check tcp (16010ms,tcp connect 
timeout), state up -> down, availability 99.95%
Mar 28 09:08:54 fw5 relayd[29633]: table _http_vip: 0 added, 1 deleted, 0 
changed, 0 killed

I called the admin in charge of host W.X.Y.Z. What he did on W.X.Y.Z was an
iptables REJECT command on the host (to remove it from relayd). We have tested
with DROP and it seems to not trigger the bug (i'll try to make more tests).



Kernel panic on Dell R210 with OpenBSD 6.0

2017-03-27 Thread Mathieu BLANC
Hello all,

I have a pair of firewalls running 6.0 (patched with openup in october, no patch
applied since then). 

Since the upgrade, this pair has some problem with kernel
panics (4 times since the upgrade in october).

The last one was this morning. The two firewall crashed at the same time with
these logs :

/bsd: panic: kernel diagnostic assertion "(sk->inp == NULL) || 
(sk->inp->inp_pf_sk == NULL)" failed: file "../../../../net/pf.c", line 6891
/bsd: Starting stack trace...
/bsd: panic() at panic+0x10b
/bsd: __assert() at __assert+0x25
/bsd: pf_state_key_unref() at pf_state_key_unref+0xc6
/bsd: pf_pkt_unlink_state_key() at pf_pkt_unlink_state_key+0x15
/bsd: m_free() at m_free+0xa0
/bsd: sbdroprecord() at sbdroprecord+0x61
/bsd: soreceive() at soreceive+0xb4f
/bsd: recvit() at recvit+0x139
/bsd: sys_recvfrom() at sys_recvfrom+0x9d
/bsd: syscall() at syscall+0x27b
/bsd: --- syscall (number 29) ---
/bsd: end of kernel
/bsd: end trace frame: 0x7f7dc870, count: 247
/bsd: 0x18ccb3b21ada:
/bsd: End of stack trace. 

I have another pair of firewalls with the same hardware (Dell R210) which is
running without problem.

After the crash this morning, i applied the last patches with openup. But after
reading the errata page, i'm not sure it will help... Or maybe this one could
be related :
https://ftp.openbsd.org/pub/OpenBSD/patches/6.0/common/019_pf.patch.sig ?

Thank you very much !

--
Mathieu
OpenBSD 6.0 (GENERIC.MP) #2: Mon Oct 17 10:22:47 CEST 2016

r...@stable-60-amd64.mtier.org:/binpatchng/work-binpatch60-amd64/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 1047105536 (998MB)
avail mem = 1010954240 (964MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.6 @ 0x3f79c000 (63 entries)
bios0: vendor Dell Inc. version "1.10.0" date 09/10/2013
bios0: Dell Inc. PowerEdge R210
acpi0 at bios0: rev 2
acpi0: sleep states S0 S4 S5
acpi0: tables DSDT FACP APIC SPCR HPET DM__ MCFG WD__ SLIC ERST HEST BERT EINJ 
TCPA SSDT
acpi0: wakeup devices PCI0(S5) USBA(S0) USBB(S0)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Xeon(R) CPU L3406 @ 2.27GHz, 2261.27 MHz
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,POPCNT,NXE,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 132MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.1, IBE
cpu1 at mainbus0: apid 4 (application processor)
cpu1: Intel(R) Xeon(R) CPU L3406 @ 2.27GHz, 2260.99 MHz
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,POPCNT,NXE,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 0, core 2, package 0
cpu2 at mainbus0: apid 1 (application processor)
cpu2: Intel(R) Xeon(R) CPU L3406 @ 2.27GHz, 2260.99 MHz
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,POPCNT,NXE,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
cpu2: 256KB 64b/line 8-way L2 cache
cpu2: smt 1, core 0, package 0
cpu3 at mainbus0: apid 5 (application processor)
cpu3: Intel(R) Xeon(R) CPU L3406 @ 2.27GHz, 2260.99 MHz
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,POPCNT,NXE,LONG,LAHF,PERF,ITSC,SENSOR,ARAT
cpu3: 256KB 64b/line 8-way L2 cache
cpu3: smt 1, core 2, package 0
ioapic0 at mainbus0: apid 0 pa 0xfec0, version 20, 24 pins
acpihpet0 at acpi0: 14318179 Hz
acpimcfg0 at acpi0 addr 0xe000, bus 0-255
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus -1 (LYD0)
acpiprt2 at acpi0: bus -1 (LYD2)
acpiprt3 at acpi0: bus 1 (HVD0)
acpiprt4 at acpi0: bus -1 (HVD2)
acpiprt5 at acpi0: bus 5 (PEX0)
acpiprt6 at acpi0: bus -1 (PEX4)
acpiprt7 at acpi0: bus -1 (PEX5)
acpiprt8 at acpi0: bus 6 (COMP)
acpicpu0 at acpi0: C3(350@96 mwait.1@0x20), C1(1000@1 mwait.1)
acpicpu1 at acpi0: C3(350@96 mwait.1@0x20), C1(1000@1 mwait.1)
acpicpu2 at acpi0: C3(350@96 mwait.1@0x20), C1(1000@1 mwait.1)
acpicpu3 at acpi0: C3(350@96 mwait.1@0x20), C1(1000@1 mwait.1)
"PNP0C33" at acpi0 not configured
"ACPI000D" at acpi0 not configured
"PNP0501" at acpi0 not configured
"PNP0501" at acpi0 not configured
"IPI0001" at acpi0 not configured
"PNP0C14" at acpi0 not configured
ipmi at mainbus0 not configured
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel Core Host" rev 0x18
ppb0 at pci0 dev 1