Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)
On Tue, May 02, 2017 at 05:03:20PM +, Stuart Henderson wrote: > Probably the best thing to do at this point is to write a mail to bugs@: > > 1. describe what the machine is doing in detail. carp? ipsec? pfsync? > what sort of relays? include config (sanitized if necessary, but do that > consistently). > > 2. copy in the panic message and stack trace as text (re-type it, > don't attach a picture or send a link to a picture). > > 3. make it a self-contained report with description etc all in the one > message, don't rely on people having message history. > > 4. include dmesg. Hi Stuart, Thx for your answer ! I didn't have the time to work on this since early may. But from time to time, I check the commit on pf.c and I saw this one which seemed to perfectly match my bug : http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/net/pf.c?rev=1.1035&content-type=text/x-cvsweb-markup I tried the diff, and it seems to be OK ! I can't trigger the bug right now (it was 100% before). So, thx you again, and special thx to bluhm@ who made the patch ! -- Mathieu
Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)
On 2017-05-02, Mathieu BLANC wrote: > On Wed, Mar 29, 2017 at 02:06:23PM +0200, Mathieu BLANC wrote: >> It also kernel panics with just this pf rules : >> # cat pf_minimal.conf >> set limit { states 10 } >> set skip on lo >> anchor "relayd/*" >> pass >> > > I upgraded the system to 6.1 release last week, the kernel panic is still here > (with the same logs). Probably the best thing to do at this point is to write a mail to bugs@: 1. describe what the machine is doing in detail. carp? ipsec? pfsync? what sort of relays? include config (sanitized if necessary, but do that consistently). 2. copy in the panic message and stack trace as text (re-type it, don't attach a picture or send a link to a picture). 3. make it a self-contained report with description etc all in the one message, don't rely on people having message history. 4. include dmesg.
Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)
On Tue, May 02, 2017 at 03:44:43PM +0200, Andre Ruppert wrote: > Hi, > > Im running 6.0 amd64 on a pair of R210 with relayd, but these are R210 (II). > > No kernel panics at all, and these systems are working in a live > environment... > > Regards > Andre Hi, Yes, i have also several OpenBSD on R210 + 6.0 (or 6.1) + relayd and it works like a charm. The only problem appeared when an admin did a REJECT (iptables) on one on the host checked by relayd with check tcp (i tried to put all the details i could in the previous mails). The next step is to try with current (until now i've waited for the 6.1 release which was very close to be released). -- Mathieu
Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)
Hi, Im running 6.0 amd64 on a pair of R210 with relayd, but these are R210 (II). No kernel panics at all, and these systems are working in a live environment... Regards Andre Am 02.05.17 um 15:03 schrieb Mathieu BLANC: On Wed, Mar 29, 2017 at 02:06:23PM +0200, Mathieu BLANC wrote: It also kernel panics with just this pf rules : # cat pf_minimal.conf set limit { states 10 } set skip on lo anchor "relayd/*" pass I upgraded the system to 6.1 release last week, the kernel panic is still here (with the same logs). smime.p7s Description: S/MIME Cryptographic Signature
Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)
On Wed, Mar 29, 2017 at 02:06:23PM +0200, Mathieu BLANC wrote: > It also kernel panics with just this pf rules : > # cat pf_minimal.conf > set limit { states 10 } > set skip on lo > anchor "relayd/*" > pass > I upgraded the system to 6.1 release last week, the kernel panic is still here (with the same logs). -- Mathieu
Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)
On Wed, Mar 29, 2017 at 10:40:08AM +0200, Mathieu BLANC wrote: > On Tue, Mar 28, 2017 at 05:58:02PM +0200, Hiltjo Posthuma wrote: > > On Tue, Mar 28, 2017 at 02:39:44PM +0200, Mathieu BLANC wrote: > > > On Tue, Mar 28, 2017 at 02:22:28PM +0200, Mathieu BLANC wrote: > > > > I can reproduce the bug (on the slave firewall) as many times as I want. > > > > > > > > > > I've just read https://www.openbsd.org/ddb.html and saw that you need a > > > trace > > > for all cpu. > > > > > > http://www.hostingpics.net/viewer.php?id=238876panic9.jpg > > > http://www.hostingpics.net/viewer.php?id=275943panic10.jpg > > > http://www.hostingpics.net/viewer.php?id=375143panic11.jpg > > > http://www.hostingpics.net/viewer.php?id=220012panic12.jpg > > > > > > (it's a different crash from the last screenshots i've made, if it's not > > > good i > > > can provide a full new set of pics) > > > > > > -- > > > Mathieu > > > > > > > Hey, > > > > Can you also provide your pf.conf ? > > > > Can you test if it also happens on -current? > > > > -- > > Kind regards, > > Hiltjo > > Hello, > > Unfortunately, i can't provide pf.conf as is (too many references to > customers, > ips, etc...). But i think i can work on a minimal file which triggers the bug. > I'll see that. > It also kernel panics with just this pf rules : # cat pf_minimal.conf set limit { states 10 } set skip on lo anchor "relayd/*" pass -- Mathieu
Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)
On Tue, Mar 28, 2017 at 05:58:02PM +0200, Hiltjo Posthuma wrote: > On Tue, Mar 28, 2017 at 02:39:44PM +0200, Mathieu BLANC wrote: > > On Tue, Mar 28, 2017 at 02:22:28PM +0200, Mathieu BLANC wrote: > > > I can reproduce the bug (on the slave firewall) as many times as I want. > > > > > > > I've just read https://www.openbsd.org/ddb.html and saw that you need a > > trace > > for all cpu. > > > > http://www.hostingpics.net/viewer.php?id=238876panic9.jpg > > http://www.hostingpics.net/viewer.php?id=275943panic10.jpg > > http://www.hostingpics.net/viewer.php?id=375143panic11.jpg > > http://www.hostingpics.net/viewer.php?id=220012panic12.jpg > > > > (it's a different crash from the last screenshots i've made, if it's not > > good i > > can provide a full new set of pics) > > > > -- > > Mathieu > > > > Hey, > > Can you also provide your pf.conf ? > > Can you test if it also happens on -current? > > -- > Kind regards, > Hiltjo Hello, Unfortunately, i can't provide pf.conf as is (too many references to customers, ips, etc...). But i think i can work on a minimal file which triggers the bug. I'll see that. Fur -current, my idea was to try if i didn't get any response on the list for -stable. But for now, we don't have any -current in production so i'm not sure :) I know there are plenty of people who have -current, i'm pretty confident with it, but it's more a question of procedure, for example how to follow -current efficiently over time. With -release and -stable it's pretty simple, upgrade every 6 months + a few patch and it's ok :) -- Mathieu
Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)
On Tue, Mar 28, 2017 at 02:39:44PM +0200, Mathieu BLANC wrote: > On Tue, Mar 28, 2017 at 02:22:28PM +0200, Mathieu BLANC wrote: > > I can reproduce the bug (on the slave firewall) as many times as I want. > > > > I've just read https://www.openbsd.org/ddb.html and saw that you need a trace > for all cpu. > > http://www.hostingpics.net/viewer.php?id=238876panic9.jpg > http://www.hostingpics.net/viewer.php?id=275943panic10.jpg > http://www.hostingpics.net/viewer.php?id=375143panic11.jpg > http://www.hostingpics.net/viewer.php?id=220012panic12.jpg > > (it's a different crash from the last screenshots i've made, if it's not good > i > can provide a full new set of pics) > > -- > Mathieu > Hey, Can you also provide your pf.conf ? Can you test if it also happens on -current? -- Kind regards, Hiltjo
Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)
On Tue, Mar 28, 2017 at 02:22:28PM +0200, Mathieu BLANC wrote: > I can reproduce the bug (on the slave firewall) as many times as I want. > I've just read https://www.openbsd.org/ddb.html and saw that you need a trace for all cpu. http://www.hostingpics.net/viewer.php?id=238876panic9.jpg http://www.hostingpics.net/viewer.php?id=275943panic10.jpg http://www.hostingpics.net/viewer.php?id=375143panic11.jpg http://www.hostingpics.net/viewer.php?id=220012panic12.jpg (it's a different crash from the last screenshots i've made, if it's not good i can provide a full new set of pics) -- Mathieu
Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)
On Tue, Mar 28, 2017 at 12:05:56PM +0300, Mihai Popescu wrote: > Isn't there a CAPSLOOK written message at panic time on the screen? > If not, look here: > http://www.openbsd.org/report.html > I can reproduce the bug (on the slave firewall) as many times as I want. I made some screenshots. Sorry, I didn't manage to provide text logs (i'm in DRAC). In http://man.openbsd.org/OpenBSD-6.0/crash i saw that i might be able to have the ddb logs in dmesg after a warm reboot but it didn't work for me. I don't know if you prefer http links or attached files. I have uploaded the jpg here : http://www.hostingpics.net/viewer.php?id=835545panic1.jpg http://www.hostingpics.net/viewer.php?id=149061panic2.jpg http://www.hostingpics.net/viewer.php?id=328015panic3.jpg http://www.hostingpics.net/viewer.php?id=730910panic4.jpg http://www.hostingpics.net/viewer.php?id=607164panic5.jpg http://www.hostingpics.net/viewer.php?id=272177panic6.jpg http://www.hostingpics.net/viewer.php?id=689399panic7.jpg http://www.hostingpics.net/viewer.php?id=499214panic8.jpg I can attach the files if you want. Here is my relayd conf : _front_vip="A.B.C.D" _front1="E.F.G.H" _front2="I.J.K.L" table { $_front1 $_front2 } redirect _http_vip { listen on $_front_vip port http forward to mode source-hash check tcp pftag RELAYD_VIP_NAT } On front1, if i made this command, my openbsd system crash. With DROP instead of REJECT it's OK (tested 5-6 times) : iptables -I INPUT -j REJECT -p tcp --dport 80 -s -- Mathieu
Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)
Isn't there a CAPSLOOK written message at panic time on the screen? If not, look here: http://www.openbsd.org/report.html
Re: Kernel panic on Dell R210 with OpenBSD 6.0 (relayd related ?)
On Mon, Mar 27, 2017 at 02:42:23PM +0200, Mathieu BLANC wrote: > Hello all, > > I have a pair of firewalls running 6.0 (patched with openup in october, no > patch > applied since then). > > Since the upgrade, this pair has some problem with kernel > panics (4 times since the upgrade in october). > > The last one was this morning. The two firewall crashed at the same time with > these logs : > > /bsd: panic: kernel diagnostic assertion "(sk->inp == NULL) || > (sk->inp->inp_pf_sk == NULL)" failed: file "../../../../net/pf.c", line 6891 > /bsd: Starting stack trace... > /bsd: panic() at panic+0x10b > /bsd: __assert() at __assert+0x25 > /bsd: pf_state_key_unref() at pf_state_key_unref+0xc6 > /bsd: pf_pkt_unlink_state_key() at pf_pkt_unlink_state_key+0x15 > /bsd: m_free() at m_free+0xa0 > /bsd: sbdroprecord() at sbdroprecord+0x61 > /bsd: soreceive() at soreceive+0xb4f > /bsd: recvit() at recvit+0x139 > /bsd: sys_recvfrom() at sys_recvfrom+0x9d > /bsd: syscall() at syscall+0x27b > /bsd: --- syscall (number 29) --- > /bsd: end of kernel > /bsd: end trace frame: 0x7f7dc870, count: 247 > /bsd: 0x18ccb3b21ada: > /bsd: End of stack trace. > Hello, This morning, another crash. I found in daemon.log something very interesting. At the same second the firewall crashed, i had the same resource checked by relayd which was gone down : Yesterday : Mar 27 11:51:48 fw5 relayd[94179]: host W.X.Y.Z, check tcp (16010ms,tcp connect timeout), state up -> down, availability 99.94% Mar 27 11:51:48 fw5 relayd[89662]: table _http_vip: 0 added, 1 deleted, 0 changed, 0 killed This morning : Mar 28 09:08:54 fw5 relayd[46733]: host W.X.Y.Z, check tcp (16010ms,tcp connect timeout), state up -> down, availability 99.95% Mar 28 09:08:54 fw5 relayd[29633]: table _http_vip: 0 added, 1 deleted, 0 changed, 0 killed I called the admin in charge of host W.X.Y.Z. What he did on W.X.Y.Z was an iptables REJECT command on the host (to remove it from relayd). We have tested with DROP and it seems to not trigger the bug (i'll try to make more tests).