Today we completed two weeks with no kernel panics since I applied that patch.
I think it is safe to consider this issue fixed.
Many thanks to All.

On Fri, Nov 25, 2022 at 9:41 AM Josmar Pierri <jcpie...@gmail.com> wrote:
>
> I think that some feedback from me on this issue is appropriate now.
>
> After I applied this patch suggested by Hrvoje, our firewall is
> enduring full traffic for 2 days without any crashing.
>
> FWIW:
> - With 7.2 snapshot #849 it was crashing twice a day.
> - With 7.2 release it was crashing randomly within a week.
> - With 7.1 release it used to crash randomly twice a month.
>
> So far so good!
>
> On Tue, Nov 22, 2022 at 2:53 PM Hrvoje Popovski <hrv...@srce.hr> wrote:
> >
> > On 22.11.2022. 18:48, Josmar Pierri wrote:
> > > I upgraded to 7.2 snapshot #849 early this morning, but it crashed
> > > twice in a few hours.
> > > This time, however, the panic message is different:
> > >
> >
> > Could you compile kernel with this diff
> > https://www.mail-archive.com/tech@openbsd.org/msg72582.html
> >
> > at least for me, that diff makes my firewall stable..
> >
> >
> >
> >
> > > uvm_fault(0xffffffff8236dcb8, 0x17, 0, 2) -> e
> > > kernel: page fault trap, code=0
> > > Stopped at         pfsync_q_del+0x96:    movq      %rdx,0x8(%rax)
> > >     TID       PID      UID      PRFLAGS      PFLAGS   CPU   COMMAND
> > >  436110  83038      0       0x14000          0x200          3     softnet
> > >  395295  39926      0       0x14000          0x200          0     softnet
> > >  189958   2208      0       0x14000          0x200          2     softnet
> > > * 65839    5423      0       0x14000          0x200          1     systqmp
> > > pfsync_q_del(fffffd8401d63890) at pfsync_q_del+0x96
> > > pfsync_delete_state(fffffd8401d63890) at pfsync_delete_state+0x118
> > > pf_remove_state(fffffd8401d63890) at pfsync_remove_state+0x14b
> > > pf_purge_expired_states(4031,40) at pf_purge_expired_states+0x242
> > > pf_purge_states(0) at pf_purge_states+0x1c
> > > taskq_thread(ffffffff822a1a10) at taskq_thread+0x100
> > > end trace frame: 0x0, count: 9
> > >
> > > This is all I could manage to get since the crash happened when I was
> > > away (and that stupid Dell console timeout when idle, removing the USB
> > > keyboard)
> > >
> > > I observed a thing that may or may not be related to this issue: The
> > > "output fail" counter keeps steadily increasing both on aggregate and
> > > the two member interfaces:
> > >
> > > :~# netstat -i -I aggr0
> > > Name    Mtu   Network     Address              Ipkts Ifail    Opkts Ofail 
> > > Colls
> > > aggr0   9200  <Link>      fe:e1:ba:d0:91:13 224426940     0 200785282
> > >  357     0
> > >
> > > At first I thought it could be something related to the switches but I
> > > still haven't found anything wrong with them.
> > >
> > >
> > >
> > > On Mon, Nov 21, 2022 at 1:22 PM Hrvoje Popovski <hrv...@srce.hr> wrote:
> > >>
> > >> On 21.11.2022. 16:04, Josmar Pierri wrote:
> > >>> Hi,
> > >>>
> > >>> I managed to get screenshots of a random kernel panic that we are
> > >>> having on a server here.
> > >>> They were taken using a console management tool embedded into the
> > >>> server (Dell IDRAC) and are PNG images of the panic itself, trace of
> > >>> all cpus and ps.
> > >>> I'm not attaching them here right now because I don't know how the
> > >>> list would react to them.
> > >>>
> > >>> I attached the output of:
> > >>> 1 - sendbug -P
> > >>> 2 - dmesg right after reboot
> > >>> 3 - dmesg-boot
> > >>>
> > >>> This server has an aggr0 grouping bnxt0 and bnxt1, both at 10 Gbps.
> > >>> Its task is to load-balance RDP traffic (TCP 3389) among 2 large pools
> > >>> (more than 50 servers on each one) and 3 small ones using pf (tables)
> > >>> for that.
> > >>>
> > >>> These panics happen at random times without an apparent cause.
> > >>>
> > >>> The panic message reads:
> > >>>
> > >>> ddb{3}> show panic
> > >>> *cpu3: kernel diagnostic assertion "st->snapped == 0" failed: file
> > >>> "/usr/src/sys/net/if_pfsync.c", line 1591
> > >>>  cpu2: kernel diagnostic assertion "st->snapped == 0" failed: file
> > >>> "/usr/src/sys/net/if_pfsync.c", line 1591
> > >>>  cpu1: kernel diagnostic assertion "st->snapped == 0" failed: file
> > >>> "/usr/src/sys/net/if_pfsync.c", line 1591
> > >>> ddb{3}>
> > >>>
> > >>> Please advise how I should proceed to submit the screenshots.
> > >>
> > >> Hi,
> > >>
> > >> I have similar setup with aggr grouping ix0 and ix1 and pfsync. If you
> > >> have two firewalls, can you sysupgrade this one to latest snapshot ?
> > >>
> > >> I'm running snapshot after last hackathon with this diff
> > >> https://www.mail-archive.com/tech@openbsd.org/msg72582.html
> > >>
> > >> and for now firewall seems to work just fine.
> > >>
> > >>
> > >>
> > >
> >

Reply via email to