Re: Crash in softnet on SGI

Jesse Darrone Sun, 17 Jul 2016 19:24:06 -0700

It crashed again, unfortunately. :(

ddb> trace
pool_put+0xa8 (fe8328730110b586,c000000002f1b000,c0000000030fa060,ffffffff888d1
fc8)  ra 0xffffffff888d1560 sp 0xffffffff91f43b48, sz 160
m_extfree+0x110 (fe8328730110b586,c000000002f1b000,c0000000030fa060,ffffffff888
d1fc8)  ra 0xffffffff888d1c10 sp 0xffffffff91f43be8, sz 32
m_free+0x138 (fe8328730110b586,c000000002f1b000,c0000000030fa060,ffffffff888d1f
c8)  ra 0xffffffff888d1d20 sp 0xffffffff91f43c08, sz 48
m_freem+0x28 (fe8328730110b586,c000000002f1b000,c0000000030fa060,ffffffff888d1f
c8)  ra 0xffffffff889619f0 sp 0xffffffff91f43c38, sz 32
in_arpinput+0x88 (fe8328730110b586,c000000002f1b000,c0000000030fa060,ffffffff88
8d1fc8)  ra 0xffffffff88961d8c sp 0xffffffff91f43c58, sz 144
arpintr+0x64 (fe8328730110b586,c000000002f1b000,c0000000030fa060,ffffffff888d1f
c8)  ra 0xffffffff8891d9e8 sp 0xffffffff91f43ce8, sz 64
if_netisr+0x140 (fe8328730110b586,c000000002f1b000,c0000000030fa060,ffffffff888
d1fc8)  ra 0xffffffff888a35d8 sp 0xffffffff91f43d28, sz 64
taskq_thread+0xd0 (fe8328730110b586,c000000002f1b000,c0000000030fa060,ffffffff8
88d1fc8)  ra 0xffffffff88a795ac sp 0xffffffff91f43d68, sz 80
proc_trampoline+0x1c (fe8328730110b586,c000000002f1b000,c0000000030fa060,ffffff
ff888d1fc8)  ra 0x0 sp 0xffffffff91f43db8, sz 0
User-level: pid 30639
ddb> ps
   TID   PPID   PGRP    UID  S       FLAGS  WAIT          COMMAND
 11563      1  11563      0  3    0x100083  ttyin         getty
 28919      1  28919      0  3    0x100098  poll          cron
 97586  60856  60856    619  3        0x82  kqread        bandb
 35239  60856  60856    619  3        0x82  kqread        ssld
 61725  60856  60856    619  3        0x82  kqread        resolver
 60856      1  60856    619  3        0x90  kqread        ircd
  2439      1   2439      0  3        0x80  select        sshd
 14087  40477  35467     83  3    0x100090  poll          ntpd
 40477  35467  35467     83  3    0x100090  poll          ntpd
 35467      1  35467      0  3        0x80  poll          ntpd
 13056  50432  50432     74  3    0x100090  bpf           pflogd
 50432      1  50432      0  3        0x80  netio         pflogd
 32095  51132  51132     73  2    0x100090                syslogd
 51132      1  51132      0  3    0x100080  netio         syslogd
 46894      0      0      0  3     0x14200  pgzero        zerothread
 37484      0      0      0  3     0x14200  aiodoned      aiodoned
 92831      0      0      0  3     0x14200  syncer        update
 63336      0      0      0  3     0x14200  cleaner       cleaner
 74532      0      0      0  3     0x14200  reaper        reaper
 94086      0      0      0  3     0x14200  pgdaemon      pagedaemon
 94815      0      0      0  3     0x14200  bored         crynlk
  3084      0      0      0  3     0x14200  bored         crypto
 81861      0      0      0  3     0x14200  pftm          pfpurge
*30639      0      0      0  7     0x14210                softnet
 56005      0      0      0  3     0x14200  bored         systqmp
 45756      0      0      0  3     0x14200  bored         systq
 92539      0      0      0  3  0x40014200                idle0
 96023      0      0      0  3     0x14200  kmalloc       kmthread
     1      0      1      0  3        0x82  wait          init
     0     -1      0      0  3     0x10200  scheduler     swapper
ddb> show panic
the kernel did not panic

ddb> show registers
at                0xffffffff88b60000    sysent+0x1320
v0                0xfe8dac10ee1eb5a0
v1                0xfe8dac10ee1eb5a0
a0                0xfe8328730110b586
a1                0xc000000002f1b000
a2                0xc0000000030fa060
a3                0xffffffff888d1fc8    m_extfree_pool
a4                0xffffffff91f43c26    end+0x92e3916
a5                              0x14
a6                              0x18
a7                               0x8
t0                               0x4
t1                0xffffffff88c0ded0    kernel_pmap_store
t2                                 0
t3                0xffffffff91f40000    end+0x92dfcf0
s0                0xc0000000030fa060
s1                0xc000000002f1b000
s2                0xffffffff88b87c50    mclpools
s3                               0x1
s4                0xc0000000000de078
s5                                 0
s6                0xc000000002f1b018
s7                0xffffffff91f43c78    end+0x92e3968
t8                        0x59605df7
t9                0xffffffff88a94f38    int2_splx
k0                0xffffffff91f43c20    end+0x92e3910
k1                0xc000000002f448c0
gp                0xffffffff88b63fd0    _gp
sp                0xffffffff91f43b48    end+0x92e3838
s8                                 0
ra                0xffffffff888b2a2c    pool_put+0x284
sr                        0x1000cfa3
lo                0x231285d0dc100a00
hi                                 0
bad               0xfe8dac10ee1eb5a8
cs                              0x10
pc                0xffffffff888b2850    pool_put+0xa8
pool_put+0xa8:  ld      v0,8(v1)

ddb> continue
panic: trap
Stopped at      Debugger+0x4:   jr      ra
Debugger+0x8:    nop
   TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
*30639  30639      0     0x14000      0x210    0  softnet
Debugger+0x4 (f2b05ff317e288e3,900000001fbd9880,900000001fbd9830,ffffffff91f438
70)  ra 0xffffffff888b5cb0 sp 0xffffffff91f438a8, sz 0
panic+0x100 (f2b05ff317e288e3,ffffffff91f43b30,0,ffffffff88c0e700)  ra 0xffffff
ff88a7689c sp 0xffffffff91f438a8, sz 112
itsa+0xf4 (f2b05ff317e288e3,ffffffff91f43b30,0,ffffffff88c0e700)  ra 0xffffffff
88a7a0ac sp 0xffffffff91f43918, sz 176
k_general+0x114 (ffffffff91f439f0,ffffffff91f43b30,0,ffffffff888b2850)  ra 0x0 s
p 0xffffffff91f439c8, sz 0
(KERNEL TRAP)
pool_put+0xa8 (ffffffff91f439f0,ffffffff91f43b30,0,ffffffff888b2850)  ra 0xffff
ffff888d1560 sp 0xffffffff91f43b48, sz 160
m_extfree+0x110 (ffffffff91f439f0,ffffffff91f43b30,0,ffffffff888b2850)  ra 0xff
ffffff888d1c10 sp 0xffffffff91f43be8, sz 32
m_free+0x138 (ffffffff91f439f0,ffffffff91f43b30,0,ffffffff888b2850)  ra 0xfffff
fff888d1d20 sp 0xffffffff91f43c08, sz 48
m_freem+0x28 (ffffffff91f439f0,ffffffff91f43b30,0,ffffffff888b2850)  ra 0xfffff
fff889619f0 sp 0xffffffff91f43c38, sz 32
in_arpinput+0x88 (ffffffff91f439f0,ffffffff91f43b30,0,ffffffff888b2850)  ra 0xf
fffffff88961d8c sp 0xffffffff91f43c58, sz 144
arpintr+0x64 (ffffffff91f439f0,ffffffff91f43b30,0,ffffffff888b2850)  ra 0xfffff
fff8891d9e8 sp 0xffffffff91f43ce8, sz 64
if_netisr+0x140 (ffffffff91f439f0,ffffffff91f43b30,0,ffffffff888b2850)  ra 0xff
ffffff888a35d8 sp 0xffffffff91f43d28, sz 64
taskq_thread+0xd0 (ffffffff91f439f0,ffffffff91f43b30,0,ffffffff888b2850)  ra 0x
ffffffff88a795ac sp 0xffffffff91f43d68, sz 80
proc_trampoline+0x1c (ffffffff91f439f0,ffffffff91f43b30,0,ffffffff888b2850)  ra
 0x0 sp 0xffffffff91f43db8, sz 0
User-level: pid 30639
http://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.

On Sat, Jul 16, 2016 at 1:06 PM, Jesse Darrone <[email protected]> wrote:
> Hey Miod,
>
> It crashed again last night so I rebuilt the kernel with your patch.
> Both hpc0 and hpc1 now report 25 mhz.  I've attached the full dmesg
> below for reference.
>
> Thanks again, Miod!
> -Jesse
>
>
> [ using 388904 bytes of bsd ELF symbol table ]
> Copyright (c) 1982, 1986, 1989, 1991, 1993
>         The Regents of the University of California.  All rights reserved.
> Copyright (c) 1995-2016 OpenBSD. All rights reserved.  http://www.OpenBSD.org
>
> OpenBSD 6.0 (GENERIC-IP22) #0: Sat Jul 16 12:21:12 EDT 2016
>     [email protected]:/usr/src/sys/arch/sgi/compile/GENERIC-IP22
> real mem = 167772160 (160MB)
> rsvd mem = 802816 (1MB)
> avail mem = 160169984 (152MB)
> mainbus0 at root: Challenge S
> cpu0 at mainbus0: MIPS R5000 CPU rev 1.0 150 MHz, R5000 based FPC rev 1.0
> cpu0: cache L1-I 32KB D 32KB 2 way, L2 512KB direct
> int0 at mainbus0 addr 0x1fbd9880
> imc0 at mainbus0: revision 3
> gio0 at imc0
> hpc0 at gio0 addr 0x1fb80000: SGI HPC3 (onboard, 25MHz)
> zs0 at hpc0 offset 0x00059830 irq 29: 85230
> zstty0 at zs0 channel 1: console
> zstty1 at zs0 channel 0
> sq0 at hpc0 offset 0x00054000 irq 3: Seeq 80c03, address 08:00:69:0a:34:09
> wdsc0 at hpc0 offset 0x00044000 irq 1: WD33C93B, 20.0 MHz, burst DMA
> wdsc0: microcode revision 0x0d, fast SCSI
> scsibus0 at wdsc0: 8 targets, initiator 0
> sd0 at scsibus0 targ 1 lun 0: <SEAGATE, ST39103LCSUN9.0G, 034A> SCSI2
> 0/direct fixed serial.SEAGATE_ST39103LCSUN9.0GLS4557570000101519ZQ
> sd0: 8637MB, 512 bytes/sector, 17689267 sectors
> pione at hpc0 offset 0x00059800 irq 5 not configured
> panel0 at hpc0 offset 0x00059850 irq 9: power button
> dsclock0 at hpc0 offset 0x00060000
> hpc1 at gio0 addr 0x1fb00000: SGI HPC3 (IO+ mezzanine, 25MHz)
> hpc1: using EXP1's DMA channel
> sq1 at hpc1 offset 0x00054000 irq 0: Seeq 80c03, address 08:00:69:02:64:d1
> clock0 at mainbus0: int 5
> vscsi0 at root
> scsibus1 at vscsi0: 256 targets
> softraid0 at root
> scsibus2 at softraid0: 256 targets
> boot device: sd0
> root on sd0a (ffbd62fcf39fc195.a) swap on sd0b dump on sd0b
>
> On Fri, Jul 15, 2016 at 1:00 PM, Miod Vallat <[email protected]> wrote:
>>> Theo suggested that a fix for ARP committed the other day might have
>>> some impact on this so I've been testing the latest snapshot.  So far
>>> I've been up for 19:47, so it's looking good so far.  Interestingly
>>> enough I was getting "sq1: receive FIFO overflow" periodically with
>>> the previous snapshot, so far on this boot that has not recurred.
>>>
>>> If the box cores again I'll test your diff and see if that improves my
>>> situation.
>>
>> If you still get `receive FIFO overflow' messages, even if the kernel
>> does not panic, please test this diff and tell me what speed gets
>> reported for hpc0 and hpc1 attachments.
>>
>> Thanks,
>> Miod

Re: Crash in softnet on SGI

Reply via email to