On Sat, Nov 5, 2011 at 8:29 AM, Norman Golisz <li...@zcat.de> wrote:

> Hi Jeffrey,
>
> On Sat Nov  5 2011 07:49, Forman, Jeffrey wrote:
> > I am in the process of building a new OpenBSD i386 5.0-release Intel Atom
> > D510-based fw/router. I was editing some config files on the box in emacs
> > when the process threw a core dump. Thinking perhaps it was just emacs, I
> > went to do something else, 'sudo pkg_add -v mutt', and received a
> coredump
> > again.
> >
> > I went looking for stress testing apps, thinking I might have a bad CPU
> or
> > RAM module and came upon 'stress'. After several iterations of stress
> > seeming to cause kernel panics, and then upgrading to a 5.0 snapshot from
> > November 13, 2011[1], I was still seeing panics. I provide the below
> detail
> > to help those more knowledgeable in debugging.
> >
> > Thanks in advance,
> > Jeff
> >
> > [1] http://openbsd.mirrors.tds.net/pub/OpenBSD/snapshots/i386/
> >
> > Full stress command line:
> > # stress --cpu 8 --io 4 --vm 2 -m 5 --vm-bytes 128M --timeout 30s -v
>
> I did this on my machine as well, it's a i386 single core processor
> running a single processor kernel. I ran this stress test several times,
> no panic. Your panic trace also indicates complications with uvm's page
> fault handler and an MP locking mechanism involved.
>
> Therefore, could you try bsd.sp and do the stress testing again? Is it
> running well now?
>
> Norman.
>
> OpenBSD 5.0-current (GENERIC) #85: Wed Nov  2 22:27:31 MDT 2011
>    dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC
> cpu0: Intel(R) Pentium(R) M processor 1.70GHz ("GenuineIntel" 686-class)
> 1.70 GHz
> cpu0:
> FPU,V86,DE,PSE,TSC,MSR,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,TM,SBF,EST,TM2
> real mem  = 2146299904 (2046MB)
> avail mem = 2101112832 (2003MB)
> mainbus0 at root
> bios0 at mainbus0: AT/286+ BIOS, date 06/18/07, BIOS32 rev. 0 @ 0xfd750,
> SMBIOS rev. 2.33 @ 0xe0010 (61 entries)
> bios0: vendor IBM version "1RETDRWW (3.23 )" date 06/18/2007
> bios0: IBM 2374VDL
> apm0 at bios0: Power Management spec V1.2
> acpi at bios0 function 0x0 not configured
> pcibios0 at bios0: rev 2.1 @ 0xfd6e0/0x920
> pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xfdea0/272 (15 entries)
> pcibios0: PCI Interrupt Router at 000:31:0 ("Intel 82371FB ISA" rev 0x00)
> pcibios0: PCI bus #6 is the last bus
> bios0: ROM list: 0xc0000/0x10000 0xdc000/0x4000! 0xe0000/0x10000!
> cpu0 at mainbus0: (uniprocessor)
> cpu0: Enhanced SpeedStep 1695 MHz: speeds: 1700, 1400, 1200, 1000, 800,
> 600 MHz
> pci0 at mainbus0 bus 0: configuration mode 1 (bios)
> io address conflict 0x5800/0x8
> io address conflict 0x5808/0x4
> io address conflict 0x5810/0x8
> io address conflict 0x580c/0x4
> pchb0 at pci0 dev 0 function 0 "Intel 82855PM Host" rev 0x03
> intelagp0 at pchb0
> agp0 at intelagp0: aperture at 0xd0000000, size 0x10000000
> ppb0 at pci0 dev 1 function 0 "Intel 82855PM AGP" rev 0x03
> pci1 at ppb0 bus 1
> vga1 at pci1 dev 0 function 0 "ATI Radeon Mobility M7" rev 0x00
> wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
> wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
> radeondrm0 at vga1: irq 11
> drm0 at radeondrm0
> uhci0 at pci0 dev 29 function 0 "Intel 82801DB USB" rev 0x01: irq 11
> uhci1 at pci0 dev 29 function 1 "Intel 82801DB USB" rev 0x01: irq 11
> uhci2 at pci0 dev 29 function 2 "Intel 82801DB USB" rev 0x01: irq 11
> ehci0 at pci0 dev 29 function 7 "Intel 82801DB USB" rev 0x01: irq 11
> usb0 at ehci0: USB revision 2.0
> uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
> ppb1 at pci0 dev 30 function 0 "Intel 82801BAM Hub-to-PCI" rev 0x81
> pci2 at ppb1 bus 2
> mem address conflict 0xb0000000/0x1000
> mem address conflict 0xb1000000/0x1000
> cbb0 at pci2 dev 0 function 0 "TI PCI4520 CardBus" rev 0x01: irq 11
> cbb1 at pci2 dev 0 function 1 "TI PCI4520 CardBus" rev 0x01: irq 11
> em0 at pci2 dev 1 function 0 "Intel PRO/1000MT (82540EP)" rev 0x03: irq
> 11, address 00:11:25:32:45:72
> iwi0 at pci2 dev 2 function 0 "Intel PRO/Wireless 2200BG" rev 0x05: irq
> 11, address 00:0e:35:bc:03:c1
> cardslot0 at cbb0 slot 0 flags 0
> cardbus0 at cardslot0: bus 3 device 0 cacheline 0x8, lattimer 0xb0
> pcmcia0 at cardslot0
> cardslot1 at cbb1 slot 1 flags 0
> cardbus1 at cardslot1: bus 6 device 0 cacheline 0x8, lattimer 0xb0
> pcmcia1 at cardslot1
> ichpcib0 at pci0 dev 31 function 0 "Intel 82801DBM LPC" rev 0x01: 24-bit
> timer at 3579545Hz
> pciide0 at pci0 dev 31 function 1 "Intel 82801DBM IDE" rev 0x01: DMA,
> channel 0 configured to compatibility, channel 1 configured to compatibility
> wd0 at pciide0 channel 0 drive 0: <SAMSUNG HM160HC>
> wd0: 16-sector PIO, LBA48, 152627MB, 312581808 sectors
> wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 5
> atapiscsi0 at pciide0 channel 1 drive 0
> scsibus0 at atapiscsi0: 2 targets
> cd0 at scsibus0 targ 0 lun 0: <MATSHITA, UJDA745 DVD/CDRW, 1.03> ATAPI
> 5/cdrom removable
> cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2
> ichiic0 at pci0 dev 31 function 3 "Intel 82801DB SMBus" rev 0x01: irq 11
> iic0 at ichiic0
> spdmem0 at iic0 addr 0x50: 1GB DDR SDRAM non-parity PC2700CL2.5
> spdmem1 at iic0 addr 0x51: 1GB DDR SDRAM non-parity PC2700CL2.5
> auich0 at pci0 dev 31 function 5 "Intel 82801DB AC97" rev 0x01: irq 11,
> ICH4 AC97
> ac97: codec id 0x41445374 (Analog Devices AD1981B)
> ac97: codec features headphone, 20 bit DAC, No 3D Stereo
> audio0 at auich0
> "Intel 82801DB Modem" rev 0x01 at pci0 dev 31 function 6 not configured
> usb1 at uhci0: USB revision 1.0
> uhub1 at usb1 "Intel UHCI root hub" rev 1.00/1.00 addr 1
> usb2 at uhci1: USB revision 1.0
> uhub2 at usb2 "Intel UHCI root hub" rev 1.00/1.00 addr 1
> usb3 at uhci2: USB revision 1.0
> uhub3 at usb3 "Intel UHCI root hub" rev 1.00/1.00 addr 1
> isa0 at ichpcib0
> isadma0 at isa0
> pckbc0 at isa0 port 0x60/5
> pckbd0 at pckbc0 (kbd slot)
> pckbc0: using irq 1 for kbd slot
> wskbd0 at pckbd0: console keyboard, using wsdisplay0
> pms0 at pckbc0 (aux slot)
> pckbc0: using irq 12 for aux slot
> wsmouse0 at pms0 mux 0
> pcppi0 at isa0 port 0x61
> spkr0 at pcppi0
> aps0 at isa0 port 0x1600/31
> npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16
> mtrr: Pentium Pro MTRR support
> uhidev0 at uhub2 port 1 configuration 1 interface 0 "Logitech USB-PS/2
> Optical Mouse" rev 2.00/18.00 addr 2
> uhidev0: iclass 3/1
> ums0 at uhidev0: 6 buttons, Z dir
> wsmouse1 at ums0 mux 0
> umass0 at uhub2 port 2 configuration 1 interface 0 "TOSHIBA TransMemory"
> rev 2.00/1.00 addr 3
> umass0: using SCSI over Bulk-Only
> scsibus1 at umass0: 2 targets, initiator 0
> sd0 at scsibus1 targ 1 lun 0: <TOSHIBA, TransMemory, 1.00> SCSI2 0/direct
> removable serial.09306544C940942403F1
> sd0: 7643MB, 512 bytes/sector, 15654848 sectors
> ugen0 at uhub3 port 2 "STMicroelectronics Biometric Coprocessor" rev
> 1.00/0.01 addr 2
> vscsi0 at root
> scsibus2 at vscsi0: 256 targets
> softraid0 at root
> scsibus3 at softraid0: 256 targets
> softraid0: sd1 was not shutdown properly
> sd1 at scsibus3 targ 1 lun 0: <OPENBSD, SR CRYPTO, 005> SCSI2 0/direct
> fixed
> sd1: 61446MB, 512 bytes/sector, 125843232 sectors
> root device (default wd0a): sd1a
> swap device (default sd1b): wd0b
> root on sd1a swap on wd0b dump on wd0b
>
>
Norman had a good thought, one which I had not thought of since OpenBSD
pre-selected booting from the MP kernel during the install given the CPU I
am running. I booted up the box on the SP kernel, and was able to get
through a few 30 second stress runs, but then ran one set for 90 seconds
and received another panic:

pmap_page_remove: pg=0xd2342ce0: va=8f9d4000, pv_ptp=0xd2956da4
pmap_page_remove: PTP's phys addr: actual=0, recorded=71112000
panic: pmap_page_remove: mapped managed page has invalid pv_ptp field
Stopped at      Debugger+0x4:   popl    %ebp
RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC!
DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION!

ddb> ps
   PID   PPID   PGRP    UID  S       FLAGS  WAIT          COMMAND
 17010  22606  22606      0  2           0                stress
  9052  22606  22606      0  2           0                stress
 18279  22606  22606      0  2           0                stress
  5192  22606  22606      0  2           0                stress
 22557  22606  22606      0  2           0                stress
 31262  22606  22606      0  2           0                stress
 29473  22606  22606      0  2           0                stress
 12947  22606  22606      0  2           0                stress
 21017  22606  22606      0  2           0                stress
 24320  22606  22606      0  2           0                stress
 26998  22606  22606      0  2           0                stress
 23032  22606  22606      0  2           0                stress
 19909  22606  22606      0  2           0                stress
 23706  22606  22606      0  2           0                stress
* 4608  22606  22606      0  7           0                stress
  4070  22606  22606      0  2           0                stress
 10494  22606  22606      0  2           0                stress
 22606  14553  22606      0  3        0x80  wait          stress
 14553  25593  14553      0  3        0x88  pause         ksh
 25593  10983  25593   1000  3        0x88  pause         ksh
 10983  12086  12086   1000  3        0x80  select        sshd
 12086   4870  12086      0  3        0x80  poll          sshd
  8902      1   8902      0  3        0x80  ttyin         getty
 17445      1  17445      0  3        0x80  ttyin         getty
   294      1    294      0  3        0x80  ttyin         getty
 10058      1  10058      0  3        0x80  ttyin         getty
  8789      1   8789      0  3        0x80  ttyin         getty
 11270      1  11270      0  3        0x80  ttyin         getty
  3408      1   3408      0  3        0x80  select        cron
 17208      1  17208      0  3        0x80  select        inetd
  1018  11457  11457     95  3        0x80  kqread        smtpd
  1605  11457  11457     95  3        0x80  kqread        smtpd
 20814  11457  11457     95  3        0x80  kqread        smtpd
 32539  11457  11457     95  3        0x80  kqread        smtpd
  3932  11457  11457     95  3        0x80  kqread        smtpd
 30194  11457  11457     95  3        0x80  kqread        smtpd
  9971  11457  11457     95  3        0x80  kqread        smtpd
 17016  11457  11457     95  3        0x80  kqread        smtpd
 11457      1  11457      0  3        0x80  kqread        smtpd
  4870      1   4870      0  3        0x80  select        sshd
  8865  32197  22908     83  3        0x80  poll          ntpd
 32197  22908  22908     83  3        0x80  poll          ntpd
 22908      1  22908      0  3        0x80  poll          ntpd
  6070  12921  12921     74  3        0x80  bpf           pflogd
 12921      1  12921      0  3        0x80  netio         pflogd
   138  10686  10686     73  2        0x80                syslogd
 10686      1  10686      0  3        0x80  netio         syslogd
    15      0      0      0  3    0x100200  aiodoned      aiodoned
    14      0      0      0  3    0x100200  syncer        update
    13      0      0      0  3    0x100200  cleaner       cleaner
    12      0      0      0  3    0x100200  reaper        reaper
    11      0      0      0  3    0x100200  pgdaemon      pagedaemon
    10      0      0      0  3    0x100200  bored         crypto
     9      0      0      0  3    0x100200  pftm          pfpurge
     8      0      0      0  3    0x100200  usbtsk        usbtask
     7      0      0      0  3    0x100200  usbatsk       usbatsk
     6      0      0      0  3    0x100200  bored         intelrel
     5      0      0      0  3    0x100200  acpi0         acpi0
     4      0      0      0  3    0x100200  bored         syswq
     3      0      0      0  3  0x40100200                idle0
     2      0      0      0  3    0x100200  kmalloc       kmthread
     1      0      1      0  3        0x80  wait          init
     0     -1      0      0  3       0x200  scheduler     swapper

ddb> trace
Debugger(d08d3b58,de8a9e08,d08d7008,de8a9e08,d09b9b94) at Debugger+0x4
panic(d08d7008,0,71112000,d2956da4,d8af872c) at panic+0x5d
pmap_page_remove(d2342ce0,d8af872c,d0a23360,d0a48c20,d89d7be0) at
pmap_page_rem
ove+0x1d2
uvm_anfree(d8af8738,d8910028,0,d8915f40,d8915f40) at uvm_anfree+0xac
amap_wipeout(d89d7be0,0,10,0,d8c4c9dc) at amap_wipeout+0x94
uvm_map_unreference_amap(d8915f40,0,de8a9ecc,d03d8591,d8c4c9e0) at
uvm_map_unre
ference_amap+0x2f
uvm_unmap_detach(d8ac8e50,0,92823000,de8a9f0c,d8c292c0) at
uvm_unmap_detach+0x5
b
sys_munmap(d8c292c0,de8a9f64,de8a9f84,de8a9fa8,d8c292c0) at sys_munmap+0x15d
syscall() at syscall+0x2d8
--- syscall (number 134217728) ---
0x2:

Am I barking up the wrong tree trying to deduce if I really do have a
hardware problem? I am open to accepting diffs and compiling from source if
other developers think there might be a bug to fix here.

Cheers,
Jeff

Reply via email to