On Sat, Nov 5, 2011 at 8:29 AM, Norman Golisz <li...@zcat.de> wrote: > Hi Jeffrey, > > On Sat Nov 5 2011 07:49, Forman, Jeffrey wrote: > > I am in the process of building a new OpenBSD i386 5.0-release Intel Atom > > D510-based fw/router. I was editing some config files on the box in emacs > > when the process threw a core dump. Thinking perhaps it was just emacs, I > > went to do something else, 'sudo pkg_add -v mutt', and received a > coredump > > again. > > > > I went looking for stress testing apps, thinking I might have a bad CPU > or > > RAM module and came upon 'stress'. After several iterations of stress > > seeming to cause kernel panics, and then upgrading to a 5.0 snapshot from > > November 13, 2011[1], I was still seeing panics. I provide the below > detail > > to help those more knowledgeable in debugging. > > > > Thanks in advance, > > Jeff > > > > [1] http://openbsd.mirrors.tds.net/pub/OpenBSD/snapshots/i386/ > > > > Full stress command line: > > # stress --cpu 8 --io 4 --vm 2 -m 5 --vm-bytes 128M --timeout 30s -v > > I did this on my machine as well, it's a i386 single core processor > running a single processor kernel. I ran this stress test several times, > no panic. Your panic trace also indicates complications with uvm's page > fault handler and an MP locking mechanism involved. > > Therefore, could you try bsd.sp and do the stress testing again? Is it > running well now? > > Norman. > > OpenBSD 5.0-current (GENERIC) #85: Wed Nov 2 22:27:31 MDT 2011 > dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC > cpu0: Intel(R) Pentium(R) M processor 1.70GHz ("GenuineIntel" 686-class) > 1.70 GHz > cpu0: > FPU,V86,DE,PSE,TSC,MSR,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,TM,SBF,EST,TM2 > real mem = 2146299904 (2046MB) > avail mem = 2101112832 (2003MB) > mainbus0 at root > bios0 at mainbus0: AT/286+ BIOS, date 06/18/07, BIOS32 rev. 0 @ 0xfd750, > SMBIOS rev. 2.33 @ 0xe0010 (61 entries) > bios0: vendor IBM version "1RETDRWW (3.23 )" date 06/18/2007 > bios0: IBM 2374VDL > apm0 at bios0: Power Management spec V1.2 > acpi at bios0 function 0x0 not configured > pcibios0 at bios0: rev 2.1 @ 0xfd6e0/0x920 > pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xfdea0/272 (15 entries) > pcibios0: PCI Interrupt Router at 000:31:0 ("Intel 82371FB ISA" rev 0x00) > pcibios0: PCI bus #6 is the last bus > bios0: ROM list: 0xc0000/0x10000 0xdc000/0x4000! 0xe0000/0x10000! > cpu0 at mainbus0: (uniprocessor) > cpu0: Enhanced SpeedStep 1695 MHz: speeds: 1700, 1400, 1200, 1000, 800, > 600 MHz > pci0 at mainbus0 bus 0: configuration mode 1 (bios) > io address conflict 0x5800/0x8 > io address conflict 0x5808/0x4 > io address conflict 0x5810/0x8 > io address conflict 0x580c/0x4 > pchb0 at pci0 dev 0 function 0 "Intel 82855PM Host" rev 0x03 > intelagp0 at pchb0 > agp0 at intelagp0: aperture at 0xd0000000, size 0x10000000 > ppb0 at pci0 dev 1 function 0 "Intel 82855PM AGP" rev 0x03 > pci1 at ppb0 bus 1 > vga1 at pci1 dev 0 function 0 "ATI Radeon Mobility M7" rev 0x00 > wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation) > wsdisplay0: screen 1-5 added (80x25, vt100 emulation) > radeondrm0 at vga1: irq 11 > drm0 at radeondrm0 > uhci0 at pci0 dev 29 function 0 "Intel 82801DB USB" rev 0x01: irq 11 > uhci1 at pci0 dev 29 function 1 "Intel 82801DB USB" rev 0x01: irq 11 > uhci2 at pci0 dev 29 function 2 "Intel 82801DB USB" rev 0x01: irq 11 > ehci0 at pci0 dev 29 function 7 "Intel 82801DB USB" rev 0x01: irq 11 > usb0 at ehci0: USB revision 2.0 > uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1 > ppb1 at pci0 dev 30 function 0 "Intel 82801BAM Hub-to-PCI" rev 0x81 > pci2 at ppb1 bus 2 > mem address conflict 0xb0000000/0x1000 > mem address conflict 0xb1000000/0x1000 > cbb0 at pci2 dev 0 function 0 "TI PCI4520 CardBus" rev 0x01: irq 11 > cbb1 at pci2 dev 0 function 1 "TI PCI4520 CardBus" rev 0x01: irq 11 > em0 at pci2 dev 1 function 0 "Intel PRO/1000MT (82540EP)" rev 0x03: irq > 11, address 00:11:25:32:45:72 > iwi0 at pci2 dev 2 function 0 "Intel PRO/Wireless 2200BG" rev 0x05: irq > 11, address 00:0e:35:bc:03:c1 > cardslot0 at cbb0 slot 0 flags 0 > cardbus0 at cardslot0: bus 3 device 0 cacheline 0x8, lattimer 0xb0 > pcmcia0 at cardslot0 > cardslot1 at cbb1 slot 1 flags 0 > cardbus1 at cardslot1: bus 6 device 0 cacheline 0x8, lattimer 0xb0 > pcmcia1 at cardslot1 > ichpcib0 at pci0 dev 31 function 0 "Intel 82801DBM LPC" rev 0x01: 24-bit > timer at 3579545Hz > pciide0 at pci0 dev 31 function 1 "Intel 82801DBM IDE" rev 0x01: DMA, > channel 0 configured to compatibility, channel 1 configured to compatibility > wd0 at pciide0 channel 0 drive 0: <SAMSUNG HM160HC> > wd0: 16-sector PIO, LBA48, 152627MB, 312581808 sectors > wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 5 > atapiscsi0 at pciide0 channel 1 drive 0 > scsibus0 at atapiscsi0: 2 targets > cd0 at scsibus0 targ 0 lun 0: <MATSHITA, UJDA745 DVD/CDRW, 1.03> ATAPI > 5/cdrom removable > cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2 > ichiic0 at pci0 dev 31 function 3 "Intel 82801DB SMBus" rev 0x01: irq 11 > iic0 at ichiic0 > spdmem0 at iic0 addr 0x50: 1GB DDR SDRAM non-parity PC2700CL2.5 > spdmem1 at iic0 addr 0x51: 1GB DDR SDRAM non-parity PC2700CL2.5 > auich0 at pci0 dev 31 function 5 "Intel 82801DB AC97" rev 0x01: irq 11, > ICH4 AC97 > ac97: codec id 0x41445374 (Analog Devices AD1981B) > ac97: codec features headphone, 20 bit DAC, No 3D Stereo > audio0 at auich0 > "Intel 82801DB Modem" rev 0x01 at pci0 dev 31 function 6 not configured > usb1 at uhci0: USB revision 1.0 > uhub1 at usb1 "Intel UHCI root hub" rev 1.00/1.00 addr 1 > usb2 at uhci1: USB revision 1.0 > uhub2 at usb2 "Intel UHCI root hub" rev 1.00/1.00 addr 1 > usb3 at uhci2: USB revision 1.0 > uhub3 at usb3 "Intel UHCI root hub" rev 1.00/1.00 addr 1 > isa0 at ichpcib0 > isadma0 at isa0 > pckbc0 at isa0 port 0x60/5 > pckbd0 at pckbc0 (kbd slot) > pckbc0: using irq 1 for kbd slot > wskbd0 at pckbd0: console keyboard, using wsdisplay0 > pms0 at pckbc0 (aux slot) > pckbc0: using irq 12 for aux slot > wsmouse0 at pms0 mux 0 > pcppi0 at isa0 port 0x61 > spkr0 at pcppi0 > aps0 at isa0 port 0x1600/31 > npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16 > mtrr: Pentium Pro MTRR support > uhidev0 at uhub2 port 1 configuration 1 interface 0 "Logitech USB-PS/2 > Optical Mouse" rev 2.00/18.00 addr 2 > uhidev0: iclass 3/1 > ums0 at uhidev0: 6 buttons, Z dir > wsmouse1 at ums0 mux 0 > umass0 at uhub2 port 2 configuration 1 interface 0 "TOSHIBA TransMemory" > rev 2.00/1.00 addr 3 > umass0: using SCSI over Bulk-Only > scsibus1 at umass0: 2 targets, initiator 0 > sd0 at scsibus1 targ 1 lun 0: <TOSHIBA, TransMemory, 1.00> SCSI2 0/direct > removable serial.09306544C940942403F1 > sd0: 7643MB, 512 bytes/sector, 15654848 sectors > ugen0 at uhub3 port 2 "STMicroelectronics Biometric Coprocessor" rev > 1.00/0.01 addr 2 > vscsi0 at root > scsibus2 at vscsi0: 256 targets > softraid0 at root > scsibus3 at softraid0: 256 targets > softraid0: sd1 was not shutdown properly > sd1 at scsibus3 targ 1 lun 0: <OPENBSD, SR CRYPTO, 005> SCSI2 0/direct > fixed > sd1: 61446MB, 512 bytes/sector, 125843232 sectors > root device (default wd0a): sd1a > swap device (default sd1b): wd0b > root on sd1a swap on wd0b dump on wd0b > > Norman had a good thought, one which I had not thought of since OpenBSD pre-selected booting from the MP kernel during the install given the CPU I am running. I booted up the box on the SP kernel, and was able to get through a few 30 second stress runs, but then ran one set for 90 seconds and received another panic:
pmap_page_remove: pg=0xd2342ce0: va=8f9d4000, pv_ptp=0xd2956da4 pmap_page_remove: PTP's phys addr: actual=0, recorded=71112000 panic: pmap_page_remove: mapped managed page has invalid pv_ptp field Stopped at Debugger+0x4: popl %ebp RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC! DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION! ddb> ps PID PPID PGRP UID S FLAGS WAIT COMMAND 17010 22606 22606 0 2 0 stress 9052 22606 22606 0 2 0 stress 18279 22606 22606 0 2 0 stress 5192 22606 22606 0 2 0 stress 22557 22606 22606 0 2 0 stress 31262 22606 22606 0 2 0 stress 29473 22606 22606 0 2 0 stress 12947 22606 22606 0 2 0 stress 21017 22606 22606 0 2 0 stress 24320 22606 22606 0 2 0 stress 26998 22606 22606 0 2 0 stress 23032 22606 22606 0 2 0 stress 19909 22606 22606 0 2 0 stress 23706 22606 22606 0 2 0 stress * 4608 22606 22606 0 7 0 stress 4070 22606 22606 0 2 0 stress 10494 22606 22606 0 2 0 stress 22606 14553 22606 0 3 0x80 wait stress 14553 25593 14553 0 3 0x88 pause ksh 25593 10983 25593 1000 3 0x88 pause ksh 10983 12086 12086 1000 3 0x80 select sshd 12086 4870 12086 0 3 0x80 poll sshd 8902 1 8902 0 3 0x80 ttyin getty 17445 1 17445 0 3 0x80 ttyin getty 294 1 294 0 3 0x80 ttyin getty 10058 1 10058 0 3 0x80 ttyin getty 8789 1 8789 0 3 0x80 ttyin getty 11270 1 11270 0 3 0x80 ttyin getty 3408 1 3408 0 3 0x80 select cron 17208 1 17208 0 3 0x80 select inetd 1018 11457 11457 95 3 0x80 kqread smtpd 1605 11457 11457 95 3 0x80 kqread smtpd 20814 11457 11457 95 3 0x80 kqread smtpd 32539 11457 11457 95 3 0x80 kqread smtpd 3932 11457 11457 95 3 0x80 kqread smtpd 30194 11457 11457 95 3 0x80 kqread smtpd 9971 11457 11457 95 3 0x80 kqread smtpd 17016 11457 11457 95 3 0x80 kqread smtpd 11457 1 11457 0 3 0x80 kqread smtpd 4870 1 4870 0 3 0x80 select sshd 8865 32197 22908 83 3 0x80 poll ntpd 32197 22908 22908 83 3 0x80 poll ntpd 22908 1 22908 0 3 0x80 poll ntpd 6070 12921 12921 74 3 0x80 bpf pflogd 12921 1 12921 0 3 0x80 netio pflogd 138 10686 10686 73 2 0x80 syslogd 10686 1 10686 0 3 0x80 netio syslogd 15 0 0 0 3 0x100200 aiodoned aiodoned 14 0 0 0 3 0x100200 syncer update 13 0 0 0 3 0x100200 cleaner cleaner 12 0 0 0 3 0x100200 reaper reaper 11 0 0 0 3 0x100200 pgdaemon pagedaemon 10 0 0 0 3 0x100200 bored crypto 9 0 0 0 3 0x100200 pftm pfpurge 8 0 0 0 3 0x100200 usbtsk usbtask 7 0 0 0 3 0x100200 usbatsk usbatsk 6 0 0 0 3 0x100200 bored intelrel 5 0 0 0 3 0x100200 acpi0 acpi0 4 0 0 0 3 0x100200 bored syswq 3 0 0 0 3 0x40100200 idle0 2 0 0 0 3 0x100200 kmalloc kmthread 1 0 1 0 3 0x80 wait init 0 -1 0 0 3 0x200 scheduler swapper ddb> trace Debugger(d08d3b58,de8a9e08,d08d7008,de8a9e08,d09b9b94) at Debugger+0x4 panic(d08d7008,0,71112000,d2956da4,d8af872c) at panic+0x5d pmap_page_remove(d2342ce0,d8af872c,d0a23360,d0a48c20,d89d7be0) at pmap_page_rem ove+0x1d2 uvm_anfree(d8af8738,d8910028,0,d8915f40,d8915f40) at uvm_anfree+0xac amap_wipeout(d89d7be0,0,10,0,d8c4c9dc) at amap_wipeout+0x94 uvm_map_unreference_amap(d8915f40,0,de8a9ecc,d03d8591,d8c4c9e0) at uvm_map_unre ference_amap+0x2f uvm_unmap_detach(d8ac8e50,0,92823000,de8a9f0c,d8c292c0) at uvm_unmap_detach+0x5 b sys_munmap(d8c292c0,de8a9f64,de8a9f84,de8a9fa8,d8c292c0) at sys_munmap+0x15d syscall() at syscall+0x2d8 --- syscall (number 134217728) --- 0x2: Am I barking up the wrong tree trying to deduce if I really do have a hardware problem? I am open to accepting diffs and compiling from source if other developers think there might be a bug to fix here. Cheers, Jeff