>Synopsis: Regular crashes on rpi4 when running PostgreSQL tests >Category: aarch64 >Environment: System : OpenBSD 7.2 Details : OpenBSD 7.2-current (GENERIC.MP) #1896: Sat Nov 19 21:38:32 MST 2022 dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP
Architecture: OpenBSD.arm64 Machine : arm64 >Description: When running PostgreSQL regression tests (using the community buildfarm tooling) on Raspberry Pi 4 machine, the system occasionally panics - this happens after a small number of hours. The system is significantly slower compared to rpi4 machines running linux (by a factor of ~5x) so the whole test suite would finish in about 24 hours, but I have never seen that to happen due to a crash. I suspected perhaps this particular rpi4 is somehow broken, so I tried booting a Linux and ran the same set of tests - and that worked just fine. In fact, it completed ~10 rounds of testing over ~2 days, while on OpenBSD I can't get a single complete run. Another thing I suspected is faulty SD card, so I moved the work directory to a USB flash drive and then to a reliable SSD (connected using a USB/SATA). The SSD did improve the performance somewhat (compared to running from USB flash drive) but the panics are still there, unfortunately. I managed to collect a bunch of information following the ddb page for two crashes (I can try again, if more information is needed). For the first crash I have only the stuff from the console: Stopped at panic+0x160 cmp w21,#0x0 TID PID UID PFFLAGS PFLAGS CPU COMMAND *178534 88804 1000 0 0 2 postgres 464655 67171 1000 0 0 0 postgres 470045 34591 1000 0 0 3 postgres 326421 84018 1000 0 0 3K postgres db_enter() at panic+0x15c panic() at __assert+0x24 panic() at uvm_fault_upper_lookup+0x258 uvm_fault_upper() at uvm_fault+0xec uvm_fault() at udata_abort+0x128 udata_abort() at do_el0_sync+0xdc do_el0_sync() at handle_el0_sync+0x74 For the second crash, I have more: Stopped at panic+0x160 cmp w21,#0x0 TID PID UID PFFLAGS PFLAGS CPU COMMAND *315901 52422 1000 0 0 0 postgres 286288 16150 1000 0 0 3 postgres 235152 96037 0 0x14000 0x200 1 zerothread ddb{0}> bt db_enter() at panic+0x15c panic() at kdata_abort+0x168 kdata_abort() at handle_el1h_sync+0x6c handle_el1h_sync() at pmap_copy_page+0x98 pmap_copy_page() at pmap_copy_page+0x98 pmap_copy_page() at uvm_fault_upper+0x13c uvm_fault_upper() at uvm_fault+0xb4 uvm_fault() at udata_abort+0x128 udata_abort() at do_el0_sync+0xdc do_el0_sync() at handle_el0_sync+0x74 handle_el0_sync() at 0x1b02613208 ddb{0}> show uvm Current UVM status: pagesize=4096 (0Ã1000), pagemask=0xfff, pageshift=12 967776 VM pages: 44735 active, 183278 inactive, 1 wired, 344603 free (43089 zero) min 10% (25) anon, 10% (25) vnode, 5% (12) vtext freemin=32259, free-target=43012, inactive-target=74248, wired-max=322592 faults=87269298, traps=0, intrs=0, ctxswitch=19407000 fpuswitch=8 softint=20156649, syscalls=124374327, kmapent=21 fault counts: noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0 ok relocks(total)=332129(335474), anget(retries)=35295179(0), amapcopy-8887865 neighbor anon/obj pg=16711715/60577268, gets(lock/unlock)=18290527/335627 cases: anon=32635596, anoncow=2659583, obj=16018472, prcopy-2268557, przero=33701122 daemon and swap counts: woke=10195, revs=27, scans=0, obscans=0, anscans=0 busy=0, freed=0, reactivate=0, deactivate=28030 pageouts=0, pending=0, nswget=0 nswapdev=1 swpages=517387, swpginuse=0, swpgonly=0 paging=0 kernel pointers: objs(kern)=0xffffff80010d6f78 ddb{0}> show bcstats Current Buffer Cache status: numbufs 88448 busymapped 0, deluri 1783 kvaslots 2855 avail kva slots 2855 bufpages 353757, dmapages 45613, dirtypages 7132 pendingreads 0, pendingwrites 3 highflips 817351, highflops 0, dmaflips 79223 ddb{0}> show panic *cpu: uvm_fault failed: ffffff80008e5274 esr 96000007 far ffffff80022a2338 # mount /dev/sd0a on / type ffs (local) /dev/sd0l on /home type ffs (local, nodev, nosuid) /dev/sd0d on /tmp type ffs (local, nodev, nosuid) /dev/sd0f on /usr type ffs (local, nodev) /dev/sd0g on /usr/X11R6 type ffs (local, nodev) /dev/sd0h on /usr/local type ffs (local, nodev, wxallowed) /dev/sd0k on /usr/obj type ffs (local, nodev, nosuid) /dev/sd0j on /usr/src type ffs (local, nodev, nosuid) /dev/sd0e on /var type ffs (local, nodev, nosuid) /dev/sd1c on /mnt/data type ffs (local, noatime, nodev, nosuid) >How-To-Repeat: Get the PostgreSQL buildfarm client installed and configured, per: https://buildfarm.postgresql.org/ https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto I can provide more detailed instructions/config if needed. Then run the whole test suite using ./run_branches.pl --run-all --nosend --nostatus --verbose which runs tests on all supported PotgreSQL branches (10-HEAD). I've never seen the whole run complete. >Fix: No idea. dmesg: OpenBSD 7.2-current (GENERIC.MP) #1896: Sat Nov 19 21:38:32 MST 2022 dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP real mem = 4124958720 (3933MB) avail mem = 3963834368 (3780MB) random: good seed from bootblocks mainbus0 at root: Raspberry Pi 4 Model B Rev 1.2 cpu0 at mainbus0 mpidr 0: ARM Cortex-A72 r0p3 cpu0: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache cpu0: 1024KB 64b/line 16-way L2 cache cpu0: CRC32,ASID16 cpu1 at mainbus0 mpidr 1: ARM Cortex-A72 r0p3 cpu1: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache cpu1: 1024KB 64b/line 16-way L2 cache cpu1: CRC32,ASID16 cpu2 at mainbus0 mpidr 2: ARM Cortex-A72 r0p3 cpu2: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache cpu2: 1024KB 64b/line 16-way L2 cache cpu2: CRC32,ASID16 cpu3 at mainbus0 mpidr 3: ARM Cortex-A72 r0p3 cpu3: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache cpu3: 1024KB 64b/line 16-way L2 cache cpu3: CRC32,ASID16 efi0 at mainbus0: UEFI 2.8 efi0: Das U-Boot rev 0x20211000 smbios0 at efi0: SMBIOS 3.0 smbios0: vendor U-Boot version "2021.10" date 10/01/2021 smbios0: Unknown Unknown Product apm0 at mainbus0 simplefb0 at mainbus0: 1920x1280, 32bpp wsdisplay0 at simplefb0 mux 1: console (std, vt100 emulation) wsdisplay0: screen 1-5 added (std, vt100 emulation) "system" at mainbus0 not configured "axi" at mainbus0 not configured simplebus0 at mainbus0: "soc" bcmclock0 at simplebus0 bcmmbox0 at simplebus0 bcmgpio0 at simplebus0 bcmaux0 at simplebus0 ampintc0 at simplebus0 nirq 256, ncpu 4 ipi: 0, 1, 2: "interrupt-controller" bcmtmon0 at simplebus0 bcmdmac0 at simplebus0: DMA0 DMA2 DMA4 DMA5 DMA6 DMA7 DMA8 DMA9 DMA10 "timer" at simplebus0 not configured pluart0 at simplebus0: rev 2, 16 byte fifo "local_intc" at simplebus0 not configured bcmdog0 at simplebus0 bcmirng0 at simplebus0 "firmware" at simplebus0 not configured "power" at simplebus0 not configured "mailbox" at simplebus0 not configured sdhc0 at simplebus0 sdhc0: SDHC 3.0, 250 MHz base clock sdmmc0 at sdhc0: 4-bit, sd high-speed, mmc high-speed "gpiomem" at simplebus0 not configured "fb" at simplebus0 not configured "vcsm" at simplebus0 not configured "clocks" at mainbus0 not configured "phy" at mainbus0 not configured "clk-27M" at mainbus0 not configured "clk-108M" at mainbus0 not configured simplebus1 at mainbus0: "emmc2bus" sdhc1 at simplebus1 sdhc1: SDHC 3.0, 100 MHz base clock sdmmc1 at sdhc1: 8-bit, sd high-speed, mmc high-speed, ddr52, dma "arm-pmu" at mainbus0 not configured agtimer0 at mainbus0: 54000 kHz simplebus2 at mainbus0: "scb" bcmpcie0 at simplebus2 pci0 at bcmpcie0 ppb0 at pci0 dev 0 function 0 "Broadcom BCM2711" rev 0x10 pci1 at ppb0 bus 1 xhci0 at pci1 dev 0 function 0 "VIA VL805 xHCI" rev 0x01: intx, xHCI 1.0 usb0 at xhci0: USB revision 3.0 uhub0 at usb0 configuration 1 interface 0 "VIA xHCI root hub" rev 3.00/1.00 addr 1 bse0 at simplebus2: address dc:a6:32:74:f0:2b brgphy0 at bse0 phy 1: BCM54210E 10/100/1000baseT PHY, rev. 2 "dma" at simplebus2 not configured "hevc-decoder" at simplebus2 not configured "rpivid-local-intc" at simplebus2 not configured "h264-decoder" at simplebus2 not configured "vp9-decoder" at simplebus2 not configured gpioleds0 at mainbus0: "led0", "led1" "sd_io_1v8_reg" at mainbus0 not configured "sd_vcc_reg" at mainbus0 not configured "fixedregulator_3v3" at mainbus0 not configured "fixedregulator_5v0" at mainbus0 not configured simplebus3 at mainbus0: "v3dbus" "bootloader" at mainbus0 not configured scsibus0 at sdmmc1: 2 targets, initiator 0 sd0 at scsibus0 targ 1 lun 0: <SD/MMC, SN32G, 0080> removable sd0: 30424MB, 512 bytes/sector, 62309376 sectors uhub1 at uhub0 port 1 configuration 1 interface 0 "VIA Labs USB2.0 Hub" rev 2.10/4.21 addr 2 bwfm0 at sdmmc0 function 1 manufacturer 0x02d0, product 0xa9a6 at sdmmc0 function 2 not configured manufacturer 0x02d0, product 0xa9a6 at sdmmc0 function 3 not configured uhidev0 at uhub1 port 4 configuration 1 interface 0 "SINO WEALTH USB KEYBOARD" rev 1.10/1.00 addr 3 uhidev0: iclass 3/1 ukbd0 at uhidev0: 8 variable keys, 6 key codes wskbd0 at ukbd0: console keyboard, using wsdisplay0 uhidev1 at uhub1 port 4 configuration 1 interface 1 "SINO WEALTH USB KEYBOARD" rev 1.10/1.00 addr 3 uhidev1: iclass 3/0, 5 report ids uhid0 at uhidev1 reportid 2: input=1, output=0, feature=0 ucc0 at uhidev1 reportid 3: 24 usages, 13 keys, enum wskbd1 at ucc0 mux 1 wskbd1: connecting to wsdisplay0 uhid1 at uhidev1 reportid 5: input=0, output=0, feature=5 umass0 at uhub0 port 2 configuration 1 interface 0 "USB SanDisk 3.2Gen1" rev 3.20/1.00 addr 4 umass0: using SCSI over Bulk-Only scsibus1 at umass0: 2 targets, initiator 0 sd1 at scsibus1 targ 1 lun 0: <USB, SanDisk 3.2Gen1, 1.00> removable serial.078155838107b7ab67f3 sd1: 29340MB, 512 bytes/sector, 60088320 sectors vscsi0 at root scsibus2 at vscsi0: 256 targets softraid0 at root scsibus3 at softraid0: 256 targets root on sd0a (9b2ba6937baf4c1f.a) swap on sd0b dump on sd0b WARNING: / was not properly unmounted WARNING: CHECK AND RESET THE DATE! gpio0 at bcmgpio0: 58 pins bwfm0: address dc:a6:32:74:f0:2c usbdevs: Controller /dev/usb0: addr 01: 1106:0000 VIA, xHCI root hub super speed, self powered, config 1, rev 1.00 driver: uhub0 addr 02: 2109:3431 VIA Labs, USB2.0 Hub high speed, self powered, config 1, rev 4.21 driver: uhub1 addr 03: 258a:0001 SINO WEALTH, USB KEYBOARD low speed, power 100 mA, config 1, rev 1.00 driver: uhidev0 driver: uhidev1 addr 04: 0781:5583 USB, SanDisk 3.2Gen1 super speed, power 224 mA, config 1, rev 1.00, iSerial 050160872597eba701145d5d63b2c86e87619ad5b7d72657a7bd19b6a916325ae60a00000000000000000000e5824fefff9c171083558107b7ab67f3 driver: umass0 pcidump: Domain /dev/pci0: 0:0:0: Broadcom BCM2711 0x0000: Vendor ID: 14e4, Product ID: 2711 0x0004: Command: 0006, Status: 0010 0x0008: Class: 06 Bridge, Subclass: 04 PCI, Interface: 00, Revision: 10 0x000c: BIST: 00, Header Type: 01, Latency Timer: 00, Cache Line Size: 08 0x0010: BAR empty (00000000) 0x0014: BAR empty (00000000) 0x0018: Primary Bus: 0, Secondary Bus: 1, Subordinate Bus: 1, Secondary Latency Timer: 00 0x001c: I/O Base: 00, I/O Limit: 00, Secondary Status: 0000 0x0020: Memory Base: c000, Memory Limit: c000 0x0024: Prefetch Memory Base: 1001, Prefetch Memory Limit: 0001 0x0028: Prefetch Memory Base Upper 32 Bits: 00000000 0x002c: Prefetch Memory Limit Upper 32 Bits: 00000000 0x0030: I/O Base Upper 16 Bits: 0000, I/O Limit Upper 16 Bits: 0000 0x0038: Expansion ROM Base Address: 00000000 0x003c: Interrupt Pin: 01, Line: 00, Bridge Control: 0000 0x0048: Capability 0x01: Power Management State: D0 0x00ac: Capability 0x10: PCI Express Max Payload Size: 128 / 512 bytes Max Read Request Size: 512 bytes Link Speed: 5.0 / 5.0 GT/s Link Width: x1 / x1 0x0100: Enhanced Capability 0x01: Advanced Error Reporting 0x0180: Enhanced Capability 0x0b: Vendor-Specific 0x0240: Enhanced Capability 0x1e: L1 PM 0x0000: 271114e4 00100006 06040010 00010008 0x0010: 00000000 00000000 00010100 00000000 0x0020: c000c000 00011001 00000000 00000000 0x0030: 00000000 00000048 00000000 00000100 0x0040: 00000000 00000000 4813ac01 00002008 0x0050: 00000000 00000000 00000000 00000000 0x0060: 00000000 00000000 00000000 00000000 0x0070: 00000000 00000000 00000000 00000000 0x0080: 00000000 00000000 00000000 00000000 0x0090: 00000000 00000000 00000000 00000000 0x00a0: 00000000 00000000 00000000 00420010 0x00b0: 00008002 00002c10 00655c12 90120000 0x00c0: 00000000 00400000 00010000 00000000 0x00d0: 0008081f 00000000 80000006 00000002 0x00e0: 00000000 00000000 00000000 00000000 0x00f0: 00000000 00000000 00000000 00000000 1:0:0: VIA VL805 xHCI 0x0000: Vendor ID: 1106, Product ID: 3483 0x0004: Command: 0006, Status: 0010 0x0008: Class: 0c Serial Bus, Subclass: 03 USB, Interface: 30, Revision: 01 0x000c: BIST: 00, Header Type: 00, Latency Timer: 00, Cache Line Size: 08 0x0010: BAR mem 64bit addr: 0x00000000c0000000/0x00001000 0x0018: BAR empty (00000000) 0x001c: BAR empty (00000000) 0x0020: BAR empty (00000000) 0x0024: BAR empty (00000000) 0x0028: Cardbus CIS: 00000000 0x002c: Subsystem Vendor ID: 1106 Product ID: 3483 0x0030: Expansion ROM Base Address: 00000000 0x0038: 00000000 0x003c: Interrupt Pin: 01 Line: 00 Min Gnt: 00 Max Lat: 00 0x0080: Capability 0x01: Power Management State: D0 0x0090: Capability 0x05: Message Signalled Interrupts (MSI) Enabled: no 0x00c4: Capability 0x10: PCI Express Max Payload Size: 128 / 256 bytes Max Read Request Size: 512 bytes Link Speed: 5.0 / 5.0 GT/s Link Width: x1 / x1 0x0100: Enhanced Capability 0x01: Advanced Error Reporting 0x0000: 34831106 00100006 0c033001 00000008 0x0010: c0000004 00000000 00000000 00000000 0x0020: 00000000 00000000 00000000 34831106 0x0030: 00000000 00000080 00000000 00000100 0x0040: 00000000 00000100 39df4009 00000004 0x0050: 000138a1 00000000 00000000 34831106 0x0060: 00002030 00000000 00000000 00000000 0x0070: 00000000 00000000 00000000 00000000 0x0080: 89c39001 00000000 00000000 00000000 0x0090: 0084c405 00000000 00000000 00000000 0x00a0: 00000000 00000000 00000000 00000000 0x00b0: 00000000 00000000 00000000 00000000 0x00c0: 00002000 00020010 00008001 00192810 0x00d0: 00065c12 10120043 00000000 00000000 0x00e0: 00000000 00000000 00000012 00000000 0x00f0: 00000000 00010022 00000000 00000000 acpidump: