>Synopsis:      Regular crashes on rpi4 when running PostgreSQL tests
>Category:      aarch64
>Environment:
        System      : OpenBSD 7.2
        Details     : OpenBSD 7.2-current (GENERIC.MP) #1896: Sat Nov 19 
21:38:32 MST 2022
                         
dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP

        Architecture: OpenBSD.arm64
        Machine     : arm64
>Description:

When running PostgreSQL regression tests (using the community buildfarm tooling)
on Raspberry Pi 4 machine, the system occasionally panics - this happens after a
small number of hours. The system is significantly slower compared to rpi4
machines running linux (by a factor of ~5x) so the whole test suite would finish
in about 24 hours, but I have never seen that to happen due to a crash.

I suspected perhaps this particular rpi4 is somehow broken, so I tried booting
a Linux and ran the same set of tests - and that worked just fine. In fact, it
completed ~10 rounds of testing over ~2 days, while on OpenBSD I can't get a
single complete run.

Another thing I suspected is faulty SD card, so I moved the work directory to
a USB flash drive and then to a reliable SSD (connected using a USB/SATA).
The SSD did improve the performance somewhat (compared to running from USB
flash drive) but the panics are still there, unfortunately.

I managed to collect a bunch of information following the ddb page for two
crashes (I can try again, if more information is needed).

For the first crash I have only the stuff from the console:

    Stopped at           panic+0x160      cmp      w21,#0x0
        TID     PID     UID   PFFLAGS    PFLAGS   CPU   COMMAND
    *178534   88804    1000         0         0     2   postgres
     464655   67171    1000         0         0     0   postgres
     470045   34591    1000         0         0     3   postgres
     326421   84018    1000         0         0     3K  postgres
    
    db_enter() at panic+0x15c
    panic() at __assert+0x24
    panic() at uvm_fault_upper_lookup+0x258
    uvm_fault_upper() at uvm_fault+0xec
    uvm_fault() at udata_abort+0x128
    udata_abort() at do_el0_sync+0xdc
    do_el0_sync() at handle_el0_sync+0x74

For the second crash, I have more:

    Stopped at           panic+0x160      cmp      w21,#0x0
        TID     PID     UID   PFFLAGS    PFLAGS   CPU   COMMAND
    *315901   52422    1000         0         0     0   postgres
     286288   16150    1000         0         0     3   postgres
     235152   96037       0   0x14000     0x200     1   zerothread
    
    ddb{0}> bt
    db_enter() at panic+0x15c
    panic() at kdata_abort+0x168
    kdata_abort() at handle_el1h_sync+0x6c
    handle_el1h_sync() at pmap_copy_page+0x98
    pmap_copy_page() at pmap_copy_page+0x98
    pmap_copy_page() at uvm_fault_upper+0x13c
    uvm_fault_upper() at uvm_fault+0xb4
    uvm_fault() at udata_abort+0x128
    udata_abort() at do_el0_sync+0xdc
    do_el0_sync() at handle_el0_sync+0x74
    handle_el0_sync() at 0x1b02613208

    ddb{0}> show uvm
    Current UVM status:
      pagesize=4096 (0×1000), pagemask=0xfff, pageshift=12
      967776 VM pages: 44735 active, 183278 inactive, 1 wired, 344603 free 
(43089 zero)
      min 10% (25) anon, 10% (25) vnode, 5% (12) vtext
      freemin=32259, free-target=43012, inactive-target=74248, wired-max=322592
      faults=87269298, traps=0, intrs=0, ctxswitch=19407000 fpuswitch=8
      softint=20156649, syscalls=124374327, kmapent=21
      fault counts:
        noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0
        ok relocks(total)=332129(335474), anget(retries)=35295179(0), 
amapcopy-8887865
        neighbor anon/obj pg=16711715/60577268, 
gets(lock/unlock)=18290527/335627
        cases: anon=32635596, anoncow=2659583, obj=16018472, prcopy-2268557, 
przero=33701122
      daemon and swap counts:
        woke=10195, revs=27, scans=0, obscans=0, anscans=0
        busy=0, freed=0, reactivate=0, deactivate=28030
        pageouts=0, pending=0, nswget=0
        nswapdev=1
        swpages=517387, swpginuse=0, swpgonly=0 paging=0
      kernel pointers:
        objs(kern)=0xffffff80010d6f78
    
    ddb{0}> show bcstats
    Current Buffer Cache status:
    numbufs 88448 busymapped 0, deluri 1783
    kvaslots 2855 avail kva slots 2855
    bufpages 353757, dmapages 45613, dirtypages 7132
    pendingreads 0, pendingwrites 3
    highflips 817351, highflops 0, dmaflips 79223
    
    ddb{0}> show panic
    *cpu: uvm_fault failed: ffffff80008e5274 esr 96000007 far ffffff80022a2338

    # mount
    /dev/sd0a on / type ffs (local)
    /dev/sd0l on /home type ffs (local, nodev, nosuid)
    /dev/sd0d on /tmp type ffs (local, nodev, nosuid)
    /dev/sd0f on /usr type ffs (local, nodev)
    /dev/sd0g on /usr/X11R6 type ffs (local, nodev)
    /dev/sd0h on /usr/local type ffs (local, nodev, wxallowed)
    /dev/sd0k on /usr/obj type ffs (local, nodev, nosuid)
    /dev/sd0j on /usr/src type ffs (local, nodev, nosuid)
    /dev/sd0e on /var type ffs (local, nodev, nosuid)
    /dev/sd1c on /mnt/data type ffs (local, noatime, nodev, nosuid)


>How-To-Repeat:

Get the PostgreSQL buildfarm client installed and configured, per:

    https://buildfarm.postgresql.org/

    https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto

I can provide more detailed instructions/config if needed. Then run the whole
test suite using

    ./run_branches.pl --run-all --nosend --nostatus --verbose

which runs tests on all supported PotgreSQL branches (10-HEAD). I've never seen
the whole run complete.

>Fix:

No idea.


dmesg:
OpenBSD 7.2-current (GENERIC.MP) #1896: Sat Nov 19 21:38:32 MST 2022
    dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP
real mem  = 4124958720 (3933MB)
avail mem = 3963834368 (3780MB)
random: good seed from bootblocks
mainbus0 at root: Raspberry Pi 4 Model B Rev 1.2
cpu0 at mainbus0 mpidr 0: ARM Cortex-A72 r0p3
cpu0: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
cpu0: 1024KB 64b/line 16-way L2 cache
cpu0: CRC32,ASID16
cpu1 at mainbus0 mpidr 1: ARM Cortex-A72 r0p3
cpu1: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
cpu1: 1024KB 64b/line 16-way L2 cache
cpu1: CRC32,ASID16
cpu2 at mainbus0 mpidr 2: ARM Cortex-A72 r0p3
cpu2: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
cpu2: 1024KB 64b/line 16-way L2 cache
cpu2: CRC32,ASID16
cpu3 at mainbus0 mpidr 3: ARM Cortex-A72 r0p3
cpu3: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
cpu3: 1024KB 64b/line 16-way L2 cache
cpu3: CRC32,ASID16
efi0 at mainbus0: UEFI 2.8
efi0: Das U-Boot rev 0x20211000
smbios0 at efi0: SMBIOS 3.0
smbios0: vendor U-Boot version "2021.10" date 10/01/2021
smbios0: Unknown Unknown Product
apm0 at mainbus0
simplefb0 at mainbus0: 1920x1280, 32bpp
wsdisplay0 at simplefb0 mux 1: console (std, vt100 emulation)
wsdisplay0: screen 1-5 added (std, vt100 emulation)
"system" at mainbus0 not configured
"axi" at mainbus0 not configured
simplebus0 at mainbus0: "soc"
bcmclock0 at simplebus0
bcmmbox0 at simplebus0
bcmgpio0 at simplebus0
bcmaux0 at simplebus0
ampintc0 at simplebus0 nirq 256, ncpu 4 ipi: 0, 1, 2: "interrupt-controller"
bcmtmon0 at simplebus0
bcmdmac0 at simplebus0: DMA0 DMA2 DMA4 DMA5 DMA6 DMA7 DMA8 DMA9 DMA10
"timer" at simplebus0 not configured
pluart0 at simplebus0: rev 2, 16 byte fifo
"local_intc" at simplebus0 not configured
bcmdog0 at simplebus0
bcmirng0 at simplebus0
"firmware" at simplebus0 not configured
"power" at simplebus0 not configured
"mailbox" at simplebus0 not configured
sdhc0 at simplebus0
sdhc0: SDHC 3.0, 250 MHz base clock
sdmmc0 at sdhc0: 4-bit, sd high-speed, mmc high-speed
"gpiomem" at simplebus0 not configured
"fb" at simplebus0 not configured
"vcsm" at simplebus0 not configured
"clocks" at mainbus0 not configured
"phy" at mainbus0 not configured
"clk-27M" at mainbus0 not configured
"clk-108M" at mainbus0 not configured
simplebus1 at mainbus0: "emmc2bus"
sdhc1 at simplebus1
sdhc1: SDHC 3.0, 100 MHz base clock
sdmmc1 at sdhc1: 8-bit, sd high-speed, mmc high-speed, ddr52, dma
"arm-pmu" at mainbus0 not configured
agtimer0 at mainbus0: 54000 kHz
simplebus2 at mainbus0: "scb"
bcmpcie0 at simplebus2
pci0 at bcmpcie0
ppb0 at pci0 dev 0 function 0 "Broadcom BCM2711" rev 0x10
pci1 at ppb0 bus 1
xhci0 at pci1 dev 0 function 0 "VIA VL805 xHCI" rev 0x01: intx, xHCI 1.0
usb0 at xhci0: USB revision 3.0
uhub0 at usb0 configuration 1 interface 0 "VIA xHCI root hub" rev 3.00/1.00 
addr 1
bse0 at simplebus2: address dc:a6:32:74:f0:2b
brgphy0 at bse0 phy 1: BCM54210E 10/100/1000baseT PHY, rev. 2
"dma" at simplebus2 not configured
"hevc-decoder" at simplebus2 not configured
"rpivid-local-intc" at simplebus2 not configured
"h264-decoder" at simplebus2 not configured
"vp9-decoder" at simplebus2 not configured
gpioleds0 at mainbus0: "led0", "led1"
"sd_io_1v8_reg" at mainbus0 not configured
"sd_vcc_reg" at mainbus0 not configured
"fixedregulator_3v3" at mainbus0 not configured
"fixedregulator_5v0" at mainbus0 not configured
simplebus3 at mainbus0: "v3dbus"
"bootloader" at mainbus0 not configured
scsibus0 at sdmmc1: 2 targets, initiator 0
sd0 at scsibus0 targ 1 lun 0: <SD/MMC, SN32G, 0080> removable
sd0: 30424MB, 512 bytes/sector, 62309376 sectors
uhub1 at uhub0 port 1 configuration 1 interface 0 "VIA Labs USB2.0 Hub" rev 
2.10/4.21 addr 2
bwfm0 at sdmmc0 function 1
manufacturer 0x02d0, product 0xa9a6 at sdmmc0 function 2 not configured
manufacturer 0x02d0, product 0xa9a6 at sdmmc0 function 3 not configured
uhidev0 at uhub1 port 4 configuration 1 interface 0 "SINO WEALTH USB KEYBOARD" 
rev 1.10/1.00 addr 3
uhidev0: iclass 3/1
ukbd0 at uhidev0: 8 variable keys, 6 key codes
wskbd0 at ukbd0: console keyboard, using wsdisplay0
uhidev1 at uhub1 port 4 configuration 1 interface 1 "SINO WEALTH USB KEYBOARD" 
rev 1.10/1.00 addr 3
uhidev1: iclass 3/0, 5 report ids
uhid0 at uhidev1 reportid 2: input=1, output=0, feature=0
ucc0 at uhidev1 reportid 3: 24 usages, 13 keys, enum
wskbd1 at ucc0 mux 1
wskbd1: connecting to wsdisplay0
uhid1 at uhidev1 reportid 5: input=0, output=0, feature=5
umass0 at uhub0 port 2 configuration 1 interface 0 "USB SanDisk 3.2Gen1" rev 
3.20/1.00 addr 4
umass0: using SCSI over Bulk-Only
scsibus1 at umass0: 2 targets, initiator 0
sd1 at scsibus1 targ 1 lun 0: <USB, SanDisk 3.2Gen1, 1.00> removable 
serial.078155838107b7ab67f3
sd1: 29340MB, 512 bytes/sector, 60088320 sectors
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
root on sd0a (9b2ba6937baf4c1f.a) swap on sd0b dump on sd0b
WARNING: / was not properly unmounted
WARNING: CHECK AND RESET THE DATE!
gpio0 at bcmgpio0: 58 pins
bwfm0: address dc:a6:32:74:f0:2c

usbdevs:
Controller /dev/usb0:
addr 01: 1106:0000 VIA, xHCI root hub
         super speed, self powered, config 1, rev 1.00
         driver: uhub0
addr 02: 2109:3431 VIA Labs, USB2.0 Hub
         high speed, self powered, config 1, rev 4.21
         driver: uhub1
addr 03: 258a:0001 SINO WEALTH, USB KEYBOARD
         low speed, power 100 mA, config 1, rev 1.00
         driver: uhidev0
         driver: uhidev1
addr 04: 0781:5583 USB, SanDisk 3.2Gen1
         super speed, power 224 mA, config 1, rev 1.00, iSerial 
050160872597eba701145d5d63b2c86e87619ad5b7d72657a7bd19b6a916325ae60a00000000000000000000e5824fefff9c171083558107b7ab67f3
         driver: umass0

pcidump:
Domain /dev/pci0:
 0:0:0: Broadcom BCM2711
        0x0000: Vendor ID: 14e4, Product ID: 2711
        0x0004: Command: 0006, Status: 0010
        0x0008: Class: 06 Bridge, Subclass: 04 PCI,
                Interface: 00, Revision: 10
        0x000c: BIST: 00, Header Type: 01, Latency Timer: 00,
                Cache Line Size: 08
        0x0010: BAR empty (00000000)
        0x0014: BAR empty (00000000)
        0x0018: Primary Bus: 0, Secondary Bus: 1, Subordinate Bus: 1,
                Secondary Latency Timer: 00
        0x001c: I/O Base: 00, I/O Limit: 00, Secondary Status: 0000
        0x0020: Memory Base: c000, Memory Limit: c000
        0x0024: Prefetch Memory Base: 1001, Prefetch Memory Limit: 0001
        0x0028: Prefetch Memory Base Upper 32 Bits: 00000000
        0x002c: Prefetch Memory Limit Upper 32 Bits: 00000000
        0x0030: I/O Base Upper 16 Bits: 0000, I/O Limit Upper 16 Bits: 0000
        0x0038: Expansion ROM Base Address: 00000000
        0x003c: Interrupt Pin: 01, Line: 00, Bridge Control: 0000
        0x0048: Capability 0x01: Power Management
                State: D0
        0x00ac: Capability 0x10: PCI Express
                Max Payload Size: 128 / 512 bytes
                Max Read Request Size: 512 bytes
                Link Speed: 5.0 / 5.0 GT/s
                Link Width: x1 / x1
        0x0100: Enhanced Capability 0x01: Advanced Error Reporting
        0x0180: Enhanced Capability 0x0b: Vendor-Specific
        0x0240: Enhanced Capability 0x1e: L1 PM
        0x0000: 271114e4 00100006 06040010 00010008
        0x0010: 00000000 00000000 00010100 00000000
        0x0020: c000c000 00011001 00000000 00000000
        0x0030: 00000000 00000048 00000000 00000100
        0x0040: 00000000 00000000 4813ac01 00002008
        0x0050: 00000000 00000000 00000000 00000000
        0x0060: 00000000 00000000 00000000 00000000
        0x0070: 00000000 00000000 00000000 00000000
        0x0080: 00000000 00000000 00000000 00000000
        0x0090: 00000000 00000000 00000000 00000000
        0x00a0: 00000000 00000000 00000000 00420010
        0x00b0: 00008002 00002c10 00655c12 90120000
        0x00c0: 00000000 00400000 00010000 00000000
        0x00d0: 0008081f 00000000 80000006 00000002
        0x00e0: 00000000 00000000 00000000 00000000
        0x00f0: 00000000 00000000 00000000 00000000
 1:0:0: VIA VL805 xHCI
        0x0000: Vendor ID: 1106, Product ID: 3483
        0x0004: Command: 0006, Status: 0010
        0x0008: Class: 0c Serial Bus, Subclass: 03 USB,
                Interface: 30, Revision: 01
        0x000c: BIST: 00, Header Type: 00, Latency Timer: 00,
                Cache Line Size: 08
        0x0010: BAR mem 64bit addr: 0x00000000c0000000/0x00001000
        0x0018: BAR empty (00000000)
        0x001c: BAR empty (00000000)
        0x0020: BAR empty (00000000)
        0x0024: BAR empty (00000000)
        0x0028: Cardbus CIS: 00000000
        0x002c: Subsystem Vendor ID: 1106 Product ID: 3483
        0x0030: Expansion ROM Base Address: 00000000
        0x0038: 00000000
        0x003c: Interrupt Pin: 01 Line: 00 Min Gnt: 00 Max Lat: 00
        0x0080: Capability 0x01: Power Management
                State: D0
        0x0090: Capability 0x05: Message Signalled Interrupts (MSI)
                Enabled: no
        0x00c4: Capability 0x10: PCI Express
                Max Payload Size: 128 / 256 bytes
                Max Read Request Size: 512 bytes
                Link Speed: 5.0 / 5.0 GT/s
                Link Width: x1 / x1
        0x0100: Enhanced Capability 0x01: Advanced Error Reporting
        0x0000: 34831106 00100006 0c033001 00000008
        0x0010: c0000004 00000000 00000000 00000000
        0x0020: 00000000 00000000 00000000 34831106
        0x0030: 00000000 00000080 00000000 00000100
        0x0040: 00000000 00000100 39df4009 00000004
        0x0050: 000138a1 00000000 00000000 34831106
        0x0060: 00002030 00000000 00000000 00000000
        0x0070: 00000000 00000000 00000000 00000000
        0x0080: 89c39001 00000000 00000000 00000000
        0x0090: 0084c405 00000000 00000000 00000000
        0x00a0: 00000000 00000000 00000000 00000000
        0x00b0: 00000000 00000000 00000000 00000000
        0x00c0: 00002000 00020010 00008001 00192810
        0x00d0: 00065c12 10120043 00000000 00000000
        0x00e0: 00000000 00000000 00000012 00000000
        0x00f0: 00000000 00010022 00000000 00000000

acpidump:

Reply via email to