On 2025/01/14 04:00, Radek wrote:
> There were 3m null-modem cables conncted to both APUs, the APU4's cable had
> also a RS232/USB adapter.
> APUs have fixed console baud rate of 115200 and I didn't find the way to
> change it to lower speed.
You can still set the OpenBSD side to a lower speed, it just means
switching speed if you want to access the BIOS. (I am very surprised
though, I was convinced you could change this, but I don't see a way
to do it without rebuilding firmware, perhaps that was the alix).
> I'm testing only APU4 now. I disconnected the null-modem cable and I set
> ddb.console to 0.
> After a few hours APU4 drops to ddb again:
>
> ddb{2}> show panic
> the kernel did not panic
was there some output before the ddb{2} prompt?
> ddb{2}> trace
> sched_steal_proc(ffff80002d4b7ff0) at sched_steal_proc+0x11c
> sched_chooseproc() at sched_chooseproc+0x1aa
seems strange.
is everything ok with cooling? power?
> mi_switch() at mi_switch+0x1e5
> sched_peg_curproc(ffff80002d4c0ff0) at sched_peg_curproc+0x67
> cpu_hz_update_sensor(ffff80002d4c0ff0) at cpu_hz_update_sensor+0x15
> sensor_task_work(ffff800000030a00) at sensor_task_work+0x51
> taskq_thread(ffff80000008db80) at taskq_thread+0x129
> end trace frame: 0x0, count: -7
> ddb{2}> show register
> rdi 0x1000 __ALIGN_SIZE
> rsi 0x7dc0 __ALIGN_SIZE+0x6dc0
> rbp 0xffff80002d695900
> rbx 0
> rdx 0x394dc21 __kernel_phys_end+0xf4dc21
> rcx 0
> rax 0xc
> r8 0xf627043 __kernel_phys_end+0xcc27043
> r9 0x5e42f67f
> r10 0xcc3bd7032b4f63e
> r11 0x63a9870a5e938412
> r12 0xffff80002d4c0ff0
> r13 0x7fffffff
> r14 0xffff80002d4b7ff0
> r15 0
> rip 0xffffffff81e8636c sched_steal_proc+0x11c
> cs 0x8
> rflags 0x10206 __ALIGN_SIZE+0xf206
> rsp 0xffff80002d6958c0
> ss 0x10
> sched_steal_proc+0x11c: cdqe
>
> ddb{2}> ps
> PID TID PPID UID S FLAGS WAIT COMMAND
> 41911 146814 1 0 3 0x100083 ttyin getty
> 5762 405322 1 0 3 0x100098 kqread cron
> 73181 284842 1 0 3 0x80 ugenrintr apcupsd
> 73181 270787 1 0 3 0x4000088 sigwait apcupsd
> 73181 486672 1 0 3 0x4000080 netacc apcupsd
> 58439 119507 1 99 3 0x1100090 kqread sndiod
> 69152 431640 1 110 3 0x100090 kqread sndiod
> 31588 226733 19541 95 3 0x1100092 kqread smtpd
> 34359 221468 19541 103 3 0x1100092 kqread smtpd
> 98195 498132 19541 95 3 0x1100092 kqread smtpd
> 86017 459136 19541 95 3 0x100092 kqread smtpd
> 70895 101640 19541 95 3 0x1100092 kqread smtpd
> 93103 373510 19541 95 3 0x1100092 kqread smtpd
> 19541 363543 1 0 3 0x100080 kqread smtpd
> 19263 467127 1 77 3 0x1100090 kqread dhcpd
> 96610 325819 1 0 3 0x88 kqread sshd
> 46440 163929 87714 68 3 0x1000090 kqread isakmpd
> 87714 108971 1 0 3 0x80 sbwait isakmpd
> 73323 396657 1 0 3 0x100080 kqread ntpd
> 38281 201772 34209 83 3 0x100092 kqread ntpd
> 34209 396498 1 83 3 0x1100092 kqread ntpd
> 96977 422652 1 53 3 0x1000090 kqread unbound
> 79026 198934 37215 73 3 0x1100090 kqread syslogd
> 37215 230033 1 0 3 0x100082 sbwait syslogd
> 19526 197700 1 0 3 0x100080 kqread resolvd
> 59258 127015 28251 77 3 0x100092 kqread dhcpleased
> 61342 136779 28251 77 3 0x100092 kqread dhcpleased
> 28251 416947 1 0 3 0x80 kqread dhcpleased
> 3370 165413 49206 115 3 0x100092 kqread slaacd
> 64831 78796 49206 115 3 0x100092 kqread slaacd
> 49206 464326 1 0 3 0x100080 kqread slaacd
> 94171 226931 0 0 3 0x14200 bored smr
> 89428 262409 0 0 3 0x14200 pgzero zerothread
> 52956 245859 0 0 3 0x14200 aiodoned aiodoned
> 54747 256091 0 0 3 0x14200 syncer update
> 4892 59507 0 0 3 0x14200 cleaner cleaner
> 82718 198935 0 0 3 0x14200 reaper reaper
> 21459 261399 0 0 3 0x14200 pgdaemon pagedaemon
> 41174 416209 0 0 3 0x14200 mmctsk sdmmc0
> 69111 190214 0 0 3 0x14200 usbtsk usbtask
> 13632 51893 0 0 3 0x14200 usbatsk usbatsk
> 43371 179039 0 0 3 0x40014200 acpi0 acpi0
> 98806 21031 0 0 7 0x40014200 idle3
> 86373 483372 0 0 3 0x40014200 idle2
> 78455 458933 0 0 7 0x40014200 idle1
> *13993 484783 0 0 2 0x40014200 sensors
> 62636 436251 0 0 3 0x14200 bored softnet3
> 56088 519338 0 0 3 0x14200 bored softnet2
> 67073 169850 0 0 3 0x14200 bored softnet1
> 18689 250204 0 0 3 0x14200 bored softnet0
> 15311 500938 0 0 3 0x14200 bored systqmp
> 9702 36446 0 0 3 0x14200 bored systq
> 75771 412492 0 0 3 0x14200 tmoslp softclockmp
> 81164 300625 0 0 3 0x40014200 tmoslp softclock
> 55664 33044 0 0 7 0x40014200 idle0
> 1 504540 0 0 3 0x82 wait init
> 0 0 -1 0 3 0x10200 scheduler swapper
> ddb{2}> mach ddbcpu 0
> Stopped at x86_ipi_db+0x16: leave
> ddb{0}> mach ddbcpu 1
> Stopped at x86_ipi_db+0x16: leave
> ddb{1}> mach ddbcpu 2
> Stopped at sched_steal_proc+0x11c: cdqe
> ddb{2}> mach ddbcpu 3
> Stopped at x86_ipi_db+0x16: leave
>
> ddb{3}> dmesg
> OpenBSD 7.6 (GENERIC.MP) #0: Thu Jan 9 07:32:40 MST 2025
>
> [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.
> MP
> real mem = 4259897344 (4062MB)
> avail mem = 4107575296 (3917MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.0 @ 0xcfe92040 (13 entries)
> bios0: vendor coreboot version "v4.17.0.1" date 06/22/2022
> bios0: PC Engines apu4
> acpi0 at bios0: ACPI 6.0
> acpi0: sleep states S0 S1 S4 S5
> acpi0: tables DSDT FACP SSDT MCFG TPM2 APIC HEST SSDT SSDT DRTM HPET
> acpi0: wakeup devices PBR4(S4) PBR5(S4) PBR6(S4) PBR7(S4) PBR8(S4) UOH1(S3)
> UOH
> 2(S3) UOH3(S3) UOH4(S3) UOH5(S3) UOH6(S3) XHC0(S4)
> acpitimer0 at acpi0: 3579545 Hz, 32 bits
> acpimcfg0 at acpi0
> acpimcfg0: addr 0xf8000000, bus 0-63
> acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: AMD GX-412TC SOC, 998.18 MHz, 16-30-01, patch 07030105
> cpu0: cpuid 1
> edx=178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE
> ,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT>
> ecx=36d8220b<SSE3,PCLMUL,MWAI
> T,SSSE3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C>
> cpu0: cpuid 6 eax=4<ARAT> ecx=1<EFFFREQ>
> cpu0: cpuid 7.0 ebx=8<BMI1>
> cpu0: cpuid d.1 eax=1<XSAVEOPT>
> cpu0: cpuid 80000001 edx=2fd3fbff<NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG>
> ecx=1d403
> 7ff<LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TOPEXT
> ,DBKP,PERFTSC,PCTRL3>
> cpu0: cpuid 80000007 edx=33d9<HWPSTATE,ITSC>
> cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 2-way I-cache, 2MB 64b/line
> 16
> -way L2 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 99MHz
> cpu0: mwait min=64, max=64, IBE
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: AMD GX-412TC SOC, 998.24 MHz, 16-30-01, patch 07030105
> cpu1: smt 0, core 1, package 0
> cpu2 at mainbus0: apid 2 (application processor)
> cpu2: AMD GX-412TC SOC, 998.33 MHz, 16-30-01, patch 07030105
> cpu2: smt 0, core 2, package 0
> cpu3 at mainbus0: apid 3 (application processor)
> cpu3: AMD GX-412TC SOC, 998.52 MHz, 16-30-01, patch 07030105
> cpu3: smt 0, core 3, package 0
> ioapic0 at mainbus0: apid 4 pa 0xfec00000, version 21, 24 pins
> ioapic1 at mainbus0: apid 5 pa 0xfec20000, version 21, 32 pins
> acpihpet0 at acpi0: 14318180 Hz
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpiprt1 at acpi0: bus 1 (PBR4)
> acpiprt2 at acpi0: bus 2 (PBR5)
> acpiprt3 at acpi0: bus 3 (PBR6)
> acpiprt4 at acpi0: bus 4 (PBR7)
> acpiprt5 at acpi0: bus -1 (PBR8)
> acpicpu0 at acpi0: C2(0@400 io@0x1771), C1(@1 halt!), PSS
> acpicpu1 at acpi0: C2(0@400 io@0x1771), C1(@1 halt!), PSS
> acpicpu2 at acpi0: C2(0@400 io@0x1771), C1(@1 halt!), PSS
> acpicpu3 at acpi0: C2(0@400 io@0x1771), C1(@1 halt!), PSS
> acpipci0 at acpi0 PCI0: 0x00000000 0x00000011 0x00000001
> acpicmos0 at acpi0
> com0 at acpi0 COM1 addr 0x3f8/0x8 irq 4: ns16550a, 16 byte fifo
> com0: console
> com1 at acpi0 COM2 addr 0x2f8/0x8 irq 3: ns16550a, 16 byte fifo
> amdgpio0 at acpi0 GPIO uid 0 addr 0xfed81500/0x300 irq 7, 184 pins
> "PRP0001" at acpi0 not configured
> "PRP0001" at acpi0 not configured
> "PRP0001" at acpi0 not configured
> "PRP0001" at acpi0 not configured
> "PRP0001" at acpi0 not configured
> "PRP0001" at acpi0 not configured
> "BOOT0000" at acpi0 not configured
> acpitz0 at acpi0: critical temperature is 115 degC
> cpu0: 998 MHz: speeds: 1000 800 600 MHz
> pci0 at mainbus0 bus 0
> pchb0 at pci0 dev 0 function 0 "AMD 16h Root Complex" rev 0x00
> vendor "AMD", unknown product 0x1567 (class system subclass IOMMU, rev 0x00)
> at
> pci0 dev 0 function 2 not configured
> pchb1 at pci0 dev 2 function 0 "AMD 16h Host" rev 0x00
> ppb0 at pci0 dev 2 function 1 "AMD 16h PCIE" rev 0x00: msi
> pci1 at ppb0 bus 1
> em0 at pci1 dev 0 function 0 "Intel I211" rev 0x03: msi, address
> 00:0d:b9:59:e0
> :e4
> ppb1 at pci0 dev 2 function 2 "AMD 16h PCIE" rev 0x00: msi
> pci2 at ppb1 bus 2
> em1 at pci2 dev 0 function 0 "Intel I211" rev 0x03: msi, address
> 00:0d:b9:59:e0
> :e5
> ppb2 at pci0 dev 2 function 3 "AMD 16h PCIE" rev 0x00: msi
> pci3 at ppb2 bus 3
> em2 at pci3 dev 0 function 0 "Intel I211" rev 0x03: msi, address
> 00:0d:b9:59:e0
> :e6
> ppb3 at pci0 dev 2 function 4 "AMD 16h PCIE" rev 0x00: msi
> pci4 at ppb3 bus 4
> em3 at pci4 dev 0 function 0 "Intel I211" rev 0x03: msi, address
> 00:0d:b9:59:e0
> :e7
> ccp0 at pci0 dev 8 function 0 "AMD 16h Crypto" rev 0x00: msix
> xhci0 at pci0 dev 16 function 0 "AMD Bolton xHCI" rev 0x11: msix, xHCI 1.0
> usb0 at xhci0: USB revision 3.0
> uhub0 at usb0 configuration 1 interface 0 "AMD xHCI root hub" rev 3.00/1.00
> add
> r 1
> ahci0 at pci0 dev 17 function 0 "AMD Hudson-2 SATA" rev 0x40: apic 4 int 19,
> AH
> CI 1.3
> ahci0: port 0: 6.0Gb/s
> scsibus1 at ahci0: 32 targets
> sd0 at scsibus1 targ 0 lun 0: <ATA, Hoodisk SSD, SBFM>
> t10.ATA_Hoodisk_SSD_L7DT
> C7A11208345_
> sd0: 15272MB, 512 bytes/sector, 31277232 sectors, thin
> ehci0 at pci0 dev 18 function 0 "AMD Hudson-2 USB2" rev 0x39: apic 4 int 18
> usb1 at ehci0: USB revision 2.0
> uhub1 at usb1 configuration 1 interface 0 "AMD EHCI root hub" rev 2.00/1.00
> add
> r 1
> ehci1 at pci0 dev 19 function 0 "AMD Hudson-2 USB2" rev 0x39: apic 4 int 18
> usb2 at ehci1: USB revision 2.0
> uhub2 at usb2 configuration 1 interface 0 "AMD EHCI root hub" rev 2.00/1.00
> add
> r 1
> piixpm0 at pci0 dev 20 function 0 "AMD Hudson-2 SMBus" rev 0x42: SMI
> iic0 at piixpm0
> iic1 at piixpm0
> iic1: addr 0x4c 3e=00 48=00 4a=00 4e=00 fc=00 fe=00 words 00=ffff 01=ffff
> 02=ff
> ff 03=ffff 04=ffff 05=ffff 06=ffff 07=ffff
> pcib0 at pci0 dev 20 function 3 "AMD Hudson-2 LPC" rev 0x11
> sdhc0 at pci0 dev 20 function 7 "AMD Bolton SD/MMC" rev 0x01: apic 4 int 16
> sdhc0: SDHC 2.00, 50 MHz base clock
> sdmmc0 at sdhc0: 4-bit, sd high-speed, mmc high-speed, dma
> pchb2 at pci0 dev 24 function 0 "AMD 16h Link Cfg" rev 0x00
> pchb3 at pci0 dev 24 function 1 "AMD 16h Address Map" rev 0x00
> pchb4 at pci0 dev 24 function 2 "AMD 16h DRAM Cfg" rev 0x00
> km0 at pci0 dev 24 function 3 "AMD 16h Misc Cfg" rev 0x00
> pchb5 at pci0 dev 24 function 4 "AMD 16h CPU Power" rev 0x00
> pchb6 at pci0 dev 24 function 5 "AMD 16h Misc Cfg" rev 0x00
> isa0 at pcib0
> isadma0 at isa0
> com2 at isa0 port 0x3e8/8 irq 5: ns16550a, 16 byte fifo
> pcppi0 at isa0 port 0x61
> spkr0 at pcppi0
> lpt0 at isa0 port 0x378/4 irq 7
> intr_establish: pic ioapic0 pin 7: can't share type 3 with 2
> wbsio0 at isa0 port 0x2e/2: NCT5104D rev 0x53
> vmm0 at mainbus0: SVM/RVI
> ugen0 at uhub0 port 3 "American Power Conversion Back-UPS CS 350 FW:807.q10
> .I U
> SB FW:q10" rev 1.10/0.06 addr 2
> uhub3 at uhub1 port 1 configuration 1 interface 0 "Advanced Micro Devices
> Hub" r
> ev 2.00/0.18 addr 2
> uhub4 at uhub2 port 1 configuration 1 interface 0 "Advanced Micro Devices
> Hub" r
> ev 2.00/0.18 addr 2
> vscsi0 at root
> scsibus2 at vscsi0: 256 targets
> softraid0 at root
> scsibus3 at softraid0: 256 targets
> root on sd0a (cbb37b39d1463c87.a) swap on sd0b dump on sd0b
>
>
> On Mon, 13 Jan 2025 11:53:11 +0000
> Stuart Henderson <[email protected]> wrote:
>
> > On 2025/01/13 11:53, Stefan Sperling wrote:
> > > On Sun, Jan 12, 2025 at 09:35:03PM +0100, Radek wrote:
> > > > Hi,
> > > > I have two fresh installs of 7.6/amd64 as a router/gateway on APU2 and
> > > > APU4. There is site-to-site IPSec tunnel between them with ~30Mbps
> > > > permamenet traffic. The boxes usually drops into ddb (no kernel panic)
> > > > within a few hours of boot.
> > > >
> > > > I attached dmesgs and ddb console outputs of the boxes.
> > > >
> > > > ### APU2
> > > > ddb{0}> show panic
> > > > the kernel did not panic
> > > >
> > > > ddb{0}> trace
> > > > db_enter() at db_enter+0x14
> > > > comintr(ffff800000098000) at comintr+0x33e
> > ^^
> > >
> > > This looks like sysctl ddb.console is set to 1, and then something
> > > causes a "break" to appear on the serial port which triggers ddb.
> >
> > yes, that is a classic "break" trace.
> >
> > > > rdx 0x3f8
> >
> > + there's your serial port :)
> >
> > Things you can try:
> >
> > - if you have a cable connected to the APU but unplugged at the other
> > end, either try disconnecting it, or plug it in to something
> >
> > - check for a loose connection/intermittent short inside the cable
> >
> > - if it's a long cable, try a shorter one
> >
> > - lower the console port speed
> >
> > To send 'break' you hold the line at 'space' or 'logic 0' condition
> > for longer than the time to transmit a valid character (including
> > stop/start/any parity bits) at the current bitrate.
> >
> > This is detected by the UART on the receiving system, e.g. here is an
> > excerpt from TI's datasheet for 16550 uart
> >
> > "Bit 4: This bit is the Break Interrupt (BI) indicator. Bit 4 is set
> > to a logic 1 whenever the received data input is held in the Spacing
> > (logic 0) state for longer than a full word transmission time (that
> > is, the total time of Start bit + data bits + Parity + Stop bits)."
> >
> > With a standard 8n1 setting, at 115200 that's "longer than about 86
> > microseconds" and at 9600 it's "longer than about 1ms".
> >
> > So at higher speeds then either quite a short glitch, or sending a single
> > char from a device connected to the port at a slower speed e.g. 9600,
> > can be enough to trigger it.
> >
> > In particular I do not recommend 115200 for serial ports on devices
> > which do break detection and 57600 might be a bit high. On my own
> > systems I normally use 9600 for debug console ports as there's not
> > normally that much data sent over them and it's way more robust.
> > You just have to watch out for things that do a bunch of kernel
> > printfs - 'debug' on pppoe(4) for example is not very fun :)
> > On the OpenBSD side, update /etc/boot.conf and /etc/ttys to change
> > this, you'll also have the setting in the APU's bios.
> >
>
>
> --
> Please do not CC me
> Radek
>