There were 3m null-modem cables conncted to both APUs, the APU4's cable had 
also a RS232/USB adapter.
APUs have fixed console baud rate of 115200 and I didn't find the way to change 
it to lower speed.

I'm testing only APU4 now. I disconnected the null-modem cable and I set 
ddb.console to 0.
After a few hours APU4 drops to ddb again:

ddb{2}> show panic
the kernel did not panic

ddb{2}> trace 
sched_steal_proc(ffff80002d4b7ff0) at sched_steal_proc+0x11c
sched_chooseproc() at sched_chooseproc+0x1aa
mi_switch() at mi_switch+0x1e5
sched_peg_curproc(ffff80002d4c0ff0) at sched_peg_curproc+0x67
cpu_hz_update_sensor(ffff80002d4c0ff0) at cpu_hz_update_sensor+0x15
sensor_task_work(ffff800000030a00) at sensor_task_work+0x51
taskq_thread(ffff80000008db80) at taskq_thread+0x129
end trace frame: 0x0, count: -7

ddb{2}> show register 
rdi                           0x1000    __ALIGN_SIZE
rsi                           0x7dc0    __ALIGN_SIZE+0x6dc0
rbp               0xffff80002d695900
rbx                                0
rdx                        0x394dc21    __kernel_phys_end+0xf4dc21
rcx                                0
rax                              0xc
r8                         0xf627043    __kernel_phys_end+0xcc27043
r9                        0x5e42f67f
r10                0xcc3bd7032b4f63e
r11               0x63a9870a5e938412
r12               0xffff80002d4c0ff0
r13                       0x7fffffff
r14               0xffff80002d4b7ff0
r15                                0
rip               0xffffffff81e8636c    sched_steal_proc+0x11c
cs                               0x8
rflags                       0x10206    __ALIGN_SIZE+0xf206
rsp               0xffff80002d6958c0
ss                              0x10
sched_steal_proc+0x11c: cdqe

ddb{2}> ps
   PID     TID   PPID    UID  S       FLAGS  WAIT          COMMAND
 41911  146814      1      0  3    0x100083  ttyin         getty
  5762  405322      1      0  3    0x100098  kqread        cron
 73181  284842      1      0  3        0x80  ugenrintr     apcupsd
 73181  270787      1      0  3   0x4000088  sigwait       apcupsd
 73181  486672      1      0  3   0x4000080  netacc        apcupsd
 58439  119507      1     99  3   0x1100090  kqread        sndiod
 69152  431640      1    110  3    0x100090  kqread        sndiod
 31588  226733  19541     95  3   0x1100092  kqread        smtpd
 34359  221468  19541    103  3   0x1100092  kqread        smtpd
 98195  498132  19541     95  3   0x1100092  kqread        smtpd
 86017  459136  19541     95  3    0x100092  kqread        smtpd
 70895  101640  19541     95  3   0x1100092  kqread        smtpd
 93103  373510  19541     95  3   0x1100092  kqread        smtpd
 19541  363543      1      0  3    0x100080  kqread        smtpd
 19263  467127      1     77  3   0x1100090  kqread        dhcpd
 96610  325819      1      0  3        0x88  kqread        sshd
 46440  163929  87714     68  3   0x1000090  kqread        isakmpd
 87714  108971      1      0  3        0x80  sbwait        isakmpd
 73323  396657      1      0  3    0x100080  kqread        ntpd
 38281  201772  34209     83  3    0x100092  kqread        ntpd
 34209  396498      1     83  3   0x1100092  kqread        ntpd
 96977  422652      1     53  3   0x1000090  kqread        unbound
 79026  198934  37215     73  3   0x1100090  kqread        syslogd
 37215  230033      1      0  3    0x100082  sbwait        syslogd
 19526  197700      1      0  3    0x100080  kqread        resolvd
 59258  127015  28251     77  3    0x100092  kqread        dhcpleased
 61342  136779  28251     77  3    0x100092  kqread        dhcpleased
 28251  416947      1      0  3        0x80  kqread        dhcpleased
  3370  165413  49206    115  3    0x100092  kqread        slaacd
 64831   78796  49206    115  3    0x100092  kqread        slaacd
 49206  464326      1      0  3    0x100080  kqread        slaacd
 94171  226931      0      0  3     0x14200  bored         smr
 89428  262409      0      0  3     0x14200  pgzero        zerothread
 52956  245859      0      0  3     0x14200  aiodoned      aiodoned
 54747  256091      0      0  3     0x14200  syncer        update
  4892   59507      0      0  3     0x14200  cleaner       cleaner
 82718  198935      0      0  3     0x14200  reaper        reaper
 21459  261399      0      0  3     0x14200  pgdaemon      pagedaemon
 41174  416209      0      0  3     0x14200  mmctsk        sdmmc0
 69111  190214      0      0  3     0x14200  usbtsk        usbtask
 13632   51893      0      0  3     0x14200  usbatsk       usbatsk
 43371  179039      0      0  3  0x40014200  acpi0         acpi0
 98806   21031      0      0  7  0x40014200                idle3
 86373  483372      0      0  3  0x40014200                idle2
 78455  458933      0      0  7  0x40014200                idle1
*13993  484783      0      0  2  0x40014200                sensors
 62636  436251      0      0  3     0x14200  bored         softnet3
 56088  519338      0      0  3     0x14200  bored         softnet2
 67073  169850      0      0  3     0x14200  bored         softnet1
 18689  250204      0      0  3     0x14200  bored         softnet0
 15311  500938      0      0  3     0x14200  bored         systqmp
  9702   36446      0      0  3     0x14200  bored         systq
 75771  412492      0      0  3     0x14200  tmoslp        softclockmp
 81164  300625      0      0  3  0x40014200  tmoslp        softclock
 55664   33044      0      0  7  0x40014200                idle0
     1  504540      0      0  3        0x82  wait          init
     0       0     -1      0  3     0x10200  scheduler     swapper
ddb{2}> mach ddbcpu 0
Stopped at      x86_ipi_db+0x16:        leave
ddb{0}> mach ddbcpu 1
Stopped at      x86_ipi_db+0x16:        leave
ddb{1}> mach ddbcpu 2
Stopped at      sched_steal_proc+0x11c: cdqe
ddb{2}> mach ddbcpu 3
Stopped at      x86_ipi_db+0x16:        leave

ddb{3}> dmesg
OpenBSD 7.6 (GENERIC.MP) #0: Thu Jan  9 07:32:40 MST 2025
    [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.
MP
real mem = 4259897344 (4062MB)
avail mem = 4107575296 (3917MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.0 @ 0xcfe92040 (13 entries)
bios0: vendor coreboot version "v4.17.0.1" date 06/22/2022
bios0: PC Engines apu4
acpi0 at bios0: ACPI 6.0
acpi0: sleep states S0 S1 S4 S5
acpi0: tables DSDT FACP SSDT MCFG TPM2 APIC HEST SSDT SSDT DRTM HPET
acpi0: wakeup devices PBR4(S4) PBR5(S4) PBR6(S4) PBR7(S4) PBR8(S4) UOH1(S3) UOH
2(S3) UOH3(S3) UOH4(S3) UOH5(S3) UOH6(S3) XHC0(S4)
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimcfg0 at acpi0
acpimcfg0: addr 0xf8000000, bus 0-63
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD GX-412TC SOC, 998.18 MHz, 16-30-01, patch 07030105
cpu0: cpuid 1 edx=178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE
,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT> ecx=36d8220b<SSE3,PCLMUL,MWAI
T,SSSE3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C>
cpu0: cpuid 6 eax=4<ARAT> ecx=1<EFFFREQ>
cpu0: cpuid 7.0 ebx=8<BMI1>
cpu0: cpuid d.1 eax=1<XSAVEOPT>
cpu0: cpuid 80000001 edx=2fd3fbff<NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG> ecx=1d403
7ff<LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TOPEXT
,DBKP,PERFTSC,PCTRL3>
cpu0: cpuid 80000007 edx=33d9<HWPSTATE,ITSC>
cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 2-way I-cache, 2MB 64b/line 16
-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, IBE
cpu1 at mainbus0: apid 1 (application processor)
cpu1: AMD GX-412TC SOC, 998.24 MHz, 16-30-01, patch 07030105
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 2 (application processor)
cpu2: AMD GX-412TC SOC, 998.33 MHz, 16-30-01, patch 07030105
cpu2: smt 0, core 2, package 0
cpu3 at mainbus0: apid 3 (application processor)
cpu3: AMD GX-412TC SOC, 998.52 MHz, 16-30-01, patch 07030105
cpu3: smt 0, core 3, package 0
ioapic0 at mainbus0: apid 4 pa 0xfec00000, version 21, 24 pins
ioapic1 at mainbus0: apid 5 pa 0xfec20000, version 21, 32 pins
acpihpet0 at acpi0: 14318180 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 1 (PBR4)
acpiprt2 at acpi0: bus 2 (PBR5)
acpiprt3 at acpi0: bus 3 (PBR6)
acpiprt4 at acpi0: bus 4 (PBR7)
acpiprt5 at acpi0: bus -1 (PBR8)
acpicpu0 at acpi0: C2(0@400 io@0x1771), C1(@1 halt!), PSS
acpicpu1 at acpi0: C2(0@400 io@0x1771), C1(@1 halt!), PSS
acpicpu2 at acpi0: C2(0@400 io@0x1771), C1(@1 halt!), PSS
acpicpu3 at acpi0: C2(0@400 io@0x1771), C1(@1 halt!), PSS
acpipci0 at acpi0 PCI0: 0x00000000 0x00000011 0x00000001
acpicmos0 at acpi0
com0 at acpi0 COM1 addr 0x3f8/0x8 irq 4: ns16550a, 16 byte fifo
com0: console
com1 at acpi0 COM2 addr 0x2f8/0x8 irq 3: ns16550a, 16 byte fifo
amdgpio0 at acpi0 GPIO uid 0 addr 0xfed81500/0x300 irq 7, 184 pins
"PRP0001" at acpi0 not configured
"PRP0001" at acpi0 not configured
"PRP0001" at acpi0 not configured
"PRP0001" at acpi0 not configured
"PRP0001" at acpi0 not configured
"PRP0001" at acpi0 not configured
"BOOT0000" at acpi0 not configured
acpitz0 at acpi0: critical temperature is 115 degC
cpu0: 998 MHz: speeds: 1000 800 600 MHz
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "AMD 16h Root Complex" rev 0x00
vendor "AMD", unknown product 0x1567 (class system subclass IOMMU, rev 0x00) at
 pci0 dev 0 function 2 not configured
pchb1 at pci0 dev 2 function 0 "AMD 16h Host" rev 0x00
ppb0 at pci0 dev 2 function 1 "AMD 16h PCIE" rev 0x00: msi
pci1 at ppb0 bus 1
em0 at pci1 dev 0 function 0 "Intel I211" rev 0x03: msi, address 00:0d:b9:59:e0
:e4
ppb1 at pci0 dev 2 function 2 "AMD 16h PCIE" rev 0x00: msi
pci2 at ppb1 bus 2
em1 at pci2 dev 0 function 0 "Intel I211" rev 0x03: msi, address 00:0d:b9:59:e0
:e5
ppb2 at pci0 dev 2 function 3 "AMD 16h PCIE" rev 0x00: msi
pci3 at ppb2 bus 3
em2 at pci3 dev 0 function 0 "Intel I211" rev 0x03: msi, address 00:0d:b9:59:e0
:e6
ppb3 at pci0 dev 2 function 4 "AMD 16h PCIE" rev 0x00: msi
pci4 at ppb3 bus 4
em3 at pci4 dev 0 function 0 "Intel I211" rev 0x03: msi, address 00:0d:b9:59:e0
:e7
ccp0 at pci0 dev 8 function 0 "AMD 16h Crypto" rev 0x00: msix
xhci0 at pci0 dev 16 function 0 "AMD Bolton xHCI" rev 0x11: msix, xHCI 1.0
usb0 at xhci0: USB revision 3.0
uhub0 at usb0 configuration 1 interface 0 "AMD xHCI root hub" rev 3.00/1.00 add
r 1
ahci0 at pci0 dev 17 function 0 "AMD Hudson-2 SATA" rev 0x40: apic 4 int 19, AH
CI 1.3
ahci0: port 0: 6.0Gb/s
scsibus1 at ahci0: 32 targets
sd0 at scsibus1 targ 0 lun 0: <ATA, Hoodisk SSD, SBFM> t10.ATA_Hoodisk_SSD_L7DT
C7A11208345_
sd0: 15272MB, 512 bytes/sector, 31277232 sectors, thin
ehci0 at pci0 dev 18 function 0 "AMD Hudson-2 USB2" rev 0x39: apic 4 int 18
usb1 at ehci0: USB revision 2.0
uhub1 at usb1 configuration 1 interface 0 "AMD EHCI root hub" rev 2.00/1.00 add
r 1
ehci1 at pci0 dev 19 function 0 "AMD Hudson-2 USB2" rev 0x39: apic 4 int 18
usb2 at ehci1: USB revision 2.0
uhub2 at usb2 configuration 1 interface 0 "AMD EHCI root hub" rev 2.00/1.00 add
r 1
piixpm0 at pci0 dev 20 function 0 "AMD Hudson-2 SMBus" rev 0x42: SMI
iic0 at piixpm0
iic1 at piixpm0
iic1: addr 0x4c 3e=00 48=00 4a=00 4e=00 fc=00 fe=00 words 00=ffff 01=ffff 02=ff
ff 03=ffff 04=ffff 05=ffff 06=ffff 07=ffff
pcib0 at pci0 dev 20 function 3 "AMD Hudson-2 LPC" rev 0x11
sdhc0 at pci0 dev 20 function 7 "AMD Bolton SD/MMC" rev 0x01: apic 4 int 16
sdhc0: SDHC 2.00, 50 MHz base clock
sdmmc0 at sdhc0: 4-bit, sd high-speed, mmc high-speed, dma
pchb2 at pci0 dev 24 function 0 "AMD 16h Link Cfg" rev 0x00
pchb3 at pci0 dev 24 function 1 "AMD 16h Address Map" rev 0x00
pchb4 at pci0 dev 24 function 2 "AMD 16h DRAM Cfg" rev 0x00
km0 at pci0 dev 24 function 3 "AMD 16h Misc Cfg" rev 0x00
pchb5 at pci0 dev 24 function 4 "AMD 16h CPU Power" rev 0x00
pchb6 at pci0 dev 24 function 5 "AMD 16h Misc Cfg" rev 0x00
isa0 at pcib0
isadma0 at isa0
com2 at isa0 port 0x3e8/8 irq 5: ns16550a, 16 byte fifo
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
lpt0 at isa0 port 0x378/4 irq 7
intr_establish: pic ioapic0 pin 7: can't share type 3 with 2
wbsio0 at isa0 port 0x2e/2: NCT5104D rev 0x53
vmm0 at mainbus0: SVM/RVI
ugen0 at uhub0 port 3 "American Power Conversion Back-UPS CS 350 FW:807.q10 .I U
SB FW:q10" rev 1.10/0.06 addr 2
uhub3 at uhub1 port 1 configuration 1 interface 0 "Advanced Micro Devices Hub" r
ev 2.00/0.18 addr 2
uhub4 at uhub2 port 1 configuration 1 interface 0 "Advanced Micro Devices Hub" r
ev 2.00/0.18 addr 2
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
root on sd0a (cbb37b39d1463c87.a) swap on sd0b dump on sd0b


On Mon, 13 Jan 2025 11:53:11 +0000
Stuart Henderson <[email protected]> wrote:

> On 2025/01/13 11:53, Stefan Sperling wrote:
> > On Sun, Jan 12, 2025 at 09:35:03PM +0100, Radek wrote:
> > > Hi,
> > > I have two fresh installs of 7.6/amd64 as a router/gateway on APU2 and 
> > > APU4. There is site-to-site IPSec tunnel between them with ~30Mbps 
> > > permamenet traffic. The boxes usually drops into ddb (no kernel panic) 
> > > within a few hours of boot.
> > > 
> > > I attached dmesgs and ddb console outputs of the boxes.
> > > 
> > > ### APU2
> > > ddb{0}> show panic
> > > the kernel did not panic
> > > 
> > > ddb{0}> trace
> > > db_enter() at db_enter+0x14
> > > comintr(ffff800000098000) at comintr+0x33e
> ^^
> > 
> > This looks like sysctl ddb.console is set to 1, and then something
> > causes a "break" to appear on the serial port which triggers ddb.
> 
> yes, that is a classic "break" trace.
> 
> > > rdx                            0x3f8
> 
> + there's your serial port :)
> 
> Things you can try:
> 
> - if you have a cable connected to the APU but unplugged at the other
> end, either try disconnecting it, or plug it in to something
> 
> - check for a loose connection/intermittent short inside the cable
> 
> - if it's a long cable, try a shorter one
> 
> - lower the console port speed
> 
> To send 'break' you hold the line at 'space' or 'logic 0' condition
> for longer than the time to transmit a valid character (including
> stop/start/any parity bits) at the current bitrate.
> 
> This is detected by the UART on the receiving system, e.g. here is an
> excerpt from TI's datasheet for 16550 uart
> 
>     "Bit 4: This bit is the Break Interrupt (BI) indicator. Bit 4 is set
>     to a logic 1 whenever the received data input is held in the Spacing
>     (logic 0) state for longer than a full word transmission time (that
>     is, the total time of Start bit + data bits + Parity + Stop bits)."
> 
> With a standard 8n1 setting, at 115200 that's "longer than about 86
> microseconds" and at 9600 it's "longer than about 1ms".
> 
> So at higher speeds then either quite a short glitch, or sending a single
> char from a device connected to the port at a slower speed e.g. 9600,
> can be enough to trigger it.
> 
> In particular I do not recommend 115200 for serial ports on devices
> which do break detection and 57600 might be a bit high. On my own
> systems I normally use 9600 for debug console ports as there's not
> normally that much data sent over them and it's way more robust.
> You just have to watch out for things that do a bunch of kernel
> printfs - 'debug' on pppoe(4) for example is not very fun :)
> On the OpenBSD side, update /etc/boot.conf and /etc/ttys to change
> this, you'll also have the setting in the APU's bios.
> 


-- 
Please do not CC me
Radek

Reply via email to