On 2025/01/14 04:00, Radek wrote:
> There were 3m null-modem cables conncted to both APUs, the APU4's cable had 
> also a RS232/USB adapter.
> APUs have fixed console baud rate of 115200 and I didn't find the way to 
> change it to lower speed.

You can still set the OpenBSD side to a lower speed, it just means
switching speed if you want to access the BIOS. (I am very surprised
though, I was convinced you could change this, but I don't see a way
to do it without rebuilding firmware, perhaps that was the alix).

> I'm testing only APU4 now. I disconnected the null-modem cable and I set 
> ddb.console to 0.
> After a few hours APU4 drops to ddb again:
> 
> ddb{2}> show panic
> the kernel did not panic

was there some output before the ddb{2} prompt?

> ddb{2}> trace 
> sched_steal_proc(ffff80002d4b7ff0) at sched_steal_proc+0x11c
> sched_chooseproc() at sched_chooseproc+0x1aa

seems strange.

is everything ok with cooling? power?

> mi_switch() at mi_switch+0x1e5
> sched_peg_curproc(ffff80002d4c0ff0) at sched_peg_curproc+0x67
> cpu_hz_update_sensor(ffff80002d4c0ff0) at cpu_hz_update_sensor+0x15
> sensor_task_work(ffff800000030a00) at sensor_task_work+0x51
> taskq_thread(ffff80000008db80) at taskq_thread+0x129
> end trace frame: 0x0, count: -7

> ddb{2}> show register 
> rdi                           0x1000    __ALIGN_SIZE
> rsi                           0x7dc0    __ALIGN_SIZE+0x6dc0
> rbp               0xffff80002d695900
> rbx                                0
> rdx                        0x394dc21    __kernel_phys_end+0xf4dc21
> rcx                                0
> rax                              0xc
> r8                         0xf627043    __kernel_phys_end+0xcc27043
> r9                        0x5e42f67f
> r10                0xcc3bd7032b4f63e
> r11               0x63a9870a5e938412
> r12               0xffff80002d4c0ff0
> r13                       0x7fffffff
> r14               0xffff80002d4b7ff0
> r15                                0
> rip               0xffffffff81e8636c    sched_steal_proc+0x11c
> cs                               0x8
> rflags                       0x10206    __ALIGN_SIZE+0xf206
> rsp               0xffff80002d6958c0
> ss                              0x10
> sched_steal_proc+0x11c: cdqe
> 
> ddb{2}> ps
>    PID     TID   PPID    UID  S       FLAGS  WAIT          COMMAND
>  41911  146814      1      0  3    0x100083  ttyin         getty
>   5762  405322      1      0  3    0x100098  kqread        cron
>  73181  284842      1      0  3        0x80  ugenrintr     apcupsd
>  73181  270787      1      0  3   0x4000088  sigwait       apcupsd
>  73181  486672      1      0  3   0x4000080  netacc        apcupsd
>  58439  119507      1     99  3   0x1100090  kqread        sndiod
>  69152  431640      1    110  3    0x100090  kqread        sndiod
>  31588  226733  19541     95  3   0x1100092  kqread        smtpd
>  34359  221468  19541    103  3   0x1100092  kqread        smtpd
>  98195  498132  19541     95  3   0x1100092  kqread        smtpd
>  86017  459136  19541     95  3    0x100092  kqread        smtpd
>  70895  101640  19541     95  3   0x1100092  kqread        smtpd
>  93103  373510  19541     95  3   0x1100092  kqread        smtpd
>  19541  363543      1      0  3    0x100080  kqread        smtpd
>  19263  467127      1     77  3   0x1100090  kqread        dhcpd
>  96610  325819      1      0  3        0x88  kqread        sshd
>  46440  163929  87714     68  3   0x1000090  kqread        isakmpd
>  87714  108971      1      0  3        0x80  sbwait        isakmpd
>  73323  396657      1      0  3    0x100080  kqread        ntpd
>  38281  201772  34209     83  3    0x100092  kqread        ntpd
>  34209  396498      1     83  3   0x1100092  kqread        ntpd
>  96977  422652      1     53  3   0x1000090  kqread        unbound
>  79026  198934  37215     73  3   0x1100090  kqread        syslogd
>  37215  230033      1      0  3    0x100082  sbwait        syslogd
>  19526  197700      1      0  3    0x100080  kqread        resolvd
>  59258  127015  28251     77  3    0x100092  kqread        dhcpleased
>  61342  136779  28251     77  3    0x100092  kqread        dhcpleased
>  28251  416947      1      0  3        0x80  kqread        dhcpleased
>   3370  165413  49206    115  3    0x100092  kqread        slaacd
>  64831   78796  49206    115  3    0x100092  kqread        slaacd
>  49206  464326      1      0  3    0x100080  kqread        slaacd
>  94171  226931      0      0  3     0x14200  bored         smr
>  89428  262409      0      0  3     0x14200  pgzero        zerothread
>  52956  245859      0      0  3     0x14200  aiodoned      aiodoned
>  54747  256091      0      0  3     0x14200  syncer        update
>   4892   59507      0      0  3     0x14200  cleaner       cleaner
>  82718  198935      0      0  3     0x14200  reaper        reaper
>  21459  261399      0      0  3     0x14200  pgdaemon      pagedaemon
>  41174  416209      0      0  3     0x14200  mmctsk        sdmmc0
>  69111  190214      0      0  3     0x14200  usbtsk        usbtask
>  13632   51893      0      0  3     0x14200  usbatsk       usbatsk
>  43371  179039      0      0  3  0x40014200  acpi0         acpi0
>  98806   21031      0      0  7  0x40014200                idle3
>  86373  483372      0      0  3  0x40014200                idle2
>  78455  458933      0      0  7  0x40014200                idle1
> *13993  484783      0      0  2  0x40014200                sensors
>  62636  436251      0      0  3     0x14200  bored         softnet3
>  56088  519338      0      0  3     0x14200  bored         softnet2
>  67073  169850      0      0  3     0x14200  bored         softnet1
>  18689  250204      0      0  3     0x14200  bored         softnet0
>  15311  500938      0      0  3     0x14200  bored         systqmp
>   9702   36446      0      0  3     0x14200  bored         systq
>  75771  412492      0      0  3     0x14200  tmoslp        softclockmp
>  81164  300625      0      0  3  0x40014200  tmoslp        softclock
>  55664   33044      0      0  7  0x40014200                idle0
>      1  504540      0      0  3        0x82  wait          init
>      0       0     -1      0  3     0x10200  scheduler     swapper
> ddb{2}> mach ddbcpu 0
> Stopped at      x86_ipi_db+0x16:        leave
> ddb{0}> mach ddbcpu 1
> Stopped at      x86_ipi_db+0x16:        leave
> ddb{1}> mach ddbcpu 2
> Stopped at      sched_steal_proc+0x11c: cdqe
> ddb{2}> mach ddbcpu 3
> Stopped at      x86_ipi_db+0x16:        leave
> 
> ddb{3}> dmesg
> OpenBSD 7.6 (GENERIC.MP) #0: Thu Jan  9 07:32:40 MST 2025
>     
> [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.
> MP
> real mem = 4259897344 (4062MB)
> avail mem = 4107575296 (3917MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.0 @ 0xcfe92040 (13 entries)
> bios0: vendor coreboot version "v4.17.0.1" date 06/22/2022
> bios0: PC Engines apu4
> acpi0 at bios0: ACPI 6.0
> acpi0: sleep states S0 S1 S4 S5
> acpi0: tables DSDT FACP SSDT MCFG TPM2 APIC HEST SSDT SSDT DRTM HPET
> acpi0: wakeup devices PBR4(S4) PBR5(S4) PBR6(S4) PBR7(S4) PBR8(S4) UOH1(S3) 
> UOH
> 2(S3) UOH3(S3) UOH4(S3) UOH5(S3) UOH6(S3) XHC0(S4)
> acpitimer0 at acpi0: 3579545 Hz, 32 bits
> acpimcfg0 at acpi0
> acpimcfg0: addr 0xf8000000, bus 0-63
> acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: AMD GX-412TC SOC, 998.18 MHz, 16-30-01, patch 07030105
> cpu0: cpuid 1 
> edx=178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE
> ,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT> 
> ecx=36d8220b<SSE3,PCLMUL,MWAI
> T,SSSE3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C>
> cpu0: cpuid 6 eax=4<ARAT> ecx=1<EFFFREQ>
> cpu0: cpuid 7.0 ebx=8<BMI1>
> cpu0: cpuid d.1 eax=1<XSAVEOPT>
> cpu0: cpuid 80000001 edx=2fd3fbff<NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG> 
> ecx=1d403
> 7ff<LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TOPEXT
> ,DBKP,PERFTSC,PCTRL3>
> cpu0: cpuid 80000007 edx=33d9<HWPSTATE,ITSC>
> cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 2-way I-cache, 2MB 64b/line 
> 16
> -way L2 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 99MHz
> cpu0: mwait min=64, max=64, IBE
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: AMD GX-412TC SOC, 998.24 MHz, 16-30-01, patch 07030105
> cpu1: smt 0, core 1, package 0
> cpu2 at mainbus0: apid 2 (application processor)
> cpu2: AMD GX-412TC SOC, 998.33 MHz, 16-30-01, patch 07030105
> cpu2: smt 0, core 2, package 0
> cpu3 at mainbus0: apid 3 (application processor)
> cpu3: AMD GX-412TC SOC, 998.52 MHz, 16-30-01, patch 07030105
> cpu3: smt 0, core 3, package 0
> ioapic0 at mainbus0: apid 4 pa 0xfec00000, version 21, 24 pins
> ioapic1 at mainbus0: apid 5 pa 0xfec20000, version 21, 32 pins
> acpihpet0 at acpi0: 14318180 Hz
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpiprt1 at acpi0: bus 1 (PBR4)
> acpiprt2 at acpi0: bus 2 (PBR5)
> acpiprt3 at acpi0: bus 3 (PBR6)
> acpiprt4 at acpi0: bus 4 (PBR7)
> acpiprt5 at acpi0: bus -1 (PBR8)
> acpicpu0 at acpi0: C2(0@400 io@0x1771), C1(@1 halt!), PSS
> acpicpu1 at acpi0: C2(0@400 io@0x1771), C1(@1 halt!), PSS
> acpicpu2 at acpi0: C2(0@400 io@0x1771), C1(@1 halt!), PSS
> acpicpu3 at acpi0: C2(0@400 io@0x1771), C1(@1 halt!), PSS
> acpipci0 at acpi0 PCI0: 0x00000000 0x00000011 0x00000001
> acpicmos0 at acpi0
> com0 at acpi0 COM1 addr 0x3f8/0x8 irq 4: ns16550a, 16 byte fifo
> com0: console
> com1 at acpi0 COM2 addr 0x2f8/0x8 irq 3: ns16550a, 16 byte fifo
> amdgpio0 at acpi0 GPIO uid 0 addr 0xfed81500/0x300 irq 7, 184 pins
> "PRP0001" at acpi0 not configured
> "PRP0001" at acpi0 not configured
> "PRP0001" at acpi0 not configured
> "PRP0001" at acpi0 not configured
> "PRP0001" at acpi0 not configured
> "PRP0001" at acpi0 not configured
> "BOOT0000" at acpi0 not configured
> acpitz0 at acpi0: critical temperature is 115 degC
> cpu0: 998 MHz: speeds: 1000 800 600 MHz
> pci0 at mainbus0 bus 0
> pchb0 at pci0 dev 0 function 0 "AMD 16h Root Complex" rev 0x00
> vendor "AMD", unknown product 0x1567 (class system subclass IOMMU, rev 0x00) 
> at
>  pci0 dev 0 function 2 not configured
> pchb1 at pci0 dev 2 function 0 "AMD 16h Host" rev 0x00
> ppb0 at pci0 dev 2 function 1 "AMD 16h PCIE" rev 0x00: msi
> pci1 at ppb0 bus 1
> em0 at pci1 dev 0 function 0 "Intel I211" rev 0x03: msi, address 
> 00:0d:b9:59:e0
> :e4
> ppb1 at pci0 dev 2 function 2 "AMD 16h PCIE" rev 0x00: msi
> pci2 at ppb1 bus 2
> em1 at pci2 dev 0 function 0 "Intel I211" rev 0x03: msi, address 
> 00:0d:b9:59:e0
> :e5
> ppb2 at pci0 dev 2 function 3 "AMD 16h PCIE" rev 0x00: msi
> pci3 at ppb2 bus 3
> em2 at pci3 dev 0 function 0 "Intel I211" rev 0x03: msi, address 
> 00:0d:b9:59:e0
> :e6
> ppb3 at pci0 dev 2 function 4 "AMD 16h PCIE" rev 0x00: msi
> pci4 at ppb3 bus 4
> em3 at pci4 dev 0 function 0 "Intel I211" rev 0x03: msi, address 
> 00:0d:b9:59:e0
> :e7
> ccp0 at pci0 dev 8 function 0 "AMD 16h Crypto" rev 0x00: msix
> xhci0 at pci0 dev 16 function 0 "AMD Bolton xHCI" rev 0x11: msix, xHCI 1.0
> usb0 at xhci0: USB revision 3.0
> uhub0 at usb0 configuration 1 interface 0 "AMD xHCI root hub" rev 3.00/1.00 
> add
> r 1
> ahci0 at pci0 dev 17 function 0 "AMD Hudson-2 SATA" rev 0x40: apic 4 int 19, 
> AH
> CI 1.3
> ahci0: port 0: 6.0Gb/s
> scsibus1 at ahci0: 32 targets
> sd0 at scsibus1 targ 0 lun 0: <ATA, Hoodisk SSD, SBFM> 
> t10.ATA_Hoodisk_SSD_L7DT
> C7A11208345_
> sd0: 15272MB, 512 bytes/sector, 31277232 sectors, thin
> ehci0 at pci0 dev 18 function 0 "AMD Hudson-2 USB2" rev 0x39: apic 4 int 18
> usb1 at ehci0: USB revision 2.0
> uhub1 at usb1 configuration 1 interface 0 "AMD EHCI root hub" rev 2.00/1.00 
> add
> r 1
> ehci1 at pci0 dev 19 function 0 "AMD Hudson-2 USB2" rev 0x39: apic 4 int 18
> usb2 at ehci1: USB revision 2.0
> uhub2 at usb2 configuration 1 interface 0 "AMD EHCI root hub" rev 2.00/1.00 
> add
> r 1
> piixpm0 at pci0 dev 20 function 0 "AMD Hudson-2 SMBus" rev 0x42: SMI
> iic0 at piixpm0
> iic1 at piixpm0
> iic1: addr 0x4c 3e=00 48=00 4a=00 4e=00 fc=00 fe=00 words 00=ffff 01=ffff 
> 02=ff
> ff 03=ffff 04=ffff 05=ffff 06=ffff 07=ffff
> pcib0 at pci0 dev 20 function 3 "AMD Hudson-2 LPC" rev 0x11
> sdhc0 at pci0 dev 20 function 7 "AMD Bolton SD/MMC" rev 0x01: apic 4 int 16
> sdhc0: SDHC 2.00, 50 MHz base clock
> sdmmc0 at sdhc0: 4-bit, sd high-speed, mmc high-speed, dma
> pchb2 at pci0 dev 24 function 0 "AMD 16h Link Cfg" rev 0x00
> pchb3 at pci0 dev 24 function 1 "AMD 16h Address Map" rev 0x00
> pchb4 at pci0 dev 24 function 2 "AMD 16h DRAM Cfg" rev 0x00
> km0 at pci0 dev 24 function 3 "AMD 16h Misc Cfg" rev 0x00
> pchb5 at pci0 dev 24 function 4 "AMD 16h CPU Power" rev 0x00
> pchb6 at pci0 dev 24 function 5 "AMD 16h Misc Cfg" rev 0x00
> isa0 at pcib0
> isadma0 at isa0
> com2 at isa0 port 0x3e8/8 irq 5: ns16550a, 16 byte fifo
> pcppi0 at isa0 port 0x61
> spkr0 at pcppi0
> lpt0 at isa0 port 0x378/4 irq 7
> intr_establish: pic ioapic0 pin 7: can't share type 3 with 2
> wbsio0 at isa0 port 0x2e/2: NCT5104D rev 0x53
> vmm0 at mainbus0: SVM/RVI
> ugen0 at uhub0 port 3 "American Power Conversion Back-UPS CS 350 FW:807.q10 
> .I U
> SB FW:q10" rev 1.10/0.06 addr 2
> uhub3 at uhub1 port 1 configuration 1 interface 0 "Advanced Micro Devices 
> Hub" r
> ev 2.00/0.18 addr 2
> uhub4 at uhub2 port 1 configuration 1 interface 0 "Advanced Micro Devices 
> Hub" r
> ev 2.00/0.18 addr 2
> vscsi0 at root
> scsibus2 at vscsi0: 256 targets
> softraid0 at root
> scsibus3 at softraid0: 256 targets
> root on sd0a (cbb37b39d1463c87.a) swap on sd0b dump on sd0b
> 
> 
> On Mon, 13 Jan 2025 11:53:11 +0000
> Stuart Henderson <[email protected]> wrote:
> 
> > On 2025/01/13 11:53, Stefan Sperling wrote:
> > > On Sun, Jan 12, 2025 at 09:35:03PM +0100, Radek wrote:
> > > > Hi,
> > > > I have two fresh installs of 7.6/amd64 as a router/gateway on APU2 and 
> > > > APU4. There is site-to-site IPSec tunnel between them with ~30Mbps 
> > > > permamenet traffic. The boxes usually drops into ddb (no kernel panic) 
> > > > within a few hours of boot.
> > > > 
> > > > I attached dmesgs and ddb console outputs of the boxes.
> > > > 
> > > > ### APU2
> > > > ddb{0}> show panic
> > > > the kernel did not panic
> > > > 
> > > > ddb{0}> trace
> > > > db_enter() at db_enter+0x14
> > > > comintr(ffff800000098000) at comintr+0x33e
> > ^^
> > > 
> > > This looks like sysctl ddb.console is set to 1, and then something
> > > causes a "break" to appear on the serial port which triggers ddb.
> > 
> > yes, that is a classic "break" trace.
> > 
> > > > rdx                            0x3f8
> > 
> > + there's your serial port :)
> > 
> > Things you can try:
> > 
> > - if you have a cable connected to the APU but unplugged at the other
> > end, either try disconnecting it, or plug it in to something
> > 
> > - check for a loose connection/intermittent short inside the cable
> > 
> > - if it's a long cable, try a shorter one
> > 
> > - lower the console port speed
> > 
> > To send 'break' you hold the line at 'space' or 'logic 0' condition
> > for longer than the time to transmit a valid character (including
> > stop/start/any parity bits) at the current bitrate.
> > 
> > This is detected by the UART on the receiving system, e.g. here is an
> > excerpt from TI's datasheet for 16550 uart
> > 
> >     "Bit 4: This bit is the Break Interrupt (BI) indicator. Bit 4 is set
> >     to a logic 1 whenever the received data input is held in the Spacing
> >     (logic 0) state for longer than a full word transmission time (that
> >     is, the total time of Start bit + data bits + Parity + Stop bits)."
> > 
> > With a standard 8n1 setting, at 115200 that's "longer than about 86
> > microseconds" and at 9600 it's "longer than about 1ms".
> > 
> > So at higher speeds then either quite a short glitch, or sending a single
> > char from a device connected to the port at a slower speed e.g. 9600,
> > can be enough to trigger it.
> > 
> > In particular I do not recommend 115200 for serial ports on devices
> > which do break detection and 57600 might be a bit high. On my own
> > systems I normally use 9600 for debug console ports as there's not
> > normally that much data sent over them and it's way more robust.
> > You just have to watch out for things that do a bunch of kernel
> > printfs - 'debug' on pppoe(4) for example is not very fun :)
> > On the OpenBSD side, update /etc/boot.conf and /etc/ttys to change
> > this, you'll also have the setting in the APU's bios.
> > 
> 
> 
> -- 
> Please do not CC me
> Radek
> 

Reply via email to