On 2025/01/14 18:05, Radek wrote:
> Hi,
>
> On Tue, 14 Jan 2025 12:38:57 +0000
> Stuart Henderson <[email protected]> wrote:
>
> > On 2025/01/14 04:00, Radek wrote:
> > > There were 3m null-modem cables conncted to both APUs, the APU4's cable
> > > had also a RS232/USB adapter.
> > > APUs have fixed console baud rate of 115200 and I didn't find the way to
> > > change it to lower speed.
> >
> > You can still set the OpenBSD side to a lower speed, it just means
> > switching speed if you want to access the BIOS.
> Yep, good idea :)
>
> > (I am very surprised
> > though, I was convinced you could change this, but I don't see a way
> > to do it without rebuilding firmware, perhaps that was the alix).
> >
> > > I'm testing only APU4 now. I disconnected the null-modem cable and I set
> > > ddb.console to 0.
> > > After a few hours APU4 drops to ddb again:
> > >
> > > ddb{2}> show panic
> > > the kernel did not panic
> >
> > was there some output before the ddb{2} prompt?
> The APU wasn't connected to PC until the crash and the first line I got after
> hitting enter on the console was ddb{2}>
ok - then please leave it connected and check that, there may be some
important information.
> >
> > > ddb{2}> trace
> > > sched_steal_proc(ffff80002d4b7ff0) at sched_steal_proc+0x11c
> > > sched_chooseproc() at sched_chooseproc+0x1aa
> >
> > seems strange.
> >
> > is everything ok with cooling? power?
> I think so. The box had over 2 years uptime on 7.2 snapshot [1].
> Nobody touches it, all the cables and power is the same. I only unplugged the
> null model cable - it was connected to the box since I can remember.
> 1. https://marc.info/?l=openbsd-bugs&m=166412911321566&w=2
>
> >
> > > mi_switch() at mi_switch+0x1e5
> > > sched_peg_curproc(ffff80002d4c0ff0) at sched_peg_curproc+0x67
> > > cpu_hz_update_sensor(ffff80002d4c0ff0) at cpu_hz_update_sensor+0x15
> > > sensor_task_work(ffff800000030a00) at sensor_task_work+0x51
> > > taskq_thread(ffff80000008db80) at taskq_thread+0x129
> > > end trace frame: 0x0, count: -7
> >
> > > ddb{2}> show register
> > > rdi 0x1000 __ALIGN_SIZE
> > > rsi 0x7dc0 __ALIGN_SIZE+0x6dc0
> > > rbp 0xffff80002d695900
> > > rbx 0
> > > rdx 0x394dc21 __kernel_phys_end+0xf4dc21
> > > rcx 0
> > > rax 0xc
> > > r8 0xf627043 __kernel_phys_end+0xcc27043
> > > r9 0x5e42f67f
> > > r10 0xcc3bd7032b4f63e
> > > r11 0x63a9870a5e938412
> > > r12 0xffff80002d4c0ff0
> > > r13 0x7fffffff
> > > r14 0xffff80002d4b7ff0
> > > r15 0
> > > rip 0xffffffff81e8636c sched_steal_proc+0x11c
> > > cs 0x8
> > > rflags 0x10206 __ALIGN_SIZE+0xf206
> > > rsp 0xffff80002d6958c0
> > > ss 0x10
> > > sched_steal_proc+0x11c: cdqe
> > >
> > > ddb{2}> ps
> > > PID TID PPID UID S FLAGS WAIT COMMAND
> > > 41911 146814 1 0 3 0x100083 ttyin getty
> > > 5762 405322 1 0 3 0x100098 kqread cron
> > > 73181 284842 1 0 3 0x80 ugenrintr apcupsd
> > > 73181 270787 1 0 3 0x4000088 sigwait apcupsd
> > > 73181 486672 1 0 3 0x4000080 netacc apcupsd
> > > 58439 119507 1 99 3 0x1100090 kqread sndiod
> > > 69152 431640 1 110 3 0x100090 kqread sndiod
> > > 31588 226733 19541 95 3 0x1100092 kqread smtpd
> > > 34359 221468 19541 103 3 0x1100092 kqread smtpd
> > > 98195 498132 19541 95 3 0x1100092 kqread smtpd
> > > 86017 459136 19541 95 3 0x100092 kqread smtpd
> > > 70895 101640 19541 95 3 0x1100092 kqread smtpd
> > > 93103 373510 19541 95 3 0x1100092 kqread smtpd
> > > 19541 363543 1 0 3 0x100080 kqread smtpd
> > > 19263 467127 1 77 3 0x1100090 kqread dhcpd
> > > 96610 325819 1 0 3 0x88 kqread sshd
> > > 46440 163929 87714 68 3 0x1000090 kqread isakmpd
> > > 87714 108971 1 0 3 0x80 sbwait isakmpd
> > > 73323 396657 1 0 3 0x100080 kqread ntpd
> > > 38281 201772 34209 83 3 0x100092 kqread ntpd
> > > 34209 396498 1 83 3 0x1100092 kqread ntpd
> > > 96977 422652 1 53 3 0x1000090 kqread unbound
> > > 79026 198934 37215 73 3 0x1100090 kqread syslogd
> > > 37215 230033 1 0 3 0x100082 sbwait syslogd
> > > 19526 197700 1 0 3 0x100080 kqread resolvd
> > > 59258 127015 28251 77 3 0x100092 kqread dhcpleased
> > > 61342 136779 28251 77 3 0x100092 kqread dhcpleased
> > > 28251 416947 1 0 3 0x80 kqread dhcpleased
> > > 3370 165413 49206 115 3 0x100092 kqread slaacd
> > > 64831 78796 49206 115 3 0x100092 kqread slaacd
> > > 49206 464326 1 0 3 0x100080 kqread slaacd
> > > 94171 226931 0 0 3 0x14200 bored smr
> > > 89428 262409 0 0 3 0x14200 pgzero zerothread
> > > 52956 245859 0 0 3 0x14200 aiodoned aiodoned
> > > 54747 256091 0 0 3 0x14200 syncer update
> > > 4892 59507 0 0 3 0x14200 cleaner cleaner
> > > 82718 198935 0 0 3 0x14200 reaper reaper
> > > 21459 261399 0 0 3 0x14200 pgdaemon pagedaemon
> > > 41174 416209 0 0 3 0x14200 mmctsk sdmmc0
> > > 69111 190214 0 0 3 0x14200 usbtsk usbtask
> > > 13632 51893 0 0 3 0x14200 usbatsk usbatsk
> > > 43371 179039 0 0 3 0x40014200 acpi0 acpi0
> > > 98806 21031 0 0 7 0x40014200 idle3
> > > 86373 483372 0 0 3 0x40014200 idle2
> > > 78455 458933 0 0 7 0x40014200 idle1
> > > *13993 484783 0 0 2 0x40014200 sensors
> > > 62636 436251 0 0 3 0x14200 bored softnet3
> > > 56088 519338 0 0 3 0x14200 bored softnet2
> > > 67073 169850 0 0 3 0x14200 bored softnet1
> > > 18689 250204 0 0 3 0x14200 bored softnet0
> > > 15311 500938 0 0 3 0x14200 bored systqmp
> > > 9702 36446 0 0 3 0x14200 bored systq
> > > 75771 412492 0 0 3 0x14200 tmoslp softclockmp
> > > 81164 300625 0 0 3 0x40014200 tmoslp softclock
> > > 55664 33044 0 0 7 0x40014200 idle0
> > > 1 504540 0 0 3 0x82 wait init
> > > 0 0 -1 0 3 0x10200 scheduler swapper
> > > ddb{2}> mach ddbcpu 0
> > > Stopped at x86_ipi_db+0x16: leave
> > > ddb{0}> mach ddbcpu 1
> > > Stopped at x86_ipi_db+0x16: leave
> > > ddb{1}> mach ddbcpu 2
> > > Stopped at sched_steal_proc+0x11c: cdqe
> > > ddb{2}> mach ddbcpu 3
> > > Stopped at x86_ipi_db+0x16: leave
> > >
> > > ddb{3}> dmesg
> > > OpenBSD 7.6 (GENERIC.MP) #0: Thu Jan 9 07:32:40 MST 2025
> > >
> > > [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.
> > > MP
> > > real mem = 4259897344 (4062MB)
> > > avail mem = 4107575296 (3917MB)
> > > random: good seed from bootblocks
> > > mpath0 at root
> > > scsibus0 at mpath0: 256 targets
> > > mainbus0 at root
> > > bios0 at mainbus0: SMBIOS rev. 3.0 @ 0xcfe92040 (13 entries)
> > > bios0: vendor coreboot version "v4.17.0.1" date 06/22/2022
> > > bios0: PC Engines apu4
> > > acpi0 at bios0: ACPI 6.0
> > > acpi0: sleep states S0 S1 S4 S5
> > > acpi0: tables DSDT FACP SSDT MCFG TPM2 APIC HEST SSDT SSDT DRTM HPET
> > > acpi0: wakeup devices PBR4(S4) PBR5(S4) PBR6(S4) PBR7(S4) PBR8(S4)
> > > UOH1(S3) UOH
> > > 2(S3) UOH3(S3) UOH4(S3) UOH5(S3) UOH6(S3) XHC0(S4)
> > > acpitimer0 at acpi0: 3579545 Hz, 32 bits
> > > acpimcfg0 at acpi0
> > > acpimcfg0: addr 0xf8000000, bus 0-63
> > > acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
> > > cpu0 at mainbus0: apid 0 (boot processor)
> > > cpu0: AMD GX-412TC SOC, 998.18 MHz, 16-30-01, patch 07030105
> > > cpu0: cpuid 1
> > > edx=178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE
> > > ,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT>
> > > ecx=36d8220b<SSE3,PCLMUL,MWAI
> > > T,SSSE3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C>
> > > cpu0: cpuid 6 eax=4<ARAT> ecx=1<EFFFREQ>
> > > cpu0: cpuid 7.0 ebx=8<BMI1>
> > > cpu0: cpuid d.1 eax=1<XSAVEOPT>
> > > cpu0: cpuid 80000001 edx=2fd3fbff<NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG>
> > > ecx=1d403
> > > 7ff<LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TOPEXT
> > > ,DBKP,PERFTSC,PCTRL3>
> > > cpu0: cpuid 80000007 edx=33d9<HWPSTATE,ITSC>
> > > cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 2-way I-cache, 2MB
> > > 64b/line 16
> > > -way L2 cache
> > > cpu0: smt 0, core 0, package 0
> > > mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> > > cpu0: apic clock running at 99MHz
> > > cpu0: mwait min=64, max=64, IBE
> > > cpu1 at mainbus0: apid 1 (application processor)
> > > cpu1: AMD GX-412TC SOC, 998.24 MHz, 16-30-01, patch 07030105
> > > cpu1: smt 0, core 1, package 0
> > > cpu2 at mainbus0: apid 2 (application processor)
> > > cpu2: AMD GX-412TC SOC, 998.33 MHz, 16-30-01, patch 07030105
> > > cpu2: smt 0, core 2, package 0
> > > cpu3 at mainbus0: apid 3 (application processor)
> > > cpu3: AMD GX-412TC SOC, 998.52 MHz, 16-30-01, patch 07030105
> > > cpu3: smt 0, core 3, package 0
> > > ioapic0 at mainbus0: apid 4 pa 0xfec00000, version 21, 24 pins
> > > ioapic1 at mainbus0: apid 5 pa 0xfec20000, version 21, 32 pins
> > > acpihpet0 at acpi0: 14318180 Hz
> > > acpiprt0 at acpi0: bus 0 (PCI0)
> > > acpiprt1 at acpi0: bus 1 (PBR4)
> > > acpiprt2 at acpi0: bus 2 (PBR5)
> > > acpiprt3 at acpi0: bus 3 (PBR6)
> > > acpiprt4 at acpi0: bus 4 (PBR7)
> > > acpiprt5 at acpi0: bus -1 (PBR8)
> > > acpicpu0 at acpi0: C2(0@400 io@0x1771), C1(@1 halt!), PSS
> > > acpicpu1 at acpi0: C2(0@400 io@0x1771), C1(@1 halt!), PSS
> > > acpicpu2 at acpi0: C2(0@400 io@0x1771), C1(@1 halt!), PSS
> > > acpicpu3 at acpi0: C2(0@400 io@0x1771), C1(@1 halt!), PSS
> > > acpipci0 at acpi0 PCI0: 0x00000000 0x00000011 0x00000001
> > > acpicmos0 at acpi0
> > > com0 at acpi0 COM1 addr 0x3f8/0x8 irq 4: ns16550a, 16 byte fifo
> > > com0: console
> > > com1 at acpi0 COM2 addr 0x2f8/0x8 irq 3: ns16550a, 16 byte fifo
> > > amdgpio0 at acpi0 GPIO uid 0 addr 0xfed81500/0x300 irq 7, 184 pins
> > > "PRP0001" at acpi0 not configured
> > > "PRP0001" at acpi0 not configured
> > > "PRP0001" at acpi0 not configured
> > > "PRP0001" at acpi0 not configured
> > > "PRP0001" at acpi0 not configured
> > > "PRP0001" at acpi0 not configured
> > > "BOOT0000" at acpi0 not configured
> > > acpitz0 at acpi0: critical temperature is 115 degC
> > > cpu0: 998 MHz: speeds: 1000 800 600 MHz
> > > pci0 at mainbus0 bus 0
> > > pchb0 at pci0 dev 0 function 0 "AMD 16h Root Complex" rev 0x00
> > > vendor "AMD", unknown product 0x1567 (class system subclass IOMMU, rev
> > > 0x00) at
> > > pci0 dev 0 function 2 not configured
> > > pchb1 at pci0 dev 2 function 0 "AMD 16h Host" rev 0x00
> > > ppb0 at pci0 dev 2 function 1 "AMD 16h PCIE" rev 0x00: msi
> > > pci1 at ppb0 bus 1
> > > em0 at pci1 dev 0 function 0 "Intel I211" rev 0x03: msi, address
> > > 00:0d:b9:59:e0
> > > :e4
> > > ppb1 at pci0 dev 2 function 2 "AMD 16h PCIE" rev 0x00: msi
> > > pci2 at ppb1 bus 2
> > > em1 at pci2 dev 0 function 0 "Intel I211" rev 0x03: msi, address
> > > 00:0d:b9:59:e0
> > > :e5
> > > ppb2 at pci0 dev 2 function 3 "AMD 16h PCIE" rev 0x00: msi
> > > pci3 at ppb2 bus 3
> > > em2 at pci3 dev 0 function 0 "Intel I211" rev 0x03: msi, address
> > > 00:0d:b9:59:e0
> > > :e6
> > > ppb3 at pci0 dev 2 function 4 "AMD 16h PCIE" rev 0x00: msi
> > > pci4 at ppb3 bus 4
> > > em3 at pci4 dev 0 function 0 "Intel I211" rev 0x03: msi, address
> > > 00:0d:b9:59:e0
> > > :e7
> > > ccp0 at pci0 dev 8 function 0 "AMD 16h Crypto" rev 0x00: msix
> > > xhci0 at pci0 dev 16 function 0 "AMD Bolton xHCI" rev 0x11: msix, xHCI 1.0
> > > usb0 at xhci0: USB revision 3.0
> > > uhub0 at usb0 configuration 1 interface 0 "AMD xHCI root hub" rev
> > > 3.00/1.00 add
> > > r 1
> > > ahci0 at pci0 dev 17 function 0 "AMD Hudson-2 SATA" rev 0x40: apic 4 int
> > > 19, AH
> > > CI 1.3
> > > ahci0: port 0: 6.0Gb/s
> > > scsibus1 at ahci0: 32 targets
> > > sd0 at scsibus1 targ 0 lun 0: <ATA, Hoodisk SSD, SBFM>
> > > t10.ATA_Hoodisk_SSD_L7DT
> > > C7A11208345_
> > > sd0: 15272MB, 512 bytes/sector, 31277232 sectors, thin
> > > ehci0 at pci0 dev 18 function 0 "AMD Hudson-2 USB2" rev 0x39: apic 4 int
> > > 18
> > > usb1 at ehci0: USB revision 2.0
> > > uhub1 at usb1 configuration 1 interface 0 "AMD EHCI root hub" rev
> > > 2.00/1.00 add
> > > r 1
> > > ehci1 at pci0 dev 19 function 0 "AMD Hudson-2 USB2" rev 0x39: apic 4 int
> > > 18
> > > usb2 at ehci1: USB revision 2.0
> > > uhub2 at usb2 configuration 1 interface 0 "AMD EHCI root hub" rev
> > > 2.00/1.00 add
> > > r 1
> > > piixpm0 at pci0 dev 20 function 0 "AMD Hudson-2 SMBus" rev 0x42: SMI
> > > iic0 at piixpm0
> > > iic1 at piixpm0
> > > iic1: addr 0x4c 3e=00 48=00 4a=00 4e=00 fc=00 fe=00 words 00=ffff 01=ffff
> > > 02=ff
> > > ff 03=ffff 04=ffff 05=ffff 06=ffff 07=ffff
> > > pcib0 at pci0 dev 20 function 3 "AMD Hudson-2 LPC" rev 0x11
> > > sdhc0 at pci0 dev 20 function 7 "AMD Bolton SD/MMC" rev 0x01: apic 4 int
> > > 16
> > > sdhc0: SDHC 2.00, 50 MHz base clock
> > > sdmmc0 at sdhc0: 4-bit, sd high-speed, mmc high-speed, dma
> > > pchb2 at pci0 dev 24 function 0 "AMD 16h Link Cfg" rev 0x00
> > > pchb3 at pci0 dev 24 function 1 "AMD 16h Address Map" rev 0x00
> > > pchb4 at pci0 dev 24 function 2 "AMD 16h DRAM Cfg" rev 0x00
> > > km0 at pci0 dev 24 function 3 "AMD 16h Misc Cfg" rev 0x00
> > > pchb5 at pci0 dev 24 function 4 "AMD 16h CPU Power" rev 0x00
> > > pchb6 at pci0 dev 24 function 5 "AMD 16h Misc Cfg" rev 0x00
> > > isa0 at pcib0
> > > isadma0 at isa0
> > > com2 at isa0 port 0x3e8/8 irq 5: ns16550a, 16 byte fifo
> > > pcppi0 at isa0 port 0x61
> > > spkr0 at pcppi0
> > > lpt0 at isa0 port 0x378/4 irq 7
> > > intr_establish: pic ioapic0 pin 7: can't share type 3 with 2
> > > wbsio0 at isa0 port 0x2e/2: NCT5104D rev 0x53
> > > vmm0 at mainbus0: SVM/RVI
> > > ugen0 at uhub0 port 3 "American Power Conversion Back-UPS CS 350
> > > FW:807.q10 .I U
> > > SB FW:q10" rev 1.10/0.06 addr 2
> > > uhub3 at uhub1 port 1 configuration 1 interface 0 "Advanced Micro Devices
> > > Hub" r
> > > ev 2.00/0.18 addr 2
> > > uhub4 at uhub2 port 1 configuration 1 interface 0 "Advanced Micro Devices
> > > Hub" r
> > > ev 2.00/0.18 addr 2
> > > vscsi0 at root
> > > scsibus2 at vscsi0: 256 targets
> > > softraid0 at root
> > > scsibus3 at softraid0: 256 targets
> > > root on sd0a (cbb37b39d1463c87.a) swap on sd0b dump on sd0b
> > >
> > >
> > > On Mon, 13 Jan 2025 11:53:11 +0000
> > > Stuart Henderson <[email protected]> wrote:
> > >
> > > > On 2025/01/13 11:53, Stefan Sperling wrote:
> > > > > On Sun, Jan 12, 2025 at 09:35:03PM +0100, Radek wrote:
> > > > > > Hi,
> > > > > > I have two fresh installs of 7.6/amd64 as a router/gateway on APU2
> > > > > > and APU4. There is site-to-site IPSec tunnel between them with
> > > > > > ~30Mbps permamenet traffic. The boxes usually drops into ddb (no
> > > > > > kernel panic) within a few hours of boot.
> > > > > >
> > > > > > I attached dmesgs and ddb console outputs of the boxes.
> > > > > >
> > > > > > ### APU2
> > > > > > ddb{0}> show panic
> > > > > > the kernel did not panic
> > > > > >
> > > > > > ddb{0}> trace
> > > > > > db_enter() at db_enter+0x14
> > > > > > comintr(ffff800000098000) at comintr+0x33e
> > > > ^^
> > > > >
> > > > > This looks like sysctl ddb.console is set to 1, and then something
> > > > > causes a "break" to appear on the serial port which triggers ddb.
> > > >
> > > > yes, that is a classic "break" trace.
> > > >
> > > > > > rdx 0x3f8
> > > >
> > > > + there's your serial port :)
> > > >
> > > > Things you can try:
> > > >
> > > > - if you have a cable connected to the APU but unplugged at the other
> > > > end, either try disconnecting it, or plug it in to something
> > > >
> > > > - check for a loose connection/intermittent short inside the cable
> > > >
> > > > - if it's a long cable, try a shorter one
> > > >
> > > > - lower the console port speed
> > > >
> > > > To send 'break' you hold the line at 'space' or 'logic 0' condition
> > > > for longer than the time to transmit a valid character (including
> > > > stop/start/any parity bits) at the current bitrate.
> > > >
> > > > This is detected by the UART on the receiving system, e.g. here is an
> > > > excerpt from TI's datasheet for 16550 uart
> > > >
> > > > "Bit 4: This bit is the Break Interrupt (BI) indicator. Bit 4 is set
> > > > to a logic 1 whenever the received data input is held in the Spacing
> > > > (logic 0) state for longer than a full word transmission time (that
> > > > is, the total time of Start bit + data bits + Parity + Stop bits)."
> > > >
> > > > With a standard 8n1 setting, at 115200 that's "longer than about 86
> > > > microseconds" and at 9600 it's "longer than about 1ms".
> > > >
> > > > So at higher speeds then either quite a short glitch, or sending a
> > > > single
> > > > char from a device connected to the port at a slower speed e.g. 9600,
> > > > can be enough to trigger it.
> > > >
> > > > In particular I do not recommend 115200 for serial ports on devices
> > > > which do break detection and 57600 might be a bit high. On my own
> > > > systems I normally use 9600 for debug console ports as there's not
> > > > normally that much data sent over them and it's way more robust.
> > > > You just have to watch out for things that do a bunch of kernel
> > > > printfs - 'debug' on pppoe(4) for example is not very fun :)
> > > > On the OpenBSD side, update /etc/boot.conf and /etc/ttys to change
> > > > this, you'll also have the setting in the APU's bios.
> > > >
> > >
> > >
> > > --
> > > Please do not CC me
> > > Radek
> > >
> >
>
>
> --
> Please do not CC me
> Radek
>