On 2025/01/14 18:05, Radek wrote:
> Hi, 
> 
> On Tue, 14 Jan 2025 12:38:57 +0000
> Stuart Henderson <[email protected]> wrote:
> 
> > On 2025/01/14 04:00, Radek wrote:
> > > There were 3m null-modem cables conncted to both APUs, the APU4's cable 
> > > had also a RS232/USB adapter.
> > > APUs have fixed console baud rate of 115200 and I didn't find the way to 
> > > change it to lower speed.
> > 
> > You can still set the OpenBSD side to a lower speed, it just means
> > switching speed if you want to access the BIOS. 
> Yep, good idea :)
> 
> > (I am very surprised
> > though, I was convinced you could change this, but I don't see a way
> > to do it without rebuilding firmware, perhaps that was the alix).
> > 
> > > I'm testing only APU4 now. I disconnected the null-modem cable and I set 
> > > ddb.console to 0.
> > > After a few hours APU4 drops to ddb again:
> > > 
> > > ddb{2}> show panic
> > > the kernel did not panic
> > 
> > was there some output before the ddb{2} prompt?
> The APU wasn't connected to PC until the crash and the first line I got after 
> hitting enter on the console was ddb{2}>

ok - then please leave it connected and check that, there may be some
important information.

> > 
> > > ddb{2}> trace 
> > > sched_steal_proc(ffff80002d4b7ff0) at sched_steal_proc+0x11c
> > > sched_chooseproc() at sched_chooseproc+0x1aa
> > 
> > seems strange.
> > 
> > is everything ok with cooling? power?
> I think so. The box had over 2 years uptime on 7.2 snapshot [1].
> Nobody touches it, all the cables and power is the same. I only unplugged the 
> null model cable - it was connected to the box since I can remember.
> 1. https://marc.info/?l=openbsd-bugs&m=166412911321566&w=2
> 
> > 
> > > mi_switch() at mi_switch+0x1e5
> > > sched_peg_curproc(ffff80002d4c0ff0) at sched_peg_curproc+0x67
> > > cpu_hz_update_sensor(ffff80002d4c0ff0) at cpu_hz_update_sensor+0x15
> > > sensor_task_work(ffff800000030a00) at sensor_task_work+0x51
> > > taskq_thread(ffff80000008db80) at taskq_thread+0x129
> > > end trace frame: 0x0, count: -7
> > 
> > > ddb{2}> show register 
> > > rdi                           0x1000    __ALIGN_SIZE
> > > rsi                           0x7dc0    __ALIGN_SIZE+0x6dc0
> > > rbp               0xffff80002d695900
> > > rbx                                0
> > > rdx                        0x394dc21    __kernel_phys_end+0xf4dc21
> > > rcx                                0
> > > rax                              0xc
> > > r8                         0xf627043    __kernel_phys_end+0xcc27043
> > > r9                        0x5e42f67f
> > > r10                0xcc3bd7032b4f63e
> > > r11               0x63a9870a5e938412
> > > r12               0xffff80002d4c0ff0
> > > r13                       0x7fffffff
> > > r14               0xffff80002d4b7ff0
> > > r15                                0
> > > rip               0xffffffff81e8636c    sched_steal_proc+0x11c
> > > cs                               0x8
> > > rflags                       0x10206    __ALIGN_SIZE+0xf206
> > > rsp               0xffff80002d6958c0
> > > ss                              0x10
> > > sched_steal_proc+0x11c: cdqe
> > > 
> > > ddb{2}> ps
> > >    PID     TID   PPID    UID  S       FLAGS  WAIT          COMMAND
> > >  41911  146814      1      0  3    0x100083  ttyin         getty
> > >   5762  405322      1      0  3    0x100098  kqread        cron
> > >  73181  284842      1      0  3        0x80  ugenrintr     apcupsd
> > >  73181  270787      1      0  3   0x4000088  sigwait       apcupsd
> > >  73181  486672      1      0  3   0x4000080  netacc        apcupsd
> > >  58439  119507      1     99  3   0x1100090  kqread        sndiod
> > >  69152  431640      1    110  3    0x100090  kqread        sndiod
> > >  31588  226733  19541     95  3   0x1100092  kqread        smtpd
> > >  34359  221468  19541    103  3   0x1100092  kqread        smtpd
> > >  98195  498132  19541     95  3   0x1100092  kqread        smtpd
> > >  86017  459136  19541     95  3    0x100092  kqread        smtpd
> > >  70895  101640  19541     95  3   0x1100092  kqread        smtpd
> > >  93103  373510  19541     95  3   0x1100092  kqread        smtpd
> > >  19541  363543      1      0  3    0x100080  kqread        smtpd
> > >  19263  467127      1     77  3   0x1100090  kqread        dhcpd
> > >  96610  325819      1      0  3        0x88  kqread        sshd
> > >  46440  163929  87714     68  3   0x1000090  kqread        isakmpd
> > >  87714  108971      1      0  3        0x80  sbwait        isakmpd
> > >  73323  396657      1      0  3    0x100080  kqread        ntpd
> > >  38281  201772  34209     83  3    0x100092  kqread        ntpd
> > >  34209  396498      1     83  3   0x1100092  kqread        ntpd
> > >  96977  422652      1     53  3   0x1000090  kqread        unbound
> > >  79026  198934  37215     73  3   0x1100090  kqread        syslogd
> > >  37215  230033      1      0  3    0x100082  sbwait        syslogd
> > >  19526  197700      1      0  3    0x100080  kqread        resolvd
> > >  59258  127015  28251     77  3    0x100092  kqread        dhcpleased
> > >  61342  136779  28251     77  3    0x100092  kqread        dhcpleased
> > >  28251  416947      1      0  3        0x80  kqread        dhcpleased
> > >   3370  165413  49206    115  3    0x100092  kqread        slaacd
> > >  64831   78796  49206    115  3    0x100092  kqread        slaacd
> > >  49206  464326      1      0  3    0x100080  kqread        slaacd
> > >  94171  226931      0      0  3     0x14200  bored         smr
> > >  89428  262409      0      0  3     0x14200  pgzero        zerothread
> > >  52956  245859      0      0  3     0x14200  aiodoned      aiodoned
> > >  54747  256091      0      0  3     0x14200  syncer        update
> > >   4892   59507      0      0  3     0x14200  cleaner       cleaner
> > >  82718  198935      0      0  3     0x14200  reaper        reaper
> > >  21459  261399      0      0  3     0x14200  pgdaemon      pagedaemon
> > >  41174  416209      0      0  3     0x14200  mmctsk        sdmmc0
> > >  69111  190214      0      0  3     0x14200  usbtsk        usbtask
> > >  13632   51893      0      0  3     0x14200  usbatsk       usbatsk
> > >  43371  179039      0      0  3  0x40014200  acpi0         acpi0
> > >  98806   21031      0      0  7  0x40014200                idle3
> > >  86373  483372      0      0  3  0x40014200                idle2
> > >  78455  458933      0      0  7  0x40014200                idle1
> > > *13993  484783      0      0  2  0x40014200                sensors
> > >  62636  436251      0      0  3     0x14200  bored         softnet3
> > >  56088  519338      0      0  3     0x14200  bored         softnet2
> > >  67073  169850      0      0  3     0x14200  bored         softnet1
> > >  18689  250204      0      0  3     0x14200  bored         softnet0
> > >  15311  500938      0      0  3     0x14200  bored         systqmp
> > >   9702   36446      0      0  3     0x14200  bored         systq
> > >  75771  412492      0      0  3     0x14200  tmoslp        softclockmp
> > >  81164  300625      0      0  3  0x40014200  tmoslp        softclock
> > >  55664   33044      0      0  7  0x40014200                idle0
> > >      1  504540      0      0  3        0x82  wait          init
> > >      0       0     -1      0  3     0x10200  scheduler     swapper
> > > ddb{2}> mach ddbcpu 0
> > > Stopped at      x86_ipi_db+0x16:        leave
> > > ddb{0}> mach ddbcpu 1
> > > Stopped at      x86_ipi_db+0x16:        leave
> > > ddb{1}> mach ddbcpu 2
> > > Stopped at      sched_steal_proc+0x11c: cdqe
> > > ddb{2}> mach ddbcpu 3
> > > Stopped at      x86_ipi_db+0x16:        leave
> > > 
> > > ddb{3}> dmesg
> > > OpenBSD 7.6 (GENERIC.MP) #0: Thu Jan  9 07:32:40 MST 2025
> > >     
> > > [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.
> > > MP
> > > real mem = 4259897344 (4062MB)
> > > avail mem = 4107575296 (3917MB)
> > > random: good seed from bootblocks
> > > mpath0 at root
> > > scsibus0 at mpath0: 256 targets
> > > mainbus0 at root
> > > bios0 at mainbus0: SMBIOS rev. 3.0 @ 0xcfe92040 (13 entries)
> > > bios0: vendor coreboot version "v4.17.0.1" date 06/22/2022
> > > bios0: PC Engines apu4
> > > acpi0 at bios0: ACPI 6.0
> > > acpi0: sleep states S0 S1 S4 S5
> > > acpi0: tables DSDT FACP SSDT MCFG TPM2 APIC HEST SSDT SSDT DRTM HPET
> > > acpi0: wakeup devices PBR4(S4) PBR5(S4) PBR6(S4) PBR7(S4) PBR8(S4) 
> > > UOH1(S3) UOH
> > > 2(S3) UOH3(S3) UOH4(S3) UOH5(S3) UOH6(S3) XHC0(S4)
> > > acpitimer0 at acpi0: 3579545 Hz, 32 bits
> > > acpimcfg0 at acpi0
> > > acpimcfg0: addr 0xf8000000, bus 0-63
> > > acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
> > > cpu0 at mainbus0: apid 0 (boot processor)
> > > cpu0: AMD GX-412TC SOC, 998.18 MHz, 16-30-01, patch 07030105
> > > cpu0: cpuid 1 
> > > edx=178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE
> > > ,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT> 
> > > ecx=36d8220b<SSE3,PCLMUL,MWAI
> > > T,SSSE3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C>
> > > cpu0: cpuid 6 eax=4<ARAT> ecx=1<EFFFREQ>
> > > cpu0: cpuid 7.0 ebx=8<BMI1>
> > > cpu0: cpuid d.1 eax=1<XSAVEOPT>
> > > cpu0: cpuid 80000001 edx=2fd3fbff<NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG> 
> > > ecx=1d403
> > > 7ff<LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TOPEXT
> > > ,DBKP,PERFTSC,PCTRL3>
> > > cpu0: cpuid 80000007 edx=33d9<HWPSTATE,ITSC>
> > > cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 2-way I-cache, 2MB 
> > > 64b/line 16
> > > -way L2 cache
> > > cpu0: smt 0, core 0, package 0
> > > mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> > > cpu0: apic clock running at 99MHz
> > > cpu0: mwait min=64, max=64, IBE
> > > cpu1 at mainbus0: apid 1 (application processor)
> > > cpu1: AMD GX-412TC SOC, 998.24 MHz, 16-30-01, patch 07030105
> > > cpu1: smt 0, core 1, package 0
> > > cpu2 at mainbus0: apid 2 (application processor)
> > > cpu2: AMD GX-412TC SOC, 998.33 MHz, 16-30-01, patch 07030105
> > > cpu2: smt 0, core 2, package 0
> > > cpu3 at mainbus0: apid 3 (application processor)
> > > cpu3: AMD GX-412TC SOC, 998.52 MHz, 16-30-01, patch 07030105
> > > cpu3: smt 0, core 3, package 0
> > > ioapic0 at mainbus0: apid 4 pa 0xfec00000, version 21, 24 pins
> > > ioapic1 at mainbus0: apid 5 pa 0xfec20000, version 21, 32 pins
> > > acpihpet0 at acpi0: 14318180 Hz
> > > acpiprt0 at acpi0: bus 0 (PCI0)
> > > acpiprt1 at acpi0: bus 1 (PBR4)
> > > acpiprt2 at acpi0: bus 2 (PBR5)
> > > acpiprt3 at acpi0: bus 3 (PBR6)
> > > acpiprt4 at acpi0: bus 4 (PBR7)
> > > acpiprt5 at acpi0: bus -1 (PBR8)
> > > acpicpu0 at acpi0: C2(0@400 io@0x1771), C1(@1 halt!), PSS
> > > acpicpu1 at acpi0: C2(0@400 io@0x1771), C1(@1 halt!), PSS
> > > acpicpu2 at acpi0: C2(0@400 io@0x1771), C1(@1 halt!), PSS
> > > acpicpu3 at acpi0: C2(0@400 io@0x1771), C1(@1 halt!), PSS
> > > acpipci0 at acpi0 PCI0: 0x00000000 0x00000011 0x00000001
> > > acpicmos0 at acpi0
> > > com0 at acpi0 COM1 addr 0x3f8/0x8 irq 4: ns16550a, 16 byte fifo
> > > com0: console
> > > com1 at acpi0 COM2 addr 0x2f8/0x8 irq 3: ns16550a, 16 byte fifo
> > > amdgpio0 at acpi0 GPIO uid 0 addr 0xfed81500/0x300 irq 7, 184 pins
> > > "PRP0001" at acpi0 not configured
> > > "PRP0001" at acpi0 not configured
> > > "PRP0001" at acpi0 not configured
> > > "PRP0001" at acpi0 not configured
> > > "PRP0001" at acpi0 not configured
> > > "PRP0001" at acpi0 not configured
> > > "BOOT0000" at acpi0 not configured
> > > acpitz0 at acpi0: critical temperature is 115 degC
> > > cpu0: 998 MHz: speeds: 1000 800 600 MHz
> > > pci0 at mainbus0 bus 0
> > > pchb0 at pci0 dev 0 function 0 "AMD 16h Root Complex" rev 0x00
> > > vendor "AMD", unknown product 0x1567 (class system subclass IOMMU, rev 
> > > 0x00) at
> > >  pci0 dev 0 function 2 not configured
> > > pchb1 at pci0 dev 2 function 0 "AMD 16h Host" rev 0x00
> > > ppb0 at pci0 dev 2 function 1 "AMD 16h PCIE" rev 0x00: msi
> > > pci1 at ppb0 bus 1
> > > em0 at pci1 dev 0 function 0 "Intel I211" rev 0x03: msi, address 
> > > 00:0d:b9:59:e0
> > > :e4
> > > ppb1 at pci0 dev 2 function 2 "AMD 16h PCIE" rev 0x00: msi
> > > pci2 at ppb1 bus 2
> > > em1 at pci2 dev 0 function 0 "Intel I211" rev 0x03: msi, address 
> > > 00:0d:b9:59:e0
> > > :e5
> > > ppb2 at pci0 dev 2 function 3 "AMD 16h PCIE" rev 0x00: msi
> > > pci3 at ppb2 bus 3
> > > em2 at pci3 dev 0 function 0 "Intel I211" rev 0x03: msi, address 
> > > 00:0d:b9:59:e0
> > > :e6
> > > ppb3 at pci0 dev 2 function 4 "AMD 16h PCIE" rev 0x00: msi
> > > pci4 at ppb3 bus 4
> > > em3 at pci4 dev 0 function 0 "Intel I211" rev 0x03: msi, address 
> > > 00:0d:b9:59:e0
> > > :e7
> > > ccp0 at pci0 dev 8 function 0 "AMD 16h Crypto" rev 0x00: msix
> > > xhci0 at pci0 dev 16 function 0 "AMD Bolton xHCI" rev 0x11: msix, xHCI 1.0
> > > usb0 at xhci0: USB revision 3.0
> > > uhub0 at usb0 configuration 1 interface 0 "AMD xHCI root hub" rev 
> > > 3.00/1.00 add
> > > r 1
> > > ahci0 at pci0 dev 17 function 0 "AMD Hudson-2 SATA" rev 0x40: apic 4 int 
> > > 19, AH
> > > CI 1.3
> > > ahci0: port 0: 6.0Gb/s
> > > scsibus1 at ahci0: 32 targets
> > > sd0 at scsibus1 targ 0 lun 0: <ATA, Hoodisk SSD, SBFM> 
> > > t10.ATA_Hoodisk_SSD_L7DT
> > > C7A11208345_
> > > sd0: 15272MB, 512 bytes/sector, 31277232 sectors, thin
> > > ehci0 at pci0 dev 18 function 0 "AMD Hudson-2 USB2" rev 0x39: apic 4 int 
> > > 18
> > > usb1 at ehci0: USB revision 2.0
> > > uhub1 at usb1 configuration 1 interface 0 "AMD EHCI root hub" rev 
> > > 2.00/1.00 add
> > > r 1
> > > ehci1 at pci0 dev 19 function 0 "AMD Hudson-2 USB2" rev 0x39: apic 4 int 
> > > 18
> > > usb2 at ehci1: USB revision 2.0
> > > uhub2 at usb2 configuration 1 interface 0 "AMD EHCI root hub" rev 
> > > 2.00/1.00 add
> > > r 1
> > > piixpm0 at pci0 dev 20 function 0 "AMD Hudson-2 SMBus" rev 0x42: SMI
> > > iic0 at piixpm0
> > > iic1 at piixpm0
> > > iic1: addr 0x4c 3e=00 48=00 4a=00 4e=00 fc=00 fe=00 words 00=ffff 01=ffff 
> > > 02=ff
> > > ff 03=ffff 04=ffff 05=ffff 06=ffff 07=ffff
> > > pcib0 at pci0 dev 20 function 3 "AMD Hudson-2 LPC" rev 0x11
> > > sdhc0 at pci0 dev 20 function 7 "AMD Bolton SD/MMC" rev 0x01: apic 4 int 
> > > 16
> > > sdhc0: SDHC 2.00, 50 MHz base clock
> > > sdmmc0 at sdhc0: 4-bit, sd high-speed, mmc high-speed, dma
> > > pchb2 at pci0 dev 24 function 0 "AMD 16h Link Cfg" rev 0x00
> > > pchb3 at pci0 dev 24 function 1 "AMD 16h Address Map" rev 0x00
> > > pchb4 at pci0 dev 24 function 2 "AMD 16h DRAM Cfg" rev 0x00
> > > km0 at pci0 dev 24 function 3 "AMD 16h Misc Cfg" rev 0x00
> > > pchb5 at pci0 dev 24 function 4 "AMD 16h CPU Power" rev 0x00
> > > pchb6 at pci0 dev 24 function 5 "AMD 16h Misc Cfg" rev 0x00
> > > isa0 at pcib0
> > > isadma0 at isa0
> > > com2 at isa0 port 0x3e8/8 irq 5: ns16550a, 16 byte fifo
> > > pcppi0 at isa0 port 0x61
> > > spkr0 at pcppi0
> > > lpt0 at isa0 port 0x378/4 irq 7
> > > intr_establish: pic ioapic0 pin 7: can't share type 3 with 2
> > > wbsio0 at isa0 port 0x2e/2: NCT5104D rev 0x53
> > > vmm0 at mainbus0: SVM/RVI
> > > ugen0 at uhub0 port 3 "American Power Conversion Back-UPS CS 350 
> > > FW:807.q10 .I U
> > > SB FW:q10" rev 1.10/0.06 addr 2
> > > uhub3 at uhub1 port 1 configuration 1 interface 0 "Advanced Micro Devices 
> > > Hub" r
> > > ev 2.00/0.18 addr 2
> > > uhub4 at uhub2 port 1 configuration 1 interface 0 "Advanced Micro Devices 
> > > Hub" r
> > > ev 2.00/0.18 addr 2
> > > vscsi0 at root
> > > scsibus2 at vscsi0: 256 targets
> > > softraid0 at root
> > > scsibus3 at softraid0: 256 targets
> > > root on sd0a (cbb37b39d1463c87.a) swap on sd0b dump on sd0b
> > > 
> > > 
> > > On Mon, 13 Jan 2025 11:53:11 +0000
> > > Stuart Henderson <[email protected]> wrote:
> > > 
> > > > On 2025/01/13 11:53, Stefan Sperling wrote:
> > > > > On Sun, Jan 12, 2025 at 09:35:03PM +0100, Radek wrote:
> > > > > > Hi,
> > > > > > I have two fresh installs of 7.6/amd64 as a router/gateway on APU2 
> > > > > > and APU4. There is site-to-site IPSec tunnel between them with 
> > > > > > ~30Mbps permamenet traffic. The boxes usually drops into ddb (no 
> > > > > > kernel panic) within a few hours of boot.
> > > > > > 
> > > > > > I attached dmesgs and ddb console outputs of the boxes.
> > > > > > 
> > > > > > ### APU2
> > > > > > ddb{0}> show panic
> > > > > > the kernel did not panic
> > > > > > 
> > > > > > ddb{0}> trace
> > > > > > db_enter() at db_enter+0x14
> > > > > > comintr(ffff800000098000) at comintr+0x33e
> > > > ^^
> > > > > 
> > > > > This looks like sysctl ddb.console is set to 1, and then something
> > > > > causes a "break" to appear on the serial port which triggers ddb.
> > > > 
> > > > yes, that is a classic "break" trace.
> > > > 
> > > > > > rdx                            0x3f8
> > > > 
> > > > + there's your serial port :)
> > > > 
> > > > Things you can try:
> > > > 
> > > > - if you have a cable connected to the APU but unplugged at the other
> > > > end, either try disconnecting it, or plug it in to something
> > > > 
> > > > - check for a loose connection/intermittent short inside the cable
> > > > 
> > > > - if it's a long cable, try a shorter one
> > > > 
> > > > - lower the console port speed
> > > > 
> > > > To send 'break' you hold the line at 'space' or 'logic 0' condition
> > > > for longer than the time to transmit a valid character (including
> > > > stop/start/any parity bits) at the current bitrate.
> > > > 
> > > > This is detected by the UART on the receiving system, e.g. here is an
> > > > excerpt from TI's datasheet for 16550 uart
> > > > 
> > > >     "Bit 4: This bit is the Break Interrupt (BI) indicator. Bit 4 is set
> > > >     to a logic 1 whenever the received data input is held in the Spacing
> > > >     (logic 0) state for longer than a full word transmission time (that
> > > >     is, the total time of Start bit + data bits + Parity + Stop bits)."
> > > > 
> > > > With a standard 8n1 setting, at 115200 that's "longer than about 86
> > > > microseconds" and at 9600 it's "longer than about 1ms".
> > > > 
> > > > So at higher speeds then either quite a short glitch, or sending a 
> > > > single
> > > > char from a device connected to the port at a slower speed e.g. 9600,
> > > > can be enough to trigger it.
> > > > 
> > > > In particular I do not recommend 115200 for serial ports on devices
> > > > which do break detection and 57600 might be a bit high. On my own
> > > > systems I normally use 9600 for debug console ports as there's not
> > > > normally that much data sent over them and it's way more robust.
> > > > You just have to watch out for things that do a bunch of kernel
> > > > printfs - 'debug' on pppoe(4) for example is not very fun :)
> > > > On the OpenBSD side, update /etc/boot.conf and /etc/ttys to change
> > > > this, you'll also have the setting in the APU's bios.
> > > > 
> > > 
> > > 
> > > -- 
> > > Please do not CC me
> > > Radek
> > > 
> > 
> 
> 
> -- 
> Please do not CC me
> Radek
> 

Reply via email to