I've dug a bit deeper into this and found that the interface outages occur
when the CPU frequency changes, either by e.g. obsdfreqd(1) doing it or by
manually changing hw.setperf, such that an actual frequency change can be
observed in hw.cpuspeed. I can thus reproduce the problem accurately by
just fiddling about with hw.setperf. The bug manifests mainly on 1000BaseT,
but I have seen signs of it happening (at smaller scale) also on 100BaseTX.

A totally unqualified guess is that the CPU frequency change results in some
sort of timer or interrupt hiccup.

Sometimes re(4) (or other part of the kernel?) gets back on track within a
second, making it look like just a little dip in network performance when
monitoring things with tcpbench. Sometimes it takes 10-15 seconds. Sometimes
the interface goes so hard off the radar that I cannot get it back on track
by down+delete'ing it, or even by unplugging the network cable, forcing me
to reboot.

The problem goes away if I lock the frequency by shutting obsdfreqd(1) down
and invoking apmd -H (or -L). This is obviously not ideal and should not be
considered a solution. Same goes for limiting the interface to 100BaseTX.

On Thu, Sep 19, 2024 at 1:06 PM stolen data <[email protected]> wrote:
>
> To try rule out a hardware problem I booted a Linux distro (running Linux 6.8)
> on the machine, and there the interface is fully stable and performs well.
>
> Here's the complete dmesg:
>
> OpenBSD 7.5 (GENERIC.MP) #2: Mon Sep 16 15:56:43 CEST 2024
>     
> [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> ...

Reply via email to