It is well known that the APU2/4 underperforms when used as a router with OpenBSD, but I found that the throughput fluctuates quite a bit, and I think it has to do with CPU allocation and interrupts. My trivial setup simulating a home router/gateway:
hostname.em0: dhcp hostname.em1: inet 10.3.2.1 255.255.255.0 pf.conf: pass match out on em0 inet from !(em0:network) to any nat-to (em0) Nothing else is running on the router, and throughput is tested with a simple iperf3 TCP benchmark between linux hosts on each side of the router, capped att 600 Mbit/s to get a stable baseline: iperf3 -b600M (that's just under the maximum throughput I saw, around 620 Mbit/s) I noticed that the speed always starts at a clean 600 Mbit/s, then eventually backs down to 4-500 Mbit/s, then back up again. The interval is on the order of a minute, but varies greatly. Looking at "systat cpu" during the transfer I noticed that during the fast speed, CPU 1 and 2 were busy at 45% each, while CPU 0 was handling interrupts at 25%. CPU User Nice System Spin Interrupt Idle 0 0.0% 0.0% 0.0% 0.4% 25.0% 74.7% 1 0.0% 0.0% 45.3% 1.0% 0.0% 53.7% 2 0.0% 0.0% 44.7% 0.8% 0.0% 54.5% 3 0.0% 0.0% 0.0% 0.0% 0.0% 100% This could go on for seconds up to minutes. Eventually whatever was running on CPU 1 and 2 migrated up to CPU 0, causing the bandwidth to drop down to 4-500 Mbit/s: CPU User Nice System Spin Interrupt Idle 0 0.0% 0.0% 76.8% 1.0% 22.2% 0.0% 1 0.0% 0.0% 0.0% 0.0% 0.0% 100% 2 0.0% 0.0% 0.0% 0.0% 0.0% 100% 3 0.0% 0.0% 0.0% 0.0% 0.0% 100% Now, waiting even further, I saw that the "System" load could sometimes move back to an idle core, and then the speed would get back to 600 Mbit/s again: CPU User Nice System Spin Interrupt Idle 0 0.0% 0.0% 0.2% 0.2% 23.0% 76.6% 1 0.0% 0.0% 99.0% 1.0% 0.0% 0.0% 2 0.0% 0.0% 0.0% 0.0% 0.0% 100% 3 0.0% 0.0% 0.0% 0.0% 0.0% 100% I'm guessing that the interrupts are all tied to CPU 0 in hardware, and that whatever process that handles the networking initially selects one or more random idle core. Then, the system thinks "Aha, we should run these on the same core that handles the interrupts", moves them over, which then starves that core. This tells me that the rumour that OpenBSD can't use more than one core on this little device is not completely true. It works well for a long time initially with the load shared between two cores, while a third handles interrupts. Does this make sense? Is there a way to enforce the "shared cores" behaviour? // Anders