Dear David,

According to my experience, the IPv4/IPv6 packet forwarding performance of OpenBSD is about an order of magnitude lower than that of Linux, if I use a 16-core server.

When I tried to identify the root causes, I found two things:

1. I used an RFC 2544 compliant test with a single IP address pair and RFC 4814 pseudorandom port numbers. However, the interrupts caused by the packet arrivals were processed by two CPU cores (one core per direction), the others did not take part in it. It is so because OpenBSD does not support the setting of the proper RSS (Receive-Side Scaling), please see the details in: https://marc.info/?l=openbsd-misc&m=166581934723445&w=2 If you forward Internet traffic, then you have different IP addresses, thus this one will not be an issue for you.

2. When I checked the CPU utilization using the top command, I found that only 3 CPU cores (out of the 32 CPU cores of my server) had non-zero load: two of them processed interrupts and had about 25-27% CPU utilization, and very likely the third one did the packet forwarding and it had about 90-95% CPU utilization in my particular experiment. That is, very likely the packet forwarding process can use only a single CPU core.

I have saved the output of the top command, now I copy it here:

36 processes: 35 idle, 1 on processor up  0:12
CPU00 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU01 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle *CPU02 states:  0.0% user,  0.0% nice, 93.8% sys,  6.2% spin, 0.0% intr,  0.0% idle* CPU03 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU04 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU05 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU06 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU07 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU08 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle *CPU09 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 25.0% intr, 75.0% idle* CPU10 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU11 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU12 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU13 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU14 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU15 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU16 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU17 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU18 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU19 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU20 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU21 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU22 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU23 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU24 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle *CPU25 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 26.7% intr, 73.3% idle* CPU26 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU27 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU28 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU29 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU30 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle CPU31 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  100% idle
Memory: Real: 32M/1397M act/tot Free: 371G Cache: 712M Swap: 0K/256M

As you can see, I made the lines with non-zero CPU utilization *bold*.

I expect that this issue will be a problem for you, too: the packet forwarding performance of your OpenBSD system will not scale up with the number of CPU cores.

Best regards,

Gábor

On 12/19/2022 5:35 PM, David Hajes wrote:
hi guys,

I have simple PcEngines APU2 router running latest OpenBSD stable.

em0 is WAN (bridge to CaTV modem with 1Gbps/100Mbps connectivity with normal 
ether connectivity with DHCP...no special stuff like PPPoE)

em1-3 is in vether/bridge mode with NAT routing to local network.

I have complained to ISP about speeds because it supposes to run almost 1Gbps.

results (speedtest.net used by ISP for some reason):

800+/85 Mbps measured by ISP technician directly from CaTV modem.
440MBps/85Mbps simple NAT firewall pf.conf based on OpenBSD suggestions
380/80Mbps with my strict firewall rules

I have used following guidehttp://dant.net.ru/calomel/network_performance.html  
No changes, same performance.

Checking out router monitoring

3k packets/s firewall throughput
pf_states lookup max. 12k/s, ~2k/s
CPU bored, max. load 25%
RAM 2.6 GB from 4GB free, swap never used

I am guessing HW is not issue.

Is there any issues with bridging local interfaces, and routing/NAT 
performance, please?

I tried to Google answers, and there is lots of whining but no real info. It 
supposes to run double speed, at least 800Mbps as shown by ISP technicians.

Any suggestions for bottleneck, please?

Regards

DavidH

Sent with [Proton Mail](https://proton.me/) secure email.

Reply via email to