On 09/08/16 12:26, Momtchil Momtchev wrote:
On 09/08/16 03:03, Darren Tucker wrote:
On Fri, Aug 05, 2016 at 11:56:15AM +1000, Darren Tucker wrote:
On Thu, Aug 04, 2016 at 02:46:44PM +0200, Momtchil Momtchev wrote:
[...] .
Also what I find very puzzling is that lower IRQ rates lead to
lower CPU utilization but not higher throughput??
I just completely disabled the interrupt moderation by bypassing
the code. 1 packet = 1 IRQ. Now I get a whopping 30k+ IRQ/s and the RX
performance is still the same, even a little bit higher - 350 MBit/s -
with the pf rules and everything. Now I am definitely CPU-bound - idle
is about 5%, the interrupt load is about 110% and the system load is
about 80% (the APU has 2 cores). Idle was 50% to 60% with interrupt
moderation. I am testing on a remote board so I can't easily disable pf
(it does NAT). I will try to properly test it with routing-only.
Then I did another test with interrupt moderation and mutithreaded
iperf. This way the board goes up to 500 MBit/s and idle is down to
practically 0% - which is consistent with your 600 MBit/s result without
pf. So the main problem seems to be the interrupt moderation-induced
latency which confuses the end-point TCP. I also get slightly higher
performance with a larger TCP window. So one should actually aim to
transfer less packets per IRQ in order to minimize the latency. 2
packets per IRQ seems a very good compromise. So maybe this explains the
0x5151 value? Where 1 is every second packet?