On 09/08/16 12:26, Momtchil Momtchev wrote:
On 09/08/16 03:03, Darren Tucker wrote:
On Fri, Aug 05, 2016 at 11:56:15AM +1000, Darren Tucker wrote:
On Thu, Aug 04, 2016 at 02:46:44PM +0200, Momtchil Momtchev wrote:
[...] .


Also what I find very puzzling is that lower IRQ rates lead to lower CPU utilization but not higher throughput??

I just completely disabled the interrupt moderation by bypassing the code. 1 packet = 1 IRQ. Now I get a whopping 30k+ IRQ/s and the RX performance is still the same, even a little bit higher - 350 MBit/s - with the pf rules and everything. Now I am definitely CPU-bound - idle is about 5%, the interrupt load is about 110% and the system load is about 80% (the APU has 2 cores). Idle was 50% to 60% with interrupt moderation. I am testing on a remote board so I can't easily disable pf (it does NAT). I will try to properly test it with routing-only. Then I did another test with interrupt moderation and mutithreaded iperf. This way the board goes up to 500 MBit/s and idle is down to practically 0% - which is consistent with your 600 MBit/s result without pf. So the main problem seems to be the interrupt moderation-induced latency which confuses the end-point TCP. I also get slightly higher performance with a larger TCP window. So one should actually aim to transfer less packets per IRQ in order to minimize the latency. 2 packets per IRQ seems a very good compromise. So maybe this explains the 0x5151 value? Where 1 is every second packet?

Reply via email to