Re: [RFC] e1000 performance patch

Robin Humble Wed, 26 Apr 2006 19:44:16 -0700

Hi Rick,

thanks for your comments.

On Wed, Apr 26, 2006 at 03:26:17PM -0700, Rick Jones wrote:
>Robin Humble wrote:
>>attached is a small patch for e1000 that dynamically changes Interrupt
>>Throttle Rate for best performance - both latency and bandwidth.
>>it makes e1000 look really good on netpipe with a ~28 us latency and
>>890 Mbit/s bandwidth.
>>
>>the basic idea is that high InterruptThrottleRate (~200k) is best for
>>small messages, 
>Best for small numbers of small messages?  If one is looking to have 
>high aggregate small packet rates, the higher throttle rate may degrade 
>the peak PPS one can achieve.

if small is <1kB, and there's a single client, then it looks to me like
the higher ITR the better.
for a single netpipe client (running 10k repetitions and from 0 byte to
1kB messages), the driver chooses 200k ITR until it gets close to 1kB
messages, when it drops to its next level of 90k ITR. about 15-20% cpu
is used.

<short delay whilst I run some tests>

for 3 netpipe clients (again running 10k repetitions and from 0 byte to
1kB messages, all with the patched e1000 driver), the server is at 200k
ITR until the 3 clients get to ~96 bytes, then it drops to 90k ITR, and
at ~512 byte messages it drops the ITR once more to 30k.

so I think the patched driver is doing the right thing there and
lowering the ITR more rapidly as it gets more clients.

but clearly I should be using netperf to get more accurate cpu numbers
and a more convincing aggregate table :-)

>It is a bit rough/messy as a writeup, but here is what I've seen wrt the 
>latency vs throughput tradeoffs:
>ftp://ftp.cup.hp.com/dist/networking/briefs/nic_latency_vs_tput.txt

from a quick read it looks like just the case with 32kB messages,
multiple simultaneous clients, and driver set to unlimited ITR sees
reduced throughput. is that right?

if so, then I'm not surprised.
this graph
  http://www.cita.utoronto.ca/mediawiki/index.php/Image:Cpu.100k.png
shows that (for our hardware etc. etc.) at 32kB the cpu usage if one
was using 100k ITR is already excessive, and unlimited ITR would be
worse than that... :-/
so for 32kB messages and a single client (never mind multiple clients)
I'd agree with your study that unlimited ITR is probably not a good
idea.

with a single client doing 32kB messages, my patched driver is probably
doing the right thing as it's at 30k ITR (and at its minimum ITR of
15k with multiple clients doing 32kB messages).

>> uint32_t goc = max(adapter->gotcl, adapter->gorcl) / 1000000;
>> uint32_t itr = goc > 10 ? (goc > 20 ? (goc > 100 ? 15000: 30000): 90000): 
>> 200000;

Hmmmm... I've just noticed that the gotcl/gorcl count is >200M on the
server when 3 clients are doing 32kB netpipes... so I can probably use
goc of > 150 or 200 as a threshold to switch to a lower ITR again.
maybe 3k or 6k...

but overall I'm actually more worried about a mix of small and large
messages than multiple clients.

a large/small mix might well occur in 'the real world' and it'll be 2s
until the watchdog routine can adapt the ITR. potentially that 2s will
be at 200k ITR which is too high for large messages, and up to 2s of
cpu will be burnt needlessly.

can netperf (or some other tool) mix up big and small message sizes
like 'the real world' perhaps does?
that might help me find a good frequency at which to try to adapt the
ITR... (eg. 1, 10, 100 or 1000 times a second)

cheers,
robin
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] e1000 performance patch

Reply via email to