RE: Poorer networking performance in later kernels?

2016-04-19 Thread Butler, Peter
On Tue, Apr 19, 2016 at 9:54 AM, Butler, Peter  wrote:
>> -Original Message-
>> From: Rick Jones [mailto:rick.jon...@hpe.com]
>> Sent: April-15-16 6:37 PM
>> To: Butler, Peter ; netdev@vger.kernel.org
>> Subject: Re: Poorer networking performance in later kernels?
>>
>> On 04/15/2016 02:02 PM, Butler, Peter wrote:
>>> (Please keep me CC'd to all comments/responses)
>>>
>>> I've tried a kernel upgrade from 3.4.2 to 4.4.0 and see a marked 
>>> drop in networking performance.  Nothing was changed on the test 
>>> systems, other than the kernel itself (and kernel modules).  The 
>>> identical .config used to build the 3.4.2 kernel was brought over 
>>> into the
>>> 4.4.0 kernel source tree, and any configuration differences (e.g. 
>>> new parameters, etc.) were taken as default values.
>>>
>>> The testing was performed on the same actual hardware for both 
>>> kernel versions (i.e. take the existing 3.4.2 physical setup, simply 
>>> boot into the (new) kernel and run the same test).  The netperf 
>>> utility was used for benchmarking and the testing was always 
>>> performed on idle systems.
>>>
>>> TCP testing yielded the following results, where the 4.4.0 kernel 
>>> only got about 1/2 of the throughput:
>>>
>>
>>> Recv Send   Send  Utilization   
>>> Service Demand
>>> Socket   Socket Message Elapsed   Send Recv 
>>> SendRecv
>>> Size Size   SizeTime   Throughput localremote   
>>> local   remote
>>> bytesbytes  bytes   secs.  10^6bits/s % S  % S  
>>> us/KB   us/KB
>>>
>>> 3.4.2 13631488 13631488   895230.01  9370.2910.146.50 
>>> 0.709   0.454
>>> 4.4.0 13631488 13631488   895230.02  5314.039.14 14.31
>>> 1.127   1.765
>>>
>>> SCTP testing yielded the following results, where the 4.4.0 kernel only got 
>>> about 1/3 of the throughput:
>>>
>>> Recv Send   Send  Utilization   
>>> Service Demand
>>> Socket   Socket Message Elapsed   Send Recv 
>>> SendRecv
>>> Size Size   SizeTime   Throughput localremote   
>>> local   remote
>>> bytesbytes  bytes   secs.  10^6bits/s  % S % S  
>>> us/KB   us/KB
>>>
>>> 3.4.2 13631488 13631488   895230.00  2306.2213.8713.19
>>> 3.941   3.747
>>> 4.4.0 13631488 13631488   895230.01   882.7416.8619.14
>>> 12.516  14.210
>>>
>>> The same tests were performed a multitude of time, and are always 
>>> consistent (within a few percent).  I've also tried playing with 
>>> various run-time kernel parameters (/proc/sys/kernel/net/...) on the
>>> 4.4.0 kernel to alleviate the issue but have had no success at all.
>>>
>>> I'm at a loss as to what could possibly account for such a discrepancy...
>>>
>>
>> I suspect I am not alone in being curious about the CPU(s) present in the 
>> systems and the model/whatnot of the NIC being used.  I'm also curious as to 
>> why you have what at first glance seem like absurdly large socket buffer 
>> sizes.
>>
>> That said, it looks like you have some Really Big (tm) increases in service 
>> demand.  Many more CPU cycles being consumed per KB of data transferred.
>>
>> Your message size makes me wonder if you were using a 9000 byte MTU.
>>
>> Perhaps in the move from 3.4.2 to 4.4.0 you lost some or all of the 
>> stateless offloads for your NIC(s)?  Running ethtool -k  on both 
>> ends under both kernels might be good.
>>
>> Also, if you did have a 9000 byte MTU under 3.4.2 are you certain you still 
>> had it under 4.4.0?
>>
>> It would (at least to me) also be interesting to run a TCP_RR test comparing 
>> the two kernels.  TCP_RR (at least with the default request/response size of 
>> one byte) doesn't really care about stateless offloads or MTUs and could 
>> show how much difference there is in basic path length (or I suppose in 
>> interrupt coalescing behaviour if the NIC in question has a mildly dodgy 
>> heuristic for such things).
>>
>> happy benchmarking,
>>
>> rick jones
>>
>
>
> I think the issue is resolved.  I had to r

Re: Poorer networking performance in later kernels?

2016-04-19 Thread David Miller
From: Oliver Hartkopp 
Date: Tue, 19 Apr 2016 17:58:03 +0200

> On 04/19/2016 04:54 PM, Butler, Peter wrote:
> 
>>
>> I think the issue is resolved.  I had to recompile my 4.4.0 kernel
>> with a few options pertaining to the Intel NIC which somehow (?) got
>> left out or otherwise clobbered when I ported my 3.4.2 .config to the
>> 4.4.0 kernel source tree.  With those changes now in I see essentially
>> identical performance with the two kernels.  Sorry for any confusion
>> and/or waste of time here.  My bad.
>>
> 
> Can you please send the relevant changes in the config that caused the
> discussed issue?
> 
> Just in the case other people do a similar kernel upgrade from 3.x to
> a recent kernel and the current defaults lead to this non-optimal
> result.

+1


RE: Poorer networking performance in later kernels?

2016-04-19 Thread Butler, Peter
> I think the issue is resolved.  I had to recompile my 4.4.0 kernel with a few 
> options pertaining to the Intel NIC which somehow (?) got left out or 
> otherwise clobbered when I ported my 3.4.2 .config to the 4.4.0 kernel source 
> tree.  With those changes now in I see essentially identical performance with 
> the two kernels.  Sorry for any confusion and/or waste of time here.  My bad.
>

Can you please send the relevant changes in the config that caused the 
discussed issue?

Just in the case other people do a similar kernel upgrade from 3.x to a recent 
kernel and the current defaults lead to this non-optimal result.

Thanks,
Oliver


Yes I just replied to another email on this thread asking the same thing and 
posted some info.

Cheers,

Peter


Re: Poorer networking performance in later kernels?

2016-04-19 Thread Oliver Hartkopp

On 04/19/2016 04:54 PM, Butler, Peter wrote:



I think the issue is resolved.  I had to recompile my 4.4.0 kernel with a few 
options pertaining to the Intel NIC which somehow (?) got left out or otherwise 
clobbered when I ported my 3.4.2 .config to the 4.4.0 kernel source tree.  With 
those changes now in I see essentially identical performance with the two 
kernels.  Sorry for any confusion and/or waste of time here.  My bad.



Can you please send the relevant changes in the config that caused the 
discussed issue?


Just in the case other people do a similar kernel upgrade from 3.x to a 
recent kernel and the current defaults lead to this non-optimal result.


Thanks,
Oliver



Re: Poorer networking performance in later kernels?

2016-04-19 Thread Josh Hunt
On Tue, Apr 19, 2016 at 9:54 AM, Butler, Peter  wrote:
>> -Original Message-
>> From: Rick Jones [mailto:rick.jon...@hpe.com]
>> Sent: April-15-16 6:37 PM
>> To: Butler, Peter ; netdev@vger.kernel.org
>> Subject: Re: Poorer networking performance in later kernels?
>>
>> On 04/15/2016 02:02 PM, Butler, Peter wrote:
>>> (Please keep me CC'd to all comments/responses)
>>>
>>> I've tried a kernel upgrade from 3.4.2 to 4.4.0 and see a marked drop
>>> in networking performance.  Nothing was changed on the test systems,
>>> other than the kernel itself (and kernel modules).  The identical
>>> .config used to build the 3.4.2 kernel was brought over into the
>>> 4.4.0 kernel source tree, and any configuration differences (e.g. new
>>> parameters, etc.) were taken as default values.
>>>
>>> The testing was performed on the same actual hardware for both kernel
>>> versions (i.e. take the existing 3.4.2 physical setup, simply boot
>>> into the (new) kernel and run the same test).  The netperf utility
>>> was used for benchmarking and the testing was always performed on
>>> idle systems.
>>>
>>> TCP testing yielded the following results, where the 4.4.0 kernel
>>> only got about 1/2 of the throughput:
>>>
>>
>>> Recv Send   Send  Utilization   
>>> Service Demand
>>> Socket   Socket Message Elapsed   Send Recv 
>>> SendRecv
>>> Size Size   SizeTime   Throughput localremote   
>>> local   remote
>>> bytesbytes  bytes   secs.  10^6bits/s % S  % S  
>>> us/KB   us/KB
>>>
>>> 3.4.2 13631488 13631488   895230.01  9370.2910.146.50 
>>> 0.709   0.454
>>> 4.4.0 13631488 13631488   895230.02  5314.039.14 14.31
>>> 1.127   1.765
>>>
>>> SCTP testing yielded the following results, where the 4.4.0 kernel only got 
>>> about 1/3 of the throughput:
>>>
>>> Recv Send   Send  Utilization   
>>> Service Demand
>>> Socket   Socket Message Elapsed   Send Recv 
>>> SendRecv
>>> Size Size   SizeTime   Throughput localremote   
>>> local   remote
>>> bytesbytes  bytes   secs.  10^6bits/s  % S % S  
>>> us/KB   us/KB
>>>
>>> 3.4.2 13631488 13631488   895230.00  2306.2213.8713.19
>>> 3.941   3.747
>>> 4.4.0 13631488 13631488   895230.01   882.7416.8619.14
>>> 12.516  14.210
>>>
>>> The same tests were performed a multitude of time, and are always
>>> consistent (within a few percent).  I've also tried playing with
>>> various run-time kernel parameters (/proc/sys/kernel/net/...) on the
>>> 4.4.0 kernel to alleviate the issue but have had no success at all.
>>>
>>> I'm at a loss as to what could possibly account for such a discrepancy...
>>>
>>
>> I suspect I am not alone in being curious about the CPU(s) present in the 
>> systems and the model/whatnot of the NIC being used.  I'm also curious as to 
>> why you have what at first glance seem like absurdly large socket buffer 
>> sizes.
>>
>> That said, it looks like you have some Really Big (tm) increases in service 
>> demand.  Many more CPU cycles being consumed per KB of data transferred.
>>
>> Your message size makes me wonder if you were using a 9000 byte MTU.
>>
>> Perhaps in the move from 3.4.2 to 4.4.0 you lost some or all of the 
>> stateless offloads for your NIC(s)?  Running ethtool -k  on both 
>> ends under both kernels might be good.
>>
>> Also, if you did have a 9000 byte MTU under 3.4.2 are you certain you still 
>> had it under 4.4.0?
>>
>> It would (at least to me) also be interesting to run a TCP_RR test comparing 
>> the two kernels.  TCP_RR (at least with the default request/response size of 
>> one byte) doesn't really care about stateless offloads or MTUs and could 
>> show how much difference there is in basic path length (or I suppose in 
>> interrupt coalescing behaviour if the NIC in question has a mildly dodgy 
>> heuristic for such things).
>>
>> happy benchmarking,
>>
>> rick jones
>>
>
>
> I think the issue is resolved.  I had to recompile my 4.4.0 kernel with a few 
> options pertaining to the Intel NIC which somehow (?) got left out or 
> otherwise clobbered when I ported my 3.4.2 .config to the 4.4.0 kernel source 
> tree.  With those changes now in I see essentially identical performance with 
> the two kernels.  Sorry for any confusion and/or waste of time here.  My bad.
>
>

Can you share which config options you enabled to get your performance back?

-- 
Josh


RE: Poorer networking performance in later kernels?

2016-04-19 Thread Butler, Peter
> -Original Message-
> From: Rick Jones [mailto:rick.jon...@hpe.com]
> Sent: April-15-16 6:37 PM
> To: Butler, Peter ; netdev@vger.kernel.org
> Subject: Re: Poorer networking performance in later kernels?
>
> On 04/15/2016 02:02 PM, Butler, Peter wrote:
>> (Please keep me CC'd to all comments/responses)
>>
>> I've tried a kernel upgrade from 3.4.2 to 4.4.0 and see a marked drop 
>> in networking performance.  Nothing was changed on the test systems, 
>> other than the kernel itself (and kernel modules).  The identical 
>> .config used to build the 3.4.2 kernel was brought over into the
>> 4.4.0 kernel source tree, and any configuration differences (e.g. new 
>> parameters, etc.) were taken as default values.
>>
>> The testing was performed on the same actual hardware for both kernel 
>> versions (i.e. take the existing 3.4.2 physical setup, simply boot 
>> into the (new) kernel and run the same test).  The netperf utility 
>> was used for benchmarking and the testing was always performed on 
>> idle systems.
>>
>> TCP testing yielded the following results, where the 4.4.0 kernel 
>> only got about 1/2 of the throughput:
>>
>
>> Recv Send   Send  Utilization   
>> Service Demand
>> Socket   Socket Message Elapsed   Send Recv 
>> SendRecv
>> Size Size   SizeTime   Throughput localremote   
>> local   remote
>> bytesbytes  bytes   secs.  10^6bits/s % S  % S  
>> us/KB   us/KB
>>
>> 3.4.2 13631488 13631488   895230.01  9370.2910.146.50 
>> 0.709   0.454
>> 4.4.0 13631488 13631488   895230.02  5314.039.14 14.31
>> 1.127   1.765
>>
>> SCTP testing yielded the following results, where the 4.4.0 kernel only got 
>> about 1/3 of the throughput:
>>
>> Recv Send   Send  Utilization   
>> Service Demand
>> Socket   Socket Message Elapsed   Send Recv 
>> SendRecv
>> Size Size   SizeTime   Throughput localremote   
>> local   remote
>> bytesbytes  bytes   secs.  10^6bits/s  % S % S  
>> us/KB   us/KB
>>
>> 3.4.2 13631488 13631488   895230.00  2306.2213.8713.19
>> 3.941   3.747
>> 4.4.0 13631488 13631488   895230.01   882.7416.8619.14
>> 12.516  14.210
>>
>> The same tests were performed a multitude of time, and are always 
>> consistent (within a few percent).  I've also tried playing with 
>> various run-time kernel parameters (/proc/sys/kernel/net/...) on the
>> 4.4.0 kernel to alleviate the issue but have had no success at all.
>>
>> I'm at a loss as to what could possibly account for such a discrepancy...
>>
>
> I suspect I am not alone in being curious about the CPU(s) present in the 
> systems and the model/whatnot of the NIC being used.  I'm also curious as to 
> why you have what at first glance seem like absurdly large socket buffer 
> sizes.
>
> That said, it looks like you have some Really Big (tm) increases in service 
> demand.  Many more CPU cycles being consumed per KB of data transferred.
>
> Your message size makes me wonder if you were using a 9000 byte MTU.
>
> Perhaps in the move from 3.4.2 to 4.4.0 you lost some or all of the stateless 
> offloads for your NIC(s)?  Running ethtool -k  on both ends under 
> both kernels might be good.
>
> Also, if you did have a 9000 byte MTU under 3.4.2 are you certain you still 
> had it under 4.4.0?
>
> It would (at least to me) also be interesting to run a TCP_RR test comparing 
> the two kernels.  TCP_RR (at least with the default request/response size of 
> one byte) doesn't really care about stateless offloads or MTUs and could show 
> how much difference there is in basic path length (or I suppose in interrupt 
> coalescing behaviour if the NIC in question has a mildly dodgy heuristic for 
> such things).
>
> happy benchmarking,
>
> rick jones
>


I think the issue is resolved.  I had to recompile my 4.4.0 kernel with a few 
options pertaining to the Intel NIC which somehow (?) got left out or otherwise 
clobbered when I ported my 3.4.2 .config to the 4.4.0 kernel source tree.  With 
those changes now in I see essentially identical performance with the two 
kernels.  Sorry for any confusion and/or waste of time here.  My bad.




Re: Poorer networking performance in later kernels?

2016-04-18 Thread Rick Jones

On 04/18/2016 04:27 AM, Butler, Peter wrote:

Hi Rick

Thanks for the reply.

Here is some hardware information, as requested (the two systems are
identical, and are communicating with one another over a 10GB
full-duplex Ethernet backplane):

- processor type: Intel(R) Xeon(R) CPU C5528  @ 2.13GHz
- NIC: Intel 82599EB 10GB XAUI/BX4
- NIC driver: ixgbe version 4.2.1-k (part of 4.4.0 kernel)

As for the buffer sizes, those rather large ones work fine for us
with the 3.4.2 kernel.  However, for the sake of being complete, I
have re-tried the tests with the 'standard' 4.4.0 kernel parameters
for all /proc/sys/net/* values, and the results still were extremely
poor in comparison to the 3.4.2 kernel.

Our MTU is actually just the standard 1500 bytes, however the message
size was chosen to mimic actual traffic which will be segmented.

I ran ethtool -k (indeed I checked all ethtool parameters, not just
those via -k) and the only real difference I could find was in
"large-receive-offload" which was ON in 3.4.2 but OFF in 4.4.0 - so I
used ethtool to change this to match the 3.4.2 settings and re-ran
the tests.  Didn't help :-(   It's possible of course that I have
missed a parameter here or there in comparing the 3.4.2 setup to the
4.4.0 setup.  I also tried running the ethtool config with the latest
and greatest ethtool version (4.5) on the 4.4.0 kernel, as compared
to the old 3.1 version on our 3.4.2 kernel.


So it would seem the stateless offloads are still enabled.  My next 
question would be to wonder if they are still "effective."  To that end, 
you could run a netperf test specifying a particular port number in the 
test-specific portion:


netperf ...   -- -P ,12345

and while that is running something like

tcpdump -s 96 -c 20 -w /tmp/foo.pcap -i  port 12345

then post-processed with the likes of:

tcpdump -n -r /tmp/foo.pcap | grep -v "length 0" | awk '{sum += 
$NF}END{print "average",sum/NR}'


the intent behind that is to see what the average post-GRO segment size 
happens to be on the receiver and then to compare it between the two 
kernels.  Grepping-away the "length 0" is to avoid counting ACKs and 
look only at data segments.  The specific port number is to avoid 
including any other connections which might happen to have traffic 
passing through at the time.


You could I suspect do the same comparison on the sending side.

There might I suppose be an easier way to get the average segment size - 
perhaps something from looking at ethtool stats - but the stone knives 
and bear skins of tcpdump above would have the added benefit of having a 
packet trace or three for someone to look at if they felt the need.  And 
for that, I would actually suggest starting the capture *before* the 
netperf test so the connection establishment is included.



I performed the TCP_RR test as requested and in that case, the
results are much more comparable.  The old kernel is still better,
but now only around 10% better as opposed to 2-3x better.


Did the service demand change by 10% or just the transaction rate?


However I still contend that the *_STREAM tests are giving us more
pertinent data, since our product application is only getting 1/3 to
1/2 half of the performance on the 4.4.0 kernel, and this is the same
thing I see when I use netperf to test.

One other note: I tried running our 3.4.2 and 4.4.0 kernels in a VM
environment on my workstation, so as to take the 'real' production
hardware out of the equation.  When I perform the tests in this setup
the 3.4.2 and 4.4.0 kernels perform identically - just as you would
expect.


Running in a VM will likely change things massively and could I suppose 
mask other behaviour changes.


happy benchmarking,

rick jones
raj@tardy:~$ cat signatures/toppost
A: Because it fouls the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

:)



Any other ideas?  What can I be missing here?

Peter




-Original Message-
From: Rick Jones [mailto:rick.jon...@hpe.com]
Sent: April-15-16 6:37 PM
To: Butler, Peter ; netdev@vger.kernel.org
Subject: Re: Poorer networking performance in later kernels?

On 04/15/2016 02:02 PM, Butler, Peter wrote:

(Please keep me CC'd to all comments/responses)

I've tried a kernel upgrade from 3.4.2 to 4.4.0 and see a marked drop
in networking performance.  Nothing was changed on the test systems,
other than the kernel itself (and kernel modules).  The identical
.config used to build the 3.4.2 kernel was brought over into the
4.4.0 kernel source tree, and any configuration differences (e.g. new
parameters, etc.) were taken as default values.

The testing was performed on the same actual hardware for both kernel
versions (i.e. take the existing 3.4.2 physical setup, simply boot
into the (new) kernel and run the same test).  The netperf 

Re: Poorer networking performance in later kernels?

2016-04-18 Thread Eric Dumazet
On Mon, 2016-04-18 at 16:27 +, Butler, Peter wrote:
> Hi Eric
> 
> Thanks for your response.  My apologies for being late in getting back
> to you - I wasn't able to have access to the lab hardware on the
> weekend.
> 
> I performed your test as suggested - I've provided a side-by-side diff
> of the nstat output below for the SCTP test only (not the TCP test).
> Note that the fields that are output are somewhat different for the
> two kernels - i.e. some fields exist in one but not the other
> (presumably this comes from the kernel internals?).
> 
> Other than seeing 'larger' throughput numbers in this output I'm not
> sure what to take from it - I'm certainly not a networking
> expert :-(   Let me know if there's anything that speaks to you.
> 
> Note that this test was again done on a clean, freshly rebooted and
> idle system.  Let me know if there's any issues with the output format
> of this data in the email.
> 
> Thanks,
> 
> Peter


OK, please do not top-post on netdev.


I can not really comment on SCTP, could you please post numbers with
TCP ?

Thanks !



RE: Poorer networking performance in later kernels?

2016-04-18 Thread Butler, Peter
P.S.  Cancel my comment about some fields existing in one kernel but not the 
other- that was probably just an artefact of the fact that for one kernel the 
value was zero but not in the other kernel (and that I did not run nstat with 
the -z option to output zero counts).  So the data I provided is still good, 
but the 'empty' fields that exist in one kernel or the other can safely be 
assumed to be zero counts where left out.



-Original Message-
From: Butler, Peter 
Sent: April-18-16 12:27 PM
To: 'Eric Dumazet' 
Cc: netdev@vger.kernel.org
Subject: RE: Poorer networking performance in later kernels?

Hi Eric

Thanks for your response.  My apologies for being late in getting back to you - 
I wasn't able to have access to the lab hardware on the weekend.

I performed your test as suggested - I've provided a side-by-side diff of the 
nstat output below for the SCTP test only (not the TCP test).  Note that the 
fields that are output are somewhat different for the two kernels - i.e. some 
fields exist in one but not the other (presumably this comes from the kernel 
internals?).

Other than seeing 'larger' throughput numbers in this output I'm not sure what 
to take from it - I'm certainly not a networking expert :-(   Let me know if 
there's anything that speaks to you.

Note that this test was again done on a clean, freshly rebooted and idle 
system.  Let me know if there's any issues with the output format of this data 
in the email.

Thanks,

Peter

   3.4.2   4.4.0
---
IpInReceives 3457295 0.0   |   IpInReceives 
1151189 0.0
IpInDelivers 3457295 0.0   |   IpInDelivers 
1151189 0.0
IpOutRequests6864955 0.0   |   IpOutRequests
2249622 0.0
IcmpInErrors 158 0.0   |   IcmpInErrors 
159 0.0
IcmpInTimeExcds  152 0.0   |   IcmpInTimeExcds  
151 0.0
IcmpInEchoReps   6   0.0   |   IcmpInEchoReps   
8   0.0
IcmpOutErrors158 0.0   |   IcmpOutErrors
159 0.0
IcmpOutTimeExcds 152 0.0   |   IcmpOutTimeExcds 
151 0.0
IcmpOutTimestamps6   0.0   |   IcmpOutTimestamps
8   0.0
IcmpMsgInType3   152 0.0   |   IcmpMsgInType3   
151 0.0
IcmpMsgInType8   6   0.0   |   IcmpMsgInType8   
8   0.0
IcmpMsgOutType0  6   0.0   |   IcmpMsgOutType0  
8   0.0
IcmpMsgOutType3  152 0.0   |   IcmpMsgOutType3  
151 0.0
TcpActiveOpens   1   0.0   TcpActiveOpens   
1   0.0
TcpPassiveOpens  3   0.0   |   TcpPassiveOpens  
4   0.0
TcpInSegs70  0.0   |   TcpInSegs
117 0.0
TcpOutSegs   66  0.0   |   TcpOutSegs   
110 0.0
   |   TcpOutRsts   
24  0.0
UdpInDatagrams   608 0.0   |   UdpInDatagrams   
604 0.0
UdpNoPorts   152 0.0   |   UdpNoPorts   
151 0.0
UdpOutDatagrams  760 0.0   |   UdpOutDatagrams  
755 0.0
   |   UdpIgnoredMulti  
144 0.0
TcpExtTW 2   0.0
TcpExtDelayedACKs3   0.0   |   TcpExtDelayedACKs
4   0.0
TcpExtTCPHPHits  25  0.0   |   TcpExtTCPHPHits  
41  0.0
TcpExtTCPPureAcks12  0.0   |   TcpExtTCPPureAcks
14  0.0
TcpExtTCPHPAcks  18  0.0   |   TcpExtTCPHPAcks  
26  0.0
   |   TcpExtTCPRcvCoalesce 
12  0.0
   |   TcpExtTCPOrigDataSent
57  0.0
IpExtInBcastPkts 152 0.0   |   IpExtInBcastPkts 
144 0.0
IpExtInOctets166191161   0.0   |   IpExtInOctets
553952120.0
IpExtOutOctets   9107586685  0.0   |   IpExtOut

RE: Poorer networking performance in later kernels?

2016-04-18 Thread Butler, Peter
Hi Eric

Thanks for your response.  My apologies for being late in getting back to you - 
I wasn't able to have access to the lab hardware on the weekend.

I performed your test as suggested - I've provided a side-by-side diff of the 
nstat output below for the SCTP test only (not the TCP test).  Note that the 
fields that are output are somewhat different for the two kernels - i.e. some 
fields exist in one but not the other (presumably this comes from the kernel 
internals?).

Other than seeing 'larger' throughput numbers in this output I'm not sure what 
to take from it - I'm certainly not a networking expert :-(   Let me know if 
there's anything that speaks to you.

Note that this test was again done on a clean, freshly rebooted and idle 
system.  Let me know if there's any issues with the output format of this data 
in the email.

Thanks,

Peter

   3.4.2   4.4.0
---
IpInReceives 3457295 0.0   |   IpInReceives 
1151189 0.0
IpInDelivers 3457295 0.0   |   IpInDelivers 
1151189 0.0
IpOutRequests6864955 0.0   |   IpOutRequests
2249622 0.0
IcmpInErrors 158 0.0   |   IcmpInErrors 
159 0.0
IcmpInTimeExcds  152 0.0   |   IcmpInTimeExcds  
151 0.0
IcmpInEchoReps   6   0.0   |   IcmpInEchoReps   
8   0.0
IcmpOutErrors158 0.0   |   IcmpOutErrors
159 0.0
IcmpOutTimeExcds 152 0.0   |   IcmpOutTimeExcds 
151 0.0
IcmpOutTimestamps6   0.0   |   IcmpOutTimestamps
8   0.0
IcmpMsgInType3   152 0.0   |   IcmpMsgInType3   
151 0.0
IcmpMsgInType8   6   0.0   |   IcmpMsgInType8   
8   0.0
IcmpMsgOutType0  6   0.0   |   IcmpMsgOutType0  
8   0.0
IcmpMsgOutType3  152 0.0   |   IcmpMsgOutType3  
151 0.0
TcpActiveOpens   1   0.0   TcpActiveOpens   
1   0.0
TcpPassiveOpens  3   0.0   |   TcpPassiveOpens  
4   0.0
TcpInSegs70  0.0   |   TcpInSegs
117 0.0
TcpOutSegs   66  0.0   |   TcpOutSegs   
110 0.0
   |   TcpOutRsts   
24  0.0
UdpInDatagrams   608 0.0   |   UdpInDatagrams   
604 0.0
UdpNoPorts   152 0.0   |   UdpNoPorts   
151 0.0
UdpOutDatagrams  760 0.0   |   UdpOutDatagrams  
755 0.0
   |   UdpIgnoredMulti  
144 0.0
TcpExtTW 2   0.0
TcpExtDelayedACKs3   0.0   |   TcpExtDelayedACKs
4   0.0
TcpExtTCPHPHits  25  0.0   |   TcpExtTCPHPHits  
41  0.0
TcpExtTCPPureAcks12  0.0   |   TcpExtTCPPureAcks
14  0.0
TcpExtTCPHPAcks  18  0.0   |   TcpExtTCPHPAcks  
26  0.0
   |   TcpExtTCPRcvCoalesce 
12  0.0
   |   TcpExtTCPOrigDataSent
57  0.0
IpExtInBcastPkts 152 0.0   |   IpExtInBcastPkts 
144 0.0
IpExtInOctets166191161   0.0   |   IpExtInOctets
553952120.0
IpExtOutOctets   9107586685  0.0   |   IpExtOutOctets   
2983660504  0.0
IpExtInBcastOctets   37356   0.0   |   IpExtInBcastOctets   
35328   0.0
   |   IpExtInNoECTPkts 
11750.0
   |   IpExtInECT0Pkts  
1150014 0.0



-Original Message-
From: Eric Dumazet [mailto:eric.duma...@gmail.com] 
Sent: April-18-16 8:17 AM
To: Butler, Peter 
Cc: netdev@vger.kernel.org
Subject: Re: Poorer networking performance in later kernels?

On Fri, 2016-04-15 at 15:33 -0700, Eric Dumazet wrote:
> On Fri, 201

Re: Poorer networking performance in later kernels?

2016-04-18 Thread Eric Dumazet
On Fri, 2016-04-15 at 15:33 -0700, Eric Dumazet wrote:
> On Fri, 2016-04-15 at 21:02 +, Butler, Peter wrote:
> > (Please keep me CC'd to all comments/responses)
> > 
> > I've tried a kernel upgrade from 3.4.2 to 4.4.0 and see a marked drop in 
> > networking performance.  Nothing was changed on the test systems, other 
> > than the kernel itself (and kernel modules).  The identical .config used to 
> > build the 3.4.2 kernel was brought over into the 4.4.0 kernel source tree, 
> > and any configuration differences (e.g. new parameters, etc.) were taken as 
> > default values.
> > 
> > The testing was performed on the same actual hardware for both kernel 
> > versions (i.e. take the existing 3.4.2 physical setup, simply boot into the 
> > (new) kernel and run the same test).  The netperf utility was used for 
> > benchmarking and the testing was always performed on idle systems.
> > 
> > TCP testing yielded the following results, where the 4.4.0 kernel only got 
> > about 1/2 of the throughput:
> > 
> >   Recv Send   Send  Utilization   
> > Service Demand
> >   Socket   Socket Message Elapsed   Send Recv 
> > SendRecv
> >   Size Size   SizeTime   Throughput localremote   
> > local   remote
> >   bytesbytes  bytes   secs.  10^6bits/s % S  % S  
> > us/KB   us/KB
> > 
> > 3.4.2 13631488 13631488   895230.01  9370.2910.146.50 
> > 0.709   0.454
> > 4.4.0 13631488 13631488   895230.02  5314.039.14 14.31
> > 1.127   1.765
> > 
> > SCTP testing yielded the following results, where the 4.4.0 kernel only got 
> > about 1/3 of the throughput:
> > 
> >   Recv Send   Send  Utilization   
> > Service Demand
> >   Socket   Socket Message Elapsed   Send Recv 
> > SendRecv
> >   Size Size   SizeTime   Throughput localremote   
> > local   remote
> >   bytesbytes  bytes   secs.  10^6bits/s  % S % S  
> > us/KB   us/KB
> > 
> > 3.4.2 13631488 13631488   895230.00  2306.2213.8713.19
> > 3.941   3.747
> > 4.4.0 13631488 13631488   895230.01   882.7416.8619.14
> > 12.516  14.210
> > 
> > The same tests were performed a multitude of time, and are always 
> > consistent (within a few percent).  I've also tried playing with various 
> > run-time kernel parameters (/proc/sys/kernel/net/...) on the 4.4.0 kernel 
> > to alleviate the issue but have had no success at all.
> > 
> > I'm at a loss as to what could possibly account for such a discrepancy...
> 
> Maybe new kernel is faster and you have drops somewhere ?
> 
> nstat >/dev/null
> netperf -H ...
> nstat
> 
> Would help
> 

Are you receiving my mails, or simply ignoring them ?

Thanks.





RE: Poorer networking performance in later kernels?

2016-04-18 Thread Butler, Peter
Just a minor clarification to my last paragraph ("When I perform the tests in 
this setup the 3.4.2 and 4.4.0 kernels perform identically - just as you would 
expect.").  By this I don't mean that the 3.4.2 and 4.4.0 kernels on the VMs 
perform identically to the 3.4.2 and 4.4.0 kernels on the actual hardware; what 
I mean is that in VM-land the original problem is essentially gone, as I get 
the same throughput with either kernel .



-Original Message-
From: Butler, Peter 
Sent: April-18-16 7:28 AM
To: 'Rick Jones' ; netdev@vger.kernel.org
Subject: RE: Poorer networking performance in later kernels?

Hi Rick

Thanks for the reply.

Here is some hardware information, as requested (the two systems are identical, 
and are communicating with one another over a 10GB full-duplex Ethernet 
backplane):

- processor type: Intel(R) Xeon(R) CPU C5528  @ 2.13GHz
- NIC: Intel 82599EB 10GB XAUI/BX4
- NIC driver: ixgbe version 4.2.1-k (part of 4.4.0 kernel)

As for the buffer sizes, those rather large ones work fine for us with the 
3.4.2 kernel.  However, for the sake of being complete, I have re-tried the 
tests with the 'standard' 4.4.0 kernel parameters for all /proc/sys/net/* 
values, and the results still were extremely poor in comparison to the 3.4.2 
kernel.

Our MTU is actually just the standard 1500 bytes, however the message size was 
chosen to mimic actual traffic which will be segmented.

I ran ethtool -k (indeed I checked all ethtool parameters, not just those via 
-k) and the only real difference I could find was in "large-receive-offload" 
which was ON in 3.4.2 but OFF in 4.4.0 - so I used ethtool to change this to 
match the 3.4.2 settings and re-ran the tests.  Didn't help :-(   It's possible 
of course that I have missed a parameter here or there in comparing the 3.4.2 
setup to the 4.4.0 setup.  I also tried running the ethtool config with the 
latest and greatest ethtool version (4.5) on the 4.4.0 kernel, as compared to 
the old 3.1 version on our 3.4.2 kernel.

I performed the TCP_RR test as requested and in that case, the results are much 
more comparable.  The old kernel is still better, but now only around 10% 
better as opposed to 2-3x better.

However I still contend that the *_STREAM tests are giving us more pertinent 
data, since our product application is only getting 1/3 to 1/2 half of the 
performance on the 4.4.0 kernel, and this is the same thing I see when I use 
netperf to test.

One other note: I tried running our 3.4.2 and 4.4.0 kernels in a VM environment 
on my workstation, so as to take the 'real' production hardware out of the 
equation.  When I perform the tests in this setup the 3.4.2 and 4.4.0 kernels 
perform identically - just as you would expect.

Any other ideas?  What can I be missing here?

Peter




-Original Message-
From: Rick Jones [mailto:rick.jon...@hpe.com]
Sent: April-15-16 6:37 PM
To: Butler, Peter ; netdev@vger.kernel.org
Subject: Re: Poorer networking performance in later kernels?

On 04/15/2016 02:02 PM, Butler, Peter wrote:
> (Please keep me CC'd to all comments/responses)
>
> I've tried a kernel upgrade from 3.4.2 to 4.4.0 and see a marked drop 
> in networking performance.  Nothing was changed on the test systems, 
> other than the kernel itself (and kernel modules).  The identical 
> .config used to build the 3.4.2 kernel was brought over into the
> 4.4.0 kernel source tree, and any configuration differences (e.g. new 
> parameters, etc.) were taken as default values.
>
> The testing was performed on the same actual hardware for both kernel 
> versions (i.e. take the existing 3.4.2 physical setup, simply boot 
> into the (new) kernel and run the same test).  The netperf utility was 
> used for benchmarking and the testing was always performed on idle 
> systems.
>
> TCP testing yielded the following results, where the 4.4.0 kernel only 
> got about 1/2 of the throughput:
>

>Recv Send   Send  Utilization   
> Service Demand
>Socket   Socket Message Elapsed   Send Recv 
> SendRecv
>Size Size   SizeTime   Throughput localremote   
> local   remote
>bytesbytes  bytes   secs.  10^6bits/s % S  % S  
> us/KB   us/KB
>
> 3.4.2 13631488 13631488   895230.01  9370.2910.146.50 
> 0.709   0.454
> 4.4.0 13631488 13631488   895230.02  5314.039.14 14.31
> 1.127   1.765
>
> SCTP testing yielded the following results, where the 4.4.0 kernel only got 
> about 1/3 of the throughput:
>
>Recv Send   Send  Utilization   
> Service Demand
>Socket   Socket Message Elapsed   Send Recv 
> SendRecv
>Size Siz

RE: Poorer networking performance in later kernels?

2016-04-18 Thread Butler, Peter
Hi Rick

Thanks for the reply.

Here is some hardware information, as requested (the two systems are identical, 
and are communicating with one another over a 10GB full-duplex Ethernet 
backplane):

- processor type: Intel(R) Xeon(R) CPU C5528  @ 2.13GHz
- NIC: Intel 82599EB 10GB XAUI/BX4
- NIC driver: ixgbe version 4.2.1-k (part of 4.4.0 kernel)

As for the buffer sizes, those rather large ones work fine for us with the 
3.4.2 kernel.  However, for the sake of being complete, I have re-tried the 
tests with the 'standard' 4.4.0 kernel parameters for all /proc/sys/net/* 
values, and the results still were extremely poor in comparison to the 3.4.2 
kernel.

Our MTU is actually just the standard 1500 bytes, however the message size was 
chosen to mimic actual traffic which will be segmented.

I ran ethtool -k (indeed I checked all ethtool parameters, not just those via 
-k) and the only real difference I could find was in "large-receive-offload" 
which was ON in 3.4.2 but OFF in 4.4.0 - so I used ethtool to change this to 
match the 3.4.2 settings and re-ran the tests.  Didn't help :-(   It's possible 
of course that I have missed a parameter here or there in comparing the 3.4.2 
setup to the 4.4.0 setup.  I also tried running the ethtool config with the 
latest and greatest ethtool version (4.5) on the 4.4.0 kernel, as compared to 
the old 3.1 version on our 3.4.2 kernel.

I performed the TCP_RR test as requested and in that case, the results are much 
more comparable.  The old kernel is still better, but now only around 10% 
better as opposed to 2-3x better.

However I still contend that the *_STREAM tests are giving us more pertinent 
data, since our product application is only getting 1/3 to 1/2 half of the 
performance on the 4.4.0 kernel, and this is the same thing I see when I use 
netperf to test.

One other note: I tried running our 3.4.2 and 4.4.0 kernels in a VM environment 
on my workstation, so as to take the 'real' production hardware out of the 
equation.  When I perform the tests in this setup the 3.4.2 and 4.4.0 kernels 
perform identically - just as you would expect.

Any other ideas?  What can I be missing here?

Peter




-Original Message-
From: Rick Jones [mailto:rick.jon...@hpe.com] 
Sent: April-15-16 6:37 PM
To: Butler, Peter ; netdev@vger.kernel.org
Subject: Re: Poorer networking performance in later kernels?

On 04/15/2016 02:02 PM, Butler, Peter wrote:
> (Please keep me CC'd to all comments/responses)
>
> I've tried a kernel upgrade from 3.4.2 to 4.4.0 and see a marked drop 
> in networking performance.  Nothing was changed on the test systems, 
> other than the kernel itself (and kernel modules).  The identical 
> .config used to build the 3.4.2 kernel was brought over into the
> 4.4.0 kernel source tree, and any configuration differences (e.g. new 
> parameters, etc.) were taken as default values.
>
> The testing was performed on the same actual hardware for both kernel 
> versions (i.e. take the existing 3.4.2 physical setup, simply boot 
> into the (new) kernel and run the same test).  The netperf utility was 
> used for benchmarking and the testing was always performed on idle 
> systems.
>
> TCP testing yielded the following results, where the 4.4.0 kernel only 
> got about 1/2 of the throughput:
>

>Recv Send   Send  Utilization   
> Service Demand
>Socket   Socket Message Elapsed   Send Recv 
> SendRecv
>Size Size   SizeTime   Throughput localremote   
> local   remote
>bytesbytes  bytes   secs.  10^6bits/s % S  % S  
> us/KB   us/KB
>
> 3.4.2 13631488 13631488   895230.01  9370.2910.146.50 
> 0.709   0.454
> 4.4.0 13631488 13631488   895230.02  5314.039.14 14.31
> 1.127   1.765
>
> SCTP testing yielded the following results, where the 4.4.0 kernel only got 
> about 1/3 of the throughput:
>
>Recv Send   Send  Utilization   
> Service Demand
>Socket   Socket Message Elapsed   Send Recv 
> SendRecv
>Size Size   SizeTime   Throughput localremote   
> local   remote
>bytesbytes  bytes   secs.  10^6bits/s  % S % S  
> us/KB   us/KB
>
> 3.4.2 13631488 13631488   895230.00  2306.2213.8713.19
> 3.941   3.747
> 4.4.0 13631488 13631488   895230.01   882.7416.8619.14
> 12.516  14.210
>
> The same tests were performed a multitude of time, and are always 
> consistent (within a few percent).  I've also tried playing with 
> various run-time kernel parameters (/proc/sys/kernel/net/...) on the
> 4.4.0 kernel to alleviate

Re: Poorer networking performance in later kernels?

2016-04-15 Thread Rick Jones

On 04/15/2016 02:02 PM, Butler, Peter wrote:

(Please keep me CC'd to all comments/responses)

I've tried a kernel upgrade from 3.4.2 to 4.4.0 and see a marked drop
in networking performance.  Nothing was changed on the test systems,
other than the kernel itself (and kernel modules).  The identical
.config used to build the 3.4.2 kernel was brought over into the
4.4.0 kernel source tree, and any configuration differences (e.g. new
parameters, etc.) were taken as default values.

The testing was performed on the same actual hardware for both kernel
versions (i.e. take the existing 3.4.2 physical setup, simply boot
into the (new) kernel and run the same test).  The netperf utility
was used for benchmarking and the testing was always performed on
idle systems.

TCP testing yielded the following results, where the 4.4.0 kernel
only got about 1/2 of the throughput:




   Recv Send   Send  Utilization   
Service Demand
   Socket   Socket Message Elapsed   Send Recv Send 
   Recv
   Size Size   SizeTime   Throughput localremote   
local   remote
   bytesbytes  bytes   secs.  10^6bits/s % S  % S  
us/KB   us/KB

3.4.2 13631488 13631488   895230.01  9370.2910.146.50 0.709 
  0.454
4.4.0 13631488 13631488   895230.02  5314.039.14 14.311.127 
  1.765

SCTP testing yielded the following results, where the 4.4.0 kernel only got 
about 1/3 of the throughput:

   Recv Send   Send  Utilization   
Service Demand
   Socket   Socket Message Elapsed   Send Recv Send 
   Recv
   Size Size   SizeTime   Throughput localremote   
local   remote
   bytesbytes  bytes   secs.  10^6bits/s  % S % S  
us/KB   us/KB

3.4.2 13631488 13631488   895230.00  2306.2213.8713.193.941 
  3.747
4.4.0 13631488 13631488   895230.01   882.7416.8619.14
12.516  14.210

The same tests were performed a multitude of time, and are always
consistent (within a few percent).  I've also tried playing with
various run-time kernel parameters (/proc/sys/kernel/net/...) on the
4.4.0 kernel to alleviate the issue but have had no success at all.

I'm at a loss as to what could possibly account for such a discrepancy...



I suspect I am not alone in being curious about the CPU(s) present in 
the systems and the model/whatnot of the NIC being used.  I'm also 
curious as to why you have what at first glance seem like absurdly large 
socket buffer sizes.


That said, it looks like you have some Really Big (tm) increases in 
service demand.  Many more CPU cycles being consumed per KB of data 
transferred.


Your message size makes me wonder if you were using a 9000 byte MTU.

Perhaps in the move from 3.4.2 to 4.4.0 you lost some or all of the 
stateless offloads for your NIC(s)?  Running ethtool -k  on 
both ends under both kernels might be good.


Also, if you did have a 9000 byte MTU under 3.4.2 are you certain you 
still had it under 4.4.0?


It would (at least to me) also be interesting to run a TCP_RR test 
comparing the two kernels.  TCP_RR (at least with the default 
request/response size of one byte) doesn't really care about stateless 
offloads or MTUs and could show how much difference there is in basic 
path length (or I suppose in interrupt coalescing behaviour if the NIC 
in question has a mildly dodgy heuristic for such things).


happy benchmarking,

rick jones



Re: Poorer networking performance in later kernels?

2016-04-15 Thread Eric Dumazet
On Fri, 2016-04-15 at 21:02 +, Butler, Peter wrote:
> (Please keep me CC'd to all comments/responses)
> 
> I've tried a kernel upgrade from 3.4.2 to 4.4.0 and see a marked drop in 
> networking performance.  Nothing was changed on the test systems, other than 
> the kernel itself (and kernel modules).  The identical .config used to build 
> the 3.4.2 kernel was brought over into the 4.4.0 kernel source tree, and any 
> configuration differences (e.g. new parameters, etc.) were taken as default 
> values.
> 
> The testing was performed on the same actual hardware for both kernel 
> versions (i.e. take the existing 3.4.2 physical setup, simply boot into the 
> (new) kernel and run the same test).  The netperf utility was used for 
> benchmarking and the testing was always performed on idle systems.
> 
> TCP testing yielded the following results, where the 4.4.0 kernel only got 
> about 1/2 of the throughput:
> 
>   Recv Send   Send  Utilization   
> Service Demand
>   Socket   Socket Message Elapsed   Send Recv 
> SendRecv
>   Size Size   SizeTime   Throughput localremote   
> local   remote
>   bytesbytes  bytes   secs.  10^6bits/s % S  % S  
> us/KB   us/KB
> 
> 3.4.2 13631488 13631488   895230.01  9370.2910.146.50 
> 0.709   0.454
> 4.4.0 13631488 13631488   895230.02  5314.039.14 14.31
> 1.127   1.765
> 
> SCTP testing yielded the following results, where the 4.4.0 kernel only got 
> about 1/3 of the throughput:
> 
>   Recv Send   Send  Utilization   
> Service Demand
>   Socket   Socket Message Elapsed   Send Recv 
> SendRecv
>   Size Size   SizeTime   Throughput localremote   
> local   remote
>   bytesbytes  bytes   secs.  10^6bits/s  % S % S  
> us/KB   us/KB
> 
> 3.4.2 13631488 13631488   895230.00  2306.2213.8713.19
> 3.941   3.747
> 4.4.0 13631488 13631488   895230.01   882.7416.8619.14
> 12.516  14.210
> 
> The same tests were performed a multitude of time, and are always consistent 
> (within a few percent).  I've also tried playing with various run-time kernel 
> parameters (/proc/sys/kernel/net/...) on the 4.4.0 kernel to alleviate the 
> issue but have had no success at all.
> 
> I'm at a loss as to what could possibly account for such a discrepancy...

Maybe new kernel is faster and you have drops somewhere ?

nstat >/dev/null
netperf -H ...
nstat

Would help